gneiss.regression.mixedlm

gneiss.regression.mixedlm(formula, table, metadata, tree, groups, **kwargs)[source]

Linear Mixed Effects Models applied to balances.

A linear mixed effects model is performed on nonzero relative abundance data given a list of covariates, or explanatory variables such as ph, treatment, etc to test for specific effects. The relative abundance data is transformed into balances using the ILR transformation, using a tree to specify the groupings of the features. The linear mixed effects model is applied to each balance separately. Only positive data will be accepted, so if there are zeros present, consider using a zero imputation method such as skbio.stats.composition.multiplicative_replacement or add a pseudocount.

Parameters:
  • formula (str) – Formula representing the statistical equation to be evaluated. These strings are similar to how equations are handled in R. Note that the dependent variable in this string should not be specified, since this method will be run on each of the individual balances. See patsy for more details.
  • table (pd.DataFrame) – Contingency table where samples correspond to rows and features correspond to columns.
  • metadata (pd.DataFrame) – Metadata table that contains information about the samples contained in the table object. Samples correspond to rows and covariates correspond to columns.
  • tree (skbio.TreeNode) – Tree object where the leaves correspond to the columns contained in the table.
  • groups (str) – Column names in metadata that specifies the groups. These groups are often associated with individuals repeatedly sampled, typically longitudinally.
  • **kwargs (dict) – Other arguments accepted into statsmodels.regression.linear_model.MixedLM
Returns:

Container object that holds information about the overall fit.

Return type:

RegressionResults

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from skbio.stats.composition import ilr_inv
>>> from skbio import TreeNode
>>> from gneiss.regression import mixedlm

Here, we will define a table of proportions with 3 features a, b, and c across 12 samples.

>>> table = pd.DataFrame({
...         'u1':  [0.804248, 0.195526, 0.000226],
...         'u2':  [0.804369, 0.195556, 0.000075],
...         'u3':  [0.825711, 0.174271, 0.000019],
...         'x1':  [0.751606, 0.158631, 0.089763],
...         'x2':  [0.777794, 0.189095, 0.033111],
...         'x3':  [0.817855, 0.172613, 0.009532],
...         'y1':  [0.780774, 0.189819, 0.029406],
...         'y2':  [0.797332, 0.193845, 0.008824],
...         'y3':  [0.802058, 0.194994, 0.002948],
...         'z1':  [0.825041, 0.174129, 0.000830],
...         'z2':  [0.804248, 0.195526, 0.000226],
...         'z3':  [0.825667, 0.174261, 0.000072]}
...         index=['a', 'b', 'c']).T

Now we are going to define some of the external variables to test for in the model. Here we will be testing a hypothetical longitudinal study across 3 time points, with 4 patients x, y, z and u, where x and y were given treatment 1 and z and u were given treatment 2.

>>> metadata = pd.DataFrame({
...         'patient': [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],
...         'treatment': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2],
...         'time': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
...     }, index=['x1', 'x2', 'x3', 'y1', 'y2', 'y3',
...               'z1', 'z2', 'z3', 'u1', 'u2', 'u3'])

Finally, we need to define a bifurcating tree used to convert the proportions to balances. If the internal nodes aren’t labels, a default labeling will be applied (i.e. y1, y2, ...)

>>> tree = TreeNode.read(['(c, (b,a)Y2)Y1;'])
>>> print(tree.ascii_art())
          /-c
-Y1------|
         |          /-b
          \Y2------|
                    \-a

Now we can run the linear mixed effects model on the proportions. Underneath the hood, the proportions will be transformed into balances, so that the linear mixed effects models can be run directly on balances. Since each patient was sampled repeatedly, we’ll specify them separately in the groups. In the linear mixed effects model time and treatment will be simultaneously tested for with respect to the balances.

>>> res = mixedlm('time + treatment', table, metadata, tree,
...               groups='patient')

See also

statsmodels.regression.linear_model.MixedLM(), skbio.stats.composition.multiplicative_replacement(), ols()