In this study, there were 18 patients with cystic fibrosis. The hypothesis was that there were two main communities at play in the CF lung. One of these communities thrives at low pH, and the other community thrives at high pH. To test this, sputum samples were divided among 8 tubes, and each of the tubes was perturbed with a different pH. Here we will calculate balances, and test how these balances change with respect to pH, using linear mixed effects models.

First we'll want to load up the datasets we want to process into qiime

In [1]:
!qiime tools import \
    --input-path otu_table.biom \
    --output-path cfstudy.biom.qza \
    --type FeatureTable[Frequency]

!qiime tools import \
    --input-path cfstudy_taxonomy.txt \
    --output-path cfstudy_taxonomy.qza \
    --type FeatureData[Taxonomy]

Again, we'll want to filter out low abundance OTUs. This will not only remove potential confounders, but could also alleviate the issue with zeros.

In [2]:
!qiime feature-table filter-features \
    --i-table cfstudy_common.biom.qza \
    --o-filtered-table cfstudy_common_filt500.biom.qza \
    --p-min-frequency 500
Saved FeatureTable[Frequency] to: cfstudy_common_filt500.biom.qza

Again, we will create the tree using pH. Note that we'll also want to reorder the OTU table for the balance calculations.

In [3]:
!qiime gneiss gradient-clustering \
    --i-table cfstudy_common_filt500.biom.qza \
    --m-gradient-file cfstudy_modified_metadata.txt \
    --m-gradient-category ph \
    --o-clustering ph_tree.nwk.qza \
    --p-weighted
Saved Hierarchy to: ph_tree.nwk.qza

Before running the linear mixed effects models using mixed we'll want to replace zeros with a pseudocount to approximate the uncertainity probability.

In [4]:
!qiime composition add-pseudocount \
    --i-table cfstudy_common_filt500.biom.qza \
    --p-pseudocount 1 \
    --o-composition-table cf_composition.qza
Saved FeatureTable[Composition] to: cf_composition.qza
In [5]:
!qiime gneiss ilr-transform \
    --i-table cf_composition.qza \
    --i-tree ph_tree.nwk.qza \
    --o-balances cf_balances.qza
Saved FeatureTable[Balance] to: cf_balances.qza

Now we can run the linear mixed effects models. pH is the only covariate being tested for and each of the patients are being accounted for by passing host_subject_id into groups. This is because the microbial differences between the patients are much larger than the pH effects, so we need to correct for this change, by treating each patient separately. This is why the linear mixed effects strategy is chosen.

In [6]:
!qiime gneiss lme-regression \
    --p-formula "ph" \
    --i-table cf_balances.qza \
    --i-tree ph_tree.nwk.qza \
    --m-metadata-file cfstudy_modified_metadata.txt \
    --p-groups host_subject_id \
    --o-visualization cf_linear_mixed_effects_model
Saved Visualization to: cf_linear_mixed_effects_model.qzv

These summary results can be visualized in qiime2 visualization framework. Checkout view.qiime2.org

Let's further summarize the results of the linear mixed effects model. We'll plot the how one of the top balances change with respect to the pH.

In [7]:
!qiime gneiss balance-taxonomy \
    --i-balances cf_balances.qza \
    --i-tree ph_tree.nwk.qza \
    --i-taxonomy cfstudy_taxonomy.qza \
    --p-taxa-level 4 \
    --p-balance-name 'y2' \
    --m-metadata-file 'cfstudy_modified_metadata.txt' \
    --m-metadata-category 'ph' \
    --o-visualization y2_taxa_summary.qzv
Saved Visualization to: y2_taxa_summary.qzv

Similar to the 88soils example, there is a very obvious transition from low pH organisms to high pH organism as the pH increases. However, given that every patient has different microbes, so it is difficult to test for individual microbes abundances across patients. However, every patient has microbes that behave the same with respect to pH. Balances is a very powerful tool for addressing this, as it can allow for entire subcommunities to be tested, rather than just individual OTUs.