Here we will be continuing the cystic fibrosis study, and examining the results in depth using the Python API.
import os import qiime2 import numpy as np import pandas as pd from skbio import TreeNode %matplotlib inline # Obtain raw OTU counts table_art = qiime2.Artifact.load('cfstudy_common_filt500.biom.qza') table = table_art.view(pd.DataFrame) # Obtain the balances balance_art = qiime2.Artifact.load('cf_balances.qza') balances = balance_art.view(pd.DataFrame) # Obtain the tree tree_art = qiime2.Artifact.load('ph_tree.nwk.qza') tree = tree_art.view(TreeNode) # Obtain the taxa taxa_art = qiime2.Artifact.load('cfstudy_taxonomy.qza') taxa = taxa_art.view(pd.DataFrame) # Obtain the metadata metadata = pd.read_table('cfstudy_modified_metadata.txt', index_col=0) # Unpack the results from the regression viz = qiime2.Visualization.load('cf_linear_mixed_effects_model.qzv') viz.export_data('regression_summary_dir') pvals = pd.read_csv('regression_summary_dir/pvalues.csv', index_col=0) resid = pd.read_csv('regression_summary_dir/residuals.csv', index_col=0)
We will visualize the entire tree, and resize the internal nodes reflect the pvalues with respect to ph. Specifically, we want to test to determine if pH is a driving factor. We also want to visualize the mean ph values for each OTU and load the tree.
from gneiss.sort import mean_niche_estimator mean_ph = mean_niche_estimator(table, metadata.ph)
Then we'll want to populate the tree with attributes that we would like to visualize.
for n in tree.postorder(): n.color = '#FF00FF' # color all nodes magenta if n.is_tip(): # display mean ph in hover tool n.mean_ph = mean_ph.loc[n.name] else: # resize node by pvalue pval = pvals.loc[n.name, 'ph'] n.ph_pvalue = -np.log(pval) / 10 # scale down for visual purposes
Now we can specify the machinery to visualize the plots.
from gneiss.plot import radialplot from bokeh.io import show, output_notebook output_notebook() p = radialplot(tree, node_size='ph_pvalue', node_color='color', edge_color='edge_color', hover_var='mean_ph') show(p)
Of course, we'll want to validate these findings by investigating the raw balances. We'll look at the most significant balance y1.
import seaborn as sns data = pd.merge(balances, metadata, left_index=True, right_index=True) grid = sns.factorplot(x="ph", y="y2", hue='host_subject_id', data=data, palette="viridis", legend=False)
Similar to the 88soils example, there is a very obvious transition from low pH organisms to high pH organism as the pH increases. However, given that every patient has different microbes, so it is difficult to test for individual microbes abundances across patients. However, every patient has microbes that behave the same with respect to pH. Balances is a very powerful tool for addressing this, as it can allow for entire subcommunities to be tested, rather than just individual OTUs.