Calour microbiome databases interface tutorial

Setup

In [1]:
import calour as ca
/Users/amnon/miniconda3/envs/calour/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
In [2]:
ca.set_log_level(11)
In [3]:
%matplotlib notebook

Load the data

We will use the Chronic faitigue syndrome data from:

Giloteaux, L., Goodrich, J.K., Walters, W.A., Levine, S.M., Ley, R.E. and Hanson, M.R., 2016.

Reduced diversity and altered composition of the gut microbiome in individuals with myalgic encephalomyelitis/chronic fatigue syndrome.

Microbiome, 4(1), p.30.

In [4]:
cfs=ca.read_amplicon('data/chronic-fatigue-syndrome.biom',
                     'data/chronic-fatigue-syndrome.sample.txt',
                     normalize=10000,min_reads=1000)
2018-07-26 13:09:44 INFO loaded 87 samples, 2129 features
2018-07-26 13:09:44 WARNING These have metadata but do not have data - dropped: {'ERR1331814'}
2018-07-26 13:09:44 INFO After filtering, 87 remaining

preprocess

remove non-interesting bacteria, cluster bacteria and sort samples by disease status

In [5]:
cfs=cfs.filter_abundance(10)
2018-07-26 13:09:45 INFO After filtering, 1100 remaining
In [6]:
cfs=cfs.cluster_features()
2018-07-26 13:09:45 INFO After filtering, 1100 remaining
In [7]:
cfs=cfs.sort_samples('Subject')

Viewing database annotations

in the interactive heatmap, when clicking on a bacteria, we get a list of all database results about the selected bacteria.

We can choose which databases to use by the databases=['dbbact',...] parameter. The possible databases depend on which database modules were installed.

Currently, supported microbiome database interfaces include:

  • dbBact - a community database for manual annotations about bacteria (interface installation instruction at dbbact-calour).
  • SpongeEMP - an automatic database for sea sponge samples (interface installation instruction at spongeworld-calour).
  • phenoDB - phenotypic information about selected bacteria (interface installation instruction at pheno-calour).

By default, calour uses the dbBact database for microbiome data

In [8]:
cfs.plot(sample_field='Subject',gui='jupyter')
Out[8]:
<calour.heatmap.plotgui_jupyter.PlotGUI_Jupyter at 0x1a10abebe0>

dbBact enrichment of selected bacteria

By selecting a set of bacteria (using the shift+click or ctrl+click) and choosing the “Enrichment” button, we can get a list of terms that are significantly enriched in the selected bacteria compared to the rest of the bacteria in the plot

Adding dbBact annotations

(Only possible using the gui='qt5' GUI)

To add a new annotation to the selected set of bacteria, choose the “Annotate” button.

Detailed instructions are available at the dbBact.org website.

Differential abundance

To find the bacteria significantly different between samples with ‘Control’ (healthy) and ‘Patient’ (sick) in the ‘Subject’ field.

In [9]:
dd=cfs.diff_abundance(field='Subject',val1='Control',val2='Patient', random_seed=2018)
2018-07-26 13:09:57 INFO 87 samples with both values
2018-07-26 13:09:57 INFO After filtering, 1100 remaining
2018-07-26 13:09:57 INFO 39 samples with value 1 (['Control'])
2018-07-26 13:09:58 INFO method meandiff. number of higher in ['Control'] : 38. number of higher in ['Patient'] : 16. total 54

Plot the significant bacteria

When clicking on a bacteria, we’ll get both dbBact, SpongeEMP, and phenoDB information

In [10]:
dd.plot(sample_field='Subject', gui='jupyter', databases=['dbbact','sponge'])
Out[10]:
<calour.heatmap.plotgui_jupyter.PlotGUI_Jupyter at 0x1a16f33f98>

dbBact term enrichment (diff_abundance_enrichment)

We can ask what is special in the bacteria significanly higher in the Control vs. the Patient group and vice versa.

  • Note since we need to get the per-feature annotations from dbBact, we need a live internet connection to run this command.

Default parameters

In [11]:
ax, enriched=dd.plot_diff_abundance_enrichment()
2018-07-26 13:10:03 INFO Getting dbBact annotations for 54 sequences, please wait...
2018-07-26 13:10:08 INFO Got 2328 annotations
2018-07-26 13:10:08 INFO Added annotation data to experiment. Total 705 annotations, 54 terms
2018-07-26 13:10:08 INFO removed 0 terms

The enriched terms are in a calour experiment class (terms are features, bacteria are samples), so we can see the list of enriched terms with the p-value (pval) and effect size (odif)

In [12]:
enriched.feature_metadata
Out[12]:
odif pvals term
little physical activity {*single exp 63*} -18.562500 0.000999 little physical activity {*single exp 63*}
LOWER IN physical activity {*single exp 63*} -18.562500 0.000999 LOWER IN physical activity {*single exp 63*}
LOWER IN rural community -18.518092 0.000999 LOWER IN rural community
LOWER IN control -17.807566 0.000999 LOWER IN control
LOWER IN small village -17.452303 0.000999 LOWER IN small village
LOWER IN tunapuco {*single exp 276*} -16.430921 0.000999 LOWER IN tunapuco {*single exp 276*}
LOWER IN peru {*single exp 276*} -16.430921 0.000999 LOWER IN peru {*single exp 276*}
crohn's disease -15.587171 0.000999 crohn's disease
chronic fatigue syndrome {*single exp 12*} -15.187500 0.000999 chronic fatigue syndrome {*single exp 12*}
LOWER IN adult -14.254934 0.001998 LOWER IN adult
mus musculus -13.588816 0.000999 mus musculus
age > 1 year -12.478618 0.004995 age > 1 year
kingdom of denmark {*single exp 273*} -12.434211 0.006993 kingdom of denmark {*single exp 273*}
mouse -11.945724 0.000999 mouse
state of oklahoma -11.501645 0.007992 state of oklahoma
age 1 year -11.368421 0.000999 age 1 year
LOWER IN plant diet {*single exp 74*} -11.368421 0.002997 LOWER IN plant diet {*single exp 74*}
stroke {*single exp 333*} -11.235197 0.001998 stroke {*single exp 333*}
age one year {*single exp 273*} -11.190789 0.003996 age one year {*single exp 273*}
research facility -10.968750 0.005994 research facility
infant -10.657895 0.018981 infant
LOWER IN male -10.347039 0.002997 LOWER IN male
LOWER IN age 30-40 {*single exp 330*} -10.125000 0.000999 LOWER IN age 30-40 {*single exp 330*}
msw {*single exp 344*} -10.036184 0.007992 msw {*single exp 344*}
finland -10.036184 0.024975 finland
heterosexual {*single exp 344*} -10.036184 0.007992 heterosexual {*single exp 344*}
LOWER IN age <1 year {*single exp 240*} -9.858553 0.019980 LOWER IN age <1 year {*single exp 240*}
age -9.858553 0.031968 age
animal product diet -9.769737 0.005994 animal product diet
obsolete_juvenile stage -9.547697 0.038961 obsolete_juvenile stage
... ... ... ...
LOWER IN state of oklahoma 14.210526 0.000999 LOWER IN state of oklahoma
msm {*single exp 344*} 14.388158 0.000999 msm {*single exp 344*}
gay {*single exp 344*} 14.388158 0.000999 gay {*single exp 344*}
homosexual {*single exp 344*} 14.388158 0.000999 homosexual {*single exp 344*}
cron diet {*single exp 293*} 15.009868 0.000999 cron diet {*single exp 293*}
caloric restriction diet {*single exp 293*} 15.009868 0.000999 caloric restriction diet {*single exp 293*}
LOWER IN united states of america 15.631579 0.000999 LOWER IN united states of america
sus scrofa 15.942434 0.000999 sus scrofa
pig 15.942434 0.000999 pig
right colon {*single exp 256*} 16.164474 0.000999 right colon {*single exp 256*}
left colon {*single exp 256*} 16.164474 0.000999 left colon {*single exp 256*}
LOWER IN city 16.342105 0.000999 LOWER IN city
influent {*single exp 53*} 17.230263 0.000999 influent {*single exp 53*}
sewage {*single exp 53*} 17.230263 0.000999 sewage {*single exp 53*}
LOWER IN effluent 17.230263 0.000999 LOWER IN effluent
wastewater treatment plant 17.452303 0.000999 wastewater treatment plant
LOWER IN finland 17.629934 0.000999 LOWER IN finland
tanzania {*single exp 190*} 17.763158 0.000999 tanzania {*single exp 190*}
hadza {*single exp 190*} 17.763158 0.000999 hadza {*single exp 190*}
egypt {*single exp 62*} 18.118421 0.000999 egypt {*single exp 62*}
amerindian {*single exp 75*} 19.184211 0.000999 amerindian {*single exp 75*}
venezuela {*single exp 75*} 19.184211 0.000999 venezuela {*single exp 75*}
south america {*single exp 53*} 20.250000 0.000999 south america {*single exp 53*}
peru {*single exp 276*} 21.315789 0.000999 peru {*single exp 276*}
tunapuco {*single exp 276*} 21.315789 0.000999 tunapuco {*single exp 276*}
rural community 21.937500 0.000999 rural community
hunter gatherer 22.026316 0.000999 hunter gatherer
el salvador {*single exp 53*} 22.026316 0.000999 el salvador {*single exp 53*}
LOWER IN infant 23.269737 0.000999 LOWER IN infant
small village 23.758224 0.000999 small village

135 rows × 3 columns

We can plot the enriched terms heatmap to see the term scores for each bacteria.

Note now rows are the bacteria and columns are the terms

In [16]:
enriched.plot(gui='jupyter', databases=[], feature_field='term',sample_field='group',
              yticklabel_kwargs={'rotation': 0, 'size': 7})
Out[16]:
<calour.heatmap.plotgui_jupyter.PlotGUI_Jupyter at 0x1a1c678c50>

Look at the behavior of a single term

We want to see all the annotations where a given term appears, and see what bacteria from either group (CFS or healthy) appear in that annotations. To do this, we use dbbact.show_term_details_diff(). The output of this function is an experiment where each COLUMN is a bacteria, and each row is an annotation. We see whether each bacteria appears in the annotation. Color indicates the annotation type.

In [38]:
dbbact=ca.database._get_database_class('dbbact')
In [40]:
term_info_exp = dbbact.show_term_details_diff('small village',dd,gui='jupyter')
2018-07-26 13:24:01 INFO found 12 annotations with term
2018-07-26 13:24:01 WARNING Do you forget to normalize your data? It is required before running this function
2018-07-26 13:24:01 INFO After filtering, 12 remaining

getting enriched annotations instead of terms

Each annotation is coming from a single experiment (as opposed to terms that can come from annotations in multiple experiment)

In [17]:
ax, enriched=dd.plot_diff_abundance_enrichment(term_type='annotation')
2018-07-26 13:12:53 INFO removed 0 terms
In [18]:
enriched.feature_metadata
Out[18]:
odif pvals term
higher in individuals with low physical activity ( high in little physical activity compared to physical activity in feces homo sapiens united states of america -18.562500 0.000999 higher in individuals with low physical activi...
high in united states of america city state of oklahoma compared to peru small village tunapuco rural community in feces homo sapiens adult -16.430921 0.000999 high in united states of america city state o...
high in children with Crohn's disease compared to healthy adult controls ( high in crohn's disease child obsolete_juvenile stage compared to control adult in feces homo sapiens glasgow -15.187500 0.000999 high in children with Crohn's disease compared...
high in chronic fatigue syndrome compared to control in feces homo sapiens new york county -15.187500 0.000999 high in chronic fatigue syndrome compared to...
high in female compared to male in feces homo sapiens united states of america -15.187500 0.000999 high in female compared to male in feces ho...
Higher in animal product diet compared to plant diet ( high in diet animal product diet compared to plant diet in feces homo sapiens united states of america -11.368421 0.000999 Higher in animal product diet compared to plan...
common feces, homo sapiens, infant, kingdom of norway, oslo, age 1 year, -10.125000 0.000999 common feces, homo sapiens, infant, kingdom o...
high in infant age 1 year compared to adult age 30-40 in feces homo sapiens kingdom of norway oslo -10.125000 0.000999 high in infant age 1 year compared to adult ...
higher in stroke patients compared to healthy controls ( high in stroke compared to control in feces homo sapiens china adult guangzhou city prefecture -10.125000 0.001998 higher in stroke patients compared to healthy ...
lower in infants age<1 year compared to 1-3 years in baby feces ( high in age age > 1 year compared to age <1 year in feces homo sapiens infant finland -9.858553 0.016983 lower in infants age<1 year compared to 1-3 ye...
lower in gay (msm) individuals compared to heterosexual (msw) ( high in heterosexual msw compared to gay homosexual msm in feces homo sapiens united states of america state of colorado denver -9.414474 0.002997 lower in gay (msm) individuals compared to het...
high in age 1 year compared to age 2 months in feces homo sapiens female infant state of california -9.414474 0.001998 high in age 1 year compared to age 2 months ...
higher in lean participants in human feces ( high in low bmi compared to high bmi in feces homo sapiens united states of america adult -8.437500 0.001998 higher in lean participants in human feces ( h...
common feces, homo sapiens, infant, kingdom of denmark, age one year, -8.437500 0.000999 common feces, homo sapiens, infant, kingdom o...
high in age age one month compared to age one week in feces homo sapiens infant kingdom of denmark -7.726974 0.006993 high in age age one month compared to age on...
high in healthy dogs compared to EPI dogs without treatment ( high in control compared to exocrine pancreatic insufficiency in feces united states of america canis lupus familiaris dog -7.726974 0.006993 high in healthy dogs compared to EPI dogs with...
common feces, homo sapiens, china, city, adult, -7.371711 0.061938 common feces, homo sapiens, china, city, adult,
high in age age > 1 year compared to age < 1 year in feces homo sapiens united states of america infant -7.016447 0.019980 high in age age > 1 year compared to age < 1...
negatively correlated with age (30-80 years) ( high in age age 30-50 years compared to age 50-80 years in feces homo sapiens south korea -7.016447 0.017982 negatively correlated with age (30-80 years) (...
common feces, homo sapiens, diarrhea, state of michigan, clostridium difficile intestinal infectious disease, -6.750000 0.005994 common feces, homo sapiens, diarrhea, state o...
high in city compared to small village rural community in feces homo sapiens china adult -6.750000 0.006993 high in city compared to small village rural...
high in old (14-28 days) compared to young (0-3 day) chickens ( high in age old age compared to young age in united states of america caecum gallus gallus chicken -6.750000 0.010989 high in old (14-28 days) compared to young (0-...
common feces, united states of america, canis lupus familiaris, iowa, -6.750000 0.002997 common feces, united states of america, canis...
common in infants age <3 years (common feces, homo sapiens, infant, finland, age < 3 years, -6.750000 0.006993 common in infants age <3 years (common feces,...
positively correlated with bmi ( high in body mass index high bmi compared to low bmi in feces homo sapiens united kingdom -6.305921 0.053946 positively correlated with bmi ( high in body ...
common feces, united states of america, canis lupus familiaris, dog, -6.305921 0.035964 common feces, united states of america, canis...
common feces, homo sapiens, china, crohn's disease, adult, -6.305921 0.044955 common feces, homo sapiens, china, crohn's di...
higher in feces of individuals with kidney stones ( high in nephrolithiasis compared to control in feces homo sapiens china adult nanning city prefecture age 50-60 years -5.328947 0.057942 higher in feces of individuals with kidney sto...
higher in babies from finland compared to estonia ( high in finland compared to estonia in feces homo sapiens infant age < 3 years -5.062500 0.014985 higher in babies from finland compared to esto...
common united states of america, colon, canis lupus familiaris, dog, -5.062500 0.020979 common united states of america, colon, canis...
... ... ... ...
common feces, ethiopia, monkey, chlorocebus djamdjamensis, bale monkey, 9.236842 0.018981 common feces, ethiopia, monkey, chlorocebus d...
common in feces of homosexual males (common feces, homo sapiens, united states of america, state of colorado, denver, gay, homosexual, msm, 9.414474 0.025974 common in feces of homosexual males (common f...
common feces, homo sapiens, brazil, 9.592105 0.042957 common feces, homo sapiens, brazil,
common duodenum, jejunum, ileum, sus scrofa, united kingdom, pig, 9.680921 0.014985 common duodenum, jejunum, ileum, sus scrofa, ...
high in healthy adult controls compared to children with Crohn's disease ( high in control adult compared to crohn's disease child obsolete_juvenile stage in feces homo sapiens glasgow 9.947368 0.005994 high in healthy adult controls compared to chi...
higher in caloric restriction (CRON) diet compared to american diet ( high in diet cron diet caloric restriction diet compared to american diet in feces homo sapiens united states of america adult 9.947368 0.004995 higher in caloric restriction (CRON) diet comp...
common feces, homo sapiens, adult, india, 9.947368 0.007992 common feces, homo sapiens, adult, india,
low in diarrhea compared to recovery period ( high in control compared to diarrhea in feces homo sapiens adult bangladesh 10.036184 0.016983 low in diarrhea compared to recovery period ( ...
common feces, homo sapiens, united states of america, adult, cron diet, caloric restriction diet, 10.125000 0.013986 common feces, homo sapiens, united states of ...
common feces, chlorocebus aethiops, ethiopia, monkey, grivet monkey, 10.657895 0.009990 common feces, chlorocebus aethiops, ethiopia,...
common feces, chlorocebus aethiops, ethiopia, monkey, vervet monkey, 10.657895 0.004995 common feces, chlorocebus aethiops, ethiopia,...
higher in gay (msm) individuals compared to heterosexual (msw) ( high in gay homosexual msm compared to heterosexual msw in feces homo sapiens united states of america state of colorado denver 10.657895 0.007992 higher in gay (msm) individuals compared to he...
high in wet season compared to dry season in feces homo sapiens tanzania hunter gatherer hadza 10.657895 0.003996 high in wet season compared to dry season i...
common feces, homo sapiens, united states of america, child, obsolete_juvenile stage, 11.190789 0.003996 common feces, homo sapiens, united states of ...
high in control compared to chronic fatigue syndrome in feces homo sapiens new york county 11.368421 0.003996 high in control compared to chronic fatigue ...
higher in babies from russia compared to finland ( high in russia compared to finland in feces homo sapiens infant age < 3 years 11.812500 0.004995 higher in babies from russia compared to finla...
common feces, homo sapiens, city, lima, shantytown, 11.812500 0.004995 common feces, homo sapiens, city, lima, shant...
common feces, homo sapiens, tanzania, hunter gatherer, hadza, 12.789474 0.000999 common feces, homo sapiens, tanzania, hunter ...
lower in small intestine compared to colon in pigs ( high in caecum left colon right colon compared to duodenum jejunum ileum in sus scrofa united kingdom pig 12.789474 0.000999 lower in small intestine compared to colon in ...
high in adult age 30-40 compared to infant age 1 year in feces homo sapiens kingdom of norway oslo 13.233553 0.002997 high in adult age 30-40 compared to infant a...
high in peru small village tunapuco rural community compared to united states of america city state of oklahoma in feces homo sapiens adult 14.210526 0.001998 high in peru small village tunapuco rural com...
common caecum, left colon, right colon, sus scrofa, united kingdom, pig, 14.654605 0.000999 common caecum, left colon, right colon, sus s...
common feces, homo sapiens, child, egypt, obsolete_juvenile stage, 15.009868 0.000999 common feces, homo sapiens, child, egypt, obs...
high in male compared to female in feces homo sapiens united states of america 15.631579 0.000999 high in male compared to female in feces ho...
lower in babies from finland compared to estonia ( high in estonia compared to finland in feces homo sapiens infant age < 3 years 16.342105 0.000999 lower in babies from finland compared to eston...
lower in wastewater plant effluent compared to influent and sewer in south america ( high in sewage influent compared to effluent in city wastewater treatment plant south america 17.230263 0.000999 lower in wastewater plant effluent compared to...
common feces, homo sapiens, venezuela, amerindian, hunter gatherer, 19.184211 0.000999 common feces, homo sapiens, venezuela, amerin...
common feces, homo sapiens, adult, peru, small village, tunapuco, rural community, 19.184211 0.000999 common feces, homo sapiens, adult, peru, smal...
common feces, homo sapiens, city, el salvador, small village, 20.605263 0.000999 common feces, homo sapiens, city, el salvador...
high in adult compared to infant age < 1 year in feces homo sapiens india 22.470395 0.000999 high in adult compared to infant age < 1 yea...

79 rows × 3 columns

Getting both enriched terms and annotations

In [19]:
ax, enriched=dd.plot_diff_abundance_enrichment(term_type='combined')
2018-07-26 13:13:05 INFO removed 0 terms
In [20]:
enriched.feature_metadata
Out[20]:
odif pvals term
higher in individuals with low physical activity ( high in little physical activity compared to physical activity in feces homo sapiens united states of america -18.562500 0.000999 higher in individuals with low physical activi...
LOWER IN physical activity {*single exp 63*} -18.562500 0.000999 LOWER IN physical activity {*single exp 63*}
little physical activity {*single exp 63*} -18.562500 0.000999 little physical activity {*single exp 63*}
LOWER IN rural community -18.518092 0.000999 LOWER IN rural community
LOWER IN control -17.807566 0.000999 LOWER IN control
LOWER IN small village -17.452303 0.000999 LOWER IN small village
LOWER IN peru {*single exp 276*} -16.430921 0.000999 LOWER IN peru {*single exp 276*}
high in united states of america city state of oklahoma compared to peru small village tunapuco rural community in feces homo sapiens adult -16.430921 0.000999 high in united states of america city state o...
LOWER IN tunapuco {*single exp 276*} -16.430921 0.000999 LOWER IN tunapuco {*single exp 276*}
crohn's disease -15.587171 0.000999 crohn's disease
high in children with Crohn's disease compared to healthy adult controls ( high in crohn's disease child obsolete_juvenile stage compared to control adult in feces homo sapiens glasgow -15.187500 0.000999 high in children with Crohn's disease compared...
high in chronic fatigue syndrome compared to control in feces homo sapiens new york county -15.187500 0.000999 high in chronic fatigue syndrome compared to...
high in female compared to male in feces homo sapiens united states of america -15.187500 0.000999 high in female compared to male in feces ho...
chronic fatigue syndrome {*single exp 12*} -15.187500 0.000999 chronic fatigue syndrome {*single exp 12*}
LOWER IN adult -14.254934 0.001998 LOWER IN adult
mus musculus -13.588816 0.000999 mus musculus
age > 1 year -12.478618 0.000999 age > 1 year
kingdom of denmark {*single exp 273*} -12.434211 0.001998 kingdom of denmark {*single exp 273*}
mouse -11.945724 0.000999 mouse
state of oklahoma -11.501645 0.007992 state of oklahoma
LOWER IN plant diet {*single exp 74*} -11.368421 0.001998 LOWER IN plant diet {*single exp 74*}
age 1 year -11.368421 0.000999 age 1 year
Higher in animal product diet compared to plant diet ( high in diet animal product diet compared to plant diet in feces homo sapiens united states of america -11.368421 0.001998 Higher in animal product diet compared to plan...
stroke {*single exp 333*} -11.235197 0.001998 stroke {*single exp 333*}
age one year {*single exp 273*} -11.190789 0.004995 age one year {*single exp 273*}
research facility -10.968750 0.011988 research facility
infant -10.657895 0.014985 infant
LOWER IN male -10.347039 0.001998 LOWER IN male
LOWER IN age 30-40 {*single exp 330*} -10.125000 0.000999 LOWER IN age 30-40 {*single exp 330*}
common feces, homo sapiens, infant, kingdom of norway, oslo, age 1 year, -10.125000 0.000999 common feces, homo sapiens, infant, kingdom o...
... ... ... ...
high in male compared to female in feces homo sapiens united states of america 15.631579 0.000999 high in male compared to female in feces ho...
pig 15.942434 0.001998 pig
sus scrofa 15.942434 0.001998 sus scrofa
left colon {*single exp 256*} 16.164474 0.000999 left colon {*single exp 256*}
right colon {*single exp 256*} 16.164474 0.000999 right colon {*single exp 256*}
lower in babies from finland compared to estonia ( high in estonia compared to finland in feces homo sapiens infant age < 3 years 16.342105 0.000999 lower in babies from finland compared to eston...
LOWER IN city 16.342105 0.000999 LOWER IN city
sewage {*single exp 53*} 17.230263 0.000999 sewage {*single exp 53*}
LOWER IN effluent 17.230263 0.000999 LOWER IN effluent
influent {*single exp 53*} 17.230263 0.000999 influent {*single exp 53*}
lower in wastewater plant effluent compared to influent and sewer in south america ( high in sewage influent compared to effluent in city wastewater treatment plant south america 17.230263 0.000999 lower in wastewater plant effluent compared to...
wastewater treatment plant 17.452303 0.000999 wastewater treatment plant
LOWER IN finland 17.629934 0.000999 LOWER IN finland
hadza {*single exp 190*} 17.763158 0.000999 hadza {*single exp 190*}
tanzania {*single exp 190*} 17.763158 0.000999 tanzania {*single exp 190*}
egypt {*single exp 62*} 18.118421 0.000999 egypt {*single exp 62*}
venezuela {*single exp 75*} 19.184211 0.000999 venezuela {*single exp 75*}
amerindian {*single exp 75*} 19.184211 0.000999 amerindian {*single exp 75*}
common feces, homo sapiens, venezuela, amerindian, hunter gatherer, 19.184211 0.000999 common feces, homo sapiens, venezuela, amerin...
common feces, homo sapiens, adult, peru, small village, tunapuco, rural community, 19.184211 0.000999 common feces, homo sapiens, adult, peru, smal...
south america {*single exp 53*} 20.250000 0.000999 south america {*single exp 53*}
common feces, homo sapiens, city, el salvador, small village, 20.605263 0.000999 common feces, homo sapiens, city, el salvador...
tunapuco {*single exp 276*} 21.315789 0.000999 tunapuco {*single exp 276*}
peru {*single exp 276*} 21.315789 0.000999 peru {*single exp 276*}
rural community 21.937500 0.000999 rural community
el salvador {*single exp 53*} 22.026316 0.000999 el salvador {*single exp 53*}
hunter gatherer 22.026316 0.000999 hunter gatherer
high in adult compared to infant age < 1 year in feces homo sapiens india 22.470395 0.000999 high in adult compared to infant age < 1 yea...
LOWER IN infant 23.269737 0.000999 LOWER IN infant
small village 23.758224 0.000999 small village

215 rows × 3 columns

Ignoring selected experiments already in dbBact

If our experiment is already in dbBact, or if there are other experiments in dbBact we do not want to include in the enrichment analysis, we can specify them using the ignore_exp=[expID,...] parameter.

In our case, the cfs experiment is already added to dbBact, so let’s ignore it’s annotations when doing the analysis. By looking at dbBact.org we know its experimentID is 12. Alternatively we can use ignore_exp=True to automatically detect the current experimentID if it exists in dbBact (using the data and mapping file md5 hash).

In [21]:
ax, enriched=dd.plot_diff_abundance_enrichment(term_type='combined', ignore_exp=[12])
2018-07-26 13:13:12 INFO removed 0 terms

Adding common dbBact terms to features (add_terms_to_features)

We can attach to each bacteria the most common dbBact term associated with it.

The terms are selected from all of the dbBact terms, or can be selected from a supplied list.

In [22]:
cfs=cfs.add_terms_to_features(dbname='dbbact',use_term_list=['feces','saliva','skin','mus musculus'])
2018-07-26 13:13:20 INFO Getting dbBact annotations for 1100 sequences, please wait...
2018-07-26 13:13:32 INFO Got 24053 annotations
2018-07-26 13:13:32 INFO Added annotation data to experiment. Total 2151 annotations, 1100 terms
In [23]:
tt=cfs.sort_by_metadata('common_term',axis='feature')
In [24]:
tt.plot(sample_field='Subject', feature_field='common_term', gui='jupyter')
Out[24]:
<calour.heatmap.plotgui_jupyter.PlotGUI_Jupyter at 0x1a1cbe0e48>

Get enriched terms using all bacteria

Instead of just comparing the bacteria enriched in the two groups (and then comparing terms between them), we can do a weighted term average for each group using all bacteria (weighing the terms of each bacteria by its’ frequency in the sample). This can work if we don’t have a strong set of bacteria separating between the two groups.

In [25]:
dbbact=ca.database._get_database_class('dbbact')
In [32]:
enriched=dbbact.sample_enrichment(cfs,'Subject','Control','Patient',
                                  term_type='combined',ignore_exp=[12])
2018-07-26 13:17:22 INFO 87 samples with both values
2018-07-26 13:17:22 WARNING Do you forget to normalize your data? It is required before running this function
2018-07-26 13:17:22 INFO After filtering, 2704 remaining
2018-07-26 13:17:22 INFO 39 samples with value 1 (['Control'])
2018-07-26 13:17:24 INFO method meandiff. number of higher in ['Control'] : 455. number of higher in ['Patient'] : 51. total 506
In [27]:
enriched.feature_metadata
Out[27]:
term num_features _calour_diff_abundance_effect _calour_diff_abundance_pval _calour_diff_abundance_group
enzyme supplement enzyme supplement 20 -1.467864 0.000999 Patient
-no enzyme supplement -no enzyme supplement 20 -1.252388 0.000999 Patient
high in EPI dogs with enzyme supplement compared to no supplement ( high in enzyme supplement compared to no enzyme supplement in feces united states of america exocrine pancreatic insufficiency canis lupus familiaris dog high in EPI dogs with enzyme supplement compar... 20 -1.252388 0.000999 Patient
-gastric bypass -gastric bypass 4 -1.009475 0.000999 Patient
lower in people with Roux-en-Y gastric bypass compared to controls ( high in control compared to gastric bypass in feces homo sapiens united states of america lower in people with Roux-en-Y gastric bypass ... 4 -1.009475 0.000999 Patient
-physical activity -physical activity 49 -0.964541 0.001998 Patient
higher in individuals with low physical activity ( high in little physical activity compared to physical activity in feces homo sapiens united states of america higher in individuals with low physical activi... 49 -0.964541 0.001998 Patient
little physical activity little physical activity 49 -0.931222 0.002997 Patient
high in children with Crohn's disease compared to healthy adult controls ( high in crohn's disease child obsolete_juvenile stage compared to control adult in feces homo sapiens glasgow high in children with Crohn's disease compared... 53 -0.874353 0.000999 Patient
-age 30-40 -age 30-40 16 -0.832852 0.000999 Patient
high in infant age 1 year compared to adult age 30-40 in feces homo sapiens kingdom of norway oslo high in infant age 1 year compared to adult ... 16 -0.832852 0.000999 Patient
salmune vaccination salmune vaccination 12 -0.821224 0.001998 Patient
-salmune vaccination -salmune vaccination 28 -0.812220 0.001998 Patient
-vaccination -vaccination 28 -0.812220 0.001998 Patient
higher in non-vaccinated chickens ( high in control compared to vaccination salmune vaccination in united states of america caecum gallus gallus chicken higher in non-vaccinated chickens ( high in co... 28 -0.812220 0.001998 Patient
pulsed antibiotic treatment, macrolide tylosin tartrate pulsed antibiotic treatment, macrolide tylosin... 9 -0.771024 0.004995 Patient
exocrine pancreatic insufficiency exocrine pancreatic insufficiency 33 -0.728781 0.002997 Patient
highfreq feces, acinonyx jubatus, namibia, highfreq feces, acinonyx jubatus, namibia, 7 -0.661063 0.000999 Patient
common united states of america, caecum, gallus gallus, chicken, age 14-28 days, common united states of america, caecum, gall... 23 -0.626557 0.001998 Patient
higher in stroke patients compared to healthy controls ( high in stroke compared to control in feces homo sapiens china adult guangzhou city prefecture higher in stroke patients compared to healthy ... 31 -0.610551 0.001998 Patient
high in old (14-28 days) compared to young (0-3 day) chickens ( high in age old age compared to young age in united states of america caecum gallus gallus chicken high in old (14-28 days) compared to young (0-... 36 -0.601091 0.003996 Patient
high in control compared to diarrhea in feces felis catus state of california high in control compared to diarrhea in fec... 10 -0.592252 0.006993 Patient
higher in babies from finland compared to estonia ( high in finland compared to estonia in feces homo sapiens infant age < 3 years higher in babies from finland compared to esto... 26 -0.589219 0.012987 Patient
canis mesomelas canis mesomelas 33 -0.569207 0.001998 Patient
smj: higher in female mice feces treated with antibiotics ( high in antibiotic pulsed antibiotic treatment, macrolide tylosin tartrate compared to control in feces united states of america female research facility mus musculoides nyulmc nod/shiltj (no. 001976, jackson labs) smj: higher in female mice feces treated with ... 9 -0.563986 0.011988 Patient
common feces, namibia, canis mesomelas, common feces, namibia, canis mesomelas, 33 -0.559916 0.002997 Patient
acinonyx jubatus acinonyx jubatus 17 -0.553638 0.002997 Patient
-dust day -dust day 21 -0.550642 0.012987 Patient
higher in dust storm compared to clear day in israel air ( high in clear day compared to dust day in air dust israel size < 10um higher in dust storm compared to clear day in ... 21 -0.550642 0.012987 Patient
namibia namibia 40 -0.538952 0.001998 Patient
... ... ... ... ... ...
peru peru 234 0.851086 0.000999 Control
tunapuco tunapuco 229 0.856866 0.000999 Control
-tibetan pig -tibetan pig 26 0.866270 0.001998 Control
-tibetan swine -tibetan swine 26 0.866270 0.001998 Control
high in sus scrofa pig compared to tibetan pig tibetan swine in china farm caecum tibet autonomous region cecal content high in sus scrofa pig compared to tibetan p... 26 0.866270 0.001998 Control
high in male compared to female in feces homo sapiens united states of america high in male compared to female in feces ho... 129 0.878053 0.000999 Control
lower in lean participants in human feces ( high in high bmi compared to low bmi in feces homo sapiens united states of america adult lower in lean participants in human feces ( hi... 25 0.914211 0.001998 Control
lower in babies from finland compared to estonia ( high in estonia compared to finland in feces homo sapiens infant age < 3 years lower in babies from finland compared to eston... 110 0.943118 0.000999 Control
lower in small intestine compared to colon in pigs ( high in caecum left colon right colon compared to duodenum jejunum ileum in sus scrofa united kingdom pig lower in small intestine compared to colon in ... 137 0.962579 0.000999 Control
-irritable bowel syndrome -irritable bowel syndrome 65 0.969604 0.000999 Control
high in control compared to irritable bowel syndrome in feces homo sapiens adult kingdom of spain high in control compared to irritable bowel ... 65 0.969604 0.000999 Control
-united states of america -united states of america 195 0.979835 0.002997 Control
-camp hukamako -camp hukamako 6 0.993227 0.000999 Control
lower in Hadza camp Hukamako compared to hadza camp Sengeli ( high in camp sengeli compared to camp hukamako in feces homo sapiens tanzania hunter gatherer hadza lower in Hadza camp Hukamako compared to hadza... 6 0.993227 0.000999 Control
common feces, ethiopia, monkey, theropithecus gelada, common feces, ethiopia, monkey, theropithecus... 17 1.005345 0.001998 Control
plant based diet plant based diet 4 1.033398 0.000999 Control
-little physical activity -little physical activity 84 1.051961 0.001998 Control
higher in individuals with high physical activity ( high in physical activity compared to little physical activity in feces homo sapiens united states of america higher in individuals with high physical activ... 84 1.051961 0.001998 Control
high in male compared to female in feces homo sapiens toronto high in male compared to female in feces ho... 11 1.053019 0.000999 Control
physical activity physical activity 84 1.072794 0.001998 Control
-city -city 184 1.084257 0.000999 Control
highfreq caecum, left colon, right colon, sus scrofa, united kingdom, pig, highfreq caecum, left colon, right colon, sus... 10 1.108316 0.000999 Control
hiv infection hiv infection 67 1.112100 0.000999 Control
-state of oklahoma -state of oklahoma 177 1.132083 0.000999 Control
high in peru small village tunapuco rural community compared to united states of america city state of oklahoma in feces homo sapiens adult high in peru small village tunapuco rural com... 177 1.132083 0.000999 Control
camp sengeli camp sengeli 6 1.177868 0.000999 Control
-msw -msw 82 1.255350 0.000999 Control
-heterosexual -heterosexual 82 1.255350 0.000999 Control
higher in gay (msm) individuals compared to heterosexual (msw) ( high in gay homosexual msm compared to heterosexual msw in feces homo sapiens united states of america state of colorado denver higher in gay (msm) individuals compared to he... 82 1.255350 0.000999 Control
high in hiv infection compared to control in feces homo sapiens united states of america high in hiv infection compared to control i... 65 1.518273 0.000999 Control

516 rows × 5 columns

In [ ]: