calour.analysis.diff_abundance¶

calour.analysis.diff_abundance(exp: calour.experiment.Experiment, field, val1, val2=None, method='meandiff', transform='rankdata', numperm=1000, alpha=0.1, fdr_method='dsfdr', random_seed=None)[source]¶

Differential abundance test between 2 groups of samples for all the features.

It uses permutation based nonparametric test and then applies multiple hypothesis correction. The idea is that you compute a defined statistic and compare it to the distribution of the same statistic values computed from many permutations.

Note

This function is also available as a class method Experiment.diff_abundance()

Parameters:

Parameters:	exp (Experiment) – Input experiment object. field (str) – The field from sample metadata to group samples. val1 (str or list of str) – The values in the field column for the first group. val2 (str or list of str or None (optional)) – The values in the field column to select the second group. None (default) to compare to all other samples (excluding val1). method (str or function) – the method to compute a statistic. options: ’meandiff’ : mean(A)-mean(B) ’stdmeandiff’ : (mean(A)-mean(B))/(std(A)+std(B)) callable : use this to calculate the statistic (input is data,labels, output is array of float) transform (str or None) – transformation to apply to the data before caluculating the statistic. ’rankdata’ : rank transfrom each OTU reads ’log2data’ : calculate log2 for each OTU using minimal cutoff of 2 ’normdata’ : normalize the data to constant sum per samples ’binarydata’ : convert to binary absence/presence alpha (float (optional)) – the desired FDR control level numperm (int (optional)) – number of permutations to perform fdr_method (str (optional)) – The method used to control the False Discovery Rate. options are: ’dsfdr’ : the discrete FDR control method ’bhfdr’ : Benjamini-Hochberg FDR method ’byfdr’ : Benjamini-Yekutielli FDR method ’filterBH’ : Benjamini-Hochberg FDR method following removal of all features with minimal possible p-value less than alpha (e.g. a feature that appears in only 1 sample can obtain a minimal p-value of 0.5 and will therefore be removed when say alpha=0.1) random_seed (int or None (optional)) – int to set the numpy random seed to this number before running the random permutation test. None to not set the numpy random seed
Returns:	A new experiment with only significant (FDR <= maxfval) difference, sorted according to the effect size. The new experiment contains additional feature_metadata_fields that include: ’_calour_diff_abundance_pval’ : the p-value for the feature, ’_calour_diff_abundance_effect’ : the effect size (t-statistic), ’_calour_diff_abundance_group’ : the value (in field) where the statistic is higher
Return type:	Experiment

exp (Experiment) – Input experiment object.
field (str) – The field from sample metadata to group samples.
val1 (str or list of str) – The values in the field column for the first group.
val2 (str or list of str or None (optional)) – The values in the field column to select the second group. None (default) to compare to all other samples (excluding val1).
method (str or function) –
the method to compute a statistic. options:
- ’meandiff’ : mean(A)-mean(B)
- ’stdmeandiff’ : (mean(A)-mean(B))/(std(A)+std(B))
- callable : use this to calculate the statistic (input is data,labels, output is array of float)
transform (str or None) –
transformation to apply to the data before caluculating the statistic.
- ’rankdata’ : rank transfrom each OTU reads
- ’log2data’ : calculate log2 for each OTU using minimal cutoff of 2
- ’normdata’ : normalize the data to constant sum per samples
- ’binarydata’ : convert to binary absence/presence
alpha (float (optional)) – the desired FDR control level
numperm (int (optional)) – number of permutations to perform
fdr_method (str (optional)) –
The method used to control the False Discovery Rate. options are:
- ’dsfdr’ : the discrete FDR control method
- ’bhfdr’ : Benjamini-Hochberg FDR method
- ’byfdr’ : Benjamini-Yekutielli FDR method
- ’filterBH’ : Benjamini-Hochberg FDR method following
removal of all features with minimal possible p-value less than alpha (e.g. a feature that appears in only 1 sample can obtain a minimal p-value of 0.5 and will therefore be removed when say alpha=0.1)
random_seed (int or None (optional)) – int to set the numpy random seed to this number before running the random permutation test. None to not set the numpy random seed

Returns:

A new experiment with only significant (FDR <= maxfval) difference, sorted according to the effect size.

The new experiment contains additional feature_metadata_fields that include:

’_calour_diff_abundance_pval’ : the p-value for the feature,
’_calour_diff_abundance_effect’ : the effect size (t-statistic),
’_calour_diff_abundance_group’ : the value (in field) where the statistic is higher

Return type:

Experiment