calour.amplicon_experiment.AmpliconExperiment¶

class calour.amplicon_experiment.AmpliconExperiment(*args, **kwargs)[source]¶

Bases: calour.experiment.Experiment

This class stores amplicon data and associated metadata.

This is a child class of Experiment

Parameters:

Parameters:	data (numpy.ndarray or scipy.sparse.csr_matrix) – The abundance table for OTUs, metabolites, genes, etc. Samples are in row and features in column sample_metadata (pandas.DataFrame) – The metadata on the samples feature_metadata (pandas.DataFrame) – The metadata on the features description (str) – name of experiment sparse (bool) – store the data array in `scipy.sparse.csr_matrix` or `numpy.ndarray`
Variables:	data (numpy.ndarray or scipy.sparse.csr_matrix) – The abundance table for OTUs, metabolites, genes, etc. Samples are in row and features in column sample_metadata (pandas.DataFrame) – The metadata on the samples feature_metadata (pandas.DataFrame) – The metadata on the features exp_metadata (dict) – metadata about the experiment (data md5, filenames, etc.) shape (tuple of (int, int)) – the dimension of data sparse (bool) – store the data as sparse matrix (scipy.sparse.csr_matrix) or dense numpy array. description (str) – name of the experiment

data (numpy.ndarray or scipy.sparse.csr_matrix) – The abundance table for OTUs, metabolites, genes, etc. Samples are in row and features in column
sample_metadata (pandas.DataFrame) – The metadata on the samples
feature_metadata (pandas.DataFrame) – The metadata on the features
description (str) – name of experiment
sparse (bool) – store the data array in scipy.sparse.csr_matrix or numpy.ndarray

Variables:

data (numpy.ndarray or scipy.sparse.csr_matrix) – The abundance table for OTUs, metabolites, genes, etc. Samples are in row and features in column
sample_metadata (pandas.DataFrame) – The metadata on the samples
feature_metadata (pandas.DataFrame) – The metadata on the features
exp_metadata (dict) – metadata about the experiment (data md5, filenames, etc.)
shape (tuple of (int, int)) – the dimension of data
sparse (bool) – store the data as sparse matrix (scipy.sparse.csr_matrix) or dense numpy array.
description (str) – name of the experiment

Methods

`copy.deepcopy(ae)`	Implement the deepcopy since pandas has problem deepcopy empty dataframe
`ae1 == ae2`	Check equality.
`ae[k]`	Get the abundance at (sampleid, featureid)
`ae1 != ae2`	Return self!=value.
`repr(ae)`	Return a string representation of this object.
`add_sample_metadata_as_features`(fields[, …])	Add covariates from sample metadata to the data table as features for machine learning.
`add_terms_to_features`(dbname[, …])	Add a field to the feature metadata, with most common term for each feature
`aggregate_by_metadata`(field[, agg, axis, …])	aggregate all samples/features that have the same value in the given field.
`binarize`([threshold, inplace])	Binarize the data with a threshold.
`center_log_ratio`([method, centralize, inplace])	Performs a clr transform to normalize each sample.
`classify`(fields, estimator[, cv, predict, …])	Evaluate classification during cross validation.
`cluster_data`([transform, axis, metric, inplace])	Cluster the samples/features.
`cluster_features`([min_abundance, inplace])	Cluster features.
`collapse_taxonomy`([level, inplace])	Collapse all features sharing the same taxonomy up to level into a single feature
`copy`()	Copy the object (deeply).
`correlation`(field[, method, nonzero, …])	Find features with correlation to a numeric metadata field
`diff_abundance`(field, val1[, val2, method, …])	Differential abundance test between 2 groups of samples for all the features.
`diff_abundance_kw`(field[, transform, …])	Test the differential expression between multiple sample groups using the Kruskal Wallis test.
`downsample`(field[, axis, num_keep, inplace, …])	Downsample the data set.
`enrichment`(features, dbname, args, *kwargs)	Get the list of enriched annotation terms in features compared to all other features in exp.
`export_html`([sample_field, feature_field, …])	Export an interactive html heatmap for the experiment.
`filter_abundance`([cutoff])	Filter features with sum abundance across all samples less than the cutoff.
`filter_by_data`(predicate[, axis, field, …])	Filter samples or features by the data matrix.
`filter_by_metadata`(field, select[, axis, …])	Filter samples or features by metadata.
`filter_fasta`(filename[, negate, inplace])	Filter features from experiment based on fasta file
`filter_ids`(ids[, axis, negate, inplace])	Filter samples or features based on a list IDs.
`filter_mean_abundance`([cutoff, field])	Filter features with a mean at least cutoff of the mean total abundance/sample
`filter_orig_reads`(minreads, **kwargs)	Filter keeping only samples with >= minreads in the original reads column Note this function uses the _calour_original_abundance field rather than the current sum of sequences per sample.
`filter_prevalence`(fraction[, cutoff, field])	Filter features keeping only ones present in more than certain fraction of all samples.
`filter_sample_categories`(field[, …])	Filter sample categories that have too few samples.
`filter_samples`(field, values[, negate, inplace])	Shortcut for filtering samples.
`filter_samples_`([cutoff, inplace])
`filter_taxonomy`(values[, negate, inplace, …])	filter keeping only observations with taxonomy string matching taxonomy
`find_lowest_taxonomy`([field, new_field])	Create a new column that contains the taxonomy of lowest possible level.
`from_pandas`(df[, exp])	Convert a Pandas DataFrame into an experiment.
`get_data`([sparse, copy])	Get the data as a 2d array
`heatmap`([sample_field, feature_field, …])	Plot a heatmap for the experiment.
`join_experiments`(other[, field_name, prefixes])	Combine two `Experiment` objects into one.
`join_experiments_featurewise`(other[, …])	Combine two `Experiment` objects into one.
`join_metadata_fields`(field1, field2[, …])	Join two sample/feature metadata fields into a single new field
`learning_curve_depths`(field[, groups, …])	Compute the learning curve with regarding to sequencing depths.
`log_n`([n, inplace])	Log transform the data
`normalize`([total, axis, inplace])	Normalize the sum of each sample (axis=0) or feature (axis=1) to sum total
`normalize_by_subset_features`(features[, …])	Normalize each sample by their total sums without a list of features
`normalize_compositional`([min_frac, total, …])	Normalize each sample by ignoring the features with mean>=min_frac in all the experiment
`plot`([title, barx_fields, barx_width, …])	Plot the interactive heatmap and its associated axes.
`plot_abund_prevalence`(field[, log, …])	Plot abundance against prevalence.
`plot_core_features`([field, steps, cutoff, …])	Plot the percentage of core features shared in increasing number of samples.
`plot_diff_abundance_enrichment`([max_show, …])	Plot the term enrichment of differentially abundant bacteria
`plot_enrichment`(enriched[, max_show, …])	Plot a horizontal bar plot for enriched terms
`plot_feature_matrix`(fields, feature_ids[, …])	This plots an array of scatter plots between each features against the specified sample metadata.
`plot_hist`([ax])	Plot histogram of all the values in data.
`plot_stacked_bar`([field, sample_color_bars, …])	Plot the stacked bar for feature abundances.
`random_permute_data`([normalize])	Shuffle independently the reads of each feature
`regress`(field, estimator[, cv, params])	Evaluate regression during cross validation.
`reorder`(new_order[, axis, inplace])	Reorder according to indices in the new order.
`rescale`([total, axis, inplace])	Rescale the data to mean sum of all samples (axis=0) or features (axis=1) to be total.
`save`(prefix[, fmt])	Save the experiment data to disk.
`save_biom`(f[, fmt, add_metadata])	Save experiment to biom format
`save_fasta`(f[, seqs])	Save a list of sequences to fasta.
`save_metadata`(f[, axis])	Save sample/feature metadata to file.
`scale`([axis, inplace])	Standardize a dataset along an axis
`sort_abundance`([subgroup])	Sort features based on their abundance in a subset of the samples.
`sort_by_data`([axis, subset, key, inplace, …])	Sort features based on their mean frequency.
`sort_by_metadata`(field[, axis, inplace])	Sort samples or features based on metadata values in the field.
`sort_centroid`([transform, inplace])	Sort the features based on the center of mass
`sort_ids`(ids[, axis, inplace])	Sort the features or samples by the given ids.
`sort_samples`(field, **kwargs)	Sort samples by field A convenience function for sort_by_metadata
`sort_taxonomy`([inplace])	Sort the features based on the taxonomy
`split_taxonomy`([field, sep, names])	Split taxonomy column into individual column per level.
`split_train_test`(test_size[, train_size, …])	Split experiment into train experiment and test experiment.
`subsample_count`(total[, replace, inplace, …])	Randomly subsample each sample to the same number of counts.
`to_pandas`([sample_field, feature_field, sparse])	Get a pandas dataframe of the abundances Samples are rows, features are columns.
`transform`([steps, inplace])	Chain transformations together.

Attributes

`shape`
`sparse`