calour.amplicon_experiment.AmpliconExperiment

class calour.amplicon_experiment.AmpliconExperiment(*args, **kwargs)[source]

Bases: calour.experiment.Experiment

This class stores amplicon data and associated metadata.

This is a child class of Experiment

Parameters:
Variables:
  • data (numpy.ndarray or scipy.sparse.csr_matrix) – The abundance table for OTUs, metabolites, genes, etc. Samples are in row and features in column
  • sample_metadata (pandas.DataFrame) – The metadata on the samples
  • feature_metadata (pandas.DataFrame) – The metadata on the features
  • exp_metadata (dict) – metadata about the experiment (data md5, filenames, etc.)
  • shape (tuple of (int, int)) – the dimension of data
  • sparse (bool) – store the data as sparse matrix (scipy.sparse.csr_matrix) or dense numpy array.
  • description (str) – name of the experiment

Methods

copy.deepcopy(ae) Implement the deepcopy since pandas has problem deepcopy empty dataframe
ae1 == ae2 Check equality.
ae[k] Get the abundance at (sampleid, featureid)
ae1 != ae2 Return self!=value.
repr(ae) Return a string representation of this object.
add_sample_metadata_as_features(fields[, …]) Add covariates from sample metadata to the data table as features for machine learning.
add_terms_to_features(dbname[, …]) Add a field to the feature metadata, with most common term for each feature
aggregate_by_metadata(field[, agg, axis, …]) aggregate all samples/features that have the same value in the given field.
binarize([threshold, inplace]) Binarize the data with a threshold.
center_log_ratio([method, centralize, inplace]) Performs a clr transform to normalize each sample.
classify(fields, estimator[, cv, predict, …]) Evaluate classification during cross validation.
cluster_data([transform, axis, metric, inplace]) Cluster the samples/features.
cluster_features([min_abundance, inplace]) Cluster features.
collapse_taxonomy([level, inplace]) Collapse all features sharing the same taxonomy up to level into a single feature
copy() Copy the object (deeply).
correlation(field[, method, nonzero, …]) Find features with correlation to a numeric metadata field
diff_abundance(field, val1[, val2, method, …]) Differential abundance test between 2 groups of samples for all the features.
diff_abundance_kw(field[, transform, …]) Test the differential expression between multiple sample groups using the Kruskal Wallis test.
downsample(field[, axis, num_keep, inplace, …]) Downsample the data set.
enrichment(features, dbname, *args, **kwargs) Get the list of enriched annotation terms in features compared to all other features in exp.
export_html([sample_field, feature_field, …]) Export an interactive html heatmap for the experiment.
filter_abundance([cutoff]) Filter features with sum abundance across all samples less than the cutoff.
filter_by_data(predicate[, axis, field, …]) Filter samples or features by the data matrix.
filter_by_metadata(field, select[, axis, …]) Filter samples or features by metadata.
filter_fasta(filename[, negate, inplace]) Filter features from experiment based on fasta file
filter_ids(ids[, axis, negate, inplace]) Filter samples or features based on a list IDs.
filter_mean_abundance([cutoff, field]) Filter features with a mean at least cutoff of the mean total abundance/sample
filter_orig_reads(minreads, **kwargs) Filter keeping only samples with >= minreads in the original reads column Note this function uses the _calour_original_abundance field rather than the current sum of sequences per sample.
filter_prevalence(fraction[, cutoff, field]) Filter features keeping only ones present in more than certain fraction of all samples.
filter_sample_categories(field[, …]) Filter sample categories that have too few samples.
filter_samples(field, values[, negate, inplace]) Shortcut for filtering samples.
filter_samples_([cutoff, inplace])
filter_taxonomy(values[, negate, inplace, …]) filter keeping only observations with taxonomy string matching taxonomy
find_lowest_taxonomy([field, new_field]) Create a new column that contains the taxonomy of lowest possible level.
from_pandas(df[, exp]) Convert a Pandas DataFrame into an experiment.
get_data([sparse, copy]) Get the data as a 2d array
heatmap([sample_field, feature_field, …]) Plot a heatmap for the experiment.
join_experiments(other[, field_name, prefixes]) Combine two Experiment objects into one.
join_experiments_featurewise(other[, …]) Combine two Experiment objects into one.
join_metadata_fields(field1, field2[, …]) Join two sample/feature metadata fields into a single new field
learning_curve_depths(field[, groups, …]) Compute the learning curve with regarding to sequencing depths.
log_n([n, inplace]) Log transform the data
normalize([total, axis, inplace]) Normalize the sum of each sample (axis=0) or feature (axis=1) to sum total
normalize_by_subset_features(features[, …]) Normalize each sample by their total sums without a list of features
normalize_compositional([min_frac, total, …]) Normalize each sample by ignoring the features with mean>=min_frac in all the experiment
plot([title, barx_fields, barx_width, …]) Plot the interactive heatmap and its associated axes.
plot_abund_prevalence(field[, log, …]) Plot abundance against prevalence.
plot_core_features([field, steps, cutoff, …]) Plot the percentage of core features shared in increasing number of samples.
plot_diff_abundance_enrichment([max_show, …]) Plot the term enrichment of differentially abundant bacteria
plot_enrichment(enriched[, max_show, …]) Plot a horizontal bar plot for enriched terms
plot_feature_matrix(fields, feature_ids[, …]) This plots an array of scatter plots between each features against the specified sample metadata.
plot_hist([ax]) Plot histogram of all the values in data.
plot_stacked_bar([field, sample_color_bars, …]) Plot the stacked bar for feature abundances.
random_permute_data([normalize]) Shuffle independently the reads of each feature
regress(field, estimator[, cv, params]) Evaluate regression during cross validation.
reorder(new_order[, axis, inplace]) Reorder according to indices in the new order.
rescale([total, axis, inplace]) Rescale the data to mean sum of all samples (axis=0) or features (axis=1) to be total.
save(prefix[, fmt]) Save the experiment data to disk.
save_biom(f[, fmt, add_metadata]) Save experiment to biom format
save_fasta(f[, seqs]) Save a list of sequences to fasta.
save_metadata(f[, axis]) Save sample/feature metadata to file.
scale([axis, inplace]) Standardize a dataset along an axis
sort_abundance([subgroup]) Sort features based on their abundance in a subset of the samples.
sort_by_data([axis, subset, key, inplace, …]) Sort features based on their mean frequency.
sort_by_metadata(field[, axis, inplace]) Sort samples or features based on metadata values in the field.
sort_centroid([transform, inplace]) Sort the features based on the center of mass
sort_ids(ids[, axis, inplace]) Sort the features or samples by the given ids.
sort_samples(field, **kwargs) Sort samples by field A convenience function for sort_by_metadata
sort_taxonomy([inplace]) Sort the features based on the taxonomy
split_taxonomy([field, sep, names]) Split taxonomy column into individual column per level.
split_train_test(test_size[, train_size, …]) Split experiment into train experiment and test experiment.
subsample_count(total[, replace, inplace, …]) Randomly subsample each sample to the same number of counts.
to_pandas([sample_field, feature_field, sparse]) Get a pandas dataframe of the abundances Samples are rows, features are columns.
transform([steps, inplace]) Chain transformations together.

Attributes

shape
sparse