calour.io.read_amplicon

calour.io.read_amplicon(data_file, sample_metadata_file=None, *, min_reads, normalize, **kwargs)[source]

Load an amplicon experiment.

Fix taxonomy, normalize reads, and filter low abundance samples. This wraps read(). Also convert feature metadata index (sequences) to upper case

Parameters:
  • sample_metadata_file (None or str, optional) – None (default) to just use samplenames (no additional metadata).
  • min_reads (int or None) – remove all samples with less than min_reads. None to keep all samples
  • normalize (int or None) – normalize each sample to the specified count. None to not normalize
Keyword Arguments:
 
  • data_file (str) – file path to the biom table.
  • sample_metadata_file (None or str, optional) – None (default) to just use sample names (no additional metadata). if not None, file path to the sample metadata (aka mapping file in QIIME).
  • feature_metadata_file (None or str, optional) – file path to the feature metadata.
  • description (str) – description of the experiment
  • sparse (bool) – read the biom table into sparse or dense array
  • data_file_type (str, optional) – the data_file format. options: ‘biom’ : a biom table (biom-format.org) (default) ‘tsv’: a tab-separated table with (samples in column and feature in row) ‘openms’ : an OpenMS bucket table csv (rows are feature, columns are samples) ‘openms_transpose’ an OpenMS bucket table csv (columns are feature, rows are samples) ‘gnps_ms’ : an OpenMS bucket table tsv with samples as columns (exported from GNPS) ‘qiime2’ : a qiime2 biom table artifact (need to have qiime2 installed)
  • feature_metadata_kwargs (sample_metadata_kwargs,) – keyword arguments passing to pandas.read_table() when reading sample metadata or feature metadata. For example, you can set sample_metadata_kwargs={'dtype': {'ph': int}, 'encoding': 'latin-8'} to read the column of ph in the sample metadata as int and parse the file as latin-8 instead of utf-8. By default, it assumes the first column in the metadata files is sample/feature IDs and is read in as row index. To avoid this, please provide {‘index_col’: False}.
  • cls (class, optional) – what class object to read the data into (Experiment by default)
  • table_sample_id_proc (None or callable, optional) –
  • table_feature_id_proc (None or callable, optional) – if not None, modify each sample/feature id in the table using the callable function. The callable accepts a list of str and returns a list of str (sample/feature ids after processing). Useful in metabolomics experiments, where the sampleIDs in the data table contain additional information compared to the mapping file (using a ‘_’ separator), and this needs to be removed in order to sync the sampleIDs between table and mapping file.
  • sample_in_row (bool, optional) – False if data table columns are sample, True if rows are samples
  • normalize (int or None) – normalize each sample to the specified read count. None to not normalize
Returns:

after removing low read sampls and normalizing

Return type:

AmpliconExperiment

See also

read()