Parameters: |
- data_file (str) – The name of the data table (mzmine2 output/bucket table/biom table) containing the per-metabolite abundances.
- sample_metadata_file (str or None (optional)) –
None (default) to not load metadata per sample
str to specify name of sample mapping file (tsv).
Note: sample names in the bucket table and sample_metadata file must match. In case bucket table sample names contains additional
information, you can split them at the separator character (usually ‘_’), keeping only the first part, using the cut_sample_id_sep=’_’ parameter
(see below)
- gnps_file (str or None (optional)) – name of the gnps clusterinfosummarygroup_attributes_withIDs_arbitraryattributes/XXX.tsv file, for use with the ‘gnps’ database.
This enables identification of the metabolites with known MS2 (for the interactive heatmap and sorting/filtering etc), as well as linking
to the gnps page for each metabolite (from the interactive heatmap - by double clicking on the metabolite database information).
Note: requires gnps-calour database interface module (see Calour installation instructions for details).
- feature_metadata_file (str or None (optional)) – Name of table containing additional metadata about each feature
None (default) to not load
- data_file_type (str, optional) –
the data file format. options include:
- ’mzmine2’: load the mzmine2 output csv file.
- MZ and RT are obtained from this file.
GNPS linking is direct via the unique id column.
table is csv, columns are samples.
- ’biom’: load a biom table for the features
- MZ and RT are obtained via the featureID (first column), which is assumed to be MZ_RT.
GNPS linking is indirect via the mz and rt threshold windows.
table is a tsv/json/hdf5 biom table, columns are samples.
- ’openms’: load an openms output table
- MZ and RT are obtained via the featureID (first column), which is assumed to be MZ_RT.
GNPS linking is indirect via the mz and rt threshold windows.
table is a csv table, columns are samples.
- ’gnps-ms2’: load a gnps exported biom table
- MZ and RT are obtained via the gnps_file if available, otherwise are NA
GNPS linking is direct via the first column (featureID).
table is a tsv/json/hdf5 biom table, columns are samples.
- sample_in_row (bool or None, optional) – False indicates rows in the data table file are features, True indicates rows are samples.
None to use default value according to data_file_type
- direct_ids (bool or None, optional) – True indicates the feature ids in the data table file are the same ids used in the gnps_file.
False indicates feature ids are not the same as in the gnps_file (such as when the ids are the MZ_RT)
None to use default value according to data_file_type
- get_mz_rt_from_feature_id (bool or None, optional) – True indicates the data table file feature ids contain the MZ/RT of the feature.
False to not obtain MZ/RT from the feature id
None to use default value according to data_file_type
- use_gnps_id_from_AllFiles (bool, optional) – True (default) to link the data table file gnps ids to the AllFiles column in the gnps_file.
False to link the data table file gnps ids to the ‘cluster index’ column in the gnps_file.
- cut_sample_id_sep (str or None, optional) – str (typically ‘_’) to split the sampleID in the data table file, keeping only the first part.
Useful when the sampleIDs in the data table contain additional information compared to the
mapping file (using a ‘_’ separator), and this needs to be removed in order to sync the sampleIDs between table and mapping file.
None (default) to not change the data table file sampleID
- mz_rt_sep (str or None, optional) – The separator character between the MZ and RT parts of the featureID (if it contains them) (usually ‘_’).
If not supplied, autodetect the separator.
Note this is used only if get_mz_rt_from_feature_id=True
- mz_thresh (float, optional) – The tolerance for M/Z to match features to the gnps_file. Used only if parameter direct_ids=False.
- rt_thresh (float, optional) – The tolerance for retention time to match features to the gnps_file. Used only if parameter direct_ids=False.
- description (str or None (optional)) – Name of the experiment (for display purposes).
None (default) to assign file name
- sparse (bool (optional)) – False (default) to store data as dense matrix (faster but more memory)
True to store as sparse (CSR)
- normalize (int or None) – normalize each sample to the specified reads. None to not normalize
|
|
- data_file (str) – file path to the biom table.
- sample_metadata_file (None or str, optional) – None (default) to just use sample names (no additional metadata).
if not None, file path to the sample metadata (aka mapping file in QIIME).
- feature_metadata_file (None or str, optional) – file path to the feature metadata.
- description (str) – description of the experiment
- sparse (bool) – read the biom table into sparse or dense array
- data_file_type (str, optional) – the data_file format. options:
‘biom’ : a biom table (biom-format.org) (default)
‘tsv’: a tab-separated table with (samples in column and feature in row)
‘openms’ : an OpenMS bucket table csv (rows are feature, columns are samples)
‘openms_transpose’ an OpenMS bucket table csv (columns are feature, rows are samples)
‘gnps_ms’ : an OpenMS bucket table tsv with samples as columns (exported from GNPS)
‘qiime2’ : a qiime2 biom table artifact (need to have qiime2 installed)
- feature_metadata_kwargs (sample_metadata_kwargs,) – keyword arguments passing to
pandas.read_table() when reading sample metadata
or feature metadata. For example, you can set sample_metadata_kwargs={'dtype':
{'ph': int}, 'encoding': 'latin-8'} to read the column of ph in the sample metadata
as int and parse the file as latin-8 instead of utf-8. By default, it assumes the first column in
the metadata files is sample/feature IDs and is read in as row index. To avoid this, please provide
{‘index_col’: False}.
- cls (
class , optional) – what class object to read the data into (Experiment by default)
- table_sample_id_proc (None or callable, optional) –
- table_feature_id_proc (None or callable, optional) – if not None, modify each sample/feature id in the table using the callable function.
The callable accepts a list of str and returns a list of str (sample/feature ids after processing).
Useful in metabolomics experiments, where the sampleIDs in the data table contain additional information compared to the
mapping file (using a ‘_’ separator), and this needs to be removed in order to sync the sampleIDs between table and mapping file.
- sample_in_row (bool, optional) – False if data table columns are sample, True if rows are samples
- normalize (int or None) – normalize each sample to the specified read count.
None to not normalize
|