calour.experiment.Experiment.add_sample_metadata_as_features

Experiment.add_sample_metadata_as_features(fields, sparse=None, inplace=False)[source]

Add covariates from sample metadata to the data table as features for machine learning.

This will convert the columns of categorical strings using one-hot encoding scheme and add them into the data table as new features.

Note

This is only for numeric and/or nominal covariates in sample metadata. If you want to add a ordinal column as a feature, use pandas.Series.map to convert ordinal column to numeric column first.

Examples

>>> exp = Experiment(np.array([[1,2], [3, 4]]), sparse=False,
...                  sample_metadata=pd.DataFrame({'category': ['A', 'B'],
...                                                'ph': [6.6, 7.7]},
...                                               index=['s1', 's2']),
...                  feature_metadata=pd.DataFrame({'motile': ['y', 'n']}, index=['otu1', 'otu2']))
>>> exp
Experiment with 2 samples, 2 features

Let’s add the columns of category and ph as features into data table:

>>> new = exp.add_sample_metadata_as_features(['category', 'ph'])
>>> new
Experiment with 2 samples, 5 features
>>> new.feature_metadata
           motile
category=A    NaN
category=B    NaN
ph            NaN
otu1            y
otu2            n
>>> new.data  
array([[1. , 0. , 6.6, 1. , 2. ],
       [0. , 1. , 7.7, 3. , 4. ]])
Parameters:
  • fields (list of str) – the column names in the sample metadata. These columns will be converted to one-hot numeric code and then concatenated to the data table
  • sparse (bool or None, optional) – use sparse or dense data matrix. When it is None, it will follow the same sparsity of the current data table in the Experiment object
  • inplace (bool) – change the Experiment object in place or return a copy of changed.
Returns:

Return type:

Experiment

See also

sklearn.preprocessing.OneHotEncoder()