calour.training.classify¶

calour.training.classify(exp: calour.experiment.Experiment, fields, estimator, cv=<sklearn.model_selection._split.RepeatedStratifiedKFold object>, predict='predict_proba', params=None)[source]¶

Evaluate classification during cross validation.

Note

This function is also available as a class method Experiment.classify()

Parameters:

Parameters:	exp (Experiment) – Input experiment object. fields (str or list of str) – column name(s) in the sample metadata, which contains the classes we want to predict. If it is a list of str, this function does multi-task (aka multioutput-multiclass) classification and you must provide an estimator of multi-task classifier. See http://scikit-learn.org/stable/modules/multiclass.html for more information. estimator (estimator object implementing fit and predict) – scikit-learn estimator. e.g. `sklearn.ensemble.RandomForestClassifer` cv (int, cross-validation generator or an iterable) – similar to the cv parameter in `sklearn.model_selection.GridSearchCV` predict ({'predict', 'predict_proba'}) – the function used to predict the validation sets. Some estimators have both functions to predict class or predict the probablity of each class for a sample. For example, see `sklearn.ensemble.RandomForestClassifier` params (dict of string to sequence, or sequence of such) – For example, the output of `sklearn.model_selection.ParameterGrid` or `sklearn.model_selection.ParameterSampler`. By default, it uses whatever default parameters of the estimator set in scikit-learn
Yields:	pandas.DataFrame – The result of prediction per sample for a given parameter set. It contains the following columns: Y_TRUE: the true class for the samples SAMPLE: sample IDs CV: which split of the cross validation Y_PRED: the predicted class for the samples (if “predict”) mutliple columns with each contain probabilities predicted as each class (if “predict_proba”)

exp (Experiment) – Input experiment object.
fields (str or list of str) – column name(s) in the sample metadata, which contains the classes we want to predict. If it is a list of str, this function does multi-task (aka multioutput-multiclass) classification and you must provide an estimator of multi-task classifier. See http://scikit-learn.org/stable/modules/multiclass.html for more information.
estimator (estimator object implementing fit and predict) – scikit-learn estimator. e.g. sklearn.ensemble.RandomForestClassifer
cv (int, cross-validation generator or an iterable) – similar to the cv parameter in sklearn.model_selection.GridSearchCV
predict ({'predict', 'predict_proba'}) – the function used to predict the validation sets. Some estimators have both functions to predict class or predict the probablity of each class for a sample. For example, see sklearn.ensemble.RandomForestClassifier
params (dict of string to sequence, or sequence of such) – For example, the output of sklearn.model_selection.ParameterGrid or sklearn.model_selection.ParameterSampler. By default, it uses whatever default parameters of the estimator set in scikit-learn

Yields:

pandas.DataFrame – The result of prediction per sample for a given parameter set. It contains the following columns:

Y_TRUE: the true class for the samples
SAMPLE: sample IDs
CV: which split of the cross validation
Y_PRED: the predicted class for the samples (if “predict”)
mutliple columns with each contain probabilities predicted as each class (if “predict_proba”)