Binary Prediction¶

Prediction of binary classes such as tumor vs. normal patients.

Elastic Net regression with nested cross validation module.

This workflow trains an elastic net model for a binary classification task (e.g., tumor vs. normal patients). The training is conducted using a nested cross validation approach (the number of cross validation in both loops can be selected). The model used can be easily changed since most of the models in scikit-learn (the machine learning library used by this package) required the same input.

pathway_forte.prediction.binary.ssgsea_nes_to_df(ssgsea_scores_csv, classes_file, removed_random=None)[source]¶

Create dataFrame of Normalized Enrichment Scores (NES) from ssGSEA of TCGA expression data.

Parameters

ssgsea_scores_csv – Text file containing normalized ES for pathways from each sample
test_size – Default test size is 0.25
removed_random (Optional[int]) – Remove percentage of df

pathway_forte.prediction.binary.get_l1_ratios()[source]¶: Return a list of values that are used by the elastic net as hyperparameters.

pathway_forte.prediction.binary.train_elastic_net_model(x, y, outer_cv_splits, inner_cv_splits, l1_ratio, model_name, max_iter=None, export=True)[source]¶

Train elastic net model via a nested cross validation given expression data.

Uses a defined hyperparameter space for l1_ratio.

Parameters

x (numpy.array) – 2D matrix of pathway scores and samples
y (list) – class labels of samples
outer_cv_splits (int) – number of folds for cross validation split in outer loop
inner_cv_splits (int) – number of folds for cross validation split in inner loop
l1_ratio (List[float]) – list of hyper-parameters for l1 and l2 priors
model_name (str) – name of the model
max_iter (Optional[int]) – default to 1000 to ensure convergence
export (bool) – Export the models using joblib

Return type

Tuple[List[float], List[float]]

Returns

A list of AUC-ROC scores