Binary Prediction

Prediction of binary classes such as tumor vs. normal patients.

Elastic Net regression with nested cross validation module.

This workflow trains an elastic net model for a binary classification task (e.g., tumor vs. normal patients). The training is conducted using a nested cross validation approach (the number of cross validation in both loops can be selected). The model used can be easily changed since most of the models in scikit-learn (the machine learning library used by this package) required the same input.

pathway_forte.prediction.binary.ssgsea_nes_to_df(ssgsea_scores_csv, classes_file, removed_random=None)[source]

Create dataFrame of Normalized Enrichment Scores (NES) from ssGSEA of TCGA expression data.

Parameters
  • ssgsea_scores_csv – Text file containing normalized ES for pathways from each sample

  • test_size – Default test size is 0.25

  • removed_random (Optional[int]) – Remove percentage of df

pathway_forte.prediction.binary.get_l1_ratios()[source]

Return a list of values that are used by the elastic net as hyperparameters.

pathway_forte.prediction.binary.train_elastic_net_model(x, y, outer_cv_splits, inner_cv_splits, l1_ratio, model_name, max_iter=None, export=True)[source]

Train elastic net model via a nested cross validation given expression data.

Uses a defined hyperparameter space for l1_ratio.

Parameters
  • x (numpy.array) – 2D matrix of pathway scores and samples

  • y (list) – class labels of samples

  • outer_cv_splits (int) – number of folds for cross validation split in outer loop

  • inner_cv_splits (int) – number of folds for cross validation split in inner loop

  • l1_ratio (List[float]) – list of hyper-parameters for l1 and l2 priors

  • model_name (str) – name of the model

  • max_iter (Optional[int]) – default to 1000 to ensure convergence

  • export (bool) – Export the models using joblib

Return type

Tuple[List[float], List[float]]

Returns

A list of AUC-ROC scores