Functional Class Scoring

This module contain the functional class methods implemented in PathwayForte.

Currently this includes GSEA and ssGSEA.

pathway_forte.pathway_enrichment.functional_class.create_cls_file(gene_expression_file, normal_sample_file, tumor_sample_file, data)[source]

Create categorical (e.g. tumor vs sample) class file format (i.e., .cls) for input into GSEA.

Parameters
  • gene_expression_file (str) – Text file containing expression values for each gene from each sample.

  • normal_sample_file (str) –

  • tumor_sample_file (str) –

  • data

pathway_forte.pathway_enrichment.functional_class.run_gsea(gene_exp, gene_set, phenotype_class, permutations=500, output_dir='/home/docs/checkouts/readthedocs.org/user_builds/pathwayforte/checkouts/latest/data/results/gsea')[source]

Run GSEA on a given dataset with a given gene set.

Parameters
  • gene_exp (str) – file with gene expression data

  • gene_set (str) – gmt files containing pathway gene sets

  • phenotype_class (str) – cls file containing information on class labels

  • permutations (int) – number of permutations

  • output_dir (str) – output directory

Returns

pathway_forte.pathway_enrichment.functional_class.filter_gsea_results(gsea_results_path, source, kegg_manager=None, reactome_manager=None, wikipathways_manager=None, p_value=None, absolute_nes_filter=None, geneset_set_filter_minimum_size=None, geneset_set_filter_maximum_size=None)[source]

Get top and bottom rankings from GSEA results.

Parameters
  • gsea_results_path (str) – path to GSEA results in .tsv file format

  • source

  • kegg_manager (Optional[Manager]) – KEGG manager

  • reactome_manager (Optional[Manager]) – Reactome manager

  • wikipathways_manager (Optional[Manager]) – WikiPathways manager

  • p_value (Optional[float]) – maximum p value allowed

  • absolute_nes_filter (Optional[float]) – filter by magnitude of normalized enrichment scores

  • geneset_set_filter_minimum_size (Optional[int]) – filter to include a minimum number of genes in a gene set

  • geneset_set_filter_maximum_size (Optional[int]) – filter to include a maximum number of genes in a gene set

Return type

DataFrame

Returns

list of pathways ranked as having the highest and lowest significant enrichment scores

pathway_forte.pathway_enrichment.functional_class.merge_statistics(merged_pathways_df, dataset)[source]

Get statistics for pathways included in the merged gene sets dataFrame.

These include the proportion of pathways from each of the other databases and the proportion of pathways deriving from 2 or more primary resources

Parameters

merged_pathways_df (DataFrame) – dataFrame containing pathways from multiple databases

Returns

statistics of contents in merged dataset

pathway_forte.pathway_enrichment.functional_class.rearrange_df_columns(df)[source]

Rearrange order of columns.

Return type

DataFrame

pathway_forte.pathway_enrichment.functional_class.get_pathway_names(database, pathway_df, kegg_manager=None, reactome_manager=None, wikipathways_manager=None)[source]

Get pathway names from database specific pathway IDs.

Parameters
  • database (str) –

  • pathway_df (DataFrame) –

  • kegg_manager (Optional[Manager]) –

  • reactome_manager (Optional[Manager]) –

  • wikipathways_manager (Optional[Manager]) –

Returns

pathway_forte.pathway_enrichment.functional_class.pathway_names_to_df(filtered_gsea_results_df, all_pathway_ids, source, kegg_manager=None, reactome_manager=None, wikipathways_manager=None)[source]

Get pathway names.

Parameters
  • filtered_gsea_results_df

  • all_pathway_ids – list of pathway IDs

  • source – pathway source (i.e., database name or ‘MPath’)

  • kegg_manager (Optional[Manager]) – KEGG manager

  • reactome_manager (Optional[Manager]) – Reactome manager

  • wikipathways_manager (Optional[Manager]) – WikiPathways manager

Return type

DataFrame

pathway_forte.pathway_enrichment.functional_class.gsea_results_to_filtered_df(dataset, kegg_manager=None, reactome_manager=None, wikipathways_manager=None, p_value=None, absolute_nes_filter=None, geneset_set_filter_minimum_size=None, geneset_set_filter_maximum_size=None)[source]

Get filtered GSEA results dataFrames.

pathway_forte.pathway_enrichment.functional_class.get_pathways_by_resource(pathways, resource)[source]

Return pathways by resource.

Return type

list

pathway_forte.pathway_enrichment.functional_class.get_analogs_comparison_numbers(kegg_reactome_pathway_df, reactome_wikipathways_pathway_df, wikipathways_kegg_pathway_df, *, pathway_column='pathway_id')[source]

Get number of existing versus expected pairwise mappings.

pathway_forte.pathway_enrichment.functional_class.get_pairwise_mapping_numbers(kegg_pathway_df, reactome_pathway_df, wikipathways_pathway_df)[source]

Get number of existing versus expected pairwise mappings.

pathway_forte.pathway_enrichment.functional_class.get_pairwise_mappings(kegg_pathway_df, reactome_pathway_df, wikipathways_pathway_df)[source]

Get pairwise mappings.

pathway_forte.pathway_enrichment.functional_class.compare_database_results(df_1, resource_1, df_2, resource_2, mapping_dict, check_contradiction=False)[source]

Compare pathways in the dataframe from enrichment results to evaluate the concordance in similar pathways.

pathway_forte.pathway_enrichment.functional_class.get_matching_pairs(df_1, resource_1, df_2, resource_2, equivalent_mappings_dict)[source]

Get equivalent pathways and their direction of change.

pathway_forte.pathway_enrichment.functional_class.run_ssgsea(filtered_expression_data, gene_set, output_dir='/home/docs/checkouts/readthedocs.org/user_builds/pathwayforte/checkouts/latest/data/results/ssgsea', processes=1, max_size=3000, min_size=15)[source]

Run single sample GSEA (ssGSEA) on filtered gene expression data set.

Parameters
  • filtered_expression_data (DataFrame) – filtered gene expression values for samples

  • gene_set (str) – .gmt file containing gene sets

  • output_dir (str) – output directory

Return type

SingleSampleGSEA

Returns

ssGSEA results in respective directory

pathway_forte.pathway_enrichment.functional_class.filter_gene_exp_data(expression_data, gmt_file)[source]

Filter gene expression data file to include only gene names which are found in the gene set files.

Parameters
  • expression_data (DataFrame) – gene expression values for samples

  • gmt_file (str) – .gmt file containing gene sets

Returns

Filtered gene expression data with genes with no correspondences in gene sets removed

Return type

pandas.core.frame.DataFrame kegg_xml_parser.py