Functional Class Scoring¶

This module contain the functional class methods implemented in PathwayForte.

Currently this includes GSEA and ssGSEA.

pathway_forte.pathway_enrichment.functional_class.create_cls_file(gene_expression_file, normal_sample_file, tumor_sample_file, data)[source]¶

Create categorical (e.g. tumor vs sample) class file format (i.e., .cls) for input into GSEA.

Parameters

gene_expression_file (str) – Text file containing expression values for each gene from each sample.
normal_sample_file (str) –
tumor_sample_file (str) –
data –

pathway_forte.pathway_enrichment.functional_class.run_gsea(gene_exp, gene_set, phenotype_class, permutations=500, output_dir='/home/docs/checkouts/readthedocs.org/user_builds/pathwayforte/checkouts/latest/data/results/gsea')[source]¶

Run GSEA on a given dataset with a given gene set.

Parameters

gene_exp (str) – file with gene expression data
gene_set (str) – gmt files containing pathway gene sets
phenotype_class (str) – cls file containing information on class labels
permutations (int) – number of permutations
output_dir (str) – output directory

Returns

pathway_forte.pathway_enrichment.functional_class.filter_gsea_results(gsea_results_path, source, kegg_manager=None, reactome_manager=None, wikipathways_manager=None, p_value=None, absolute_nes_filter=None, geneset_set_filter_minimum_size=None, geneset_set_filter_maximum_size=None)[source]¶

Get top and bottom rankings from GSEA results.

Parameters

gsea_results_path (str) – path to GSEA results in .tsv file format
source –
kegg_manager (Optional[Manager]) – KEGG manager
reactome_manager (Optional[Manager]) – Reactome manager
wikipathways_manager (Optional[Manager]) – WikiPathways manager
p_value (Optional[float]) – maximum p value allowed
absolute_nes_filter (Optional[float]) – filter by magnitude of normalized enrichment scores
geneset_set_filter_minimum_size (Optional[int]) – filter to include a minimum number of genes in a gene set
geneset_set_filter_maximum_size (Optional[int]) – filter to include a maximum number of genes in a gene set

Return type

DataFrame

Returns

list of pathways ranked as having the highest and lowest significant enrichment scores

pathway_forte.pathway_enrichment.functional_class.merge_statistics(merged_pathways_df, dataset)[source]¶

Get statistics for pathways included in the merged gene sets dataFrame.

These include the proportion of pathways from each of the other databases and the proportion of pathways deriving from 2 or more primary resources

Parameters: merged_pathways_df (DataFrame) – dataFrame containing pathways from multiple databases
Returns: statistics of contents in merged dataset

pathway_forte.pathway_enrichment.functional_class.rearrange_df_columns(df)[source]¶

Rearrange order of columns.

Return type: DataFrame

pathway_forte.pathway_enrichment.functional_class.get_pathway_names(database, pathway_df, kegg_manager=None, reactome_manager=None, wikipathways_manager=None)[source]¶

Get pathway names from database specific pathway IDs.

Parameters

database (str) –
pathway_df (DataFrame) –
kegg_manager (Optional[Manager]) –
reactome_manager (Optional[Manager]) –
wikipathways_manager (Optional[Manager]) –

Returns

pathway_forte.pathway_enrichment.functional_class.pathway_names_to_df(filtered_gsea_results_df, all_pathway_ids, source, kegg_manager=None, reactome_manager=None, wikipathways_manager=None)[source]¶

Get pathway names.

Parameters

filtered_gsea_results_df –
all_pathway_ids – list of pathway IDs
source – pathway source (i.e., database name or ‘MPath’)
kegg_manager (Optional[Manager]) – KEGG manager
reactome_manager (Optional[Manager]) – Reactome manager
wikipathways_manager (Optional[Manager]) – WikiPathways manager

Return type

DataFrame

pathway_forte.pathway_enrichment.functional_class.gsea_results_to_filtered_df(dataset, kegg_manager=None, reactome_manager=None, wikipathways_manager=None, p_value=None, absolute_nes_filter=None, geneset_set_filter_minimum_size=None, geneset_set_filter_maximum_size=None)[source]¶: Get filtered GSEA results dataFrames.

pathway_forte.pathway_enrichment.functional_class.get_pathways_by_resource(pathways, resource)[source]¶

Return pathways by resource.

Return type: list

pathway_forte.pathway_enrichment.functional_class.get_analogs_comparison_numbers(kegg_reactome_pathway_df, reactome_wikipathways_pathway_df, wikipathways_kegg_pathway_df, *, pathway_column='pathway_id')[source]¶: Get number of existing versus expected pairwise mappings.

pathway_forte.pathway_enrichment.functional_class.get_pairwise_mapping_numbers(kegg_pathway_df, reactome_pathway_df, wikipathways_pathway_df)[source]¶: Get number of existing versus expected pairwise mappings.

pathway_forte.pathway_enrichment.functional_class.get_pairwise_mappings(kegg_pathway_df, reactome_pathway_df, wikipathways_pathway_df)[source]¶: Get pairwise mappings.

pathway_forte.pathway_enrichment.functional_class.compare_database_results(df_1, resource_1, df_2, resource_2, mapping_dict, check_contradiction=False)[source]¶: Compare pathways in the dataframe from enrichment results to evaluate the concordance in similar pathways.

pathway_forte.pathway_enrichment.functional_class.get_matching_pairs(df_1, resource_1, df_2, resource_2, equivalent_mappings_dict)[source]¶: Get equivalent pathways and their direction of change.

pathway_forte.pathway_enrichment.functional_class.run_ssgsea(filtered_expression_data, gene_set, output_dir='/home/docs/checkouts/readthedocs.org/user_builds/pathwayforte/checkouts/latest/data/results/ssgsea', processes=1, max_size=3000, min_size=15)[source]¶

Run single sample GSEA (ssGSEA) on filtered gene expression data set.

Parameters

filtered_expression_data (DataFrame) – filtered gene expression values for samples
gene_set (str) – .gmt file containing gene sets
output_dir (str) – output directory

Return type

SingleSampleGSEA

Returns

ssGSEA results in respective directory

pathway_forte.pathway_enrichment.functional_class.filter_gene_exp_data(expression_data, gmt_file)[source]¶

Filter gene expression data file to include only gene names which are found in the gene set files.

Parameters

expression_data (DataFrame) – gene expression values for samples
gmt_file (str) – .gmt file containing gene sets

Returns

Filtered gene expression data with genes with no correspondences in gene sets removed

Return type

pandas.core.frame.DataFrame kegg_xml_parser.py