Faster Insights on Omics Data Signatures with Polly Discover

Yogesh Lakhotia, Omnya Mohamed Izzeldin

February 12, 2024

What are the upregulated and downregulated genes in response to a treatment?
Are there specific gene signatures associated with a disease subtypes or stages?
How are signalling pathways affected by genetic mutations?
How does your in-house data compare with the publicly available data?

These are fundamental questions when researching gene expression data to identify candidate genes and biomarkers associated with diseases. However, addressing these questions using public databases is highly non-trivial. Data quality and variability remain persistent concerns due to variations in experimental protocols, sample sizes, and platform differences. These factors introduce noise and bias, akin to searching for a needle in a haystack when attempting to find and extract meaningful gene signatures from the available data.

Challenges While Exploring Public Bulk RNA-seq Data

Bioinformaticians face several challenges when exploring publicly available bulk RNA-seq data. These challenges arise from the complexity and volume of the data, as well as the need to ensure data quality and extract meaningful biological insights. A few notable roadblocks include:

Data Heterogeneity: Publicly available RNA-seq data often come from different laboratories, platforms, and experimental conditions. This heterogeneity makes it difficult to compare and integrate datasets effectively.

Inconsistency in Data Quality and Preprocessing: For instance, GEO (Gene Expression Omnibus) includes a multitude of gene expression profiles from various experiments, platforms, and sources. Of these, only 2.9% of the records (or studies in layman’s terms) have been curated retrospectively. Researchers must apply rigorous quality control measures and preprocessing steps to make data suitable for analysis.
Lack of Transparency: Inadequate documentation and clarity in data processing and analysis pipelines pose challenges to the interpretation, optimization, and comparability of RNA-seq data across studies, potentially undermining its reliability and utility in scientific research.

Our Solution: Polly

Elucidata's data harmonization platform, Polly, tackles the challenges of data heterogeneity in open-source databases by integrating and standardizing diverse datasets. Polly ensures data quality through rigorous preprocessing and provides transparent documentation of the analysis pipelines, enabling researchers to derive reliable insights efficiently.

Omics Data Signatures — How does Polly make Data ML-ready?

Feature	Description
Metadata Harmonization and Data Standardization.	Polly's harmonization engine standardizes and harmonizes data related to samples and experimental conditions.
Stringent Quality Checks in Data Ingestion and Processing.	Rigorous quality checks during data ingestion and processing stages to identify and rectify errors or anomalies.
Customizable Processing	Daa processing pipelines can be tailored to meet the unique requirements of different research projects and applications.
Ensuring Transparency in the End-to-End Process	1. Documentation of steps, parameters, and methods applied to the process. 2. Facilitates understanding and reproducibility of analyses.

These high-quality datasets form a solid foundation for extracting relevant molecular signatures. For further exploration and analysis of these signatures, the platform also provides Polly Discover.

What is Polly Discover?

Polly Discover is an analysis module on the platform, to help users extract, find, and explore biologically important signatures from relevant curated datasets, as well as comparisons (of cohorts) within datasets. The module provides interactive visualizations that facilitate the interpretation of expression results. Users can enhance these results by incorporating existing knowledge bases and integrating them into meta-analysis methods, machine learning applications, and other tools. For those seeking more advanced visualizations, the data can be streamed to tools like Spotfire using APIs.

Polly Discover - Key Features

High-quality metadata curation custom to research needs. Human readable comparison names segregated into appropriate categories to ease findability.
Full control over data processing pipelines used. Ensure all data is comparable with inhouse findings.
360-degree findability journeys ( based on genes, pathways and other metadata fields) to search across public, in-house data
Fast turnaround times / predictable delivery timelines with tech-enabled processes.
Discover robust and consistent gene expression signatures across various comparisons.
Integrate with other open-source knowledge bases seamlessly to enrich signatures.

Use Case: Finding the Gene Signatures Associated with Ulcerative Colitis in a Few Clicks.

A researcher studying ulcerative colitis aimed to identify specific gene signatures linked to the disease. By comparing their in-house bulk RNA-seq data with publicly available information, they sought to validate their findings and pinpoint potential targets with greater confidence.

For starters, data audits have been performed on datasets from sources such as GEO and ArrayExpress to find all the ulcerative colitis-related datasets and store them in an Atlas. Both public and in-house data were processed using the same pipeline enabling users to generate and compare insights from both public and in-house data seamlessly.

With Polly Discover,

The datasets were deeply curated with Polly Harmonization Engine to make the following key fields available to the users - disease, tissue, drug, cell-line, cell type, mouse/rat strain, experimental factors, comparison types, etc. This curation enabled users to find relevant curated datasets within minutes.
Each dataset was carefully curated to identify relevant groups and suitable comparisons. For instance, within the GSE112057 dataset, comparisons included Crohn’s Disease vs. normal, Crohn’s Disease vs. colitis, and Polyarticular Arthritis vs. colitis, among others. Using DESeq2, differentially expressed genes and enriched pathways from MSigDB for each of these comparisons are already precomputed and stored in Polly’s Atlas. This streamlined approach makes it convenient and efficient to identify gene signatures and grasp the functional significance of these differentially expressed genes.

In this case study, we picked 5 datasets where ulcerative colitis samples are compared with normal samples. Here’s how one dataset can be consumed with the Polly Discover on Polly-

A curated comparison study enables identifying genes that are known to be biologically relevant to Ulcerative Colitis, here there are 55 Control Samples and 43 Perturbation Samples with 837 upregulated genes.

Further analysis of the differentially expressed genes in the dataset can be done by visualizing a volcano plot of genes and its associated log fold change value and p-value. The Gene List can be downloaded and compared to the in-house propriety bulk-RNAseq data for validation.

More robust validation of in-house findings can be achieved by cross-comparing log-fold change (logFC) values across 5 datasets, this can help analyze consistent patterns of gene expression changes across datasets, and researchers can identify more reliable gene signatures associated with ulcerative colitis.

Notably, all genes consistently demonstrate similar expression patterns across the various studies.

Upregulated genes across 5 datasets of comparison ' Ulcerative Colitis Vs Normal'.

This approach adds strength to the results by demonstrating the consistency of gene expression patterns across diverse studies conducted by different groups, even in the presence of heterogeneity in experimental conditions, data sources, and time points regarding Ulcerative Colitis.

With Polly Discover, identifying common genes across all curated datasets is a mere minute task. Further analysis can be done using open-source tools like GOProfiler, NetworkAnalyst, Cytoscape, etc.

‍

Downstream step	Tool
functional relevance of these genesets	GOProfiler	Pathways that get impacted by the geneset of consistently upregulated genes
Drug repurposing	NetworkAnalyst	Drugs that can be used for a given gene target
Gene signaling regulation	NetworkAnalyst	Gene signaling regulation

‍

Employing DisGeNET, researchers identified the predominant mutations in ulcerative colitis-afflicted individuals, namely NOD2, ATG16L1, IL23R, ABCB1, TNFSF15, STAT3, NR1I2, and TLR4. Their objective was to explore instances of differential expression of these genes in various biological conditions. With Polly Discover, they could search and discover 99 distinct comparisons across biological conditions where these genes exhibited differential expression.

Impact

1. By utilizing Polly Discover, the researcher were able to validate the in-house findings of their study on ulcerative colitis saving 70% of time consumed over traditional methods.

2. The researcher efficiently identified gene signatures and enriched pathways associated with the disease, enhancing their understanding of ulcerative colitis.

3. With few clicks, researchers swiftly identified 99 distinct comparisons across biological conditions showcasing the varied expression of key genes predominant mutations in ulcerative colitis-afflicted individuals

Conclusion

Polly Discover on Elucidata's Polly simplifies the complexities of transcriptomics data analysis, providing researchers with a one-stop solution. By addressing challenges in publicly available RNA-seq data, Polly Discover ensures high-quality, harmonized data for efficient exploration.

The use-case of Polly Discover is exemplified in a scenario involving the exploration of genes associated with ulcerative colitis. Through Polly's harmonizing engine, researchers can compare in-house bulk RNA-seq data with public data, ensuring high confidence in target identification. The platform's curated datasets, comparisons, and precomputed gene signatures streamline the process, offering efficient data exploration.

‍Connect with us or reach out to us at info@elucidata.io to learn more.

Other Resources

Blogs Case Studies Dataset Roundup Documentation Glossary Webinars Whitepapers

Thank you for reaching out!

Our team will get in touch with you over email within next 24-48hrs.

Oops! Something went wrong while submitting the form.

FAQs

What are the key benefits of using Polly for gene target prioritization in patient stratification?

Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.

View Video

How does Polly help in training classifier models for patient stratification?

View Video

How does Polly assist in defining genetic signatures for different stages of cell differentiation?

View Video

What is the process of creating a disease-specific atlas using Polly’s harmonization engine?

View Video

How does Polly integrate multiple data types for more reliable patient stratification?

View Video

Can Polly handle data quality issues and unstructured data from public repositories?

View Video

How does Polly harmonize multi-omic datasets to improve the quality of patient stratification?

View Video

How does Elucidata's Polly help in overcoming the challenges of patient stratification?

View Video

What challenges do researchers face when performing patient stratification using multi-omics data?

View Video

What is patient stratification, and why is it important for precision medicine?

View Video

What are the key advantages of using Polly for transcriptome profiling and biomarker identification?

View Video

Meet Us at BIO-Europe 2025

View Details

[Upcoming Webinar] Scaling High-Quality Data Processing: Achieve 4x Cost Reduction for Foundation ModelsRegister Now->

Reserve Your Seat

Pharma Company Achieves 4x Faster Target Identification for Inflammatory Disease

Key Highlights

What’s a Rich Text element?

Static and dynamic content editing

How to customize formatting for each rich text

All Solution Briefs

Other Resources

Faster Insights on Omics Data Signatures with Polly Discover

Challenges While Exploring Public Bulk RNA-seq Data

Our Solution: Polly

What is Polly Discover?

Polly Discover - Key Features

Use Case: Finding the Gene Signatures Associated with Ulcerative Colitis in a Few Clicks.

Impact

Conclusion

Other Resources

Talk to our Data Expert

More Solution Briefs

Faster Insights on Omics Data Signatures with Polly Discover

Enhancing Data Quality: QC Filters for Single Cell RNA-seq Analysis

How to Perform Patient Stratification on Polly

ChatGPT in Drug Discovery

Solving Biomedical Data Findability Issues Using Polly

How to Compare Gene Signatures on Polly

FAQs

What are the key benefits of using Polly for gene target prioritization in patient stratification?

How does Polly help in training classifier models for patient stratification?

How does Polly assist in defining genetic signatures for different stages of cell differentiation?

What is the process of creating a disease-specific atlas using Polly’s harmonization engine?

How does Polly integrate multiple data types for more reliable patient stratification?

Can Polly handle data quality issues and unstructured data from public repositories?

How does Polly harmonize multi-omic datasets to improve the quality of patient stratification?

How does Elucidata's Polly help in overcoming the challenges of patient stratification?

What challenges do researchers face when performing patient stratification using multi-omics data?

What is patient stratification, and why is it important for precision medicine?

What are the key advantages of using Polly for transcriptome profiling and biomarker identification?

What methodologies does Polly use to identify synergistic drug combinations?

How does Polly rank datasets similar to a gene signature query?

What steps are involved in creating a query gene signature on Polly?

How does Polly's RNA-Seq Atlas simplify gene signature analysis?

What is gene signature comparison, and why is it important in drug discovery?

Get the latest news, industry insights, and updates delivered directly to your inbox.

All Solution Briefs

Faster Insights on Omics Data Signatures with Polly Discover

Enhancing Data Quality: QC Filters for Single Cell RNA-seq Analysis

How to Perform Patient Stratification on Polly

ChatGPT in Drug Discovery

Solving Biomedical Data Findability Issues Using Polly

How to Compare Gene Signatures on Polly

info@elucidata.io

info@elucidata.io

info@elucidata.io