Gene Signature comparisons with available datasets have proven to be a powerful technique utilized by biopharma R&D teams for drug discovery, biomarker identification, development, and personalized medicine.
This technique allows researchers to analyze the expression levels of large numbers of genes in samples from individuals with a particular condition or disease and compare it to a conserved cluster of genes whose expression levels are most strongly associated.
This gene signature can then be used to search public databases of gene expression data for other drugs or compounds that can revert the disease signature, indicating a potential therapeutic effect.
However, extracting associated signatures from public databases can be challenging due to various processing pipelines, syntaxes, schemas, and metadata annotations used at the source. We address these challenges through Polly’s RNA-Seq Omixatlas.
This blog discusses how users can compare signatures using Polly's RNA-Seq OmixAtlas.
Polly is a biomedical data platform for life sciences R&D, primarily delivering bulk & single-cell RNA-seq data, along with 24 other data types. It delivers 155 TB of FAIR and ML-ready biomedical data from ~30 different public and proprietary sources to customers. Polly’s RNA-Seq OmixAtlas (OA) contains curated RNA seq datasets collected from Gene Expression Omnibus (GEO). This richly curated resource provides a good base for researchers looking to find datasets with similar transcriptional profiles to their gene sets of interest.
All datasets on Polly are:
The first step to compare Gene Signatures is to create a query wherein the gene of interest can be searched against a dataset to identify a closely associated gene cluster. To generate a query signature, the following steps are required:
Or Polly experts can be contacted that will work with your scientists to customize these steps as needed and capture transcriptome profiles and generate queries (gene signature vectors) that will run on Polly’s signature database. The query will consist of gene clusters that were significantly differentially expressed in the experiment with Log Fold Change, p-values, and adjusted p-values
Example of a query: Given an input of gene set and Log Fold Change values, search for all datasets that show maximum cosine similarity scores with the input genes and their differential expression results.
This signature database can now be queried to identify datasets with similar transcriptional profiles to the Query Signature. For instance, users can run complex SQL queries to identify:
We used signature reversal and multivariate gene expression signatures to identify potential drug combinations for COVID-19. To do this, publicly available transcriptomics data from COVID-19 studies and drug signatures from LINCS were compiled, processed, and curated. All datasets were ingested through Polly's proprietary curation pipeline, enriched with ontology-backed metadata, and engineered to a query-able .gct format.
Polly delivers ML-ready biomedical molecular data that is curated to accelerate drug discovery. Hosting a rich repository of ~45,000 RNA-Seq datasets, Polly is a customizable platform that assists with comprehensively analyzing integrated biomedical data.
Want to perform gene signature comparisons effectively? Talk to us!
Get the latest insights on Biomolecular data and ML