Omixatlas - A Data Centric Approach to Research with Ml-ready Biomolecular Data

Jayashree

October 20, 2022

Biological multi-omics data hold tremendous potential for reuse and discovery. An enormous amount of data is being generated and made public by academic labs and organizations worldwide. However, the data is scattered across multiple sources and lacks standardization. It is un(FAIR) as the availability of data does not equate to its usability. Elucidata’s data warehouse, OmixAtlas, is a repository of FAIR (Findable, Accessible, Interoperable, Reusable) data. It is a collection of millions of datasets from public, proprietary, and licensed sources that have been curated, harmonized, and made ready for downstream machine learning and analytical applications. It is one central location to access data over 26 data types from over 30 public repositories and licensed sources.

All datasets on Polly go through a 2-step process:

Data Engineering: This includes transforming data to fit a proprietary data schema that is uniform across several datatypes. The transformation streamlines data in one consistent schema and allows users to query multiple data types on a single data infrastructure.
Metadata Harmonization: This involves tagging each sample and dataset with a uniform ontology.

How Is Data Structured on Omixatlas?

Data schema: the data available within OmixAtlas is curated within defined indexes on the basis of the information it contains. These indexes are:

Dataset-level metadata (index: files): Contains curated fields like drug, disease, tissue organism, etc., for each dataset.
Sample-level metadata (index: gct_metadata, h5ad_metadata, and biom_metadata): Contains curated fields like cell lines, experimental design, etc., for each sample.
Feature level metadata (gct_row_metadata, h5ad_data, and biom_data): Contains the gene/molecule symbol along with the feature intensity for each sample.
Variant-related data (index: variant_data): Contains the schema for variant-related information present in vcf files.

Accessing OmixAtlas

OmixAtlas provides access to thousands of tissue-derived or disease-specific multi-omics datasets from multiple sources in one place. The data can be accessed and analyzed on the same computational infrastructure.

The datasets on Polly can be accessed through GUI or programmatically with Polly Python.

Polly Python library provides convenient access to the below-mentioned functionalities through functions in Python language.

Creating and updating an OmixAtlas
Querying data and metadata
Downloading any dataset
Working with Workspaces
Working with the data schema
Ingesting data on OmixAtlas
Working with cohorts

Polly library allows access to data in OmixAtlas over any computational platform like SageMaker, Polly, etc.

The details of datasets can be easily visualized easily over UI as well.

‍

General details such as the number of tissues, diseases, organisms, etc., are displayed on the summary page of an OmixAtlas.
OmixAtlas offers filtering options using which a user can narrow down search results across metadata fields such as dataset id, no. of samples, description, drugs, cell type, cell line, disease etc.
Dataset download is also possible over UI.

Features of High Significance on Polly

• While handling enormous data and while working on different omics datasets, do you have the need to group samples from multiple OmixAtlases so that it becomes easy to analyze data from different datasets/across repositories?

Look no further! We’ve got you covered with our super useful feature Cohorting which allows you to group datasets or samples based on metadata of interest on Polly. This feature enables you to study the difference between two cohorts- for example. Diseased vs Normal or Cancerous vs Non-Cancerous cells.

• Missing out on datasets while querying just because your search term does not match the ontological term?

For instance, while querying datasets for the disease IBD, the ideal result set must include datasets annotated with diseases - ‘inflammatory bowel diseases', ‘inflammatory bowel diseases, Crohn's disease’, and ‘inflammatory bowel diseases 8’. However, expansion of a keyword doesn’t happen under the hood, resulting in a lesser number of valid hits.

To overcome this, Polly has the ‘Ontology Recommendations’ functionality integrated into Polly-Python. This functionality aims to provide more valid hits in fewer user efforts. The expansion of the keyword happens implicitly, reducing the manual interventions.

For example, if the user tries to query the dataset for the disease ‘obesity’, the result set of ontological recommendations would also include the searches for the terms -

• With tons of data generated & published in the public repositories every year, do you find it challenging to find out the accurate resource required to curate & harmonize them to our needs?

Our Curation app is the solution to all the curation woes. It helps you curate, standardize & harmonize all the clinical data that you’ve generated in a double-blinded manner to convert them into analysis-ready formats!

Along with standard metadata curation, we also offer custom metadata curation wherein users will be able to curate a field of their choice. For instance, the curation of cancer stage, BMI etc. The user will be able to define the custom column header, and ontology to be used if any.

• Visualization apps

Spotfire Integration is readily available on Polly to help you visualize. It saves a lot of time since it is available on the same platform and there is no need to set up a separate integration. Datasets on Polly are richly annotated with a variety of metadata labels. The data can now be easily analyzed, visualized, and shared in the form of dashboards.
Cell X gene is a widely used tool for visualizing processed single-cell datasets. It is an interactive application and is primarily used for preliminary analysis and exploration of single-cell datasets.

Public and Enterprise Omixatlas

Public OmixAtlas is a repository of more than 1.5 million datasets and 4.1 million samples aggregated from 32 publicly available sources. In addition, managing in-house data at scale can also be done with our Enterprise OmixAtlas where proprietary data is standardized and curated. This helps in significantly reducing the time spent on processing datasets.

Benefits of Public OmixAtlas:

Allows querying of millions of records in a single request using SQL.
Analysis of data using algorithms of choice.
Accessing data using any computational infrastructure.
Conversion of data formats to suit analysis needs.
Augmenting internally generated data with public studies.

Elucidata Public OmixAtlas — Public OmixAtlas

Benefits of Enterprise OmixAtlas:

Using Polly’s technology for storing, curating, and managing biomolecular data at scale.
Access to built-in filters on Polly’s UI or other GUI-based platforms and also Polly's programmatic interface to find and interpret data in Enterprise OmixAtlas.

Elucidata Enterprise OmixAtlas — Proprietary data standardized and curated - enabling analysis and novel insights

‍Contact us if you want to learn more about using our 1.5 million curated datasets to train your models or to take advantage of our data-centric platform Polly, to find and analyze relevant datasets.

‍

Other Resources

Blogs Case Studies Dataset Roundup Documentation Glossary Webinars Whitepapers

Thank you for reaching out!

Our team will get in touch with you over email within next 24-48hrs.

Oops! Something went wrong while submitting the form.

FAQs

What are the key benefits of using Polly for gene target prioritization in patient stratification?

Lorem ipsum dolor sit amet consectetur. Dictumst faucibus nibh imperdiet phasellus vitae ut sit. Ut eros amet massa tellus orci. Vestibulum ac arcu est nulla non eget nulla. Eget pulvinar eu ac mi cursus elementum neque. Massa nisl fringilla platea diam faucibus nullam. In lacus mauris nec ultrices. Ut accumsan leo adipiscing montes proin.

View Video

How does Polly help in training classifier models for patient stratification?

View Video

How does Polly assist in defining genetic signatures for different stages of cell differentiation?

View Video

What is the process of creating a disease-specific atlas using Polly’s harmonization engine?

View Video

How does Polly integrate multiple data types for more reliable patient stratification?

View Video

Can Polly handle data quality issues and unstructured data from public repositories?

View Video

How does Polly harmonize multi-omic datasets to improve the quality of patient stratification?

View Video

How does Elucidata's Polly help in overcoming the challenges of patient stratification?

View Video

What challenges do researchers face when performing patient stratification using multi-omics data?

View Video

What is patient stratification, and why is it important for precision medicine?

View Video

What are the key advantages of using Polly for transcriptome profiling and biomarker identification?

View Video

Datafair 2026 Boston The Data-centric Mandate: From AI Models to Patient Impact

Register Now

[Upcoming Webinar] Scaling High-Quality Data Processing: Achieve 4x Cost Reduction for Foundation ModelsRegister Now->

Reserve Your Seat

Pharma Company Achieves 4x Faster Target Identification for Inflammatory Disease

Key Highlights

What’s a Rich Text element?

Static and dynamic content editing

How to customize formatting for each rich text

All Solution Briefs

Other Resources

Omixatlas - A Data Centric Approach to Research with Ml-ready Biomolecular Data

How Is Data Structured on Omixatlas?

Accessing OmixAtlas

Features of High Significance on Polly

Public and Enterprise Omixatlas

Other Resources

Talk to our Data Expert

More Solution Briefs

Faster Insights on Omics Data Signatures with Polly Discover

Enhancing Data Quality: QC Filters for Single Cell RNA-seq Analysis

How to Perform Patient Stratification on Polly

ChatGPT in Drug Discovery

Solving Biomedical Data Findability Issues Using Polly

How to Compare Gene Signatures on Polly

FAQs

What are the key benefits of using Polly for gene target prioritization in patient stratification?

How does Polly help in training classifier models for patient stratification?

How does Polly assist in defining genetic signatures for different stages of cell differentiation?

What is the process of creating a disease-specific atlas using Polly’s harmonization engine?

How does Polly integrate multiple data types for more reliable patient stratification?

Can Polly handle data quality issues and unstructured data from public repositories?

How does Polly harmonize multi-omic datasets to improve the quality of patient stratification?

How does Elucidata's Polly help in overcoming the challenges of patient stratification?

What challenges do researchers face when performing patient stratification using multi-omics data?

What is patient stratification, and why is it important for precision medicine?

What are the key advantages of using Polly for transcriptome profiling and biomarker identification?

What methodologies does Polly use to identify synergistic drug combinations?

How does Polly rank datasets similar to a gene signature query?

What steps are involved in creating a query gene signature on Polly?

How does Polly's RNA-Seq Atlas simplify gene signature analysis?

What is gene signature comparison, and why is it important in drug discovery?

Get the latest news, industry insights, and updates delivered directly to your inbox.

All Solution Briefs

Faster Insights on Omics Data Signatures with Polly Discover

Enhancing Data Quality: QC Filters for Single Cell RNA-seq Analysis

How to Perform Patient Stratification on Polly

ChatGPT in Drug Discovery

Solving Biomedical Data Findability Issues Using Polly

How to Compare Gene Signatures on Polly

info@elucidata.io

info@elucidata.io

info@elucidata.io