Biological multi-omics data hold tremendous potential for reuse and discovery. An enormous amount of data is being generated and made public by academic labs and organizations worldwide. However, the data is scattered across multiple sources and lacks standardization. It is un(FAIR) as the availability of data does not equate to its usability. Elucidata’s data warehouse, OmixAtlas, is a repository of FAIR (Findable, Accessible, Interoperable, Reusable) data. It is a collection of millions of datasets from public, proprietary, and licensed sources that have been curated, harmonized, and made ready for downstream machine learning and analytical applications. It is one central location to access data over 26 data types from over 30 public repositories and licensed sources.
All datasets on Polly go through a 2-step process:
Data schema: the data available within OmixAtlas is curated within defined indexes on the basis of the information it contains. These indexes are:
OmixAtlas provides access to thousands of tissue-derived or disease-specific multi-omics datasets from multiple sources in one place. The data can be accessed and analyzed on the same computational infrastructure.
The datasets on Polly can be accessed through GUI or programmatically with Polly Python.
Polly Python library provides convenient access to the below-mentioned functionalities through functions in Python language.
Polly library allows access to data in OmixAtlas over any computational platform like SageMaker, Polly, etc.
The details of datasets can be easily visualized easily over UI as well.
• While handling enormous data and while working on different omics datasets, do you have the need to group samples from multiple OmixAtlases so that it becomes easy to analyze data from different datasets/across repositories?
Look no further! We’ve got you covered with our super useful feature Cohorting which allows you to group datasets or samples based on metadata of interest on Polly. This feature enables you to study the difference between two cohorts- for example. Diseased vs Normal or Cancerous vs Non-Cancerous cells.
• Missing out on datasets while querying just because your search term does not match the ontological term?
For instance, while querying datasets for the disease IBD, the ideal result set must include datasets annotated with diseases - ‘inflammatory bowel diseases', ‘inflammatory bowel diseases, Crohn's disease’, and ‘inflammatory bowel diseases 8’. However, expansion of a keyword doesn’t happen under the hood, resulting in a lesser number of valid hits.
To overcome this, Polly has the ‘Ontology Recommendations’ functionality integrated into Polly-Python. This functionality aims to provide more valid hits in fewer user efforts. The expansion of the keyword happens implicitly, reducing the manual interventions.
For example, if the user tries to query the dataset for the disease ‘obesity’, the result set of ontological recommendations would also include the searches for the terms -
• With tons of data generated & published in the public repositories every year, do you find it challenging to find out the accurate resource required to curate & harmonize them to our needs?
Our Curation app is the solution to all the curation woes. It helps you curate, standardize & harmonize all the clinical data that you’ve generated in a double-blinded manner to convert them into analysis-ready formats!
Along with standard metadata curation, we also offer custom metadata curation wherein users will be able to curate a field of their choice. For instance, the curation of cancer stage, BMI etc. The user will be able to define the custom column header, and ontology to be used if any.
• Visualization apps
Public OmixAtlas is a repository of more than 1.5 million datasets and 4.1 million samples aggregated from 32 publicly available sources. In addition, managing in-house data at scale can also be done with our Enterprise OmixAtlas where proprietary data is standardized and curated. This helps in significantly reducing the time spent on processing datasets.
Benefits of Public OmixAtlas:
Benefits of Enterprise OmixAtlas:
Contact us if you want to learn more about using our 1.5 million curated datasets to train your models or to take advantage of our data-centric platform Polly, to find and analyze relevant datasets.
Get the latest insights on Biomolecular data and ML