Harmonize and Curate Public Data on Polly

Polly curates and harmonizes datasets from public repositories- processes measurements, links to ontology-backed metadata and transforms them into a unified data model to deliver pristine quality data for faster insights.

Public Repositories Supported

Array Express

Know More
30+

Public Repositories Supported

Technology

How Does Polly Make Data from Public Repositories ML-ready?

Polly curates datasets of your choice from public repositories, harmonizes them, and delivers in a pristine-quality, ML-ready format, fit for downstream analysis.

The Polly Difference

Deeply Curated, Highest Quality Data

Polly retrieves datasets from public repositories and curates and harmonizes data of choice. Polly's powerful harmonization engine processes measurements, links to ontology-backed metadata and transforms them into a Unified Data Model. Datasets are mapped with 6 standard fields (can be customized for up to ~30 fields) to ontologies at dataset and sample level. Commonly used ontologies are MeSH, BRENDA Tissue Ontology, NCBI Taxonomy, Cellosaurus, Cell Ontology and PubChem, for disease, tissue, organism, cell line, cell type, drug respectively.

request demo
35+TBs

Biomedical data harmonized every month.

25+

Data types supported across multi-omics, assay and clinical.

30+

Data sources supported including GEO, PRIDE, CPTAC and more.

2500+

Diseases across oncology, metabolic, immunology and more.

Case studies

24x Faster Proteomics Research with PRIDE, $500K Savings

View Case Study
Case studies

Oncology Company Achieves ~80% Faster Gene Target ID & Validation

View Case Study
Case studies

Pharma-AI Collaboration Cuts Costs by ~$3M with Curated Public Data

View Case Study
Request Demo