How to Perform Patient Stratification on Polly

How to Perform Patient Stratification on Polly

Anurag Srivastava, Shruti Malavade
November 6, 2023

Patient stratification involves categorizing a patient population into subgroups based on the presence or absence of a disease. This approach plays a crucial role in understanding the underlying pathology of a disease, enabling physicians to customize therapeutic interventions for individuals. Patient stratification is key to precision medicine and the development of novel therapeutic targets. While AI models and multi-omics approaches have simplified patient stratification, significant challenges persist.

This blog will discuss the significant challenges faced in performing patient stratification and how users can achieve it using the Polly harmonization engine.

Challenges Faced by the Industry in Performing Patient Stratification

The ideal situation for successful personalized medicine would be for clinicians to know beforehand the patient’s risk classification and which drug to administer. However, the reality is that performing patient stratification is difficult even when utilizing multi-omics datasets at our disposal. There are many hurdles like poor data quality, a small sample size, and limited data availability.

More than 50% of public repository datasets like Gene Expression Omnibus (GEO) lack annotations, and just 2% are harmonized.

Nearly 80% of the available data are unstructured and unFAIR, making their usefulness inadequate. The problems associated with poor data quality, unFAIR data, missing metadata, and small sample sizes can result in a faulty predictive model, leading to suboptimal results.

One common strategy for patient stratification relies on cell type differentiation, which has proven effective in classifying autoimmune and cancer patients. Implementing it on a large scale presents challenges due to data-related issues. Another significant challenge in patient stratification is the lack of reliable biomarkers, as exemplified in the case of pancreatic cancer.  Moreover, disease heterogeneity adds another layer of complexity, as primary tumor sites vary among patients. The critical breakthrough in overcoming these challenges lies in the quality and harmonization of data.

The solution to this Polly by Elucidata. Polly's harmonization engine provides the means to enhance data quality, harmonize multi-modal datasets, and train patient classifiers.

Polly's capabilities include data harmonization, metadata annotation (providing essential information like tumor site), and seamless integration of various data types. Polly further aids in tackling quality-related issues by harmonizing multi-modal data into an ML-ready resource. This ensures that all data is clean, consistently processed, linked to critical metadata, and statistically robust.

Our Approach:

1. Curate an Atlas specific to disease

The first step in patient stratification using the cell type differentiation method at scale involves aggregating a large multi-omic data corpus to gain a comprehensive view of the disease. This multi-omic data corpus provides a holistic perspective, simultaneously enhancing model robustness and clinical relevance. To create a data warehouse, we use our Polly harmonization engine. Polly harmonization engine can build disease-specific atlases of ML-ready datasets. Researchers can merge and harmonize multi-modal datasets from diverse sources to meet common standards. This integration of multiple omics types and samples enhances the robustness of the models.  

2. Define Genetic Signatures

The next step involves defining genetic signatures for each stage of cell differentiation using the harmonized data from the disease-specific atlas. We employ cell types and ranking genes from each dataset to build the classifier model. After comparing gene pairs, the model classifies cell types. The cell differentiation stage cannot be determined by pairwise comparisons alone. Instead, we use more modeling techniques. We acquire patient samples from public sources like TCGA after defining the genetic markers for each cell type at each differentiation step.

3. Train Classifier Model

The classifier model is trained on harmonized datasets to categorize patients based on their cell differentiation stage and to classify them into low and high-risk groups. Performing differential expression analysis on the two patient cohorts generates a list of differentially expressed genes, serving as the foundation for a genetic signature for these patient populations. Subsequently, users can utilize transcription factor enrichment analysis to refine these genetic signatures and define potential drug targets.

4. Target Prioritization

To obtain precise targets from patient stratification, it's crucial to prioritize further gene targets more thoroughly. Our experts collaborate with your team to prioritize drug targets based on druggability scores and supporting literature evidence.

Case Study: Identifying Potential Drug Targets for AML

We used Polly OmixAtlas and patient stratification to identify potential drug targets for AML. To do this, 10k+ multi-omics datasets related to AML and normal Hematopoiesis were consolidated from public & proprietary sources. The datasets were cleaned & linked to harmonized metadata, stage of differentiation, cell line, cell type & more by Polly. Curation enabled the integration of multiple datasets to create high-quality multi-omics signatures.

Overcoming Challenges in Multi-Omics Patient Stratification
The Polly Platform
Overcoming Challenges in Multi-Omics Patient Stratification
Shortlisting genes that act as markers of differentiation

Overcoming Challenges in Multi-Omics Patient Stratification
Overcoming Challenges in Multi-Omics Patient Stratification
Training patient classifiers with harmonized AML Datasets


  • 2+ data-centric patient stratification-based targets in AML were identified using an integrative multi-omics approach
  • 6 Months to identify & validate targets with Polly. Significantly faster than the average 2-year time period

Read the case study here.

Polly For Enhanced Quality of Patient Stratification

By utilizing Polly's capabilities, researchers can streamline the multi-omics analysis process, from data retrieval to downstream analysis and interpretation. Its assistance can save time, provide expert guidance, and simplify the complex tasks involved in multi-omics analysis, ultimately enhancing the efficiency and accuracy of research.

Polly aims to empower researchers by augmenting their capabilities, accelerating the pace of discovery, and facilitating breakthroughs in various scientific fields.

Request Demo