Two patients with the same diagnosis often show very different disease trajectories and treatment responses. For clinical and research teams, the goal is clear: identify meaningful patient subgroups that go beyond traditional labels.
One approach to this problem is to combine real-world clinical data (EHR) with molecular datasets (like TCGA) and apply unsupervised learning to uncover hidden patterns. But this is where most efforts break down.
EHR data is fragmented across systems, inconsistent in coding (ICD, SNOMED, proprietary formats), and often buried in free text. At the same time, public datasets like TCGA are structured but lack the clinical context needed for real-world applications. Without a common representation, it becomes difficult to generate reliable patient features, making downstream clustering and stratification noisy and hard to reproduce.
In this session, we walk through how this can be addressed in practice.
We’ll show how standardizing clinical data using the OMOP Common Data Model, combined with LLM-assisted workflows, can transform fragmented datasets into consistent, analysis-ready patient cohorts. Using real examples, we’ll demonstrate how heterogeneous clinical tables can be converted into model-ready datasets that support meaningful patient clustering and discovery.
Two patients with the same diagnosis often show very different disease trajectories and treatment responses. For clinical and research teams, the goal is clear: identify meaningful patient subgroups that go beyond traditional labels.
One approach to this problem is to combine real-world clinical data (EHR) with molecular datasets (like TCGA) and apply unsupervised learning to uncover hidden patterns. But this is where most efforts break down.
EHR data is fragmented across systems, inconsistent in coding (ICD, SNOMED, proprietary formats), and often buried in free text. At the same time, public datasets like TCGA are structured but lack the clinical context needed for real-world applications. Without a common representation, it becomes difficult to generate reliable patient features, making downstream clustering and stratification noisy and hard to reproduce.
In this session, we walk through how this can be addressed in practice.
We’ll show how standardizing clinical data using the OMOP Common Data Model, combined with LLM-assisted workflows, can transform fragmented datasets into consistent, analysis-ready patient cohorts. Using real examples, we’ll demonstrate how heterogeneous clinical tables can be converted into model-ready datasets that support meaningful patient clustering and discovery.
.png)
.png)
Scaling clinico-genomic data integration: Large pharmaceutical organizations working with external data providers used Polly to build interoperable clinico-genomic data products 6x faster.
Although purchased datasets are often labeled as "clean," they still lack interoperability—Polly's pipelines bridge this gap with robust integration and harmonization.
Information Retrieval: Drug safety monitoring teams used Polly's Knowledge Graph powered co-scientist to conversationally retrieve the right cohorts & assess drug response—cutting discovery time by 70%.
.png)
.png)

.png)
.png)
.png)

.png)



.png)


If you’re working with complex biological data, you may be asking:
Can generative AI truly assist in scientific reasoning, not just data analysis?
What does it mean for hypothesis generation, literature review, or even designing experiments?
Could this accelerate—not replace—my discovery pipeline?
Whether you're skeptical, curious, or already experimenting with AI in your lab—this is a session designed to ground your understanding in evidence, not speculation.
.png)
.png)
.png)

.png)

