March 1, 2025

5 Minutes Read

Unknown Data Quality from 'Research Use Only' Collaborations is a Business Risk for Dx Companies!

Abhishek Jha

Co-Founder & CEO, Elucidata

A conversation with Prof. Kim, an oncologist and CEO of IMBDx—a publicly traded diagnostics company based in South Korea—left me reflecting on a harsh truth that many in the field grapple with.

By the time most cancer patients walk into his clinic, it’s already too late to intervene.

Early detection isn’t just a clinical aspiration. It’s the very promise of precision medicine. And yet, despite all our advancements, help comes a little too late for many of us.

That’s the promise—but the question diagnostics companies are asking is: how do we actually get there?

The Promise—and the Problem

The vision is clear: early detection, personalized intervention, and better patient outcomes.

But, as Peter put it, realizing this promise is hard. The biggest barrier? As Amit Agrawal, CSO, Diagnostics Platform at Danaher Corp1, pointed out—it’s a fractured ecosystem: fragmented data, siloed systems, and a lack of true interoperability between research and clinical workflows.

Even though many large diagnostics companies are now forming “research-use only” partnerships—connecting pharma, instrument manufacturers, research institutions, and diagnostic labs—a critical bottleneck remains.

Yes, tech infrastructure is essential to making these partnerships efficient, scalable, and secure. But let’s not miss the first rung of the ladder.

The primary mandate of a successful tech infrastructure should be to ensure Data Quality!

You can have secure pipelines, compliance-ready environments, and scalable architectures—but if the data flowing through them is inconsistent, fragmented, or incomplete, everything built on top of poor quality or unknown quality data is compromised.

That’s the quiet reality facing many diagnostics teams today.

Data Quality: The Silent Crisis in Diagnostics

Across diagnostics, unknown data quality quietly shows up as the hidden risk on everything—from new product discovery to regulatory filings.

It's rarely discussed upfront, and even rarer are contingency plans for dealing with poor data quality. But it's often the reason behind slow clinical validation, poor model performance, and missed opportunities in product development.

Imagine acquiring millions of dollars’ worth of clinical and diagnostics data, without having an objective assesment of the quality of the underlying data. Perhaps later your data engineers will discover (as often is the case) inconsistencies, missing fields, or formatting errors.

The data that should power your next diagnostic product or regulatory submission is instead fragmented, unreliable, and unusable.

And yet, they push forward—because the stakes are high—piecing together fragmented information to detect disease, refine biomarkers, and guide treatment decisions.

But when the foundation is shaky, even the best analytical tools can only go so far.

However, the science doesn't stop.

While teams struggle with missing pieces of the puzzle, competitors move faster, launching new tests, products, and regulatory filings.

Solving the Crisis Holding Back Diagnostic Innovation

Data is a responsibility.

Time and again, I've observed companies invest heavily in acquiring EHRs, multi-omics datasets, imaging, and clinical trial data. However, without structured data quality processes, teams often are caught flat footed because they had no objective estimate of the quality of the data that they have access to.

Worst case, the quality of data is beyond salvage.

Best case scenario, they identify the data quality issues and fix them which causes unanticipated delays in new product development.

A Gartner report found that data quality issues are one of the leading causes of poor AI project performance, costing businesses millions of dollars annually (Gartner). In diagnostics, the consequences extend beyond operational inefficiencies:

Delayed product development cycles—pushing regulatory approvals and market launches further down the line, undermining the ROI on data investments
Wasted resources & increase in operational costs—millions spent on acquiring data, only for it to require extensive manual rework
Regulatory and compliance setbacks due to incomplete or inconsistent datasets

A Diagnostics Company’s Data Dilemma

Not too long ago, we came across a scenario that, frankly, isn’t unusual in diagnostics.

A company working across inflammatory disease had amassed over 30 million patient records from multiple academic partners and vendors—data that spanned EHRs, omics, imaging, and clinical trials.

But the volume of data was only part of the story.

The real challenge was they did not have an objective assessment of the data that was acquired. Was it a goldmine or a pile of garbage? Or something in between?

Gaps in key fields and non-standard labels made meaningful analysis nearly impossible without months of manual rework.

Among the most persistent issues? Timestamps. Dates ranged from 1888 to 2048, with missing day values, mismatched formats, and irregular time zones.

This isn’t an isolated problem—it’s the kind of hidden complexity that quietly derails timelines and undercuts the value of even the most expensive datasets.

In this case, resolving it required standardizing 293 fields, validating for completeness and consistency, and harmonizing across data models like OMOP CDM.

Only then was the data in a state that could support downstream use: biomarker analysis, model development, regulatory reporting.

And that’s the real point here: high-value diagnostics work doesn’t just rely on data—it relies on structured, usable, quality-assured data.

‍

Regulatory Pressure and the Shift Toward Quality 4.0- What does it mean for diagnostics companies?

Regulatory bodies like the FDA and EMA have set strict data integrity guidelines, especially as AI becomes more embedded in clinical and diagnostic workflows. The shift toward Quality 4.0—which integrates AI and digital tools with quality management—is gaining momentum.

According to a Forbes Tech Council report, the diagnostics industry is beginning to embrace continuous data quality monitoring rather than treating it as an afterthought (Forbes).

This means:

Real-time validation and anomaly detection to catch errors before they impact research timelines
Standardized ontologies and data models to ensure consistency across diagnostic workflows
AI-powered curation tools that don’t just extract insights but actively improve data quality

Where This Conversation Goes Next

We discussed these challenges with Dmitrii Calzago from Danaher Corp in our recent webinar. How fragmented, unstructured data limits discovery and how organizations like Danaher struggle to enforce quality standards at scale. The discussion made one thing clear: Data quality should not be an afterthought. For diagnostics companies, it is the foundation for accurate testing, biomarker discovery, and regulatory success.

And we’re not stopping there.

Our next session in April will take a deeper dive into the nuts and bolts of building scalable, reusable data products. If the last session was about understanding the problem, this one is about fixing it.

Unknown data quality is the biggest risk to your new product development for Dx companies.

Reliable research goes beyond powerful models and extensive datasets—it starts with trustworthy data behind every insight, prediction, and decision. Even the best tools fall short if the data lacks integrity. Ensuring high-quality data leads to better science, more confident decisions, and faster product development with optimized costs.

Because in the end, data quality is the key to accurate diagnoses, advancing research, and improving patient outcomes.

‍

Upcoming Webinar : Data-centric AI approach to Out-of-distribution problems in Life Sciences

View Details

Overview

MODULES

features

Managed Services

Data Products

Data Types

Discovery

Preclinical Development

Clinical Research

Precision Diagnostics

Core Facility

[Upcoming Webinar] Scaling High-Quality Data Processing: Achieve 4x Cost Reduction for Foundation ModelsRegister Now->

Reserve Your Seat

Unknown Data Quality from 'Research Use Only' Collaborations is a Business Risk for Dx Companies!