FAIR Data

FAIRifying Research Data: Your Resolution for 2023

Trisha Dhawan
January 3, 2023

Dear Reader, it’s 2023 already!

Looking back at 2022, one can conclude that groundbreaking discoveries & advancements in life science research aided by artificial intelligence and machine learning significantly impacted multiple fronts. We live in times where the pace of innovation and evolution is unforeseen, and with each passing day, our understanding of how to optimize the integration of AI/ML in life science research is improving.

In the past few editions of Polly Bits, I talked extensively about the cutting-edge developments in life science research piloted by AI/ML. This new year is a harbinger of exciting new products and advances. With the launch of exciting new AI-based tools such as ChatGPT towards the end of 2022, it is only fair to expect transformational technological evolution. Needless to say, the integration of technology in life science research is inevitable, urging us to think collaboration and rapid time-to-results.

In this edition, I want to emphasize the importance of Findable, Accessible, Interoperable, and Reusable (FAIR) research data in drug discovery and development. Integrating technology and automating processes using AI/ML in the drug discovery and development processes requires high-quality, structured data that can be easily stored, managed and used. The attributes that make research data FAIR are critical for:

  • automating certain processes to save valuable resources;
  • enabling collaboration across disciplines;
  • hastening insight derivation;

All of the above points shouldn’t be isolated but concerted efforts towards accelerating drug discovery. Let’s see how!

The Big Data Problem in Biotechnology

The past decade or so saw a mammoth movement in biotechnology in terms of innovation. The result- what we can accomplish in just a few hours with the use of futuristic biomedical equipment and techniques was unfathomable but has now become commonplace.

Be it non-invasive genetic testing or cancer diagnosis, CRISPR/Cas-9 gene editing techniques, personalized medicine, stem-cell research, and synthetic biology, we have invented technologies that aid and enhance our understanding of the interaction of biology with drugs. With the advent of new, high-throughput technology, comes ginormous volumes of data. For example, a study estimated that for every 3 billion bases of the human genome sequence, approximately 100 gigabases of data must be collected, implying 240 exabytes of storage capacity by 2025.  

This experimental data generated is precious and critical to further drug discovery. The current practices being followed to store, manage and share data currently are not aligned with the ultimate goal of data availability and reusability. To give you a sense of the gravity of the problem- data scientists spend a whopping 80% of their time just collecting and organizing datasets, whereas only about 20% of their time is spent on data mining for insight derivation based on a survey!

We are surrounded by data, but starved for insights.

This further emphasizes that it is time to adopt tools and software to store, manage and share data using FAIR guiding principles to enable scientists to make better use of it.

The Intervention

The FAIR guiding principles have been around for a while, however, their implementation and compliance is still a work in progress. Research data available in public and proprietary repositories remains largely unstructured and unusable.

Of late, funding agencies have recognized the problems associated with data sharing and reuse and have started taking improve data warehousing and sharing.

In 2022, directives were issued by the OSTP and the NIH for data sharing of government-funded projects. The Office of Science and Technology Policy (OSTP) issued new guidelines mandating data from federally funded research to become accessible to all without an embargo starting in 2026. The Scientific Data Sharing Directive by the NIH aims to ensure the availability of high-quality data from NIH-funded projects. The Data Management and Sharing Policy (DMS), effective Jan 25th, 2023, urges researchers to include a plan and budget for the management and storage of research data, and include a budget for the same in their grants.

These are a few commendable steps that will encourage and ensure data availability and reusability to accelerate time-to-insights and collaboration, but they shouldn’t be limited to government-funded projects. More importantly, researchers should strive to adhere to the FAIR principles to contribute to the scientific community at large. There are a plethora of tools and services that can be easily integrated into pipelines and processes to help handle data better and also help visualize and analyze data but that can only be done if the data is well structured, to begin with. This is what digital transformation is all about!

Using the FAIR principles to store, manage and share data is a step toward being better prepared not only for a crisis such as a pandemic but for expediting the discovery of safer, better drugs. This should be our resolution for this year!

You can read more about FAIR data here or you could email us at info@elucidata.io for more details.

This post was originally published in Polly Bits- our biweekly newsletter on LinkedIn.

Blog Categories

Blog Categories

Request Demo