Are You Being FAIR to Your Data ?

Swetabh Pathak
September 27, 2022

In 1907, the American Journal of Psychology described a peculiar phenomenon. The authors identified that looking at a string of words or a phrase for too long can often render it meaningless to the reader. In his doctoral thesis published in 1962 at McGill, Leon James coined the phrase “Semantic Satiation” to describe this phenomenon. He explained it as a process where meaningful words fall prey to irrelevance upon repetition. Working in the drug-target discovery space, we cannot help but wonder if the conversation around reproducible research is heading the same way. 

Are You Playing FAIR?

The data revolution driven by the human genome project and later by high throughput technologies has propelled us towards a big data-driven discovery paradigm. As a consequence, a single experiment in pre-clinical research today can produce TBs of complex data in hours/days, and the data accumulates in ever-growing public data repositories. The 3 Vs of Big Data - Volume, Velocity, and Variety along with the complexity, make manual data wrangling unfeasible and mandates FAIRification of data. It also drives home the fact that data access, use, and management are not isolated goals, but rather critical requirement for enabling innovation and discovery.

FAIR data principles- Elucidata

Ensure Accurate, Reproducible Results for Biopharma R&D

Implementing FAIR principles is critical for reusing legacy and newly generated data for tackling high-value healthcare challenges. The NIH and Elixir have been key supporters of the efforts to establish standards for data curation and metadata annotation for reuse and integration of Big Data based on the FAIR principles. The recent OSTP directive was another commendable step in this direction.

 “The FAIR principles put the onus on organizations that own and publish data to make it “machine-actionable”, i.e. a machine can read the metadata that describes the data, and this enables the machine to access and utilize the data for various applications.”

Currently, for most organizations, data generation, storage, analysis, and insight derivation are owned by different stakeholders. A significant bottleneck is the disconnect between these stakeholders. FAIRly stored, managed and shared data facilitates data reuse, enables verification of the credibility and accuracy of the data and the insights derived from it. Further, it enables interdisciplinary collaboration and innovation- accelerating the drug discovery.

To ensure accurate and reproducible outcomes, the obvious solution is a comprehensive, interactive platform that will ultimately help achieve reproducibility as opposed to an in-house mishmash of datasets and tools. Whether it is building high-throughput workflows with independent modules or creating cloud infrastructure for scalable data analysis, computing environments that interact effectively with FAIRified data to generate insights are the need of the hour.

Polly by Elucidata

Contact us if you want to learn more about using our 1.5 million FAIRified datasets to train your models or to take advantage of our data-centric platform Polly to find and analyze relevant datasets.

On that note, Elucidata is hosting DataFAIR 2022, that will feature experts who have successfully transformed data practices within their organizations to de-risk AI/ML initiatives. The primary tracks are ‘Data-Centric AI’ and ‘FAIR Transformation’.



  1. E. Severance and M.F. Washburn in The American Journal of Psychology

Blog Categories

Request Demo