GenAI, Biopharma R&D and El

Every day, we witness the emergence of new products and services centered on large language models. Generative AI (Gen AI) models are increasingly handling routine tasks across various industries, underscoring the extensive adoption of this technology.

It is undeniable that GenAI will revolutionize the processes of development, production, and commercialization of new treatments in the Biopharma sector. The pivotal questions revolve around 'When' and 'How.' We tried to paint a picture of the answers to these questions in a candid conversation with Jainik Dedhia, our in-house GenAI expert and senior product manager at Elucidata. Read on to find out the key pointers that came out of the conversation.

Q: Let's begin by exploring the progress of GenAI. What are your thoughts on GenAI's recent progress and adoption in the biopharma industry?

Jainik: I've noticed a significant evolution in Gen AI, especially with advancements like GPT-4, particularly in the realm of language models. However, when it comes to Gen AI within life sciences, progress has been relatively slow. Reflecting on the broader picture, I've observed that other industries tend to embrace new technologies faster than our domain. For instance, while ‘dockers’ became prevalent in some sectors around 2014, it took until around 2019 for bioinformatics professionals in our customer circles to start utilizing it.

In terms of Gen AI, there's a growing interest among researchers, fueled by buzz-worthy papers on models like ScGPT, Gene PT, and AlphaFold.

I believe AlphaFold stands out as a remarkable research achievement.

Mostly because it addressed a long-standing problem that seemed almost impossible for humans to solve without advancements in AI. The complexity of protein structures, which are challenging for human comprehension, was successfully tackled by AlphaFold. The progress in pure research, particularly in the imaging field related to predicting conditions from data like X-rays or CT scans, has been notable. While it's unclear if these advancements are in current production use for patient care, the potential for AI, such as ScGPT, in areas like drug and target discovery is evident.

Many are acknowledging the potential of GenAI in research and are keen to explore its applications. Conversations with our customers and peers suggest that most bioinformatics professionals are in the early stages of understanding these methodologies. They're delving into research papers and contemplating how to integrate GenAI into their daily tasks. Despite the excitement, successful implementation is expected to take some time, possibly a year or two, as researchers work to distill these cutting-edge concepts into tangible tools for everyday use.

‍

Q: So, you feel the rate of adoption of GenAI in biopharma industry is relatively slow. Why so? Could you point out some key challenges that hinder its integration into our industry?

Jainik: Diving into the world of GenAI adoption in BioPharma has been quite a journey. One of the major challenges is ensuring the quality of what's being generated. In the dynamic landscape of cutting-edge tools like AlphaFold, and GPT, it's easy to be captivated by their seemingly magical capabilities. However, the reality is far more nuanced. It's crucial to understand that these models are only as effective as the data we provide them. If we feed them unrefined or poorly curated data, they'll inevitably produce outputs that mirror the quality of that input. Essentially, it boils down to the principle of "garbage in, garbage out.”

This graph (from one of our conference presentations) illustrates the importance of correct labeling in model performance

Then there's the whole infrastructure challenge. Biotech companies often find themselves in a bit of a bind when trying to set up the tech stack for Gen AI adoption. They recognize the value in prototypes shared through research articles. While attempting to replicate these methodologies, they come to the realization that their existing infrastructure lacks the capability needed to effectively power certain AI methodologies. Let’s take the example of one of our customers. We've been providing them with both single-cell and bulk RNA sequencing data, which they used to fuel their internal knowledge graph or network. Initially, they were adamant about keeping this project entirely proprietary and self-built. However, a couple of months back, one of their bioinformatics team members mentioned that they're now seeking a managed AI infrastructure solution to support their knowledge graph. This shift highlights the struggle of managing the required AI infrastructure.

Another major hurdle is around expertise in the field. Take the case of biotech companies; there's a notable expertise gap when it comes to areas outside their scientific focus. These organizations understandably prioritize scientific endeavors as their core competency. However, challenges arise when they need to incorporate new technologies or embrace cloud engineering. Such initiatives can be daunting due to a lack of in-house expertise, making it challenging to efficiently manage projects or attract the right talent. Additionally, hiring dedicated personnel for these specific tasks may not be cost-effective, given that these endeavors aren't their primary focus. Balancing the need for technological advancements with the core scientific work poses a unique challenge especially for smaller biotech companies.

‍

Q: As an early adopter of GenAI, Elucidata has pioneered the integration of this cutting-edge technology into its workflows. Could you provide insights into the specific use cases where GenAI has been successfully implemented?

Jainik: Elucidata specializes in ingesting and providing harmonized biomedical data from diverse sources in standardized machine learning (ML)-ready formats. Our cloud platform, Polly, delivers harmonized biomedical data to accelerate key research milestones.

Over a year back, we upgraded our automated curation process by adopting the chatGPT-based model, which has significantly improved efficiency and accuracy compared to our earlier BERT-based models. Initially, we used BERT models along with manual quality checks for curation. ChatGPT has some distinct advantages over BERT. It's a language model designed to follow instructions, making it versatile in handling various tasks. Additionally, as a language model, it demonstrates higher accuracy in task execution. Preliminary estimates indicate that this shift has doubled our curation speed, resulting in a substantial enhancement in efficiency and cost-effectiveness. Anyone who wants to know more can check out the detailed blog on our website.

We also wanted to expand the horizons of GenAI in biomedical data analysis. We developed PollyGPT, a GPT model fine tuned on biomedical data to transform multi-omics data engagement by interpreting natural language prompts and conducting statistical analyses, catering to users without Python or SQL expertise. There is an interesting short video that showcases PollyGPT's ability to interpret natural language prompts for identifying normal/ diseased samples, performing complex statistical analyses like PCA, differential gene expression, and generating insightful visuals.

Q: How do you see the biopharma research-GenAI landscape evolving in the coming years?

Jainik: I believe single-cell RNA sequencing analysis holds immense potential, and with the advent of technologies like ScGPT, there's a promising future. Many are discussing the use of data augmentation, though it's not widely implemented in production yet. Personally, I anticipate substantial growth in this industry, possibly exceeding a CAGR of 20%. While the initial growth might be gradual, it could accelerate to 100% year-on-year. It's challenging to predict when saturation might occur, but currently, we seem to be on a trajectory of rapid and exponential expansion.

Companies like Novartis, Cygnal, and Branch Bio are making waves with their evolving team compositions and approaches. Celsius and Owkin are even ahead of the curve, adopting a platform-oriented and scalable approach. However, the build vs. buy debate is real. Some organizations often start off wanting to do everything in-house, but reality kicks in, and they realize they need external expertise due to resource constraints and cost considerations.

Conclusion

GenAI is a versatile technology with a wide range of perspectives that can coexist harmoniously. It's essential to remain open-minded and adaptable as our understanding evolves. Over the past year, we've engaged in enlightening discussions with fellow early adopters, enthusiasts, and skeptics across various platforms. We conducted a GenAI workshop in December last year focused on implementing Gen AI models in practical scenarios, and we are gearing up to host another one soon. Looking forward, I feel that scaling up solutions is the name of the game, with a growing recognition of the importance of infrastructure and engineering expertise. There is a video recording where our CTO, Swetabh Pathak, spoke about the the tools needed to set your GenAI initiatives up for success, in case you missed the workshop. Connect with us if you want to know more!

‍