A Deep Dive into Single-cell Analysis in Life Sciences R&D

Single-cell RNA-sequencing (scRNA-seq) is a contemporary next-generation sequencing (NGS) approach designed for the genome-wide measurement of transcriptomic information in individual cells. The development of advanced plate-based or microfluidics capabilities coupled with sophisticated, high-throughput sequencing methods, has enabled the quantification of single-cell gene expression profiles in a massively parallel manner, scaling up to hundreds of thousands of cells in a single experiment.

The analysis of high-throughput scRNA-seq data has proven revolutionary for life sciences research, enabling many discoveries, including the existence of rare cell populations, particularly in disease contexts; re-thinking previous interaction networks, and uncovering novel regulatory relationships between genes given the increased resolution of cell-to-cell variability; and tracking the trajectories of distinct cell populations in development, disease and therapy.

This blog discusses the cutting-edge applications of scRNA-seq, while also highlighting the challenges associated with single cell analysis and measures to address those challenges.

Importance of Single-cell Analysis

Single-cell analysis plays a pivotal role in revolutionizing biopharma R&D by offering unprecedented insights into cellular heterogeneity and functionality. By scrutinizing individual cells, researchers can uncover subtle differences that are obscured in bulk measurements, crucial for understanding disease mechanisms, drug responses, and therapeutic development.

This granularity enables precise characterization of cell populations, identifying rare subtypes, and dissecting complex cellular interactions. Single-cell technologies empower the identification of biomarkers for patient stratification and personalized medicine, enhancing the efficiency of drug discovery and development. Moreover, they facilitate the elucidation of drug resistance mechanisms, paving the way for the design of more effective therapies. Harnessing the power of single-cell analysis expedites the translation of basic research findings into clinical applications, ultimately driving innovation and discoveries in R&D.

Cutting-Edge Applications of Single-cell Data Analysis

Single-cell technologies have already been used to improve the efficiency of the drug discovery and development process in multiple ways- from more efficient drug screening and target identification through highly multiplexed functional screens, to developing improved disease models that enable a finer understanding of the underlying molecular mechanisms, and characterization of drug-tolerant cell subpopulations.

Some of the cutting-edge applications of scRNA-seq include:

1. Drug Screening and Target Identification

The application of scRNA-seq in drug discovery and target screening has accelerated the identification of promising therapeutic targets and streamlined the drug development process. For discovery, novel high-throughput screening (HTS) approaches utilize single-cells as “targets” for examining the effects of potential drug molecules. The resulting sc expression can be analyzed to generate an unbiased and detailed view of the response (to the drugs administered) at the genetic level while accounting for cell-to-cell variability. Additionally, sophisticated library multiplexing techniques in scRNA-seq can extend standard HTS tests to enable profiling of hundreds of compounds, considering multiple doses, time points and cell types, making the assay much more comprehensive than usual.

Single cell techniques have also been successfully applied in target identification, particularly for cancers. For example, Abdelfattah et al performed an integrative analysis of 201,986 single-cells (glioma, immune, and other stromal cells) isolated from 44 samples from 18 low- and high-grade glioblastoma (GBM) patients. They identified S100A4 as a novel therapeutic target in GBM. Deletion of S100A4 in non-cancer cells reprogrammed the immune landscape and significantly improved survival (Abdelfattah et al. 2022).

‍2. Characterizing Drug Resistance

Drug resistance is the principal limiting factor in the treatment of certain cancers and various infectious diseases. Single-cell technologies are emerging as a powerful tool for studying the biological mechanisms of drug resistance at cellular resolution. Analysis of the gene expression profiles of resistant cells can reveal novel gene sets, biological processes, and pathways not previously associated with drug tolerance. This type of data can also guide screening and targeting strategies. Computational methods for drug prediction can utilize the data to identify novel drug candidates for targeting the resistant cells. Patients stratified by markers of tolerant cell populations can enter into clinical trials for combination therapy (Aissa et al. 2021). ‍

3. Assessing Cellular Heterogeneity in Disease

ScRNA-seq can also enable the unbiased detection of rare cell types that drive pathobiology. Single-cell technologies have been useful in providing detailed knowledge of underlying disease mechanisms, and investigation of novel therapeutic approaches for a range of complex diseases, including cancer, neurodegenerative diseases, inflammatory and autoimmune diseases, as well as infectious diseases.

For instance, cancer metastasis studies conducting single-cell analysis of circulating tumour cells (CTCs) revealed spatial heterogeneity and the immune-evasion mechanism of CTCs in hepatocellular carcinoma (HCC), identifying chemokine CCL5 as an important mediator of CTC immune evasion, and highlighting a potential anti-metastatic therapeutic strategy in HCC (Sun, Yun-Fan et al. 2021). Thus, given their enhanced resolution of disease state, SC approaches can be used as more effective, robust models for understanding pathogenesis across a wide range of contexts.

4. Biomarker Discovery and Patient Stratification

Several studies based on the application of scRNA-seq approaches to profile diseased tissues and reporting on biomarkers predictive of drug response or resistance have been published, illustrating another lucrative area of application for this technology ( (Leader et al. 2021, Martin, Jerome C. et al. 2019, Zhang et al. 2021). Patients can be stratified into refined populations based on these novel prognostic markers that predict drug response. These prognostic or predictive biomarkers can then be used as eligibility criteria in clinical trials to identify patients who are more likely to have disease progression or respond to a drug, respectively.

Challenges in Single-Cell Analysis

ScRNA-seq data has undoubtedly revolutionized the field of transcriptomics by allowing us to study cell-to-cell heterogeneity. This granularity of information in turn has enhanced our understanding of cell identity, diversity, and function in the context of normal development and disease. However, despite its successes, scRNA-seq analysis is not without its challenges. Challenges arise due to technical, methodological, or even biological limitations.

1. Low Depth and Coverage: A typical scRNA-seq experiment begins with library preparation which involves isolation of individual cells, followed by mRNA capture and sequencing. However, current high-throughput scRNA-seq protocols capture only a fraction (5% - 20%) of the molecules physically present in a cell causing a non-uniform coverage across genes with an overrepresentation of more highly expressed genes and potential drop out of information from low-expressed genes.

2. Cell Selection, Dissociation, and Handling Errors: As mentioned above, individual cells must be isolated during the library preparation stage. However, the dissociation of cells from tissues or organs is a stressful event causing the loss of delicate cell types or alteration of their gene expression profiles. In some cases, multiple cells may also be finally captured within a sequencing droplet or well, causing the resulting data to be confounded due to the presence of doublets. Careful handling and optimization of cell dissociation protocols are essential to minimize these effects and obtain accurate results.

3. Biological Variability: Biological phenomena like the cell cycle and transcriptional bursting can introduce biological variability that can affect the results if they are unrelated to the scope of the study.

4. Batch Effects: A batch effect occurs when non-biological factors cause changes in the data produced by the experiment. The library preparation stage of scRNA-seq is susceptible to the introduction of batch effects. This can occur due to a variety of reasons - from the sequencing protocol used to the laboratory where the experiment is performed, even down to who was experimenting. Batch effects introduce expression variability in genes potentially leading to inaccurate conclusions.

5. Data Complexity: The first step in the scRNA-seq data analysis process is base-calling. Depending on the technology, the resulting data files may need to be converted to another format for further processing, such as FASTQ. Once the files are in the appropriate format, indexes, unique molecular identifiers (UMI), and other molecular barcodes must be used to demultiplex the reads and remove unnecessary duplicates. This process can be challenging due to cryptic data formats as well as a lack of standardization in data files and formats between different technologies. Once count matrices have been generated, these files are also available in heterogeneous formats making it complicated for the novice user to utilize them easily.

‍6. Computational Requirements: scRNA-seq data generated from complex, large-scale experiments can be very high dimensional, capturing information from hundreds of thousands of cells. Such large-volume data requires both appropriate computational resources and scalable methods, particularly for complex downstream analyses.

‍7. Lack of Standardization in Analytic Approaches: scRNA-seq data analysis can be split into pre and post-processing of the data. Data pre-processing includes the initial analyses to count and clean the data. Whereas post-processing involves dimensionality reduction, clustering, cell types annotation, and visualization. Data integration and batch correction are optional steps that may be required for effective analysis. While the general steps in single-cell data analysis pipeline are somewhat defined, multiple tools are available at each step, each with its advantages, specific use cases, and limitations. Due to these factors and the heterogeneity in technologies and research questions, standardization of analysis pipelines is challenging.

‍8. Challenges in Utilizing Open-source Data: As scRNA-seq technologies have significantly advanced over the years, increasing numbers of datasets are being deposited in public archives. Data from an estimated 3,000 scRNA-seq studies have been submitted to NCBI’s Gene Expression Omnibus (GEO), EMBL-EBI’s ArrayExpress, and the European Nucleotide Archive (ENA) in recent years. This data is potentially a remarkable and comprehensive resource for accelerating biological discovery. However, the utilization of open-source datasets suffers from a reproducibility crisis due to a widespread lack of metadata about the scRNA-seq experiments. Minimum standards for reporting data and metadata for the various scRNA-seq assays must be established.

Addressing Challenges in scRNA-seq Data Analysis with Polly

ScRNA-seq analyses are prone to technical, biological, and analytical challenges. Owing to its complexity, effective handling and utilization of this data requires powerful tools and expertise. Polly - an AI-enabled cloud-based platform, developed by Elucidata, helps address the challenges associated with leveraging the power of biological multi-omics data, including scRNA-seq.

Polly’s robust harmonization engine standardizes and integrates multi-source single-cell datasets and delivers the highest quality data fit for diverse analysis methods and pipelines. It processes measurements, links to harmonized metadata, and transforms them into a Unified Data Model. This harmonization process significantly reduces the time and effort required for data cleaning and harmonization, allowing researchers to focus on scientific discovery.

Polly-curated addresses the challenges associated with scRNA-seq datasets in the following way:

1. Data Harmonization: Polly’s powerful harmonization engine standardizes scRNA-seq data from diverse public and in-house sources. Comprehensive data validation checks ensure that all cell & dataset-level metadata annotations are human-readable and accurately assigned at all levels.

2. Comprehensive QC Checks: All single-cell datasets delivered by Polly undergo ~50 QA checks to ensure quality and provenance. QC pipelines implemented on Polly are consistent with best practices determined in the field. Users also have access to comprehensive QA reports detailing the processing methodology.

3. Normalization & Batch Effect Correction: These are key analytical challenges in single-cell data processing. All scRNA-seq datasets delivered by Polly are normalized to eliminate technical variations related to sequencing depth. Batch effect correction is applied wherever necessary to ensure meaningful comparisons between cells.

4. Cell Type Annotation: Polly processed scRNA-seq datasets also include cell type annotation results which are essential for many secondary analyses.

5. Improved Reproducibility: Standardized data ensures the reproducibility of results, a critical aspect of scientific research. Polly's harmonization engine contributes to the robustness and reliability of scRNA-seq data analysis.

6. Enhanced Collaboration: Polly facilitates collaboration by providing a standardized framework for data sharing. This fosters a collaborative environment where researchers can seamlessly exchange and build upon each other's work.

Case Studies and Success Stories

To highlight the real-world impact of Polly in single-cell transcriptomics research, let's explore a case study where Polly has played a transformative role:

A Boston-based pharmaceutical company wanted to leverage Elucidata’s high-quality, curated single-cell RNA sequencing (scRNA-seq) datasets, to fast-track the identification/validation of gene targets and drug discovery for inflammatory disease in a specific cell type.

The primary challenges faced by the company included finding relevant and good quality datasets; harmonizing publicly available data from multiple sources; and a lack of expertise in analyzing the data appropriately.

Elucidata provided them with a two-fold solution to tackle the above challenges:

Through Polly, we were able to provide a curated collection of 156 scRNA-seq datasets highly relevant to inflammatory disease and their preferred cell type of interest.

Elucidata’s experts in scRNA-seq data management and analysis identified meta-analysis as the appropriate strategy for target identification and ensuring target specificity to cell type and disease of interest. Further, they devised both biased and unbiased strategies for meta-analysis to explore both a) the targets pre-identified by the company and b) to identify new targets. The unbiased analysis utilized classifier models and correlation scores to accurately identify targets in the desired cell type for inflammatory disease.

Impact

‍Elucidata's innovative approach helped in achieving results beyond the client’s expectations by validating 5 pre-identified targets and shortlisting 4 novel targets for further exploration in inflammatory disease.

They also achieved an impressive 4x acceleration in target identification.

Targets were identified and validated within a span of 2.5 months, a process that typically takes 8-10 months. Read the full case-study here.

Conclusion

In conclusion, scRNA-seq is a revolutionary development for advancing life sciences R&D, offering unprecedented insights into cellular complexity and heterogeneity. The evolution of scRNA-seq technologies has opened new avenues in critical application areas like drug development and target identification, biomarker discovery, and a deeper understanding of cellular processes. However, the challenges associated with accessing and utilizing the vast amounts of scRNA-seq data underscore the need for innovative solutions.

Platforms like Polly, are built for addressing the challenges associated with leveraging powerful multi-omics data, including scRNA-seq. Its powerful harmonization engine can be leveraged to provide access to FAIR data, making data more accessible and usable for researchers. By streamlining the data analysis workflow, enhancing collaboration, and ensuring data quality, Polly contributes to the acceleration of transcriptomics research.