"Rarity Has an Inherent Value" - Single Cell Omics and Rare Cell Detection

In biology, rare cell populations often play a pivotal role in both normal function and disease. "Rare" refers to an infrequent population of cells, metabolites, and biomolecules that, despite their low abundance, drive critical biological phenomena. These rare cells, including cancer stem cells, circulating tumor cells, antigen-specific T-cells, and stem cells, can profoundly influence cellular function, repair mechanisms, and disease progression. Studying these populations is crucial for understanding how diseases initiate and evolve.

For instance, the early detection of cancer relies on identifying signals from rare entities such as circulating tumor DNA, tumor cells, exosomes, or specific blood markers. These signatures can indicate the presence of a tumor in its nascent stages, providing critical insights before it fully develops.

Disorders Are Often Propelled Forward by Rare Cellular Events

The relevance of rare cells is underscored by numerous studies. For example, Montoro et al.'s study, "A revised airway epithelial hierarchy includes CFTR-expressing ionocytes" highlights the role of previously unknown Foxi1+ ionocytes. Although these cells are rare, comprising merely 0.39% of all epithelial cells, they are crucial in driving cystic fibrosis by producing the majority of the cystic fibrosis transmembrane conductance regulator (CFTR) in the CF lung. There are several other examples, which showcase that the initial progenitor population or even the final contributing population for a disease is a rare one. As a function of understanding biology and diseases, it is critical to consider approaches to discover these rare populations.

Can Single-cell Omics Capture Rare Cells ?

The advancement of single cell technologies such as droplet based sequencing methods have enabled us to achieve high sequencing depths, significantly enhancing our ability to capture signals even with low-abundances. As these technologies continue to improve, the limit of detecting rare signals will increasingly rely upon computational approaches to identify them. Thus, downstream methods and analysis become critical components for the discovery of rare cell populations.

Discovering the Rare Through Single Cell Methods

Leiden clustering employs a neighborhood approach,and is a widely used method for clustering cells. It has become a crucial step in most single cell workflows. The review by Zhang et al [2] highlights that Leiden clustering (as a part of Seurat) can perform well even compared to some dedicated rare cell detection algorithms, especially for identifying clusters of rare cell populations in large datasets. However, it requires careful parameter tuning to identify these populations effectively. Further, it is essential to use dedicated methods specifically designed for rare cell detection in order to reliably identify rare cell populations and to avoid bias and subjectivity which could arise from parameter tuning in Leiden Clustering.

Dedicated Rare Cell Detection Methods

It is also crucial to use reliable rare cell detection methods to accurately identify rare cell populations within the datasets. Although the lack of comprehensive review of all available methods makes it difficult to determine the best option for different technical scenarios, this blog highlights a few widely adopted and cited methods for identifying rare cell populations in single-cell data. These include:

FiRE: Finder of Rare Entities (FiRE) by Jindal et al [3], uses a density based approach to find local density within data points. Its methodology is robust for variations and scalable for large datasets, producing better performance in terms of speed as compared to other methods.
GiniClust2: An extension of the original GiniClust [4] algorithm, GiniClust2 [5] utilizes the Gini index and Fano factor to improve the detection of rare cell types. While effective, its distance-based approach can be slower than FiRE. The recently released version GiniClust3 [6] aims to be faster by changing the clustering algorithm from DBSCAN to Leiden, but requires more benchmarks.
RaceID3: Published (as a single publication) alongside the cell fate inference method FateID, RaceID3 [7] improves upon the initial RaceID method for identifying rare cell populations. It uses a distance-based k-medoid clustering approach and includes an efficient feature selection step to enhance rare cell detection.

Several new methods, such as scSID [8], scBalance [9], and CIARA [10] have also shown promise in finding rare cell populations. These methods require further usage and comparison to establish benchmarks for scientific accuracy and technical efficiency.

Conclusion

Therefore, the ability to accurately detect rare cell populations is crucial for advancing one’s understanding of diseases and developing targeted treatments. While there are many options available, selecting the right method depends on specific technical scenarios, such as sequencing protocols and library complexity.

At Elucidata, we provide custom solutions for single-cell analysis, regardless of downstream requirements. To learn more, reach out to us here or email us at info@elucidata.io.