High Value Single Cell RNA-seq Datasets on Polly

Deepthi Das
February 17, 2023

Single-cell datasets are not easy to come by. This data needs to be mined from various publications and multiple repositories like SCP, HCA, GEO, etc. And this is just the beginning. The researcher then has to spend a huge chunk of time extracting and preprocessing this data because it will be present in different file formats, which may not be easily accessible or compatible with all data analysis tools.

Another challenge is incomplete or poorly described metadata (cell types, experimental conditions, sample information, etc.), making it challenging to interpret the results.

Here’s a glimpse of the single-cell datasets on Polly. Polly is Elucidata’s data-centric ML Ops platform which hosts the world's largest collection of highly curated ML-ready single-cell RNA seq datasets. Polly hosts data from high-impact publications, and popular repositories, all consistently processed through standard pipelines and following a standard ontology.

Find these and many more such valuable datasets with high sample counts on Polly.

1. Comprehensive Profiling of Cancer Cells and Their Microenvironment in Advanced NSCLC

Year of Publication: 2021
No of cells: 58 396
Organism: Homo sapiens
Source: GEO
Dataset ID: GSE148071_GPL20795
Reference link: Single-cell profiling of tumor heterogeneity and the microenvironment in advanced non-small cell lung cancer

Lung cancer is a highly heterogeneous disease. Cancer cells and cells within the tumor microenvironment together determine disease progression, as well as response to or escape from treatment. To map the cell type-specific transcriptome landscape of cancer cells and their tumor microenvironment in advanced non-small cell lung cancer (NSCLC), 42 tissue biopsy samples were analyzed from stage III/IV NSCLC patients by single-cell RNA sequencing and present the large-scale, single-cell resolution profiles of advanced NSCLCs.

In addition to cell types described in previous single-cell studies of early-stage lung cancer, rare cell types in tumors, such as follicular dendritic cells and T helper 17 cells, were identified. Tumors from different patients display large heterogeneity in cellular composition, chromosomal structure, developmental trajectory, intercellular signaling network, and phenotype dominance. The study also reveals a correlation of tumor heterogeneity with tumor-associated neutrophils, which might help to shed light on their function in NSCLC.

t-SNE plot showing the different cell types

2. Mapping the Developing Human Immune System Across Organs

Year of Publication: 2022
No of cells: 589 390
Source: Publication
Polly ID: HSC_immune_cells_all_hematopoietic-derived_cells
Organism: Homo sapiens
Reference link: Mapping the developing human immune system across organs

Although recent single-cell genomics studies have offered profound insights into the developing human immune system, they have not conceptualized the immune system as a distributed network across many tissues. Suo et al. integrated single-cell RNA sequencing, antigen-receptor sequencing, and spatial transcriptomics of nine prenatal tissues to reconstruct the immune system’s development through time and space.

They describe the late acquisition of immune effector functions by macrophages and natural killer cells and the maturation of monocytes and T cells before peripheral tissue seeding. Moreover, they describe how blood and immune cell development occurs, not just in primary hematopoietic organs but across peripheral tissues. Finally, the authors characterize the development of various prenatal innate-like B and T cell populations, including B1 cells.

UMAP shows immune cells in different developmental stages

3. Construction of a Human Cell Landscape at the Single-Cell Level

Year of Publication: 2020
Organism: Homo sapiens
No of cells: 599 926
Polly ID: Construction_of_a_human_cell_landscape_at_single-cell_level
Source: Publication
Reference link: Construction of a human cell landscape at single-cell level

Single-cell analysis is a valuable tool for dissecting cellular heterogeneity in complex systems. However, a comprehensive single-cell atlas has not been achieved for humans. In this study the authors use single-cell mRNA sequencing to determine the cell-type composition of all major human organs and construct a scheme for the human cell landscape (HCL). They have uncovered a single-cell hierarchy for many tissues that had not been well characterized. They also established a 'single-cell HCL analysis' pipeline that helps to define human cell identity.

Finally, they performed a single-cell comparative analysis of landscapes from human and mouse samples to identify conserved genetic networks. The study found that stem and progenitor cells exhibit strong transcriptomic stochasticity, whereas differentiated cells are more distinct. The results provide a useful resource for the study of human biology.

UMAP shows the distribution of different cells in the human cell landscape

4. Defining T Cell States Associated with Response to Checkpoint Immunotherapy in Melanoma

Year of Publication: 2018
Organism: Homo sapiens
No of cells: 14 151
Source: GEO
Dataset ID: GSE120575_GPL18573
Reference link: Defining T Cell States Associated with Response to Checkpoint Immunotherapy in Melanoma

Cancer treatment has been revolutionized by immune checkpoint blockade therapies. Despite the high rate of response in advanced melanoma, the majority of patients succumb to the disease. To identify factors associated with the success or failure of checkpoint therapy, the authors profiled transcriptomes of immune cells from 48 tumor samples of melanoma patients treated with checkpoint inhibitors. Two distinct states of CD8+ T cells were defined by clustering and associated with patient tumor regression or progression.

A single transcription factor, TCF7, was visualized within CD8+ T cells in fixed tumor samples and predicted positive clinical outcomes in an independent cohort of checkpoint-treated patients. They delineated the epigenetic landscape and clonality of these T cell states and demonstrated enhanced antitumor immunity by targeting novel combinations of factors in exhausted cells. This study of immune cell transcriptomes from tumors demonstrates a strategy for identifying predictors, mechanisms, and targets for enhancing checkpoint immunotherapy.

UMAP shows the distribution of immune cells

5. Spatial Multi-Omic Map of Human Myocardial Infarction

Year of Publication: 2022
Organism: Homo sapiens
No of cells: 191 795
Source: Publication
Polly ID: All-snRNA-Spatial_multi-omic_map_of_human_myocardial_infarction
Reference link: Spatial multi-omic map of human myocardial infarction

Myocardial infarction is a leading cause of mortality worldwide. While advances in acute treatment have been made, late-stage mortality is still high, driven by an incomplete understanding of cardiac remodeling processes. This study used single-cell gene expression, chromatin accessibility, and spatial transcriptomic profiling of different physiological zones and time points of human myocardial infarction and human control myocardium to generate an integrative high-resolution map of cardiac remodeling.

This approach allowed the authors to increase the spatial resolution of cell-type composition and provide spatially resolved insights into the cardiac transcriptome and epigenome with the identification of distinct cellular zones of injury, repair, and remodeling. They identified and validated mechanisms of fibroblast to myofibroblast differentiation that drive cardiac fibrosis. The study provides an integrative molecular map of human myocardial infarction and represents a reference to advanced mechanistic and therapeutic studies of cardiac disease.

UMAP shows the distribution of cardiac cells

Connect with us to accelerate your journey of finding relevant biomedical datasets, creating cohorts, and visualizing & analyzing the data, thereby deriving actionable insights and probable targets.

Request Demo