Product & Engineering

Visualizing Single Cell Datasets Using CellxGene

Jayashree
December 16, 2022

Recent technological advances like NGS have enabled unprecedented insight into transcriptomics at the level of single cells. Single-cell RNA sequencing has emerged as the technique of choice for researchers trying to understand the cellular heterogeneity of tissue systems under physiological and pathological conditions. However, large single-cell datasets make it difficult for researchers to access them for their analyses.

Read on to know how single-cell data is emerging and how the data can be easily visualized on CellxGene.

What Is Single-cell RNA Sequencing?

RNA sequencing (RNA-seq) is a technique used to detect and quantify RNA in a biological sample and is useful for studying cellular responses. When done on a single cell, it is called single-cell RNA sequencing (scRNA-seq).

Thanks to innovative sample preparation and sequencing technologies, gene expression in individual cells can now be measured for thousands of cells in a single experiment. Since its introduction, single-cell RNA sequencing approaches have revolutionized transcriptomic studies as they created unprecedented opportunities for resolving cell heterogeneity by exploring gene expression profiles at a single-cell resolution.

With more detailed and accurate information, scRNA-seq greatly promotes the understanding of cellular functions, disease progression, and treatment response.

Challenges While Working with Single-cell Data

  1. Semi-structured, raw scRNA-seq data from public repositories are difficult to retrieve and integrate together for cell-type and cell-function annotation exercises.
  2. Different research groups process the data with different reference genome database versions or reference build. For example, IL4R has aliases CD124 and IL4RA. The use of inconsistent gene names creates mapping issues, and additional steps need to be performed to identify common genes (differentially expressed/ target genes) between datasets because of the discrepancy in nomenclature.
  3. Preliminary exploration or analysis of single-cell data has extensive memory requirements.
  4. Analysis and insight generation from different pipelines written by different users is often counterproductive to reproducibility. More importantly, comparing and interpreting different datasets requires a standard processing pipeline.

Overcoming Challenges: Single-cell Data on Polly

Curated - harmonized, standardized, annotated - data processed using standard pipelines is the key to overcoming the above-mentioned challenges, especially in the context of single-cell research. Polly- Elucidata's data harmonization platform, addresses this need by hosting and managingFAIR (Findable, Accessible, Interoperable, and Reusable) multi-omics data from both public and proprietary sources. Within Polly, highly curated and machine-actionable single cell data is available, offering a powerful resource to tackle the roadblocks encountered in single-cell research.

The curated single cell data within Polly is processed using standardized pipelines, ensuring consistency and reproducibility across experiments. By harmonizing and annotating the data, Polly provides researchers with a comprehensive understanding of the underlying biological context. This curated approach facilitates seamless integration of single cell datasets from diverse sources, enabling researchers to perform robust analyses and extract meaningful insights.

With access to highly curated single-cell data on Polly, researchers can overcome many of the challenges associated with batch effects, data variability, and integration across studies. By leveraging this resource, researchers can accelerate their discoveries, advance our understanding of cellular heterogeneity, and uncover novel insights into complex biological systems.

  • FAIR Data: On Polly, users get access to comprehensive single-cell RNA-seq data, which is structured and readily usable compared to the raw data available at single-cell data source repositories.
  • Comprehensive Resource: Polly鈥檚 Single Cell Data repository consists of single-cell RNA Sequencing datasets from multiple sources like Gene Expression Omnibus, Expression Atlas, Human cell atlas, and Single Cell Portal, to name a few.
  • Consistent Schema: All single-cell datasets are processed through a standard pipeline and made available in consistent tabular formats. This data is readily usable for downstream analyses.
  • Excellent Findability: Structured metadata attributes (1) curated as per standard ontology terms and (2) organized in terms of usability, which enables end-users to apply required filters (through the GUI or command-only interface) against specific columns like disease, tissue, drug, cell-line, organism, phenotype/drug-specific data as top ones, and getting a relevant outcome in a time-efficient manner.
  • Improved Query Results: Each dataset on Polly is available with six standard harmonized metadata fields curated using Polly's proprietary NLP-based curation model, Polly-BERT.
  • Raw Data Processing: Polly鈥檚 Single Cell Atlas can be processed through an expert-vetted workflow on Polly or a custom workflow of your choice. The resultant data can be utilized for seamless comparison with your proprietary data or data from other sources.

Analyze Single-cell Data Using CellxGene

What is CellxGene?

CellxGene is a single-cell visualization platform developed by the Chan-Zuckerberg initiative. It allows users to explore single-cell RNA-seq (scRNA-seq) datasets in the web browser without any computational skills. This third-party app is hosted on Polly, and it helps you form more visualizations and gain more insights. The explorer makes it easier for biologists to collaboratively explore and understand their single-cell RNA-seq data.

How Is Single-cell Data Stored: CellxGene on Polly

The data is available on Polly in h5ad format. This is a file format in which data (data matrices) can be stored along with the sample metadata, and some additional information points can be added as layers to the raw data, which is very efficient in single-cell where we need to store clustering, PCA, UMAP, etc. The structure of an h5ad file is shown in the image below:

single cell data
Single-cell data format on Polly

How Does CellxGene Help in Data Analysis on Polly?

One can use the CellxGene application to analyze data from sources like GEO. The major use cases of CellxGene are:

  1. Examining Categorical Metadata: Categorical metadata (such as tissue of origin or cell type) can be used in several ways within CellxGene, including coloring embedding plots (i.e., color UMAP by cell type), looking at cell counts, making selections of cells or viewing the interaction between different categorical metadata fields.
  1. Find Cells Where a Gene Is Expressed: Numerical metadata (such as gene expression features or QC metrics like the number of genes) can be examined on the embedding plot and be used to filter and select cells.
  2. Compare Expression of Multiple Genes: CellxGene allows you to compare the expression of multiple genes via bivariate plots.
  3. Using Gene Sets to Learn about Cell Population Functional Characteristics: CellxGene allows you to examine groups of genes via the gene sets feature.
  4. Find Marker Genes: CellxGene allows you to find marker genes between selected cell populations.

Poorly labeled metadata and missing annotations present significant challenges for researchers analyzing single cell datasets. These issues can make it extremely difficult to extract meaningful insights from the data. However, Polly addresses these drawbacks by providing curated columns and cell type curation features, enhancing the usability of single cell datasets.

With Polly, users have access to well-organized metadata and comprehensive annotations, making it easier to visualize and explore single-cell datasets. By leveraging curated columns and cell type curation tools, researchers can quickly identify and analyze relevant cell populations, facilitating more efficient data interpretation and downstream analysis.

Furthermore, Polly seamlessly integrates with CellxGene, for single cell analysis, allowing users to leverage its full suite of features within the Polly platform. This seamless integration streamlines workflows and enhances the overall user experience, empowering researchers to accelerate their research efforts.

Reach out to us today to learn more about how Polly can help accelerate your single cell research and unlock new insights into cellular heterogeneity and biological complexity.

Blog Categories

Blog Categories

Request Demo