Curation – The missing link to Single Cell Data Analysis
Big Data

Curation – The missing link to Single Cell Data Analysis

Deepthi Das
February 24, 2022

Discovery teams working on single-cell data typically get stuck for days and weeks on the initial step of sourcing relevant datasets from open-source portals. Storing and analyzing this data is another roadblock. Let’s take a quick glance at the recurring challenges scientists face while performing single-cell analysis (SCA) and some solutions that could streamline their discovery process.

Single Cell RNA Data Analysis Workflow

Challenges of working with publicly available single-cell data

  • Semi-structured, raw scRNA-seq data from public repositories are difficult to retrieve and integrate together for cell-type and cell-function annotation exercises. Each repository processes data differently and may lack the adequate metadata annotations which directly affects findability of these datasets.
  • There is a lack of standards for the deposition of cell-level metadata. Although guidelines have recently been proposed for single-cell data deposition, these guidelines have primarily focused on describing experimental aspects of the study. In most cases, even the cell types assigned by investigators for each cellular barcode are not mentioned.
  • Preliminary exploration or analysis of single cell data has extensive memory requirements. Also, researchers need to spend critical amounts of time downloading the data, packages, and libraries to a computational environment.

  • Analysis and insight generation from different pipelines written by different users is often counterproductive to reproducibility. More importantly, comparing and interpreting different datasets requires a standard processing pipeline.

Here are the factors which can streamline the discovery process

  • Metadata harmonization: Standard metadata fields such as tissue, disease, number of samples, platform or sequencing technology (10x or smartseq), organism, sample cohorts, cell types are some of the key annotations that would ease the effort of identifying relevant datasets.
  • Scalable infrastructure and integrative platform for analysis: A cloud platform that can store different formats of data such as h5ad or h5seurat, perform compute-intensive processing workflows such as Cellranger, Scanpy, Seurat as well as integrations with open source algorithms such as Nichenet (Ligand-Receptor analysis), CCA, Harmony (Batch Correction), SingleR, SCSA (Automated Cell Type Annotation) or applications would be ideal for processing the data.
  • Specific pipeline for consistent analysis: A standard single-cell analysis workflow (such as Scanpy, Seurat) should be used to perform analysis across all the datasets so that comparative studies can be carried out between single cell data from different sources. The datasets can then be stored in a single format, such as h5ad format, which is a widely used format in the single-cell sequencing community. It should be designed to store large amounts of data and allow fast querying of parts of a file without accessing the complete file in memory.
  • Find all the relevant data in one place: A resource/ repository that collates all the single cell data on diverse areas, especially oncology, will save a lot of time and effort for the researchers who could use it to derive meaningful insights.

Consider partnering with us to address your curation woes...

Elucidata’s cloud platform, Polly, allows the user to carry out integrative analysis on single cell data. We host a single cell repository that hosts more than 2100 curated datasets from six different sources. Our curation pipelines, curated data, standard workflows, and scientific expertise are being used by academia and industries across the globe to accelerate their drug discovery process. To know more, please get in touch with us at


Single-Cell approaches to Profile the response the Immune Checkpoint Inhibitors

Insights into immuno-oncology enabled by single cell sequencing

Applying high-dimensional single-cell technologies to the analysis of cancer immunotherapy

Eleven grand challenges in single-cell data science

Single cells make big data: New challenges and opportunities in transcriptomics

Subscribe to our Newsletter

Get the latest insights on Biomolecular data and ML

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Blog Categories