1. Distribution of Key Quality Control Metrics
Figure 1: These violin plots display the distribution of quality control metrics for each cell. Metrics include the number of genes detected, total transcript counts, and the percentage of mitochondrial transcripts.
A good-quality dataset would typically have a reasonable number of genes detected per cell and a moderate total transcript count. High mitochondrial transcript percentages can indicate low-quality, dying cells. Please Note: certain datasets do not have mitochondrial genes (MT-), thus figure for percentage of mitochondrial transcripts may be empty.
2. UMAP Visualization of Cells Colored by Sample
Figure 2: Sample level distribution of clustering pattern of cells with the help of UMAP embeddings.
If cells from the same sample cluster together distinctly from cells of other samples, it may indicate the presence of batch effects. Ideally, cells should be mixed and group based on their biological characteristics rather than their originating sample, indicating that the data is free of significant batch effects and the samples are comparable.
3. Stacked Barplot of Cell Types Distributed Across Samples
Figure 3: The bar plot showcases the distribution and abundance of different cell types within each sample. Each color in a bar represents a different cell type with the height of the color segment indicating the count of that cell type in the sample.
A uniform distribution of cell types across samples, may suggest that the sample preparation and preprocessing methods used were effective and there was minimal bias or variation in the processing steps. In some cases, if the experiment design ensures enrichment of a cell-type in a sample, then a non-uniform distribution is also valid.
4. Stacked Barplot of Clusters Distributed Across Samples
Figure 4: The bar plot showcases the distribution and abundance of different clusters within each sample. Each color in a bar represents a different cluster with the height of the color segment indicating the count of that cluster in the sample.
Generally, a uniform distribution of clusters across samples, suggests there was minimal bias or variation in the processing steps.
5. Stacked Barplot of Cell-types Distributed Across Clusters
Figure 5: The bar plot showcases the distribution and abundance of different cell types within each cluster. Each color in a bar represents a different cell-type with the height of the color segment indicating the count of that cell-type in the cluster.
Generally, each cluster should have only one cell-type to indicate accurate cell-type annotation. A corner-cases are observed when the authors have only provided cell ID to cell-type mapping and no marker genes. These need to manually rectified.
Figure 6a: The bar plot visualizes the total count of cells detected in each sample. Each bar corresponds to a different sample, with its height representing the number of cells.
This plot provides an understanding of the sample distribution in terms of cellularity. A wide variance in cell numbers across samples might indicate inconsistencies in cell isolation, sample preparation, or sequencing depth. Consistent cell counts across samples, however, would suggest a more uniform sampling process.
Figure 6b: The bar plot illustrates the median number of genes detected in each sample. Each bar represents a different sample, and its height corresponds to the median gene counts.
Consistently low gene counts might indicate low sequencing depth or poor-quality samples. On the other hand, large variances between samples or cell types might point to technical biases or true biological differences.
Figure 6c: The bar plot showcases the median percentage of mitochondrial gene transcripts across samples.
Consistently high mitochondrial gene percentages across samples might indicate a widespread issue with cell viability, while sporadic high values could suggest sample-specific issues which can be removed before downstream analysis
7. Gene Counts Distribution
Figure 7: The plot provides a smoothed representation of the distribution of detected genes across cells.
This plot gives an idea about the average gene richness in cells. High variability might indicate a mix of high and low-quality cells.
8. UMI Count Distribution
Figure 8: The plot provides a smoothed representation of the distribution of UMIs across cells.
This plot offers insight into the typical transcriptomic depth of the dataset. A broad distribution might indicate variability in sequencing depth across cells.
9. UMI Vs Gene Counts Distribution Scatter Plots Colored by Density
Figure 9: The scatter plot provides a visual representation of the relationship between the number of unique molecular identifiers (UMIs) and the number of genes detected in single cells. The color intensity indicates the density of data points in a particular region of the plot, allowing for the identification of trends and patterns.
Ideally, one would expect to see a positive correlation between UMIs and genes, indicating that cells with more transcripts also express more unique genes. Areas with higher density may represent the most typical cells in the dataset, while outliers could indicate low-quality cells or potential doublets.