Effect of Diseases & Drugs on Blood Cells using scRNA-seq Data on Polly

Shraddha Dumawat, Deepthi Das
December 8, 2022

Peripheral blood is rich in various immune cells and soluble molecules that generate a systemic immune response.  It also acts as a medium that is exposed to and interacts with drugs/chemicals.  A peripheral blood mononuclear cell (PBMC) is any blood cell having a round nucleus, such as a lymphocyte, monocyte, or macrophage. PBMCs give selective responses to the immune system and are the key players in the human body’s immunity. These fundamentally important cells are prone to be influenced by drugs as they come in close contact during blood circulation. Therefore, the availability of PBMCs from peripheral blood is very important for researchers studying the toxicity of new drugs or chemical compounds.

On Polly, we have highly curated single-cell and bulk RNA seq datasets across 169 diseases which you can find and use to accelerate your research. Figure 1 represents the distribution of curated PBMC datasets on Polly. Here, we look at a few single-cell RNA-seq datasets on blood cells influenced by disease/ drugs and showcase how they can be analyzed to gain meaningful information.

Figure 1: Single-cell datasets related to blood on Polly

1. Malaria

Dataset ID: GSE149728_GPL27467
Source: GEO, Single cell
Published on: Sep 30, 2020
Title: Chronic Malaria drives functional heterogenity in B cell subpopulations and expansion of unswitched atypical memory B cells [Malaria scRNA-seq]

Human chronic infectious diseases have been shown to alter the composition and phenotype of the B cell compartment, which in part, can attribute to failure to acquire protective immunity. However, the extent of such alterations is poorly understood. Here, using a combination of bulk and single-cell RNA-sequencing (scRNA-seq) of B cells in individuals living in malaria-endemic Africa, changes in naïve B cell, classical memory B cell (MBC) and atypical MBC subsets were characterized. Unswitched atypical MBCs that expanded in children upon the onset of febrile malaria were particularly interesting. This subpopulation expressed IgD but only low levels of IgM (IgD+IgMlo), high levels of the atypical MBC markers, Tbet and CD11c, as well as the intrinsically autoreactive VH4-34. IgD+IgMlo atypical MBCs were distinguished functionally by their acquisition of high antigen-affinity thresholds for activation, suggesting the IgD+IgMlo atypical MBC expansion during febrile malaria may reduce responses to low-affinity self-antigens during acute malaria. Figure 2 represents the distribution of cell type across the disease cohorts.

Figure 2: Plotting the distribution of cell type for each cohort
Figure 3: Performing Quality Control and Preprocessing the data

Figure 3 represents the following:

'n_gene_by_counts' - number of unique genes detected in each cell.

'total_counts' - total number of molecules detected within a cell (correlates strongly with unique genes)

'pct_counts_mt' - the percentage of reads that map to the mitochondrial genome

Figure 4:  Principal component analysis (PCA) is a mathematical procedure that transforms a number of possibly correlated (e.g., expression of genes in a network) variables into a (smaller) number of uncorrelated variables called principal components (PCs)

Gene expression displays structured co-expression, and dimensionality reduction by principle component analysis groups the co-varying genes into principle components, ordered by the amount of variation they explain.

Figure 5: An alternative to PCA for visualizing scRNASeq data is a tSNE plot. tSNE (t-Distributed Stochastic Neighbor Embedding) combines dimensionality reduction (e.g. PCA) with random walks on the nearest-neighbor network to map high dimensional data to a 2-dimensional space.

2. Drug-Induced Hypersensitivity

Dataset ID: GSE132802_GPL21290_pbmc & GSE132802_GPL21290_skin
Source: GEO, Single cell
Published : Jun 17, 2019
Title: Targeted therapy guided by single-cell transcriptomic analysis in drug-induced hypersensitivity syndrome: a case report

The aim of this study was to determine new therapeutic targets for a refractory drug-induced hypersensitivity syndrome/DRESS using single-cell transcriptomic analysis. 200,000 PBMCs were cultivated in 200ul of RPMI-1640 supplemented with 10% human AB serum with (PBMC_T4_BACT) or without (PBMC_T4_CTRL) 48 ug/mL sulfamethoxazole/trimethoprim (SMX-TMP).

Trimethoprim/sulfamethoxazole. It is an antimicrobial used to treat and prevent many bacterial infections. The FDA-Approved indications include acute infective exacerbation of chronic bronchitis, otitis media in pediatrics only, travelers diarrhea for treatment and prophylaxis, urinary tract infections, etc. [1]
Mechanism of action of Trimethoprim/sulfamethoxazole in the bacterial body. They work by inhibiting folic acid, thereby inhibiting the production of DNA.

For the in vitro therapeutic experiments, PBMCs were cultured in the presence of SMX-TMP with (DRESS_Day4_TOFA) or without (DRESS_Day4_BACT) tofacitinib. The patient was treated with 10mg/d of tofacitinib, a JAK3 inhibitor. The freshly isolated PBMCs were collected again two weeks after the initiation of intervention (PBMC_POST2W). Single cells from the skin, freshly isolated PBMCs, and culture PBMCs were captured using a droplet-based single-cell approach (10x Genomics), and the library was prepared.

Design :

As seen here, the various cohorts of the dataset are mentioned with respect to their characteristics and cells.

Tofacitinib inhibits the process of intracellular signaling from the receptor to the cellular nucleus and inhibits the inflammation process via a new pathway (inhibition of the Janus kinases).

Mechanism of action of Tofacitinib

The study found that the lymphocytes in skin and PBMCs exhibited upregulation of skin-homing chemokine receptors, CCR4 and CCR10, and JAK3 and STAT1. Treatment with tofacitinib dramatically extinguished skin inflammation in a chronic refractory case of DiHS/DRESS. It was concluded that a successful intervention with tofacitinib in a refractory case of DiHS/DRESS was guided by the use of scRNAseq, which demonstrated aberrant activity in the JAK-STAT pathway.

Analysis of the Study

Let us first look at the overall picture of the study to understand the dataset.

Figure 6: Overall picture

Figure 6 represents the following :

'gene_counts' - number of unique genes detected in each cell

'umi_counts' - total number of molecules detected within a cell (correlates strongly with unique genes)

'percent_mito' - the percentage of reads that map to the mitochondrial genome

The sunburst plot below (Figure 7) represents the sample distribution as per the longitudinal study with respect to the drug treatment.

Figure 7: Various cohorts of Drug Hypersensitivity were compared with healthy controls

To analyze this study, the cohorts with respect to the study can be compared to understand the overall differential expression as shown in Figure 8.

Figure 8: Plotting the distribution of cell type for each disease

We can also look at the distribution of the cohort type in terms of cell distribution.

Figure 9: Cohorts with respect to cell distribution on Polly

Here, we analyze the concentration of different cell types at different stages of the clinical study.  The study can be divided into two parts - 1) where the cultured PBMC is studied based on different cohorts and 2) where treatment is done according to the disease. This can be utilized downstream for studies to understand various pathways that play an important role in differential expression.

Figure 10 shows the UMAP plot to understand dimensionality reduction. Datasets can be treated accordingly.

Figure 10: Uniform Approximation and Projection (UMAP) is another nonlinear dimensionality reduction method. Like tSNE, UMAP is nondeterministic and requires that the random seed be fixed to ensure reproducibility. While tSNE optimizes for local structure, UMAP tries to balance the preservation of the local and global structure. For this reason, we prefer UMAP over tSNE for exploratory analysis and general visualization.

Genes can be ranked according to the expression level score and understand the most differentially expressed genes in comparison to various cell types.

Figure 11: Ranking genes

Graph-based Clustering

  • Graph-based methods attempt to partition a pre-computed neighbor graph into modules (i.e., groups/clusters of cells) based on their connectivity.
  • Currently, the most widely used graph-based methods for single-cell data are variants of the Louvain algorithm.
  • The intuition behind the Louvain algorithm is that it looks for areas of the neighbor graph that are more densely connected than expected (based on the overall connectivity in the graph).

In figure 12, we'll explore the graph-based Leiden clustering method, an improved version of the Louvain algorithm.

Figure 12: Grouping cells based on the similarity of their expression profiles allows us to identify cell types and states, as well as infer differences between groups. This is done either via clustering or community detection.

The table below provides insights into how one can look at high gene scores of each cluster in the single-cell expression data.

Table representing high gene scores of each cluster

With this, a comparison of the top high gene scores across the different cell types can be made to understand the gene expression according to the disease indication. Figure 14 represents the same.

Figure 14: Differential expression across top genes in the dataset between the cell types

The given dataset gives us insights into the gene candidates that can be used as targets for countering drug hypersensitivity.


  1. Kim, Doyoung, et al. "Targeted therapy guided by single-cell transcriptomic analysis in drug-induced hypersensitivity syndrome: a case report." Nature medicine 26.2 (2020): 236-243.
  2. Holla, Prasida, et al. "Shared transcriptional profiles of atypical B cells suggest common drivers of expansion and function in malaria, HIV, and autoimmunity." Science Advances 7.22 (2021): eabg8384.

Request Demo