Noteworthy Datasets on Infectious Diseases

Deepthi Das
April 15, 2022

Multi-omics analysis, though very popular in cancer studies and other genetic diseases/disorders, has not been commonly used to derive insights on infectious diseases. Integrative multi-omics analysis enables researchers to study infectious diseases at a holistic level. However, when a general datatype distribution is observed, we can see that wet-lab data forms a significant part of the available datatype, a myopic view of infectious diseases.

Researchers have recently started deep-diving into exploring integrative multi-omics analysis to get a deeper picture of the pathophysiology of infectious diseases , the host responses to infection and treatment. The data obtained through integrative analysis can ultimately be used to validate or compare results from new drugs or vaccine studies to accelerate the process of drug discovery and development. On Polly, we have curated multi-omics data readily available for your use in the analysis. Here's a snapshot of some of the interesting datasets

The ‘Monthly Dataset Roundup’’ series features datasets on Polly that are of scientific value, intended to promote data sharing and reuse of biomedical molecular data. Polly Omixatlases contains ML-ready, curated data sets from diverse public data repositories of both omics (transcriptomics, proteomics, metabolomics, single-cell data, etc.) and non-omics data (flow cytometry, lab measurements, immunological assays, etc.). It offers a unique advantage of allowing users to access, utilize and integrate diverse data types to perform a truly multi-dimensional analyses of their research ques on. This month, we are featuring datasets that capture the comprehensive molecular landscape of ‘infectious diseases’; curated versions of which can be found and analyzed on Polly.

Distribution of infectious diseases data sets on Polly, showcasing the various data types available

Dataset 1

Analyze cytokine assay data to identify an inflammatory cytokine signature that predict COVID-19 severity and survival.

Dataset ID: SDY1662_*

Year of Publication: 2020

Total Samples: 4642 from 2340 patients

Experiment type: Lab tests- Blood chemistry measurements, Blood cell count

Organism: Homo sapiens

Reference link: NCBI - Publication


Several studies have revealed that the hyper-inflammatory response induced by SARS-CoV-2 is a major cause of disease severity and death in infected patients. However, predictive biomarkers of pathogenic inflammation to help guide targetable immune pathways are critically lacking. The researchers implemented a rapid multiplex cytokine assay to measure serum IL-6, IL-8, TNF-a, and IL-1b in hospitalized COVID-19 patients upon admission to the Mount Sinai Health System in New York.

To enhance the relevance of the cytokine assays, the team focused on four pathogenic cytokines, IL-6, IL-8, TNF-a, and IL-1b, with clinically available drugs to counteract them, and chose the ELLA microfluidics platform to rapidly measure them (within 3 hours), making these results potentially actionable.

Types of lab measurements included in the dataset from 100 samples
Distribution of patient race from 100 samples

Dataset 2

Study blood cell counts from patient data to gather insights about immune cell proliferation after Angiotensin-converting enzyme inhibitors (ACEIs) and Angiotensin II type 1 receptor blockers (ARBs)

Dataset ID: SDY1641_*

Year of Publication: 2020

Total Samples: 84 from 42 patients

Experiment type: Lab measurements - Blood chemistry, Blood cell count

Organism: Homo sapiens

Reference link: NCBI-Publication

The data set includes lab tests like chemistry test and blood cell count from 42 subjects


The dysfunction of the renin-angiotensin system (RAS) has been observed in coronavirus infection disease (COVID-19) patients, but whether RAS inhibitors, such as angiotensin-converting enzyme inhibitors (ACEIs) and angiotensin II type 1 receptor blockers (ARBs), are associated with clinical outcomes remains unknown. COVID-19 patients with hypertension were enrolled to evaluate the effect of RAS inhibitors.

The researchers observed that patients receiving ACEI or ARB therapy had a lower rate of severe diseases and a trend toward a lower level of IL-6 in peripheral blood. In addition, ACEI or ARB therapy increased CD3 and CD8 T cell counts in peripheral blood and decreased the peak viral load compared to other antihypertensive drugs.

Dataset 3

Study global gene expression profile and differentiation from sorted regulatory T cells and follicular regulatory T cells

Year of Publication: 2017

Total Samples: 6

Experiment type: Transcriptomics

Organism: Mus musculus

Reference link: NCBI - Publication  


Interleukin 2 (IL-2) promotes Foxp3+ regulatory T (Treg) cell responses, but inhibits T follicular helper (TFH) cell develop ent. However, it is not clear how IL-2 affects T follicular regulatory (TFR) cells, a cell type with properties of both Treg and TFH  ell. Hence the researchers conducted an RNA-seq analysis on sorted conventional Treg cells (FoxP3+CD69hiPD1loCXCR5loCD25hi) and Tfr cells (FoxP3+CD69hiPD1hiCXCR5hiCD25lo) obtained from the mediastinal lymph node (mLN) of day 30 infected B6.FoxP3-DTR/GFP mice.

Using this model they found that high IL-2 concentrations at the peak of the infection prevented TFR cell development by a Blimp-1-dependent mecha ism. However, once the immune response resolved, some Treg cells downregulated CD25, upregulated Bcl-6 and differentiated into TFR cells, which then migrated into the B cell follicles to prevent the expansion of self-reactive B cell cl nes. Thus, unlike its effects on conventional Treg cells, IL-2 inhibits TFR cell responses

PCA plot showing heterogeneity between Tfr and Treg cells
Gene Set Enrichment from differential expression between Treg and Tfr cells

Dataset 4

A prospective tuberculosis signature of risk can be derived from publicly available RNAseq data along with PCR data that was used to predict  tuberculosis disease in an independent cohort.

Visual summary of Tuberculosis datasets on Polly

Year of Publication: 2016

Organism: Homo sapiens

Reference link: NCBI - Publication


In this prospective cohort study, the researchers followed up healthy, South African adolescents aged 12–18 years from the adolescent cohort study (ACS) who were infected with M tuberculosis for 2 y ars. They collected blood samples from study participants every 6 months and monitored the adolescents for progression to tuberculosis dis ase. A prospective signature of risk was derived from whole blood RNA sequencing data by comparing participants who developed active tuberculosis disease (progressors) with those who remained healthy (matched contr ls). After adaptation to multiplex qRT-PCR, the signature was used to predict tuberculosis disease in untouched adolescent samples and in samples from independent cohorts of South African and Gambian adult progressors and cont ols. Participants of the independent cohorts were household contacts of adults with active pulmonary tuberculosis disease.

A 16 gene signature of risk was identi ied. The signature predicted tuberculosis progression with a sensitivity of 66·1% (95% CI 63·2–68·9) and a specificity of 80·6% (79·2–82·0) in the 12 months preceding tuberculosis diagn sis. The risk signature was validated in an untouched group of adolescents (p=0·018 for RNA sequencing and p=0·0095 for qRT-PCR) and in the independent South African and Gambian cohorts (p values <0·0001 by qRT-PCR) with a sensitivity of 53·7% (42·6–64·3) and a specificity of 82·8% (76·7–86) in 12 months preceding tuberculosis.

Dataset 5

Systems Biology approach with flow cytometry and microarray data can provide insights in cell population and gene expression profile  to study influenza vaccine 2011-12 in Healthy Children

Dataset ID: SDY364, SDY368, SDY387, SDY522, GSE52005_GPL10558

Year of Publication: 2014

Total Samples: 124

Experiment type: Flow cytometry, microarray

Organism: Homo sapiens

Reference link: NCBI - Publication


The treatment of pediatric immune system dysfunctions depends upon the basic understanding of its molecular and cellular components, as well as the inherent relationships between these compon nts. Specifically, such knowledge requires an appreciation of B-Iymphocytes, T lymphocytes, natural killer cells and dendritic c lls. The authors conducted a prospective study in children immunized with TIV or LAIV to characterize the differences in (1) B-cell populations by flow cytometry, (2) serum antibody titers by hemagglutination inhibition (HAI) assay and virus neutralization assay (VNA), and (3) whole-blood transcriptional profiles to determine whether early changes in expression of certain immune related genes correlated with antibody responses as well as longitudinal study with time points at baseline (day 0), 24hr, day 7 and day 30. 136 total samples anal zed. This was a prospective cohort of previously healthy children 6 months to 14 years of age enrolled between October 2011 and February 2012.

Cell population measured from flow cytometry data in 4 patients in the study

Exploration of expression profiling of TIV or LAIV vaccine shows any downstream analysis will need to be corrected kw_curated_gender covariate

Other Resources

Request Demo