Noteworthy Datasets on Non-Small Cell Lung Cancer (NSCLC)

Shraddha Dumawat, Deepthi Das
October 6, 2022

Non-small cell lung cancer (NSCLC) contributes to about 80-85% of all lung cancers. A growing number of studies have combined machine learning with multi-omics analysis to improve the prognosis of lung cancer.  In a recent study, the authors trained a deep convolutional neural network (inception v3) on data obtained from The Cancer Genome Atlas to diagnose two types of lung cancer with 97% accuracy. The model could also detect cancer-related genetic mutations. Although the validation cohort in many related studies is relatively small, these findings are sufficient to indicate that multi-omics analysis based on machine learning has great potential in improving lung cancer prognosis. Here, we have curated a few important datasets, each of which has made a relevant contribution to further the understanding of NSCLC.

Explore these interesting datasets to know more about various aspects of NSCLC, such as the expression profiles of long noncoding RNAs (lncRNAs) in early-stage lung squamous cell carcinoma (SCC), plasma lncRNA and mRNA profiles in patients with NSCLC, reprogramming of tumor-infiltrating immune cells in early-stage of NSCLC, gene expression data from African Americans and European Americans with NSCLC and clinically relevant somatic mutations, novel noncoding alterations, and mutational signatures shared by common and rare tumor types. You can find more highly curated datasets on NSCLC (see figure below) and many other diseases from different repositories that can be visualized and analyzed using our DataOps platform, Polly.

Non-small cell lung carcinoma dataset
Non-small cell lung carcinoma dataset

Dataset 1

Identification of lnc RNAs for the detection of early-stage lung squamous cell carcinoma by microarray analysis.

Dataset ID: GSE88862_GPL16956
Year of Publication: 2016
Total Samples: 6
Experiment type: Transcriptomics
Organism: Homo sapiens
Reference link:  Publication, Raw data

Summary:

The aberrant expressions of lncRNAs have been reported in numerous cancers, facilitating cancer diagnosis. However, the expression profile of lncRNAs in early-stage lung SCC has not been well discussed. The present study aimed to examine the expression profile of lncRNAs in early-stage lung SCC and identify lncRNA biomarkers for diagnosis. The authors screened thousands of aberrantly expressed lncRNAs and mRNAs in early-stage lung SCC tissues compared to their corresponding adjacent nontumorous tissues through high-throughput lncRNA microarray. Bioinformatics analyses were used to investigate the functions of aberrantly expressed mRNAs and their associated lncRNAs. To further understand the expression pattern of long non-coding RNAs in early-stage lung SCC, they employed the Arraystar Human LncRNA Microarray V3.0 profiling as a discovery platform to identify lncRNAs that are differentially expressed in early-stage lung SCC. Three pairs of tumor tissues and adjacent normal tissues of early-stage lung SCC patients were used for microarray analysis.

Gene set enrichment plot showing the difference in pathways between the two sources of the tissue.
Gene set enrichment plot showing the difference in pathways between the two sources of the tissue
Complement and coagulation pathways described in relation to the diseases.
Complement and coagulation pathways described in relation to the diseases

Dataset 2

LncRNA BRCAT54 inhibits the tumorigenesis of non-small cell lung cancer by binding to RPS9 to transcriptionally regulate JAK-STAT and calcium pathway genes.

Dataset ID: GSE99870_GPL21827
Year of Publication: 2018
Total Samples: 16
Experiment type: Transcriptomics
Organism: Homo sapiens
Reference link:  Publication, Raw data

Summary:

Increasing evidence suggests that lncRNAs play critical roles in cancers. However, the expression pattern and underlying mechanisms of lncRNAs in NSCLC remain incompletely understood. The authors identified a novel lncRNA (BRCAT54), which was significantly upregulated in preoperative plasma, NSCLC tissues and NSCLC cells, and its higher expression was associated with better prognosis in patients with NSCLC. Overexpression of BRCAT54 inhibited proliferation, migration and activated apoptosis in NSCLC cells and knockdown of BRCAT54 reversed the suppressive effects. Moreover, overexpression of BRCAT54 repressed NSCLC cell growth in vivo. Mechanistically, BRCAT54 is directly bound to RPS9. Knockdown of RPS9 substantially reversed the promoting effects of si-BRCAT54 on cell proliferation and enhanced the inhibitive effect of si-BRCAT54 on BRCAT54 expression. In addition, silencing of RPS9 activated JAK-STAT pathway and suppressed calcium signaling pathway gene expressions. This study identified BRCAT54 as a tumor suppressor in NSCLC. Targeting the BRCAT54 and RPS9 feedback loop might be a novel therapeutic strategy for NSCLC.

Down-regulation of pathways in cancer and calcium signaling pathway post-operation.
Down-regulation of pathways in cancer and calcium signaling pathway post-operation

Dataset 3

Characterizing the metabolic and immune landscape of NSCLC reveals prognostic biomarkers through omics data integration.

Dataset ID: GSE117570_GPL18573
Year of Publication: 2019
Total Cells: 11061
Experiment type: Single cell RNA-seq
Organism: Homo sapiens
Reference link:  Publication, Raw data

Summary:

In this study, the authors analyzed and validated single-cell RNA-seq data by integrating multi-level omics data to identify key metabolic features and prognostic biomarkers in NSCLC. High-throughput single-cell RNA-seq data, including 4887 cellular gene expression profiles from NSCLC tissues, were analyzed. After pre-processing, the cells were clustered into 12 clusters using the t-SNE clustering algorithm, and the cell types were defined according to the marker genes. Malignant epithelial cells exhibit individual differences in molecular features and intra-tissue metabolic heterogeneity. The authors found that oxidative phosphorylation (OXPHOS) and glycolytic pathway activity are major contributors to intra-tissue metabolic heterogeneity of malignant epithelial cells and T cells. Furthermore, they constructed T-cell differentiation trajectories and identified several key genes that regulate the cellular phenotype. By screening for genes associated with T-cell differentiation using the Lasso algorithm and Cox risk regression, they identified four prognostic marker genes for NSCLC. In summary, their study revealed metabolic features and prognostic markers of NSCLC at single-cell resolution, which provides novel findings on molecular biomarkers and signatures of cancers.

UMAP of the cell types generated by CellxGene of Polly.
UMAP of the cell types generated by CellxGene of Polly

Dataset 4

Comparative transcriptome profiling reveals coding and noncoding RNA differences in NSCLC from African Americans and European Americans.

Dataset ID: GSE101929_GPL570
Year of Publication: 2017
Total Samples: 66
Experiment type: Transcriptomics
Organism: Homo sapiens
Reference link:  Publication, Raw data

Summary:

This research aims to determine whether racial differences in gene and miRNA expression translate to differences in lung tumor biology with clinical relevance in African Americans (AAs) and European Americans (EAs). It was observed that AA-enriched differential gene expression was characterized by stem cell and invasion pathways. Differential gene expression in lung tumors from EAs was primarily characterized by cell proliferation pathways. Population-specific gene expression was partly driven by population-specific miRNA expression profiles. Drug susceptibility predictions revealed a strong inverse correlation between AA resistance and EA sensitivity to the same panel of drugs. Comparative transcriptomic profiling revealed clear differences in lung tumor biology between AAs and EAs. Increased participation by AAs in lung cancer clinical trials is needed to integrate and leverage transcriptomic differences with other clinical information to maximize therapeutic benefits for AAs and EAs.

Upregulated gene shown between EA and AA cohort of patients (X2K analysis)
X2K analysis representing the upregulated transcription factors for AA cohort as compared to the EA cohort

Dataset 5

The mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients.

Dataset ID: MSK_IMPACT_2017_Mutation*
Year of Publication: 2017
Total Samples: 40 (represented here)
Experiment type: Mutation
Organism: Homo sapiens
Reference link:  Publication, Raw data

Summary:

Tumor molecular profiling is a fundamental component of precision oncology, enabling the identification of genomic alterations in genes and pathways that can be targeted therapeutically. The existence of recurrent targetable alterations across distinct histologically defined tumor types, coupled with an expanding portfolio of molecularly targeted therapies, demands flexible and comprehensive approaches to profile clinically relevant genes across the full spectrum of cancers. This study established a large-scale, prospective clinical sequencing initiative using a comprehensive assay, MSK-IMPACT, through which the authors have compiled tumor and matched normal sequence data from a unique cohort of more than 10,000 patients with advanced cancer and available pathological and clinical annotations. With this data, they identified clinically relevant somatic mutations, novel noncoding alterations, and mutational signatures shared by common and rare tumor types. Patients were enrolled in genomically matched clinical trials at a rate of 11%.

Here, we represent a cohort that is made of the given datasets to understand the landscape of NSCLC, using 40 samples of the same. The plot below represents a small summary of sample distribution of the cohort generated for the project.

Sample distribution of the datasets representing the cancer stage, disease, and vital status
Sample distribution of the datasets representing the cancer stage, disease, and vital status

Polly’s OmixAtlases provide FAIR biomolecular data on the Polly platform enabling researchers to carry out robust data analysis and effective consumption of omics data. Reach out to us at info@elucidata.io for more details.

Request Demo