In 1990, an ambitious project was launched in the USA to sequence the human DNA and identify, both functionally and structurally, genes that make up the whole genome. It was called the Human Genome Project (HGP). A few years later, another ambitious project was launched to study the abnormal human DNA – The Cancer Genome Atlas (TCGA). Both of these projects had a common history and reason behind their inception: human curiosity and cancer. Before we understood the molecular machinery behind cancer, we dealt with it in all sorts of possible ways: from surgery to unrestrained chemotherapy, cancer was and is the definition of unchartered territory. The nature of data – literally billions of letters – demand crisp representation and visualization so that people across different fields could sense it. cBioPortal is one such attempt to represent the information contained in TCGA. Most of the molecular machinery is hidden from us, and resources such as cBioPortal enable researchers worldwide to unravel interesting and unknown mechanisms.
TCGA's purpose is to catalog genetic mutations responsible for cancer; it hosts data from different technologies and patients. But why focus on only genetic mutations? The answer lies in biology's central dogma, often stated as "DNA makes RNA and RNA make protein". If any alteration happens at the DNA level, then that propagates to protein as well. Proteins perform most functions within a cell, such as catalyzing metabolism, DNA replication, transporting molecules etc. They can execute such functions mainly because of their structure. For example, in our cells the primary source of energy is ATP (Adenosine triphosphate); to create energy ADP is converted to ATP with the help of a protein which looks like a motor with a shaft, the rotation of the motor results in addition of a phosphate to ADP which is ATP. The motor is rotated due to a proton gradient. The whole system looks like a dam with protons as water and protein as a turbine.
Now imagine a hypothetical scenario in which shaft of the protein is broken, it would be devastating for that cell, the shaft most probably would be broken due to a mutation in DNA responsible for creation of the protein. It is almost impossible to say that the shaft is broken just from the DNA data, one must look at the protein sequence, structure and in some cases look at even metabolites to confirm the break in shaft and change of its function. In other words, we must look at other omics data as well to explain the phenotype. Generally, when such a mutation happens in a cell, our body signals it to kill itself, but in some cases, such cells survive and divide. The DNA inside these cells will also be replicated, and this propagation of abnormal DNA sometimes results in cancer.
The diversity of cancer makes it very difficult to study all the cancer types without a common database. However, all cancers are united by their tendency to divide uncontrollably – this can happen in different types of tissues, and with different types of mutations even within the same cancer type. Several labs all over the world have to be involved to collect data since each lab would have expertise in different tissues. This creates the need to create a common platform which can process data from different labs, different tissues and different technologies.
cBioPortal is a platform to visualize the information that TCGA contains. The landing page for cBioPortal is very simple and elegant, it only needs the Gene name to search TCGA. There are many resources surrounding usage of cBioPortal which will guide one to extract data from TCGA. However, not much is available regarding the usage of cBioPortal as a hypothesis generation tool. The journey from raw data to hypothesis can be perilous and confusing with many necessary iterations. With so much data that is generated in this field, it is really important to generate more hypotheses and assess them with real experiments which will ultimately lead to more insights. cBioPortal is one such tool which can be used to generate hypotheses and we will see one such example in further blogs.
Story of a gene:
Each gene has a story of its own, especially an evolutionary one. Each gene encodes information that translates to life. Everytime a cell divides, the gene gets copied and performs the same function in the divided cell (division of cells happen all the time in our body). Occasionally an error happens in copying the gene (that might happen either due to external factors or simply a random error), this error is then copied into divided cells, if this error is not critical to the survival of cell then it is allowed to propagate otherwise the cell is doomed to death. In fact these errors are not that uncommon, the Human Genome Project captures a lot of them, they are sometimes called single nucleotide polymorphisms (SNPs). It is believed that as organisms evolve, DNA is mutated, and such mutations happen all the time. Then how can we label a particular sequence of letters (DNA) as normal?fdsl
Well, if a particular mutation is present in a large population then it is most likely an innocent mistake, à la SNPs. It takes a lot of effort and computation to differentiate harmful and innocent mistakes (or mutations). cBioPortal only contains harmful mutations. The objective is to look only at the potentially malicious mutations and look for patterns amongst genes. One could also look at the expression of a particular gene and a corresponding mutation.
cBioPortal brings together information from different tissues, and different types of cancer from different parts of the world. Never before in history has such volume of data put together for research. It is a great tool for looking at the individual genes and their aberrations which have happened to different cancer patients in the past, this can give us some leads into the mechanism of cancer.
Get the latest insights on Biomolecular data and ML