Our goal is to create a portal for quick and easy evaluation of publicly available cancer transcriptome sequencing data to identify cancer biomarkers, therapeutic targets. Additionally, UALCAN helps to visualize the data in simple downloadable format.
We used the available information to generate subgroups. All samples may not have the information at the time of data collection and hence the numbers vary. We will try to update the portal periodically as more information becomes available.
We created ‘scan by gene class’ option with the idea that many cancers show common molecular alterations and this option provides a quick glance at the common gene expression alteration across cancers. Thus, this feature may enhance collaboration between organ specific cancer researchers who have specific reagents for the targets. Please do note that we mention related genes for different classes of genes since some of the genes may not exactly fall the class mentioned even though we collected these lists from different sources. Using associated GeneCards feature, one can confirm the type of gene class/function (if known).
The output page of ‘scan by gene classes’ in UALCAN, highlights up-/down-regulated status, if median TPM value of a gene is greater than one in either normal or cancer samples.
UALCAN is built using level 3 TCGA data. User need to download the processed RNA-seq data using TCGA assembler(http://www.compgenome.org/TCGA-Assembler/)or Genome Data Commons Data Portal (https://portal.gdc.cancer.gov/). However, the graphics and figures (box-plot, KM survival plot and Heatmap) from UALCAN can be downloaded in PDF and SVG formats. In addition, box plots can be downloaded as JPEG and PNG formats.
TNBC breast cancers are further sub-classified using TNBC subtyping method described by Lehmann BD et al. (Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies. J Clin Invest. 2011 Jul;121(7):2750-67.) and Chen X et al., (TNBCtype: A Subtyping Tool for Triple-Negative Breast Cancer. Cancer Inform. 2012;11:147-56.)
In case of prostate cancer, tumor samples were categorized into ERG (ETS transcription factor), ETV1/4 (ETS variant 1/4) and FLI1 (Fli-1 proto-oncogene, ETS transcription factor) gene fusions and SPOP (speckle type BTB/POZ protein), FOXA1 (forkhead box A1) and IDH1 (isocitrate dehydrogenase (NADP(+)) 1, cytosolic) mutations based on Cancer Genome Atlas Research Network. The Molecular Taxonomy of Primary Prostate Cancer. Cell. 2015;163(4):1011-25.
The purpose of the heatmap feature is to list top differentially expressed genes based on median TPM values in normal and cancer samples. So this feature is not available for cancer types, for which normal samples were not sequenced. Also for cancer types (such as SKCM, GBM, PAAD, CESC) having less than 10 normal samples, the heatmap feature is not available since the power of the statistics is not strong enough to provide reliable differentially expressed genes.
Yes, we will update the website periodically with additional data as they become available. We will also add additional feature.
While there is no exact cut-off value, we suggest using independent validations while using graphs for genes with low TPMs (less than 1 Median TPM in both normal and tumors) and the survival curves generated for those genes.
A gene is marked as up-regulated in specific cancer types, based on following criteria
No. Outlier values are not depicted in boxplots.
Yes. As of March 2018, we have integrated processed PRAD [prostate cancer] RNA-seq data from MET500 dataset [Robinson DR et al, Integrative clinical genomics of metastatic cancer. Nature. 2017 Aug 17;548(7667):297-303]. MET500 dataset from University of Michigan, comprises of RNA-seq data from either poly(A)+ or exome-capture transcriptome platform. We have gathered data from exome-capture platform for data consistency. Do note that gene expression units here is RPKM.
ERG gene fusion and AR amplification are common molecular features of metastatic prostate cancers. We used these molecular signatures to categorize PRAD samples. The list of PRAD samples with ERG gene fusions is obtained from the Supplementary table 6 of Robinson DR et al, 2017. Similarly, PRAD samples with AR amplification are obtained from Supplementary table 3 of the manuscript.
Samples with histology type "Seminoma; NOS" only are marked as Seminoma samples, while samples with histology type "Non-Seminoma; Teratoma (Mature)" or "Non-Seminoma; Teratoma (Immature)" or "Non-Seminoma; Yolk Sac Tumor" or "Non-Seminoma; Embryonal Carcinoma" or "Non-Seminoma; Choriocarcinoma" are considered as "Non seminoma" samples. The samples with mixed histology are considered as Non seminoma (as mentioned in https://cancergenome.nih.gov/cancersselected/TesticularGermCellCancer )