1. Why UALCAN web portal was developed?

Our goal is to create a portal for quick and easy evaluation of publicly available cancer transcriptome sequencing data to identify cancer biomarkers, therapeutic targets. Additionally, UALCAN helps to visualize the data in simple downloadable format.

2. Why the number of total samples differ from sub groups samples added together in many cases?

We used the available information to generate subgroups. All samples may not have the information at the time of data collection and hence the numbers vary. We will try to update the portal periodically as more information becomes available.

3. What is the use of scan by gene classes?

We created ‘scan by gene class’ option with the idea that many cancers show common molecular alterations and this option provides a quick glance at the common gene expression alteration across cancers. Thus, this feature may enhance collaboration between organ specific cancer researchers who have specific reagents for the targets. Please do note that we mention related genes for different classes of genes since some of the genes may not exactly fall the class mentioned even though we collected these lists from different sources. Using associated GeneCards feature, one can confirm the type of gene class/function (if known).

4. Why in some cases even though the value of gene expression between normal and cancer is large, UALCAN scan by gene classes does not highlight them?

The output page of ‘scan by gene classes’ in UALCAN, highlights up-/down-regulated status, if median TPM value of a gene is greater than one in either normal or cancer samples.

5. How one can download data from UALCAN?

UALCAN is built using level 3 TCGA data. User need to download the processed RNA-seq data using TCGA assembler(http://www.compgenome.org/TCGA-Assembler/)or Genome Data Commons Data Portal (https://portal.gdc.cancer.gov/). However, the graphics and figures (box-plot, KM survival plot and Heatmap) from UALCAN can be downloaded in PDF and SVG formats. In addition, box plots can be downloaded as JPEG and PNG formats.

6. How TNBC breast cancers were classified in to subtypes?

TNBC breast cancers are further sub-classified using TNBC subtyping method described by Lehmann BD et al. (Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies. J Clin Invest. 2011 Jul;121(7):2750-67.) and Chen X et al., (TNBCtype: A Subtyping Tool for Triple-Negative Breast Cancer. Cancer Inform. 2012;11:147-56.)

7. How primary prostate tumor samples were categorized?

In case of prostate cancer, tumor samples were categorized into ERG (ETS transcription factor), ETV1/4 (ETS variant 1/4) and FLI1 (Fli-1 proto-oncogene, ETS transcription factor) gene fusions and SPOP (speckle type BTB/POZ protein), FOXA1 (forkhead box A1) and IDH1 (isocitrate dehydrogenase (NADP(+)) 1, cytosolic) mutations based on Cancer Genome Atlas Research Network. The Molecular Taxonomy of Primary Prostate Cancer. Cell. 2015;163(4):1011-25.

8. Why heatmap features are available for only certain cancer types?

The purpose of the heatmap feature is to list top differentially expressed genes based on median TPM values in normal and cancer samples. So this feature is not available for cancer types, for which normal samples were not sequenced. Also for cancer types (such as SKCM, GBM, PAAD, CESC) having less than 10 normal samples, the heatmap feature is not available since the power of the statistics is not strong enough to provide reliable differentially expressed genes.

9. Will you be adding additional datasets and analyses?

Yes, we will update the website periodically with additional data as they become available. We will also add additional feature.

10. In case of genes with low expression level (e.g. Median TPM < 1), how reliable will be prediction of gene expression and survival analysis?

While there is no exact cut-off value, we suggest using independent validations while using graphs for genes with low TPMs (less than 1 Median TPM in both normal and tumors) and the survival curves generated for those genes.

11. In the output of “scan by gene class”, on what basis gene were marked as "up-regulated"?

A gene is marked as up-regulated in specific cancer types, based on following criteria

  • median TPM in tumor samples is greater than median TPM in normal samples
  • Median TPM value is greater than 1
  • Statistical significance of expression differences (p-value) is less than 0.05
  • Ratio of median tumor TPM and median normal TPM is greater than 1.5

  • 12. Do all expression values are represented in boxplot?

    No. Outlier values are not depicted in boxplots.