Gene Expression - Data

Gene expression level 3 data has been downloaded from the publicly accessible TCGA portal. The platform codes currently used to produce the COSMIC gene expression values are: IlluminaHiSeq_RNASeqV2, IlluminaGA_RNASeqV2, IlluminaHiSeq_RNASeq, and IlluminaGA_RNASeq.

Please note that as from COSMIC v71 we no longer show results from the array platforms AgilentG4502A_07_2 and AgilentG4502A_07_3. By using only RNAseq data we can show more results. This is because disagreement between the array and RNAseq data was quite common and resulted in the exclusion of data (see 'Qualitative merging of results' below).

For the RNASeq platforms we used the .trimmed.annotated.gene.quantification.txt, files which contain Level 3 expression data and used RPKM as a method of quantifying gene expression from RNA sequencing data by normalizing for total read length and the number of sequencing reads.[https://wiki.nci.nih.gov/display/TCGA/RNASeq]

For the RNASeqV2 platforms, the files used were rsem.genes.normalized_results, which contain Level 3 expression data produced using MapSplice to do the alignment and RSEM to perform the quantitation. [https://wiki.nci.nih.gov/display/TCGA/RNASeq+Version+2]

Analysis

The mean and sample standard deviation of the gene expression values have been calculated from the Tumour samples that are diploid for each corresponding gene, platform, study. Based on these mean and STDEV values we have calculated the standard scores for gene expression for each corresponding gene, platform, and study.

Qualitative merging of results

Qualitative merging of results, per study(project_code) across analysis platforms. In order to display if a gene is over or under expressed, a threshold of 2 STDEV, plus or minus was selected. In the cases that a sample has been analysed with more than one platform for the specific study and gene where the scores from all platforms are above or below the threshold then we display over or under. If they do not agree then we do not display it. The z_score displayed across thew website (serves as an indicative score of expression level) is taken from one platform in order of preference: IlluminaHiSeq_RNASeqV2, IlluminaGA_RNASeqV2, IlluminaHiSeq_RNASeq, IlluminaGA_RNASeq

Help Index