All cancers arise as a result of the acquisition of a series of fixed DNA sequence abnormalities, mutations, many of which ultimately confer a growth advantage upon the cells in which they have occurred. There is a vast amount of information available in the published scientific literature about these changes. COSMIC is designed to store and display somatic mutation information and related details and contains information relating to human cancers.
Types of data
There are two types of data in COSMIC: Expert manual curation data and systematic screen data. It is useful to understand the differences of these data types and use them appropriately.
Expert curation data
- Manually input from peer reviewed publications by COSMIC expert curators
- Consists of comprehensive literature curation of selected Census genes at release, followed by subsequent updates (Cancer Gene Census)
- Includes additional data points relevant to each disease and publication
- Provides accurate frequency data as mutation negative samples are specified
- Also called non-systematic or targeted screen data
Genome-wide screen data
- Uploaded from publications reporting large scale genome screening data or imported from other databases such as TCGA and ICGC
- Provides unbiased molecular profiling of diseases while covering the whole genome
- Provides objective frequency data by interpreting non mutant genes across each genome
- Facilitates finding novel driver genes in cancer
Selecting papers from the literature
To identify papers reporting somatic mutations PubMed is broadly searched for papers containing relevant mutation data (example search: (ras OR genes, ras) AND human AND mutation). Those identified from their abstracts to include somatic mutation information relating to cancer or pre-cancerous conditions are then selected for curating. After examination of the information in the full text of the paper, the sample and mutation data are extracted. Any papers containing incomplete data (e.g. mutations that are reported but not fully described) or data of insufficient quality (e.g. errors identified in the data) are not fully curated but are added to a list of "additional references containing somatic mutation information".
We attempt to map every mutation to a single version of a gene, but where this is not possible we map to an alternative transcript. The gene sequences are held in COSMIC and available in the Download section here.
A central aim of COSMIC is to provide somatic mutation frequencies. These are available in the main display windows. However, it is important to understand how they are calculated and possible limitations of the data.
What mutation detection method was employed?
Mutation screening methods differ in their sensitivity and the sensitivity of a particular method can vary from laboratory to laboratory. Some methods identify all classes of small intragenic mutation (base substitutions and small insertions/deletions). However, the protein truncation test will not detect mutations that cause missense amino acid substitutions.
Was the whole gene screened?
Some genes are characterised by mutation hot spots, for example BRAF, RAS and TP53. These genes are often screened for somatic mutations only in the region most likely to contain mutations. This strategy will obviously miss mutations located elsewhere in the gene and hence will provide a distorted view of the distribution of mutations in the gene and perhaps underestimate the frequency of mutations.
Has the sample been screened before?
There are examples where the same data is reported twice, perhaps in a follow-up study with reference to further data or as a positive control, for example using cell lines with known mutations. Where possible we have noted sample names and within papers have removed any redundancy. However between papers it is not possible to confirm two samples with the same name are indeed the same sample. We have therefore included both samples and both results in COSMIC. If you want to review this information the sample name, mutation and paper reference are displayed in the Mutation Details view.
Are all the mutations real?
For many putative somatic mutations that have been reported in the published literature, definitive evidence that they are somatically acquired (through demonstration of their absence in normal DNA from the same individual as the tumour) is not available. Therefore, occasional germline variants may have inadvertently been represented in publications as somatic mutations and entered in the database. In addition, simple laboratory errors which result in an incorrect normal DNA sample (i.e. from a different individual) being analysed as a control for a particular tumour sample may provide apparently persuasive, but misleading, evidence of somatic origin. Finally, DNA amplification methods have an intrinsic error rate, and these errors may subsequently be interpreted as somatic mutations. There is some evidence that this may be a particular problem in analyses of archival formalin-fixed, paraffin embedded material.