Frequently Asked Questions ( FAQ )

Can I download the full COSMIC dataset ?

The export sheets are updated with each COSMIC release, the file can be found here : cancer.sanger.ac.uk/files/cosmic/current_release/CosmicCompleteExport.tsv.gz. This file contains all the samples analysed for every gene in COSMIC found with/without mutation.

Where can I download all the mutations from COSMIC ?

The export sheets are updated with each COSMIC release, the file can be found here : cancer.sanger.ac.uk/files/cosmic/current_release/CosmicMutantExport.tsv.gz. This file contains all the samples analysed for every gene in COSMIC found with/without mutation.

How can I find the latest COSMIC release version ?

On Cosmic home page, search panel is updated with the latest release version number and also in the news section in the right hand panel of the home page.

Which version of the human reference sequence does COSMIC use ?

Currently we are using GRCh37. However, we are now in the process of remapping to the latest build.

How are samples counted in COSMIC ?

Each sample has its own name and ID. Multiple instances of the same sample name can exist as separate entries, indicating that it was unclear during curation that these samples were identical, apart from their name. To account for the duplication of probably identical samples during curation, we attempt to combine samples with identical names and disease descriptions. Please see the help documentation for more details.

What does 'NS' mean in tissue/histology classifications ?

NS means 'not specified'.

Can I download data from older versions os COSMIC ?

We make data files available to download for the current release and for the 3 previous releases only ie data files released more than 1 year ago will not be available.

I am preparing a manuscript for publication and I am including some COSMIC data. How should I cite COSMIC ?

We are very happy for you to use the data, and any tabulations or graphic screenshots which support your work. Please cite the website address (cancer.sanger.ac.uk) and the paper COSMIC: exploring the world's knowledge of somatic mutations in human cancer (Forbes et al. 2014). Thank you.

How often is COSMIC updated ?

COSMIC is updated once every three months; see the news item for what's in the new version.

What is the difference between cell line data in COSMIC and the Cell Line Project ?

The Cell Line Project is an in-depth analysis of over a thousand commonly used cancer cell lines, as defined here. However, many more cell lines have been examined in the literature for somatic mutations, and they are all recorded in the standard COSMIC database.

What are Census genes and where is the updated version of the census ?

The Cancer Gene Census is a list of genes known to be involved in cancer. They are listed here: http://cancer.sanger.ac.uk/cosmic/census/tables?name=symbol&

Can I submit data to COSMIC ?

All mutation data in COSMIC is currently entered by our curators. If you would like to submit data for one of your publications, or even pre-publication, please contact "cosmic@sanger.ac.uk" and one of our curators will be happy to help.

How do I calculate gene mutation frequencies across multiple transcript variants?

Gene mutation frequency counts are currently calculated on a per-transcript basis, rather than across gene loci. We've strived to maximise the coding annotations derived from large scale studies providing genomic mutations, and this has resulted in the same genomic mutation being annotated to different splice variants of the same gene. An additional splice variant is only used in COSMIC if a mutation arises which maps to the coding domain of a gene, but not any of the existing COSMIC transcripts. One can therefore not add the prevalence counts of the splice variants together as this would potentially add the same genomic mutation more than once.

What is the difference between census and classic genes ?

A census gene is one that is known to be involved in cancer. The list of these genes is used to prioritise the literature curation for the COSMIC database. Once the literature for a census gene has been completely curated, it is released and sometimes termed a 'COSMIC classic' gene.

Any mutations do not appear in the full text of the paper. Are these extracted from the supplementary material?

Yes. We utilize supplementary material for curation when it contains additional information.

What are the rules for mutation syntax in COSMIC?

Mutations are annotated using syntax derived from HGVS nomenclature recommendations [http://www.hgvs.org/mutnomen/].

How do I examine an histology or cancer type?

COSMIC may use an alternative histology terminology, for example small cell carcinoma instead of neuroendocrine carcinoma (or, for some sites, neuroendocrine carcinoma instead of small cell carcinoma). More information about our classification system can be found at the URL below and all COSMIC tumour site and histology translations are available to view as an excel spread sheet or tab delineated text file in the Classification documents found here.
Note: You may also use our search to find out the matching disease classification for the alternative terminologies.

How do I examine colon cancer?

COSMIC uses Large Intestine for this site. More information about our classification system can be found at the URL below and all COSMIC tumour site and histology translations are available to view as an excel spread sheet or tab delineated text file in the Classification documents found here.

How do I examine a tumour site?

COSMIC may use an alternative site eg Colon versus Large Intestine. More information about our classification system can be found at the URL below and all COSMIC tumour site and histology translations are available to view as an excel spread sheet or tab delineated text file in the Classification documents found here.

Why is my search bringing back fewer records than expected?

Check you have not got a filter (displayed on the RHS of screen) on unexpectedly, limiting the gene region, tumour type or site etc. Some genes will have been curated as part of systematic screens so will have some data in COSMIC but have not yet been manually curated so will have less data than has been published. This may be because they are in the list of genes waiting to be manually curated, or they are not included in the Cancer Gene Census yet . Finding the cancer gene census, here.

Where can I find Patient age information ?

If a paper gives the precise age then this is entered and displayed in years to 2 decimal places in the sample overview page, for example http://cancer.sanger.ac.uk/cosmic/sample/overview?id=1735169 Less precise age information is added as a remark and displayed on the sample overview page as, for example, Age=Adult; Age=Child; Age=Elderly; Age=young adult; Age=more than 65 years; Age=Adult 20-60 years. For an example see http://cancer.sanger.ac.uk/cosmic/sample/overview?id=1757821 If the paper uses term “paediatric” this is added as a remark Age=Child. In the past some papers reporting paediatric or adult leukaemias have had this information included in the Tumour remark section. This information is now included in the Individual Remark section Age=Child or Age=Adult as described above.

Has the whole gene been screened ?

Not necessarily. Sometimes the entire coding sequence and the intron-exon boundaries of a gene will have been screened, but at other times only specific exons, codons or a specific single nucleotide change in 1 codon will have been analysed. This information is not visible in COSMIC but can be obtained from the original publication from which the data was extracted.

How can I tell what part of a gene has been screened?

Sometimes the entire coding sequence and the intron-exon boundaries of a gene will have been screened, but at other times only specific exons, codons or a specific single nucleotide change in 1 codon will have been analysed. This information is not visible in COSMIC but can be obtained from the original publication from which the data was extracted.

Is my gene fully curated in COSMIC? How are genes selected for manual curation in COSMIC?

As new cancer genes are identified from the literature these are added to the Cancer Gene Census list. A gene which is not currently in the manually curated Classic Gene List may be awaiting completion of the initial curation process, thus the data will not yet have been released; a gene may not have been confirmed as a true cancer gene according to our selection criteria and is awaiting more evidence; alternatively we may have missed the gene in question. We welcome suggestions for missing genes via the <% Email cosmic@sanger.ac.uk COSMIC Team %>

Why can’t I find a particular publication in COSMIC?

Publications are identified for manual curation of genes from the Classic Gene List by using weekly PubMed and PubCrawler searches. If data from a specific publication is missing it may have been missed from these searches or the paper may be awaiting curation, especially for some of the older well known cancer genes. Alternatively, the publication may be recorded in COSMIC but as a reference only if, for instance, the data was unclear or not presented in a format which was compatible with the COSMIC data entry system. We welcome suggestions concerning missing publications via the COSMIC Contact link.

Is mutation data from cell lines included in COSMIC?

Cell lines are included in COSMIC if they have been screened for mutations. See also COSMIC Cell Line Project where the genetics and genomics of large numbers of cancer cell lines have been systematically characterized.

Are mutations analysed by immunohistochemistry included in COSMIC?

Mutations analysed solely by immunohistochemistry using mutation specific antibodies are not currently included in COSMIC.

Why does COSMIC contain data on overgrowth syndromes as they are not really cancer?

Somatic mutations detected in tissues associated with overgrowth syndromes such as Proteus and Cloves syndromes are included in COSMIC. Not all somatic mutations give a growth advantage to the cells but the mutations that have been identified in context of these syndromes clearly do. Including these mutations in COSMIC will help us further define and understand cancer.

What does Inferred Breakpoint mean?

This is the genomic breakpoint for a gene fusion. For many fusions this is not reported in detail so it is necessary to infer the position based on the reported mRNA transcripts in a given sample. To do this, it is assumed that each sample's breakpoint lies between the most 3' expressed exon of the 5' gene and the most 5' exon of the 3' gene, from the mRNAs reported in that sample. However, if the genomic breakpoint position is reported in detail for the sample then this is input as the Inferred Breakpoint.

What does Observed mRNA transcript mean?

Many papers determine fusions between genes using expression technologies such as RT-PCR. A number of these studies have identified more than one transcript per sample, some finding over four different products between the same gene pair in one tumour. This implies significant alternative splicing of the mRNAs expressed from the fused gene pair. These alternative transcripts are input as Observed mRNA transcripts.

What are Related Breakpoints?

These are either all the Inferred Breakpoints for a selected mRNA transcript mutation, or all the Observed mRNA transcripts for a selected inferred breakpoint mutation.

What is a Translocation Name?

This is the syntax format describing the portions of mRNA present (in HGVS "r." format) from each gene in a fusion.

How is inverted sequence annotated in a fusion?

An "o" before a gene name is used to indicate an inverted sequence, e.g. FUS{NM_004960.2}:r.1_597_oCREB3L2{NM_194071.2}:r.979-18_991_CREB3L2{NM_194071.2}:r.1049_7455

Why can't I find any information in COSMIC on a particular gene fusion pair?

The curation of fusion data is on-going and the list of fusions currently curated in COSMIC can be found here: http://cancer.sanger.ac.uk/cancergenome/projects/classic/. Sometimes an alternative transcript needs to be used to annotate a fusion so it may be necessary to search all transcripts for a gene to find any curated for fusions e.g. NOTCH1 and NOTCH1_ENST00000277541.