Data Portal

The Catalogue Of Somatic Mutations In Cancer (COSMIC) is a comprehensive database of somatic mutations. This dataset can be examined in following multiple ways,

COSMIC Search

COSMIC's home page provides options to search the website through multiple entry points. Searching gains access to web pages where the data set can be examined with the help of various graphical and tabular views. Along with this, the home page provides information about the latest updates in COSMIC with current statistics and links to the additional CGP (Cancer Genome Project) resources.

Search

COSMIC can be searched in several ways. For example, by

  1. Gene name or HUGO synonym (eg BRAF or B-raf)
  2. Tissue or cancer type such as 'lung' or 'colon' (classified in COSMIC as 'large intestine')
  3. Mutation description eg the common KRAS mutation "c.35G>A" (CDS styntax) or "p.G12D" (Amino acid syntax)
  4. Combined gene and mutation description eg "KRAS p.G12D"
  5. Sample name such as 'COLO-829' or a Cosmic Sample Id eg '687448'

After searching, the results are listed, sorted by the number of unique mutations found (highest at the top). For example, searching for "COLO-829" will display all the sample ids with the sample name COLO-829. The search results can be narrowed down with different facets like "Gene", "Mutation", or "Sample" in the right hand panel.

Search By "Gene"

The gene search finds matching gene names or transcript names (even if partially known). There are 2 options "Exact Match" and "All Matches", by default the "Exact Match" option is selected. If the gene name "TP53" is entered with the "Exact match" option it will take the user directly to the "TP53" gene overview page.


Alternatively, if the "All matches" option is selected and "TP53" is searched , it displays all TP53 matches on an intermediate page.

Choosing the "All Matches" option produces the following intermediate gene page :

This page has 3 tabs:

  1. Census - gene names in the Cancer Gene Census (a list of known cancer genes [more details]).
  2. Mutations - gene names outside of the cancer gene census, with mutations.
  3. No Mutations - gene names outside of the cancer gene census, with no mutations.

Search by "Sample"

The sample search finds matching sample names. There are three sample search options

  1. All Samples - finds samples which are either tumour or cell line
  2. Tumour sample - finds tumour samples only
  3. Cell lines - finds cell lines only

For example, "PC-3" has mutliple entries in COSMIC with some being tumours and others cell lines. To view only the cell lines with the name PC-3 select the 'Cell Lines' option before searching

Searching will list all the samples in COSMIC with the same sample name but with different id's; each linking to the sample overview page for more details.

Note : for more details on samples, please follow this link.

Search By "Tissue"

Follow this link to use the tissue browser where a list of primary tissue types is available for selection to view tissue/disease specific mutation frequencies with links to genes, mutations and sample details.

Sample Counts in COSMIC

A sample is a cell line or single piece of tumour examined through one or more genes for mutations. These experiments can happen in a number of ways, but usually involve sequencing. The name of the sample is defined by the data source. Usually cell lines have recognised names (which we capture) such as 'HCC38', or 'PC-3'. Names of primary tumours are often more abstract, sometimes numeric ('1','2'...), and often completely absent, in which case they are assigned a 6 or 7-digit name reflecting their database ID. Multiple instances of the same sample name can exist as separate entries, indicating that it was unclear during curation that these samples were identical, apart from their name. This is especially acute for cell lines, where the same sample name can indicate very different biological material, for instance the name 'PC-3' (http://cancer.sanger.ac.uk/cosmic/sample/overview?name=PC-3) is used for cell lines from 3 different tissues.

A number of tumours can be examined form a single cancer patient, and a number of samples can be examined from each of these tumours. Each sample has its own name and ID. Their identical ancestry is indicated how?

Sample counting

To account for the duplication of probably identical samples during curation, we attempt to combine samples with identical names and disease descriptions. For instance these two PC-3's will be counted as one (in mutation frequency calculations) since it's likely they're the same thing, just curated from different papers:

    
        Sample id     Name  Primary site( s ) 
        COSS1028650   PC-3  prostate  
        COSS1028702   PC-3  prostate  
    

Mutation Frequency

The mutation frequency of a gene or tissue on the COSMIC webpages is a simple division of the number of samples with observed mutations, over the number of samples examined, from our curations. There are two different contexts for this data, between the published literature and the Cancer Genome Consortium data. The Cancer Genome Consortium data can be considered fully objective, where every gene has been fully sequenced through every sample. However, for the genes with full literature curation (http://cancer.sanger.ac.uk/cancergenome/projects/classic/), the % frequencies will reflect the samples and mutations as they are published. Since it is more difficult to publish studies which find no mutations, it is likely these frequencies are less accurate, simply representing the best current knowledge.