What is COSMIC?
COSMIC – the Catalogue of Somatic Mutations in Cancer – is the world's largest source of expert manually curated somatic mutation information relating to human cancers. Here we outline that data in terms of structure, content and scope making it easier for you to evaluate what you will find in COSMIC, and how best to access it to fulfill your research needs.
COSMIC comprises the COSMIC database and the Cell Lines Project, two separate but related resources. This page discusses COSMIC; please see the About Cell Lines page for more information on the Cell Lines Project.
The COSMIC database combines two main types of data:
High Precision Data, Manually Curated by Experts:
- Targeted gene-screening panels
- Over 27,000 peer reviewed papers
- Metadata (environmental factors and patient history)
- Focused on known and suspected cancer genes and mutations
- Objective frequency data as a result of mutation negative samples
- Full details of the curation process and data captured
Genome-wide Screen Data:
- Over 37,000 genomes, consisting of:
- Provides unbiased, genome-level profiling of diseases
- Objective frequency data, by interpreting non-mutant genes across each genome
- Can be used to discover novel driver genes
Together, this compilation of data provides extensive coverage of the cancer genomic landscape from a somatic perspective. New and potentially significant data are continually captured and made available through four significant updates to COSMIC each year.
For more information on COSMIC, read more about our curation processes and the analyses that we run on mutation data, or see our answers to frequently asked questions about curation, histology, and mutation syntax.
Website Access and Tools
Exploring the COSMIC website provides a good introduction to the core data available.
Key aspects of the website include:
- Free access for all users
- Always displays the most recent COSMIC release (updated quarterly)
Dedicated tools help you explore the data:
The genome browser provides a genome wide perspective to cancer genomics. Different variant tracks can be turned on or off at the user's discretion.
The gene pages summarise all the data for a specific gene. This is a good starting place if you have a particular gene of interest, such as BRAF.
The cancer browser allows for mutations to be explored by tissue type and histology, in order to give a disease specific perspective.
Fusion genes are manually curated from the scientific literature. The overview page allows for browsing or searching of the fusion genes available in COSMIC.
Drug resistance data
COSMIC curates mutations that confer drug resistance. This includes data on both acquired resistance and intrinsic resistance. From the overview page you can browse or search the drug-gene pairs available in COSMIC.
Hallmarks of Cancer
The Hallmarks of Cancer bring together manually curated information on the function of proteins that are coded for by cancer genes. They present a condensed overview of most relevant facts with quick access to the literature source.
COSMIC-3D provides an interactive view of cancer mutations in the context of 3D protein structures. It features a heatmap of recurrent mutations alongside known and predicted small molecule binding sites.
Cancer Gene Census
The Cancer Gene Census (CGC) is an ongoing effort to catalogue those genes which contain mutations that have been causally implicated in cancer. This page provides a detailed overview of the criteria used when categorising a gene as part of the CGC as well as browsing and searching of the current CGC genes.
Cancer Mutation Census
The Cancer Mutation Census (CMC) project is an undertaking to classify coding mutations in COSMIC and identify variants driving different types of cancer. The CMC integrates all coding somatic mutations collected by COSMIC with biological and biochemical information from multiple sources, combining data obtained from manual curation and computational analyses. Metrics like ClinVar significance, dN/dS ratios, and variant frequencies in normal populations (gnomAD) have been integrated into this resource.
Different mutational processes (such as intrinsic slight infidelity of the DNA replication machinery, exogenous or endogenous mutagen exposures, enzymatic modification of DNA and defective DNA repair) generate unique combinations of mutation types, termed mutational signatures. The 30 mutational signatures in COSMIC are based on an analysis of 10,952 exomes and 1,048 whole-genomes across 40 distinct types of human cancer.
- Genome Browser
- All aspects are available as both GRCh37 and GRCh38
- SNPs are flagged for clarity
- Video tutorials explaining new features in the site
Access via the download files
COSMIC is a vast resource and we try to make it as easy as possible for you to find new and innovative ways of taking full advantage of it. All COSMIC data is available as download files, ready for you to integrate into your current pipelines and tools.
Key aspects of the downloadable files include:
- Free registration for academic use; commercial use requires a licence.
Complete, one-click file downloads and filtered files are
- Choose from 25 files, dividing the COSMIC data into logical categories, such as ‘Complete Mutation Data’, ‘Non Coding Variants’ and ‘Methylation Data’
- Filter files by gene, tissue or sample of interest
- Available for both GRCh37 and GRCh38
- Updated 3 times per year with each COSMIC release
- Access the six most recent COSMIC data releases
- All files are available with a description
- SNPs are included for full coverage
Download a sample of COSMIC data
We have made the first 100 lines of each of the download files freely available so you can try out the data. You can download the data sample on the "About" page. Full descriptions of what is in the complete download files are also available.
Download GRCh37 data sample (tar file) (zip file)
Download GRCh38 data sample (tar file) (zip file)