What is COSMIC?

COSMIC – the Catalogue of Somatic Mutations in Cancer – is the world's largest source of expert manually curated somatic mutation information relating to human cancers. Here we outline that data in terms of structure, content and scope making it easier for you to evaluate what you will find in COSMIC, and how best to access it to fulfill your research needs.

Overview

COSMIC comprises the COSMIC database and the Cell Lines Project, two separate but related resources. This page discusses COSMIC; please see the About Cell Lines page for more information on the Cell Lines Project.

The COSMIC database combines two main types of data:

High Precision Data, Manually Curated by Experts:
  • Targeted gene-screening panels
  • Over 25,000 peer reviewed papers
  • Metadata (environmental factors and patient history)
  • Focused on known and suspected cancer genes and mutations
  • Objective frequency data as a result of mutation negative samples
  • Full details of the curation process and data captured
Genome-wide Screen Data:
  • Over 32,000 genomes, consisting of:
    • peer reviewed large scale genome screening data
    • other databases such as TCGA and ICGC
  • Provides unbiased, genome-level profiling of diseases
  • Objective frequency data, by interpreting non-mutant genes across each genome
  • Can be used to discover novel driver genes

Together, this compilation of data provides extensive coverage of the cancer genomic landscape from a somatic perspective. New and potentially significant data are continually captured and made available through four significant updates to COSMIC each year.

For more information on COSMIC, read more about our curation processes and the analyses that we run on mutation data, or see our answers to frequently asked questions about curation, histology, and mutation syntax.

Website Access and Tools

Exploring the COSMIC website provides a good introduction to the core data available.

Key aspects of the website include:

  • Free access for all users
  • Always displays the most recent COSMIC release (updated quarterly)
  • Dedicated tools help you explore the data:
    • Genome Browser
      The genome browser provides a genome wide perspective to cancer genomics. Different variant tracks can be turned on or off at the user's discretion.
    • Gene pages
      The gene pages summarise all the data for a specific gene. This is a good starting place if you have a particular gene of interest, such as BRAF.
    • Cancer Browser
      The cancer browser allows for mutations to be explored by tissue type and histology, in order to give a disease specific perspective.
    • Fusion genes
      Fusion genes are manually curated from the scientific literature. The overview page allows for browsing or searching of the fusion genes available in COSMIC.
    • Drug resistance data
      COSMIC curates mutations that confer drug resistance. This includes data on both acquired resistance and intrinsic resistance. From the overview page you can browse or search the drug-gene pairs available in COSMIC.
    • Hallmarks of Cancer
      The Hallmarks of Cancer bring together manually curated information on the function of proteins that are coded for by cancer genes. They present a condensed overview of most relevant facts with quick access to the literature source.
    • COSMIC-3D
      COSMIC-3D provides an interactive view of cancer mutations in the context of 3D protein structures. It features a heatmap of recurrent mutations alongside known and predicted small molecule binding sites.
    • Cancer Gene Census
      The Cancer Gene Census (CGC) is an ongoing effort to catalogue those genes which contain mutations that have been causally implicated in cancer. This page provides a detailed overview of the criteria used when categorising a gene as part of the CGC as well as browsing and searching of the current CGC genes.
    • Mutational signatures
      Different mutational processes (such as intrinsic slight infidelity of the DNA replication machinery, exogenous or endogenous mutagen exposures, enzymatic modification of DNA and defective DNA repair) generate unique combinations of mutation types, termed mutational signatures. The 30 mutational signatures in COSMIC are based on an analysis of 10,952 exomes and 1,048 whole-genomes across 40 distinct types of human cancer.
    • CONAN
      CONAN (the copy number analysis tool) searches for loss of heterozygosity, homozygous deletions and amplifications across the COSMIC dataset. All samples have been analysed with PICNIC or ASCAT.
  • All aspects are available as both GRCh37 and GRCh38
  • SNPs are filtered out for clarity
  • Video tutorials explaining new features in the site

Access via the download files

COSMIC is a vast resource and we try to make it as easy as possible for you to find new and innovative ways of taking full advantage of it. All COSMIC data is available as download files, ready for you to integrate into your current pipelines and tools.

Key aspects of the downloadable files include:

  • Free registration for academic use; commercial use requires a licence.
  • Complete, one-click file downloads and filtered files are available:
    • Choose from 19 files, dividing the COSMIC data into logical categories, such as ‘Complete Mutation Data’, ‘Non Coding Variants’ and ‘Methylation Data’
    • Filter files by gene, tissue or sample of interest
    • Complete Oracle database
  • Available for both GRCh37 and GRCh38
  • Updated quarterly with each COSMIC release
  • Access the six most recent COSMIC data releases
  • All files are available with a description
  • SNPs are included for full coverage

Download a sample of COSMIC data

We have made the first 100 lines of each of the download files freely available so you can try out the data. You can download the data sample on the "About" page. Full descriptions of what is in the complete download files are also available.