News Index

COSMIC v87 - 13th November 2018

COSMIC v87 (November 2018) includes 4 new fully curated genes, substantial curation updates for KRAS and Mesothelioma, 1 newly curated fusion pair, and mutation data from 9 new systematic screen papers. We have also added 4 new genes to the Cancer Gene Census and 8 to the Hallmarks of Cancer.

Data Updates

  1. New fully curated cancer genes (4);
    • ARID1B 13,605 samples, 456 mutations, 187 papers
    • RBM10 12638 samples, 364 mutations, 109 papers
    • B2M 13,190 samples, 241 mutations, 126 papers
    • BCORL1 6,822 samples, 238 mutations, 151 papers
  2. Substantial curation update to Mesothelioma;
  3. Substantial curation update to KRAS +5,973 samples, +476 mutations, +75 papers
  4. Curated Gene Fusions (1);
  5. Cancer Gene Census;
  6. Resistance Mutations Updates;
  7. Whole Genome data
    • 9 Systematic Screen Papers

COSMIC's cancer genome data is interpreted into standardised annotations from a variety of sources, described here.

File Download Service

Files for v87 will not be available on our SFTP service, which is now closed. For more information about how to download COSMIC data and the files available please visit the following pages -

Curated Genes

ARID1B

ARID1B (AT-rich interaction domain 1B), like ARID1A, encodes a protein which is a component of the SWI/SNF chromatin remodelling complex and may play a role in cell-cycle activation. Somatic mutations in both genes are very frequent in gynaecological and several other solid tumors. ARID1B mutations are found in approximately 20% of ovarian clear cell carcinomas and are also detected in dedifferentiated ovarian and endometrial carcinomas where concurrent ARID1A and ARID1B inactivating mutations result in loss of protein expression in 25% of the tumours. Recurrent mutations in both ARID1A and ARID1B have also been identified in acute promyelocytic leukaemia. The majority of these mutations are loss-of-function alterations, similar to the truncating mutations seen in solid tumours. Additionally, ARID1B is a potential biomarker for neuroblastoma patients with poor prognosis and ARID1B mutations have been found in gastrointestinal stromal tumours lacking alterations in the canonical KIT/PDGFRA/RAS pathways. Mutations are also detected in breast carcinoma, hepatocellular carcinoma, mesonephric carcinoma, schwannomas, microsatellite unstable colorectal cancer and diffuse large B cell lymphoma. ARID1B may be targetable with FDA-approved HDAC inhibitors, including vorinostat and panobinostat.

RBM10

RBM10 encodes a spliceosomal RNA binding motif protein involved in the regulation of gene expression, predominantly through the regulation of alternative splicing. The gene is located on the X chromosome (Xp11.3) and has been shown to be mutated in a variety of cancer types, including breast, colon, thyroid, ovary, pancreas, prostate and lung. These genetic changes are mainly missense single nucleotide variants, although frameshift insertions predicted to generate truncated proteins and nonsense mutations have also been found. RBM10 acts as a tumour suppressor gene and these loss-of-function mutations affect the mechanism of repression of Notch signalling and cell proliferation through the regulation of NUMB alternative splicing. As RBM10 regulates the alternative splicing of hundreds of target genes there is a need for the expression of RBM10 itself to be tightly regulated, which occurs through auto-regulatory processes. RBM10 negatively regulates its own mRNA and protein expression by exon skipping and the promotion of alternative splicing-coupled nonsense-mediated mRNA decay. In lung adenocarcinoma samples, mutations that affect the splice sites of the exons skipped (6 or 12) have been shown to lead to reduced RBM10 expression, consistent with the tumour suppressive role of RBM10. Mutations in RBM10 have also been implicated in the drug resistance mechanism of thyroid carcinoma harbouring BRAFV600E.

KRAS (update)

KRAS (Kirsten rat sarcoma viral oncogene homolog) was one of the first 4 genes that was curated for COSMIC when the database was first released to the public 14 years ago. Over the last decade, KRAS has become one of the clinically most important and sequenced oncogenes in cancer. Accordingly, the related scientific literature in PubMed has exploded to a level that is impossible to manually curate exhaustively. With the help of PubTator and LitVar powered by machine learning (Wei, 2013; Allot 2018), we have scanned the literature from the last 5 years and managed to curate 21 new mutations for this COSMIC release. These consisted of 15 new substitution mutations, 1 nonsense mutation and 5 new deletions/insertions from 17 publications. In total, 70 new KRAS mutations have been added to COSMIC during 2018. Most of the mutations are found outside the well-known oncogenic hotspots of exon 1 codons 12 and 13 and exon 2 codon 61 expanding the number of potentially relevant mutations in oncology.

Wei CH, Kao HY, Lu Z. PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res. 2013;41:W518-22Allot A, Peng Y, Wei CH, Lee K, Phan L, Lu Z. LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC. Nucleic Acids Res. 2018;46(W1):W530-W536.

B2M

B2M (beta-2-microglobulin) encodes for the constant light chain of the classical major histocompatibility complex (MHC) class I molecule. Specific recognition of this trimeric complex by the T-cell receptor triggers cytotoxic T lymphocyte activity, which is currently exploited in cancer immunotherapies, the efficacy of which is dependent on the MHC class I expression level in the tumour. Somatic mutations in B2M, which include substitutions, deletions and LOH of chromosome 15q21, inhibit transcription of B2M or affect translation of the mRNA, resulting in a lowered expression of the protein or in synthesis of a non-functional protein. Mutations are spread across B2M, however, a frequent mutation at the position M1 seems to be recurrent in lymphomas and a two nucleotide deletion in a CT-repeat in colorectal carcinomas. Decreased expression of B2M has been reported to be associated with worse prognosis in non-Hodgkin lymphoma patients and a favourable prognosis in Hodgkin lymphoma patients. Truncating B2M mutations may be associated with acquired resistance to PD-1 blockade in metastatic melanoma.

BCORL1

BCORL1 (BCL6 corepressor like 1, Xq25-26.1) is homologous to the tumour suppressor gene BCOR and ubiquitously expressed in human tissues. A transcriptional co-repressor, BCORL1 encodes a protein which is tethered to promoter regions by DNA binding proteins, interacting with histone deacetylases, CtBP and PCGF1, and represses E-cadherin expression via interaction with CtBP. Somatic mutations have been found in a variety of myeloid tumours including acute myeloid leukaemia, myelodysplastic syndrome and chronic myelomonocytic leukaemia. These mutations include missense mutations but are most often frameshift, splice site or nonsense mutations across the gene predicted to result in severely shortened truncated proteins lacking the LXXLL nuclear receptor recruitment motif and the C-terminus. Similar somatic mutations have also been reported in solid tumours such as MSI-H gastric adenocarcinoma, melanoma, Wilms tumour, Intracranial germ cell tumours, gliomas and head and neck squamous cell carcinoma.

 

Mesothelioma

As part of release v87 we have focused on updating the expert-curated mutation data for mesothelioma. Approximately 30 additional publications that include mutation screening data in this disease are included in the release.

Malignant mesothelioma is a rare and aggressive tumour, mostly occurring in the pleural mesothelial cells, but also arising in the peritoneal or pericardial lining and tunica vaginalis. Histologically, malignant pleural mesothelioma (MPM) is classified into 3 main subtypes: epithelial, mesenchymal/sarcomatous, and mixed/biphasic. The disease is associated with occupational and, more rarely, environmental or domestic exposure to asbestos, and to other mineral fibres. Since asbestos exposure is most common in industries with a male work force MPM is seen predominantly in men. A cancer syndrome with germline BAP1 mutations also predisposes carriers to mesothelioma. There is a long latency period in MPM, with up to 50 years between exposure and tumour development, and patients have a poor prognosis, rarely responding to conventional cytotoxic drugs. Although surgery combined with radio-chemotherapy can be beneficial in patients who present with early-stage disease, most patients are in an advanced stage at diagnosis. A greater understanding of the underlying genetics of MPM and the development of novel targeted therapies are needed to improve the outcome for MPM patients. The disease will remain a global health issue while asbestos continues to be mined and used, especially in developing countries.

The genomic landscape of MPM includes recurrent somatic mutations in some tumour suppressor genes: CDKN2A, NF2 and BAP1. TP53 mutations are also found, at a lower frequency, as well as hotspot TERT promoter mutations. This mesothelioma update includes a paper by Ugurluer et al. (COSP45544) who perform exome-based next-generation sequencing on pleural and peritoneal mesotheliomas. They find tumour-related mutations in 73% of their mesothelioma patients and confirm BAP1, CDKN2A/B and NF2 as the most frequently mutated genes. In a publication by Kang et al. (COSP45546) SETDB1 is identified as a frequently mutated gene in MPM. Tranchant et al. (COSP45541) find an MPM molecular subgroup characterised by co-occurring mutations in the LATS2 and NF2 genes. Furthermore, by defining the specific deregulated signal pathways they identify PF-04691502, an inhibitor of the mTOR/Pi3K/AKT pathway and already in use in clinical trials for other cancer types, as potentially useful for this MPM subgroup.

Lai et al. (COSP45543) report oncogene targeted deep sequencing of a case of malignant peritoneal mesothelioma, identifying a novel somatic BAP1 insert frameshift mutation and suggesting the resultant tumour-specific neo-antigen as a diagnostic marker. Monch et al. (COSP45532) identify a group of MPM characterised by overexpression of ALK and present preclinical data showing that a combination of crizotinib and rapamycin may be suitable targeted therapy in MPM.

Unlike malignant mesothelioma, well differentiated papillary mesothelioma (WDPM) of the peritoneum shows indolent behaviour and is not associated with asbestos exposure. Stevers et al. (COSP45755) perform genomic profiling on WDPM, finding them defined by mutually exclusive mutations in TRAF7 and CDC42, and lacking the genetic alterations common to malignant mesothelioma.

 

Curated Gene Fusions

ETV6-PDGFRB

The rare fusion ETV6 (TEL)-PDGFRB, the molecular consequence of the t(5;12)(q33;p13) translocation, is found in some patients with chronic myelomonocytic leukaemia and other myeloproliferative disorders with eosinophilia. ETV6 encodes an ETS family transcription factor containing a Helix-Loop-Helix (HLH) and an ETS DNA binding domain. PDGFRB encodes a cell surface tyrosine kinase receptor for members of the platelet-derived growth factor family. In the fusion, the N-terminal HLH domain of ETV6 is fused to the transmembrane and the tyrosine kinase domains of PDGFRB. The breakpoints detected in ETV6-PDGFRB transcripts are consistently at exon 4 in ETV6 and at exon 11 in PDGFRB. Imatinib is effective therapy in patients with ETV6-PDGFRB-positive chronic myeloproliferative diseases.

 

Cancer Gene Census (CGC)

New Census genes (tier 1):

LATS1 LATS2

New Census genes (tier 2):

MACC1SETDB1

Hallmarks of Cancer

Based on the concept defined by D. Hanahan and R. A. Weinberg, COSMIC, in collaboration with Open Targets, integrates functional descriptions focused on Hallmarks of Cancer into the CGC. The Hallmark pages visually explain the role of a gene in cancer by highlighting which of the classic behaviours are displayed by the gene and whether they are promoted or suppressed.

New Hallmark Genes in v87

CYLDLATS1LATS2MEN1MYD88NOTCH1NOTCH2RUNX1T1

Systematic Screen Papers

Follow links below to the 9 papers which are new in v87, or view the full table of papers here.

COSP41831COSP43799COSP44936COSP45225COSP45380COSP45515COSP45604COSP45606COSP45874

COSMIC Statistics:

1,403,267
Samples (+11,895)
5,992,260
Coding mutations (+14,283)
26,494
Papers (+243)
19574
Fusions (+206)
35,490
Whole genomes (+10)
1,179,545
Copy number variants (+0)
9,147,833
Gene expression variants (+0)
7,879,142
Differentially methylated CpGs (+0)
19,721,615
Non Coding Variants (+596)

COSMIC v86 - 14th August 2018

COSMIC v86 (August 2018) includes 3 new fully curated genes, substantial curation update for Glioblastoma, 1 newly curated fusion pair, and 476 whole genome screened samples from 8 new systematic screen papers. We have also integrated ICGC release 27 and added 7 new genes to the Cancer Gene Census Hallmarks of Cancer.

Data Updates

  1. New fully curated cancer genes (3);
    • CHD4 - 1,783 samples, 38 mutations, 118 papers
    • IRS4 - 346 samples, 11 mutations, 91 papers
    • CTCF - 14,520 samples, 380 mutations, 136 papers
  2. Substantial curation update to Glioblastoma;
  3. Curated Gene Fusions (1);
    • MN1-ETV6 - 13 samples, 10 mutations, 7 papers
  4. Cancer Gene Census;
  5. Resistance Mutations Updates;
  6. Whole Genome data
    • 8 Systematic Screen Papers; 476 new whole genome screen samples
    • ICGC release 27; 1550 new whole genome screen samples

COSMIC's cancer genome data is interpreted into standardised annotations from a variety of sources, described here.

File Download Service

Files for v86 are available on our SFTP site but this is now a legacy service which is no longer supported. For more information about the files available, how to download from the command line, and help with automating the download process please visit the following help pages -

Curated Genes

CHD4

CHD4 (chromodomain helicase DNA-binding protein 4) encodes a protein belonging to the SNF2/RAD54 helicase family and which is the main component of the nucleosome remodelling and deacetylase complex, with an important role in epigenetic transcriptional repression. A high frequency of mutations in CHD4 (17%) has been found in serous endometrial carcinoma where most are missense mutations and half affect the ATPase/helicase and helicase domains. CHD4 is also mutated in clear cell, endometrioid and mixed-histology endometrial tumours, and in uterine carcinosarcoma.

IRS4

IRS4 (insulin receptor substrate 4) is a cytoplasmic scaffold protein that is phosphorylated by the insulin receptor tyrosine kinase in response to receptor stimulation by insulin. Tyrosine phosphorylated IRS4 protein has been shown to associate with cytoplasmic signalling molecules that contain SH2 domains leading to the activation of the P13K/AKT and MAPK/ERK signalling pathways. The gene for IRS4 is located on chromosome Xq22.3 and somatic mutations, including point mutations and deletions, have been found in a variety of tumour types including metastatic melanoma, multiple meningioma, paediatric T-cell acute lymphoblastic leukaemia and other haematological malignancies. An insertional mutagenesis study in mouse has shown that IRS4 is a driver in mammary oncogenesis, working synergistically with ERBB2/HER2. Furthermore, analysis of expression of IRS4 in human breast carcinomas has shown it to be a putative biomarker for HER2-targeted therapy resistance.

CTCF

CTCF (CCCTC-binding factor) encodes a transcriptional regulator affecting chromatin structure and organisation by binding different DNA target sequences and proteins, thus playing a vital role in transcription by controlling promoter-enhancer interactions. Mutations have been observed in a variety of cancers including endometrial carcinoma (and the potentially precancerous lesion endometriosis), bladder cancer, Wilms tumour, several myeloid tumours (including acute megakaryoblastic leukaemia (AMKL), transient abnormal myelopoiesis preceding AMKL in Down syndrome patients, and acute lymphoblastic leukaemia), head and neck cancers and some breast cancers. CTCF functions as a tumour suppressor in cancer. Most mutations are frameshift, nonsense or splice mutations resulting in haploinsufficiency following nonsense mediated decay, or are loss-of-function missense mutations often in the zinc finger domains. However, some missense mutations alter DNA binding residues and act as gain-of-function mutations enhancing cell survival, for example p.K365T in endometrial carcinoma.

 

Glioblastoma

We have focused on updating the expert-curated mutation data for the brain cancer Glioblastoma multiforme (GBM). Approximately 70 new publications which include mutation screening data in this disease are included in this release. GBM, a WHO grade IV astrocytoma, is the most common malignant primary tumour of the adult brain and has a poor prognosis. It may occur de novo or develop as a secondary tumour from diffuse astrocytoma WHO grade II or anaplastic astrocytoma WHO grade III. Primary and secondary GBMs have different genetic profiles, with IDH1/2 mutations being evident in secondary GBM. Similarly TP53 mutations are more common in secondary than in primary GBM. The most common mutations in primary GBM are TERT promoter mutations (especially C228T, C250T), occurring in 70-80% of tumours, and these mutations are indicative of poorer outcome. Alterations in EGFR are also frequent in primary GBM, with EGFR amplification present in approximately 40% of cases and about half of these also carrying the EGFR vIII variant, an inframe deletion of exons 2-7. Also common in primary GBM are CDKN2A deletions and PTEN mutations. By contrast, paediatric GBM has a different genomic landscape to that of adults, with infrequent changes in CDKN2A, PTEN and EGFR but frequent mutations at hotspot positions in H3F3A and H3.1. GBM has some rare variants, including gliosarcoma, giant cell GBM and epithelioid GBM; the latter often harbouring BRAF V600E mutations.

 

Curated Gene Fusions

MN1-ETV6

The MN1-ETV6 fusion, resulting from t(12;22)(p13;q12), is a recurrent but infrequent abnormality in haematological malignancies. MN1 encodes a transcription co-factor while ETV6 encodes a protein of the ETS transcription factor family. Two fusion transcript types have been reported and in both most of MN1, including the glutamine/proline rich domain, is fused to the DNA binding domain of ETV6. In type I, exon 1 of MN1 is fused to exon 3 of ETV6 and in type II the same MN1 exon is fused to ETV6 exon 4. MN1-ETV6 fusions are found in myeloid leukaemia and myelodysplastic syndromes.

 

Cancer Gene Census (CGC)

Hallmarks of Cancer

Based on the concept defined by D. Hanahan and R. A. Weinberg, COSMIC, in collaboration with Open Targets, integrates functional descriptions focused on Hallmarks of Cancer into the CGC. The Hallmark pages visually explain the role of a gene in cancer by highlighting which of the classic behaviours are displayed by the gene and whether they are promoted or suppressed.

New Hallmark Genes in v86

DICER1IDH1IDH2KITNF1NF2PIK3R1

ICGC release 27, April 30th 2018

1 New ICGC Study (SSM)

COSU657 WT-US

4 Updated CNV Studies

COSU340 CLLE-ESCOSU382 PACA-CACOSU537 PRAD-CACOSU538 PRAD-UK

CNVs not reported in 1 Study

COSU645 BOCA-FR

There has been a cleanup of duplicated records which had arisen for two reasons:

  1. For two studies (LIAD-FR and and BOCA-FR) we had included data from old publications which were subsequently submitted to ICGC using different sample names.
  2. Two studies (LICA-CN and BRCA-KR) had been re-submitted to ICGC using different sample names.
  3. Both of these issues caused duplicated samples and mutations in the COSMIC database and these were removed in v86.

    Systematic Screen Papers

    Follow links below to the 8 papers which are new in v86, or view the full table of papers here.

    COSP33196COSP43017COSP43572COSP43892COSP44274COSP44557COSP44783COSP44991

    COSMIC Statistics:

    1,391,372
    Samples (+8,010)
    5,977,977
    Coding mutations (+356,850)
    26,251
    Papers (+213)
    19,368
    Fusions (+20)
    35,480
    Whole genomes (+2,026)
    1,179,545
    Copy number variants (-16,011; no CNVs in BOCA-FR in ICGC 27)
    9,147,833
    Gene expression variants (+45)
    7,879,142
    Differentially methylated CpGs (+0)
    19,721,019
    Non Coding Variants (+1,661,437)

COSMIC v85 - 8th May 2018

COSMIC v85 (May 2018) includes 3 new fully curated genes, substantial curation updates for TERT, 1 newly curated fusion pair, and 162 whole genome screened samples from 8 new systematic screen papers. We have also added 25 new genes to the Cancer Gene Census Hallmarks of Cancer and added a new track to the main Genome Browser which displays mutation recurrence. The new download service has been extended making earlier versions of COSMIC available and also adding functionality to support users who automate downloads.

Data Updates

  1. New fully curated cancer genes (3);
    • ERBB2 - 18,953 samples, 460 mutations, 197 papers
    • LRP1B - 2,143 samples, 271 mutations, 240 papers
    • POLD1 - 12,196 samples, 157 mutations, 104 papers
  2. Substantial curation update (TERT);
    • TERT - 51,526 samples (+18,222), 10,767 mutations (+1,967), 455 papers (+87)
  3. Curated Gene Fusions (1);
  4. Cancer Gene Census;
  5. Resistance Mutations Updates;
  6. Whole Genome data
    • 8 Systematic Screen Papers; 162 whole genome screen samples

COSMIC's cancer genome data is interpreted into standardised annotations from a variety of sources, described here.

File Download Service

We have extended the new '1-click' download service released in February to include previous versions of COSMIC (from v81) and added functionality to support users who use command line tools and automate downloads. The SFTP site will cease to be supported from our next release (v86 scheduled for August). For more information about the files available, how to download from the command line, and help with automating the download process please visit the following help pages -

Genome Browser

A new 'Mutation Recurrence' track has been added to the main Genome Browser. This is a colour density track (pale yellow=low score, red=high score) across the whole reference sequence. Mouseover any nucleotide position on the track to see the score, which is the number of whole genome screened sample IDs with a coding or non-coding mutation at that position.

Website Translator

We have added the option to automatically translate the COSMIC website using the Google Translate plugin. This option is available from every page footer, where a language can be selected from the drop down menu.

Gene Expression Data

There has been no new gene expression data added in v85 but due to the removal of some duplicates the overall number of variants has decreased from 9,176,464 to 9,147,788 (-0.31%).

GDPR Compliance

As a result of the new EU GDPR (General Data Protection Regulations) legislation which comes into effect on 25th May, we will be making some changes to our terms and conditions and privacy policy for the COSMIC website. In the near future we will also be sending registered users an email with instructions for managing mailing preferences and the steps needed to keep your account active.

Curated Genes

ERBB3

ERBB3 (erb-b2 receptor tyrosine kinase 3), which encodes HER3 (Human Epidermal growth factor Receptor 3), is a member of the epidermal growth factor receptor family, consisting of four closely related type 1 transmembrane receptors. ERBB3 is the final of the four genes to be expert curated by COSMIC. Unlike other members of the family, HER3 has impaired tyrosine kinase activity, but can function via ligand binding and heterodimerization with other members of the family to influence cell proliferation. Increased expression of HER3 has been observed in a number of cancers, where it has been linked with therapeutic resistance. Somatic mutations in ERBB3 are found across a wide range of cancer types, including colon and gastric cancer, with recurrent hotspots seen in the extracellular domain.

POLD1

DNA polymerase delta 1 (POLD1) mutations from targeted screening studies have now been curated in COSMIC. Polymerase delta plays an essential role in the replication and repair of chromosomal DNA. Recent studies have shown that germline mutations in the proofreading domain of POLD1 predispose to cancer. They are present in 0.5-2% of patients in intestinal polyposis and CRC cohorts enriched for familial disease. Also low levels of somatic POLD1 mutations occur in multiple sporadic tumours, such as colorectal, gastric and endometrial carcinomas, melanomas and childhood brain tumours, where they often underlie an ultramutated phenotype and potentially a favourable prognosis. POLD1 is a large gene, and is likely to acquire somatic mutations secondary to other causes of increased mutation burden, such as MMR-deficiency; therefore it is important to differentiate pathogenic variants from passenger mutations that are of no functional consequence.

LRP1B

LRP1B (Low-density lipoprotein (LDL) receptor-related protein 1B) is a member of the LDL receptor family of lipoprotein receptors, which have many functions in the human body including cholesterol metabolism and atherosclerotic lesion formation. The gene encoding LRP1B is very large (1.9Mb, 91 exons) and situated on the long arm of chromosome 2 (2q21.2), in the FRA2F fragile site. The LRP1B gene was first discovered during the study of cancer cell lines harbouring homozygous deletions in this region; alterations of the gene in small cell lung cancer cell lines were suggestive of a tumour suppressor role. Furtherstudies of the gene in different cancers have shown a high frequency of genetic changes including homozygous deletions (glioblastoma multiforme, and cancers of the oesophagus, nasopharynx, bladder and lung), point mutations (chronic lymphocytic leukaemia and cancers of the lung, nasopharynx, oesophagus, ovary and stomach) and promotor methylation resulting in transcription silencing. Later experiments, involving overexpression of a recombinant gene in lung cancer cell lines with little or no endogenous LRP1B expression resulted in significantly reduced cellular proliferation, confirming the postulated growth-suppressing function of the protein and role as a tumour suppressor.

TERT (update)

The curated data for TERT (telomerase reverse transcriptase) have been updated. More than 60 publications which include screening of TERT, sometimes alongside other genes, are included in this release. TERT encodes the reverse transcriptase component of telomerase which adds telomere repeats to chromosome ends enabling cell replication. Maintenance of telomere length is a key process in malignant progression. As well as the 2 common hot spot mutations in the TERT core promoter at positions c.1-124 (C228T) and c.1-146 (C250T) many other promoter variants have also been identified. TERT promoter mutations are frequent in melanoma and glioma, particularly glioblastoma. They are also found in numerous other solid tumours including hepatocellular carcinoma, urothelial bladder carcinoma and papillary thyroid carcinoma, as well as malignant phyllodes tumour of the breast and in higher grade meningioma.

 

Curated Gene Fusions

RUNX1-RUNX1T1

The RUNX1-RUNX1T1 (AML1-ETO) fusion is now represented in COSMIC. A proportion of the literature reporting on this fusion pair has been curated. The RUNX1-RUNX1T1 fusion results from the translocation t(8;21)(22q;22q), one of the most common cytogenetic abnormalities in acute myeloid leukaemia (AML). Along with inv(16) AML, these comprise the core binding leukaemias, both characterised by the disruption and transcriptional deregulation of genes encoding the subunits of the core binding factor, a transcription factor that functions as an essential regulator of normal haematopoiesis. RUNX1-RUNX1T1 is found in approximately 5-10% of all AML cases and is most common in AML with maturation (FAB M2). The fusion is consistent, with the amino terminal portion of RUNX1, including the runt homology domain, joined to almost the entire RUNX1T1 gene. The genomic breakpoints occur in RUNX1 intron 5 and RUNX1T1 intron 1. Evidence suggests the fusion alone is insufficient to induce leukaemogenesis but additional cooperating mutations are required, such as point mutations in KIT or NRAS. The prognosis is generally favourable for patients with RUNX1-RUNX1T1 AML; complete remission can be achieved with relatively long disease-free survival when patients are treated with high dose chemotherapy but additional activating mutations can confer a poorer prognosis.

 

Cancer Gene Census (CGC)

Hallmarks of Cancer

Based on the concept defined by D. Hanahan and R. A. Weinberg, COSMIC, in collaboration with Open Targets, integrates functional descriptions focused on Hallmarks of Cancer into the CGC. The Hallmark pages visually explain the role of a gene in cancer by highlighting which of the classic behaviours are displayed by the gene and whether they are promoted or suppressed.

New Hallmark Genes in v85

ACVR1AKT1AMER1ARID1AARID2ATF1BAP1BCORCALRCASP8CBLCHEK2CIITACRLF2EZRGNA11GNAQGNASH3F3AHNF1AKDM6AMED12PAX3PIK3CBRECQL4

Systematic Screen Papers

Follow links below to the 8 papers which are new in v85, or view the full table of papers here.

COSP41629COSP43278COSP43218COSP44684*COSP44887COSP43417COSP44236COSP45027

COSP44684*

Zehir A, et al. (Nat Med. 2017 Jun;23(6):703-713., PMID:28481359) describes the compiled tumor and matched normal sequence data from a unique cohort of more than 10,000 patients with advanced cancer. The MSK-IMPACT study from the Memorial Sloan Kettering Cancer Center New York, New York, USA is a clinical sequencing initiative and it has identified 78,240 clinically relevant somatic mutations from various tumour types.

COSMIC Statistics:

1,383,362
Samples (+25,074)
5,621,127
Coding mutations (+172,277)
26,038
Papers (+231)
19,348
Fusions (+422)
33,454
Whole genomes (+163)
1,195,556
Copy number variants (+0)
9,147,788
Gene expression variants (-28,676 duplicates removed)
7,879,142
Differentially methylated CPGs (+0)
18,059,582
Non Coding Variants (+2,928)

COSMIC v84 - 13th February 2018

COSMIC v84 (February 2018) includes 8 new fully curated genes, substantial curation updates for POLE and PIK3CA, 1 new fusion pair, 337 genomes from 11 new systematic screen papers and updates from ICGC release 26. We have also added 20 new genes to the Cancer Gene Census (2 added to tier 1 and 18 to tier 2). In this release we launch a new download service which allows users to download complete data files directly from the website. We have also substantially updated the 'About' pages to better describe the COSMIC project.

Data Updates

  1. New fully curated cancer genes (8);
    • ZFHX3 - 1,554 samples, 177 mutations, 153 papers
    • DGCR8 - 825 samples, 55 mutations, 60 papers
    • SIX1 - 792 samples, 49 mutations, 35 papers
    • SIX2 - 765 samples, 22 mutations, 38 papers
    • NCOA2 - 1,164 samples, 22 mutations, 89 papers
    • TP63 - 1,209 samples, 36 mutations, 119 papers
    • ACVR2A - 2,144 samples, 240 mutations, 112 papers
    • LZTR1 - 352 samples, 6 mutations, 85 papers
  2. Substantial curation updates (2);
    • POLE - 8,171 samples (+826), 504 mutations (+110), 204 papers (+33)
    • PIK3CA - 81,939 samples (+5,413), 10,013 mutations (+764), 2377 papers (+140)
  3. Curated Gene Fusions (1);
  4. Cancer Gene Census;
  5. Resistance Mutations Updates;
  6. Whole Genome data
    • 11 Systematic Screen Papers; 337 genomes
    • Updates from ICGC release 26; 3 new studies, 3 studies updated

COSMIC's cancer genome data is interpreted into standardised annotations from a variety of sources, described here.

New File Download Service

We have launched a new download service which allows users to download all complete data files from the website, avoiding the need to connect to our SFTP server.To use this service you will need to login and visit the download page. For each downloadable file there are now two or three download buttons -'Download Whole File', 'Download Filtered File' and 'Access via SFTP Server'. Every available file can be downloaded from the website or SFTP server, but the option to download filtered data is not available for all files.

About COSMIC

The new About page describes the COSMIC project, core data and resources. It is a useful for source of information for new users and those wishing to explore how COSMIC can support their research.

Curated Genes

ZFHX3

The ZFHX3 (Zinc Finger Homeobox protein-3) gene encodes the transcription factor ATBF1 (AT-motif binding factor 1). The gene is situated on the long arm of chromosome 16 (16q22), a region which frequently exhibits loss of heterozygosity in solid tumours. Functionally, ATBF1 inhibits cell proliferation through transcriptional negative regulation of c-Myb and transactivation of the cell-cycle inhibitor cyclin-dependent kinase inhibitor 1A (CDKN1A). It also down regulates the alpha-fetoprotein (AFP) oncoprotein, a plasma protein not usually present in normal adult organs but can be found in some adult cancer cells (such as hepatocellular, yolk sac and gastric). Consequently, ZFHX3 acts as a tumour suppressor gene and mutations have been reported in several cancer types - firstly in prostate cancer, and then in breast, colorectal, endometrial, gastric, lung and salivary gland tumours, and neuroblastoma. Mutation types cover the spectrum of changes - missense, small insertion and deletions (both frameshift and non-frameshift), nonsense mutations causing truncation of the encoding protein and intronic changes affecting splice mechanisms.Mutations in ZFHX3 are also a risk factor for atrial fibrillation, a cardiac arrhythmia strongly correlated with cancer incidence.

 

DGCR8, SIX1 and SIX2

The miRNA processing gene DiGeorge Syndrome Critical Region 8 (DGCR8) and the renal development genes SIX homeobox 1 and SIX homeobox 2 ( SIX1 and SIX2) are somatically mutated in the embryonal kidney neoplasia Wilms Tumour (WT). DGCR8 is part of the DROSHA microprocessor complex, which recognises and cleaves a pri-miRNA to release a pre-miRNA. Several DGCR8 mutations have been reported in WT and are often associated with chr22 loss of heterozygosity. The recurrent p.E518K mutation, located at the first dsRNA binding domain, has been shown to cause reduction in critical mature miRNAs in tumours. The highly homologous SIX1 and SIX2 genes are essential for progenitor renewal and early renal development. Loss of SIX2 has been shown to result in epithelial differentiation and loss of nephron progenitors. A recurrent mutation located in the homeodomain, p.Q177R, is found in both SIX1 and SIX2 in WTs and is thought to act dominantly, altering the DNA binding properties and thus upregulating cell cycle genes involved in kidney development. SIX1, SIX2 and DGCR8 mutants can be seen early in tumour development or appear at later stages and show evidence of association with poor outcome and disease progression, often being observed in chemotherapy resistant tumours and/or at recurrence. SIX1/2 mutants observed in combination with DGCR8 or other miRNA processing gene mutations in a single tumour show evidence of RAS activation and a higher rate of relapse and death.

 

NCOA2

The nuclear receptor coactivator 2 (NCOA2) gene encodes a transcriptional coactivator (SRC-2) that modulates gene expression by hormone receptors. In prostate cancer, NCOA2 is found to be both amplified and mutated. The genomic and functional data suggest that NCOA2 functions as a driver oncogene in primary tumours by increasing AR signalling, which is known to play a critical role in early and late stage prostate cancer. However, NCOA2 has many additional targets, including genes involved in cell-cycle regulation, signal transduction, apoptosis, immunity, and transport, which also may contribute to tumorigenesis. In liver cancer NCOA2 has been proposed to act as a tumour suppressor. Deletion of NCOA2 in mice promotes diethylnitrosamine (DEN)-induced liver tumorigenesis. Low levels of NCOA2 and its target glucose-6-phosphatase (G6pc) in HCC patients are associated with poor survival. NCOA2 may promote liver tumorigenesis in cooperation with Myc. NCOA2 mutations have also been reported in melanoma and lung cancer where they clustered in two highly conserved regions of the gene, and several other cancers.

 

TP63

Large scale exome sequencing studies have identified mutations in genes involved in the differentiation programme of squamous epithelium and the Notch/p63 axis, including TP63, as drivers of squamous cell carcinoma of the head and neck. Recurrent missense and nonsense mutations in TP63 have been found.

 

ACVR2A

ACVR2A, activin A receptor type 2A, encodes a transmembrane serine-threonine kinase receptor that mediates the functions of activins, members of the transforming growth factor-beta superfamily. ACVR2A acts as a tumour suppressor gene with a hotspot at an 8-base pair polyadenine tract in exon 10 where truncating frameshift mutations occur in gastrointestinal cancers with microsatellite instability.

 

LZTR1

LZTR1 (Leucine Zipper Like Transcription Regulator 1) encodes a BTB-Kelch protein that localises to the golgi and acts as a tumour suppressor. Somatic mutations in LZTR1 have been observed across a number of different cancer types, including endometrial, skin and colorectal cancers. They are also seen in glioblastoma, where they have been demonstrated to co-occur with copy number loss. Predisposing germline mutations and loss of heterozygosity are frequently seen in schwannomatosis.

 

PIK3CA (update)

PIK3CA encodes a key component of the PI3K pathway, which plays a key role in many different cancers and is a recognised drug target. Somatic mutations in PIK3CA occur with high frequency, in particular in colorectal, breast and endometrial cancers. In the current release we have updated PIK3CA, focusing on adding novel mutations and papers describing cancers in which it is less well described, for example salivary duct carcinoma, vulval carcinoma and overgrowth syndromes.

 

POLE (update)

Over the last few COSMIC releases we have significantly updated the database with the latest POLE (DNA polymerase epsilon) related literature. Hotspot mutations, such as p.P286R, in the POLE exonuclease domain are associated with an ultramutated tumour phenotype which often includes elevated levels of other driver gene mutations. The mutational signature can be used to subclassify endometrial and colorectal cancers, guide the treatment and act as a prognostic marker. POLE ultramutated tumours are likely to be sensitive to immune checkpoint inhibitors and there are several ongoing trials investigating these agents alone or in combination with chemotherapy or other biological agents.

Curated Gene Fusions

TBL1XR1-TP63

TBL1XR1-TP63 has been identified as a recurrent fusion in diffuse large B cell lymphoma where it is exclusive to the germinal centre B cell-like subtype. TP63 encodes a member of the p53 family of transcription factors with functional domains including an N-terminal transactivation domain, a central DNA-binding domain and an oligomerization domain. TBL1XR1, transducin beta-like 1 X-linked receptor 1, is a member of the WD40 repeat-containing gene family and encodes a component of both nuclear receptor corepressor and histone deacetylase 3 complexes. In all fusion transcripts the TP63 breakpoint is consistent at exon 4, losing the N-terminal domain and conserving the distal reading frame. TBL1XR1-TP63 has also been found in peripheral T cell lymphoma, where this fusion and ALK rearrangements were mutually exclusive.

 

Cancer Gene Census (CGC)

New CGC genes (Tier 1)

SIX1BAX

New CGC genes (Tier 2)

Please note that RAD17 is only available on the GRCh37 genome version

SIX2ARHGEF10LARHGEF10BAZ1ACASP3CASP9CPEB3GPC5IGF2BP2LEPROTL1MGMTN4BP2PTPRDRAD17RFWD3SETD1BSOX21USP44

Based on the concept defined by D. Hanahan and R. A. Weinberg, COSMIC, in collaboration with Open Targets, integrates functional descriptions focused on Hallmarks of Cancer into the CGC. The Hallmark pages visually explain the role of a gene in cancer by highlighting which of the classic behaviours are displayed by the gene and whether they are promoted or suppressed. A concise overview with associated references is available for 227 census genes and this will continue to be expanded.

All CGC genes have been re-evaluated and classified with regard to their function in cancer, as oncogenes or tumour suppressive genes, as well as genes participating in fusions, where applicable.

To be able to provide high-confidence and comprehensive data, the CGC has been divided into two tiers.

To classify into Tier 1 of the CGC, a gene must possess a documented activity that may drive or suppress cancer, and there must be evidence of mutations in this gene, detected in cancer, and changing the activity of the protein in a way that promotes the oncogenic transformation. We also take into account the existence of somatic mutation patterns in cancer samples, typical for tumour suppressor genes (broad range of inactivating mutations) or oncogenes (well defined hotspots of missense mutations).

Tier 2 of the Cancer Gene Census consists of genes with strong indications of roles in cancer but with less expansive available evidence, compared to Tier 1. It currently contains 127 genes and is being expanded.

The complete CGC list (Tier 1 and 2) is available here, but please note that any reference to the CGC (or 'Census') across the website which doesn't specify tier, refers to the Tier 1 list.

Systematic Screen Papers

Follow links below to the 11 papers which are new in v84, or view the full table of papers here.

COSP35171COSP40352COSP41135COSP41330COSP42475COSP43194COSP43748COSP44132COSP44142COSP44360COSP44579

COSMIC Statistics:

1,358,288
Samples (+15,074)
5,448,850
Coding mutations (+82,577)
25,807
Papers (+306)
18,926
Fusions (+81)
33,291
Whole genomes (+777)
1,195,556
Copy number variants (+14,767)
9,176,464
Gene expression variants (+0)
7,879,142
Differentially methylated CPGs (+0)
18,056,654
Non Coding Variants (+1,095,049)

COSMIC v83 - 7th November 2017

COSMIC v83 (November 2017) includes 3 new fully curated genes, a substantial curation update for VHL, 1 new fusion pair, 1,138 genomes from 13 new systematic screen papers, updates from ICGC release 25, and updated resistance mutation data; 8 new samples and 11 new resistance mutations. We have also added 5 new genes to the Cancer Gene Census (Tier 1) and expanded the census table functionality to display both tiers. In this release we have retired the legacy website but in response to user feedback we have added full support for the GRCh37 coordinate system across the main website.

Data Updates

  1. New fully curated cancer genes (4);
    • TGFBR2 - 6,164 samples, 857 mutations, 183 papers
    • ERBB4 - 7,967 samples, 232 mutations, 296 papers
    • BCL9L - 271 samples, 8 mutations, 83 papers
  2. Substantial curation update (1);
    • VHL - 15,790 samples (+1,140), 2,487 mutations (+322), 569 papers (+80)
  3. Curated Gene Fusions (1);
    • ETV6-ABL1 - 595 samples, 32 mutations, 24 papers
  4. Cancer Gene Census;
  5. Resistance Mutations Update;
  6. Whole Genome data
    • 13 Systematic Screen Papers; 1,138 genomes
    • Updates from ICGC release 25

COSMIC's cancer genome data is interpreted into standardised annotations from a variety of sources, described here.

Support for GRCh37 and 38 Reference Genomes

We have added a new menu to the main navigation bar on the COSMIC website called 'Genome Version'. The default is set to GRCh38 but GRCh37 can be selected from this menu. When set to GRCh37 the 'GRCh37 Archive' logo will appear at the top of each web page. Please note that you will need to enable cookies in order for this to work.

Curated Genes

TGFBR2

TGFBR2, transforming growth factor beta receptor 2, encodes a transmembrane member of the Ser/Thr protein kinase family which forms a heterodimeric complex with TGF-beta receptor type-1 and binds TGF-beta. TGFBR2 acts as a tumour suppressor gene in colorectal cancer, where its mutational inactivation is the most common genetic event affecting the TGF-beta signalling pathway, occurring in approximately 30% of these cancers. A 10-base pair polyadenine tract in the extracellular domain is a hotspot, where insertions and deletions result in a frameshift and a non-functioning protein lacking the receptor's transmembrane domain and intracellular kinase domain. These mutations are common in cancers displaying microsatellite instability, with unique clinicopathological features, including an increased incidence in the proximal colon, presentation at an early stage and better prognosis than microsatellite stable (MSS) colon cancer. The frameshift mutations also occur in gastric cancer and missense mutations are found in the kinase domain in MSS colon cancer. Genome studies have identified TGFBR2 as a significantly mutated gene in cervical cancer, and head and neck squamous cell carcinoma.

ERBB4

ERBB4 is a member of the Epidermal Growth Factor Receptor (EGFR) subfamily of receptor tyrosine kinases, along with EGFR, ERBB2 and ERBB3. Ligands include neuregulins and several EGF family members. Activated ERBB4 can function as both a homodimer and a heterodimer with other EGFR family members, resulting in a range of cellular responses. A comparatively less well understood member of the EGFR family, somatic mutations in ERBB4 are seen across various cancers (including breast, lung and melanoma) and in various different regions of the gene. No hotspot mutations have been identified. It has been proposed to act as both an oncogene and a tumour suppressor and is being investigated as a potential drug target.

BCL9L

BCL9L (B-cell CLL/lymphoma 9 like) is a co-activator of Wnt/beta-catenin signalling. It increases the expression of a subset of Wnt target genes but also regulates genes that are required for early stages of intestinal tumour progression. Somatic loss-of-function alterations in BCL9L are frequent in aneuploid colorectal carcinoma but are also found in other tumour types at lower frequency. BCL9L has been proposed to function as an oncogene or as a tumour suppressor depending on the cellular context.

VHL (updated)

VHL is a tumour suppressor gene that plays a role in a rare inherited disorder called Von Hippel-Lindau syndrome but also in sporadic forms of cancer. The current update in COSMIC brings together the historic collection and the latest published data on the somatic mutations in the VHL gene, including novel mutations and VHL mutations in new histological entities and ethnic groups. Early inactivation of VHL is commonly seen in ccRCC, the most common form of renal cancer. A recent publication by Corr?? et al. (28214514) explores the feasibility of using circulating tumour DNA as a biomarker in this disease. Cho et al. (27994516) sequenced Taiwanese pancreatic neuroendocrine tumours (pNETs) for a large customised panel of genes. They observed that Asian patients with pNETs were more frequently mutated for the mTOR and angiogenesis (including VHL) pathways when compared to Caucasian patients, which could partially explain the better outcome observed for targeted therapy in Asian patients with pNETs. Other reports analysed VHL mutations in tumour types such as parotid mucoepidermoid carcinoma, glioblastoma, breast cancer, colorectal cancer, and clear cell microcystic adenoma.

Curated Gene Fusions

ETV6-ABL1

ETV6-ABL1, resulting from t(9;12)(q34;p13) or a complex rearrangement, is a rare but recurrent fusion in a wide range of haematological malignancies including myelodysplastic neoplasm, acute lymphoblastic leukaemia, acute myeloid leukaemia and Philadelphia chromosome-negative chronic myeloid leukaemia. ETV6 encodes an ETS family transcription factor which contains two functional domains, an N-terminal pointed domain that is involved in protein-protein interactions with itself and other proteins, and a C-terminal DNA-binding domain. Two types of ETV6-ABL1 transcript are detected: type A has an ETV6 breakpoint at exon 4 and type B at exon 5. The ABL1 breakpoint is consistent at exon 2. Both types result in constitutive tyrosine kinase activity similar to that seen with the BCR-ABL1 fusion. Eosinophilia is a common characteristic of patients with ETV6-ABL1 fusion.

Cancer Gene Census (CGC)

New CGC genes (Tier 1)

BARD1IRS4PIK3CBPOLD1POLQ

Based on the concept defined by D. Hanahan and R. A. Weinberg, COSMIC, in collaboration with Open Targets, integrates functional descriptions focused on Hallmarks of Cancer into the CGC. The Hallmark pages visually explain the role of a gene in cancer by highlighting which of the classic behaviours are displayed by the gene and whether they are promoted or suppressed. A concise overview with associated references is available for 227 census genes and this will continue to be expanded.

All CGC genes have been re-evaluated and classified with regard to their function in cancer, as oncogenes or tumour suppressive genes, as well as genes participating in fusions, where applicable.

To be able to provide high-confidence and comprehensive data, the CGC has been divided into two tiers.

To classify into Tier 1 of the CGC, a gene must possess a documented activity that may drive or suppress cancer, and there must be evidence of mutations in this gene, detected in cancer, and changing the activity of the protein in a way that promotes the oncogenic transformation. We also take into account the existence of somatic mutation patterns in cancer samples, typical for tumour suppressor genes (broad range of inactivating mutations) or oncogenes (well defined hotspots of missense mutations).

Tier 2 of the Cancer Gene Census consists of genes with strong indications of roles in cancer but with less expansive available evidence, compared to Tier 1. It currently contains 127 genes and is being expanded.

The complete CGC list (Tier 1 and 2) is available here, but please note that any reference to the CGC (or 'Census') across the website which doesn't specify tier, refers to the Tier 1 list.

Systematic Screen Papers

Follow links below to the 13 papers which are new in v83, or view the full table of papers here.

COSP37488COSP39873COSP41061COSP41673COSP42002COSP42117COSP42498COSP42745COSP42774COSP42802COSP42924COSP43750COSP44021

COSMIC Statistics:

1,343,214
Samples (+16,867)
5,366,273
Coding mutations (+530,287)
25,501
Papers (+331)
18,845
Fusions (+36)
32,514
Whole genomes (+1,238)
1,180,789
Copy number variants (+0)
9,176,464
Gene expression variants (+428)
7,879,142
Differentially methylated CPGs (+0)

COSMIC v82 - 3rd August 2017

COSMIC v82 (August 2017) includes 4 new fully curated genes, a substantial curation update for SMAD4, 1 new fusion pair, 342 genomes from 11 new systematic screen papers, updates from ICGC release 24, and updated resistance mutation data; 1 new drug and 4 updated. We also launch the new COSMIC website featuring new styles and layout as well as an enhanced version of the Cancer Gene Census and additional website download options.

New COSMIC Webite

The new COSMIC website has now been launched. We welcome your feedback, please email cosmic@sanger.ac.uk with any issues or suggestions for improvement.

The old websites have been updated to v82 and will continue as the legacy website and GRCh37 (archive) legacy website. These will be available until the next release in November 2017, but we do not plan to maintain them beyond that date. However, we will continue to provide our download files as both GRCh38 and GRCh37 versions for the foreseeable future.

New features include -

  • - Updated styles and page layouts
  • - New Cancer Gene Census pages (Hallmarks of cancer)
  • - New 'Targeted Screen' filter on the gene page and cancer browser
  • - Option to download filtered datasets from download page directly, avoiding SFTP (requires login)

Oracle download

For users who download the COSMIC Oracle database dumps, please note that we now only support Oracle 12c. This is because Oracle 11.2 is no longer supported by Oracle.

Data Updates

  1. New fully curated cancer genes (4);
    • KEAP1 - 2,665 samples, 149 mutations, 94 papers
    • DROSHA - 1,774 samples, 121 mutations, 83 papers
    • BTK - 1,810 samples, 55 mutations, 83 papers
    • EPAS1 - 964 samples, 72 mutations, 96 papers
  2. Substantial curation update (1);
    • SMAD4 - 13,198 samples, 710 mutations, 443 papers
  3. Curated Gene Fusions (1);
  4. Cancer Gene Census;
    • 49 genes removed. See section below for details.
    • New web pages integrate functional descriptions focused on Hallmarks of Cancer.
  5. Resistance Mutations Update;
  6. Whole Genome data
    • 11 Systematic Screen Papers; 342 genomes
    • ICGC release 24; 5 new studies, 1 updated

COSMIC's cancer genome data is interpreted into standardised annotations from a variety of sources, described here.

Curated Genes

KEAP1

Kelch-like ECH-associated protein 1 (KEAP1) is a component of the Cullin 3-based E3 ubiquitin ligase complex and controls the stability and accumulation of NRF2 protein. When cells are exposed to oxidative damage, KEAP1 releases NRF2 which translocates into the nucleus where it specifically recognises an enhancer sequence known as Antioxidant Response Element (ARE) resulting in the activation of redox balancing genes. Several studies have reported somatic mutations of the interacting domain between KEAP1 and NRF2 leading to a permanent NRF2 activation. Somatic mutations of the KEAP1 gene are found in non-small cell lung cancer, hepatocellular carcinoma, endometrial cancer, melanoma and many other cancer types and have been associated with a poor outcome and resistance to chemotherapy. The mutations are generally widely distributed in the KEAP1 gene and the frequency of mutations depends on the cancer type and origin.

DROSHA

microRNAs (miRNA) are vital regulators of gene expression. Together with its co-factor DGCR8, the miRNA processing gene DROSHA (drosha ribonuclease III) is involved in the early stages of miRNA processing and is essential for the biogenesis of most miRNAs. Low DROSHA expression levels are observed in several cancer types, including neuroblastoma, endometrial and ovarian cancer, and are associated with advanced stages of several cancer types. In contrast, copy number increases (seen in advanced cervical squamous cell carcinoma) and over-expression are observed in other cancer types, including serous ovarian carcinoma, gastric and non-small cell lung cancers, often associated with prognosis or progression. DROSHA is frequently mutated in Wilms tumour, with the majority of mutations found in the RNase IIIb domain, at p.E1147. The recurrent mutation p.E1147K affects miRNA processing via a dominantnegative mechanism resulting in down regulation of miRNAs.

BTK

BTK encodes Bruton tyrosine kinase, a TEC family cytoplasmic tyrosine kinase required for the development, activation and differentiation of B cells, and an early component of the B-cell receptor signalling pathway. Recurrent mutations at BTK C481 have been identified in patients with chronic lymphocytic leukaemia (CLL) who have progressed after an initial response to ibrutinib treatment. Ibrutinib is a highly specific BTK inhibitor, inactivating by irreversible binding to C481 within the ATP-binding domain of BTK. While C481 mutations are most common among CLL patients who progress on ibrutinib, mutations at the non-kinase SH2 domain at T316 have also been reported. Progression of mantle cell lymphoma after a durable response to ibrutinib may also be due to C481 BTK mutation. This same mutation has also been detected in Waldenstrom macroglobulinaemia patients progressing on ibrutinib.

EPAS1

Hypoxia-inducible factors (HIFs) are transcription factors that respond to changes in tissue oxygen concentration. One of these, Hypoxia-inducible factor 2-alpha (HIF-2-alpha), is encoded by EPAS1. Somatic mutations in EPAS1 occur recurrently in sporadic pheochromocytomas and paragangliomas, as well as in somatostatinomas as part of Pacak-Zhuang syndrome (multiple paragangliomas and somatostatinomas associated with polycythaemia). In some patients with multiple tumours, these somatic EPAS1 mutations are mosaic, having arisen post-zygotically. The majority of somatic EPAS1 mutations are found in exon 12, and gain of function mutations in this region have been shown to cause stabilisation of the HIF2A protein, resulting in transcription of genes involved in the hypoxia response and promotion of angiogenesis and proliferation.

SMAD4 (updated)

The expertly curated data for SMAD4 have been updated. Over 40 publications which include screening of SMAD4, often alongside other genes, are included in this release. SMAD4 encodes a member of the Smad family of signal transduction proteins which plays a pivotal role in signal transduction of the transforming growth factor beta superfamily cytokines by mediating transcriptional activation of target genes. SMAD4, a tumour suppressor gene, is one of the major driver genes in pancreatic cancer. A lack of SMAD4 mutations in high-grade pancreatic intraepithelial neoplasia, the major precursor of pancreatic ductal adenocarcinoma, indicates these are late genetic alterations in pancreatic carcinoma. SMAD4 mutations are also found in colorectal carcinoma (CRC), where they have a prognostic role in metastatic CRC cases, and less frequently in other tumours, including lung cancer.

Curated Gene Fusions

SET-NUP214

The SET-NUP214 fusion results from a recurrent genetic abnormality at 9q34 and is found predominantly in T-cell acute lymphoblastic leukaemia (T-ALL), with a reported frequency of up to 10%. The fusion is rarely detected in acute myeloid leukaemia, acute undifferentiated leukaemia and B-cell acute lymphoblastic leukaemia. In T-ALL, the SET-NUP214 fusion is associated with elevated expression of HOXA cluster genes and with corticosteroid/chemotherapy resistance. SET encodes a protein with a critical role in chromatin binding and remodelling, while NUP214 encodes an FG-repeat-containing nucleoporin involved in the cell cycle and transportation of material between the nucleus and cytoplasm. Most commonly the breakpoints in the SET-NUP214 transcript are at exon 7 of SET and exon 18 of NUP214.

Cancer Gene Census

Based on the concept defined by D. Hanahan and R. A. Weinberg, COSMIC, in collaboration with Open Targets, integrates functional descriptions focused on Hallmarks of Cancer into the Cancer Gene Census. New Hallmark pages visually explain the role of a gene in cancer by highlighting which of the classic behaviours are displayed by the gene and whether they are promoted or suppressed. A concise overview with associated references is available for 226 census genes and will be expanded on a regular basis.

All Cancer Gene Census (CGC) genes have been re-evaluated and classified with regard to their function in cancer, as oncogenes or tumour suppressive genes, as well as genes participating in fusions, where it was applicable.

To be able to provide high-confidence and comprehensive data, we have divided the CGC into two tiers. Currently, only the Tier 1 genes are shown on the website and in the download files.

To classify into Tier 1 of the CGC, a gene must possess a documented activity that may drive or suppress cancer, and there must be evidence of mutations in this gene, detected in cancer, and changing the activity of the protein in a way that promotes the oncogenic transformation. We also take into account the existence of somatic mutation patterns in cancer samples, typical for tumour suppressor genes (broad range of inactivating mutations) or oncogenes (well defined hotspots of missense mutations).

Tier 2 of the Cancer Gene Census consists of genes with strong indications of roles in cancer but with less expansive available evidence, compared to Tier 1. It currently contains 41 genes from the previous release of the Cancer Gene Census and is being expanded, with a planned initial release of about 200 genes in November 2017, along with COSMIC v83.

Complete census list (Tier 1) is available here

CGC Genes moved to Tier 2 of the Cancer Gene Census

1. PMS1 - PMS1, a component of DDR, only one recurrent frameshift mutation K164fs*6 in four samples, newer papers about MMR genes in cancer don't mention this gene, mice deficient in PMS1 do not develop tumours, no evidence for significant activity in MMR in vitro [PMID: 10542278]

2. Fusion genes with only one case (or rare partners of potent oncogenes known to be fused to multiple partners and able to drive the transformation on their own):

  • CGC GeneFusion partner
      
    C15orf65CIITA
    COX6CHMGA2
    FNBP1KMT2A
    GMPSKMT2A
    KIAA1598ROS1
    NCKIPSDKMT2A
    OMDUSP6
    PWWP2AROS1
    TFRCBCL6
    THRAP3USP6
    ZCCHC8ROS1

3. Fusion genes transcribed with a shifted reading frame or untranscribed upon fusion, for which there is no sufficient evidence for tumour suppressing activity:

  •    ACSL6
  •    ALDH2
  •    HMGN2P46 (also a pseudogene)
  •    LHFP
  •    MDS2
  •    RALGDS
  •    RUNDC2A
  •    TFPT

4. Non-coding genes and pseudogenes do not fit to the current schema of Tier 1 of the Cancer Gene Census. We are working on better characterisation of the role of such genes in cancer. Temporarily they are classified as Tier 2 CGC genes:

  •    DUX4L1
  •    MALAT1

5. Genes known to be involved in cancer only through fusions, where the oncogenic mechanisms depend on disruption of the structure of their fusion partner and there is no evidence of their other cancer-promoting activity so far:

  • CGC GeneFusion partner
      
    AKAP9 BRAF
    CEP89 BRAF
    ELN PAX5
    FAM131B BRAF
    KIAA1549 BRAF
    LSM14A BRAF
    SRGAP3 BRAF

6. Genes known to be involved in cancer only through fusions, for which there is not enough data describing their participation in oncogenic transformation

  • CGC GeneFusion partnerComment
       
    CHIC2 ETV6 CHIC2 is also deleted in process of FIP1L1 to PDGFRA fusion generation,
        but no function of CHIC2 suggesting tumour suppressive, or other cancer-related properties, has been described
       
    CLP1 MLLT10 This fusion is generated through an insertion and coexists with MLLT10-KMT2A
       
    JAZF1 SUZ12 fusion protein is antiapoptotic, but identical to the fusion product arising from physiologically regulated trans-splicing,
        it is possible that the presence of fusion transcript in cancer cells is a result rather than a cause of transformation
       
    LCP1 BCL6 LCP1 is involved in invasion and metastasis and is a biomarker of renal cancer. Lai et al.
        classify the LCP1-BCL6 fusion as a secondary structural change that increases invasive capacity
        of the developed NHL [PMID: 9614913]. More evidence is needed before classifying this gene as a Tier 1 CGC gene
    MNX1 ETV6 Beverloo et al. suggest that the transformation mechanism may be fully dependent on disruption of ETV6 structure
        and function [PMID: 11454678],the involvement of MNX1 in oncogenesis remains to be confirmed
       
    NACA BCL6 There is evidence of antiapoptotic activity of NACA and its involvement in regulation of haematopoiesis;
        it is a rare fusion partner of BCL6 in NHL. More evidence is needed before classifying this gene as a Tier 1 CGC gene
       
    SEPT5, SEPT6, SEPT9 KMT2A Septins form a distinct class of MLL fusion partners, but these fusions are rare;
        there is no data on septins function in cancer and they have no recurrent somatic mutations in cancers
       
    SPECC1 PDGFRB in fusion with PDGFRB in 1 sample, has a recurrent somatic N303fs mutation in colon adenocarcinoma
        and in other cancers, may be a TSG, but the functional data is still missing
       
    VTI1A TCF7L2 The mechanism of oncogenesis in case of VTI1A-TCF7l2 fusion is TCF7L2-dependent
        and the role of VTI1A is not determined [PMID: 21892161]

Genes removed from the Cancer Gene Census (Tier 1 and 2)

  • GeneReason for removal
      
    BCL5 obsolete name of BCL6, currently the name of a phenotype, all data in COSMIC refers to BCL6
      
    C12orf9 withdrawn from all gene databases translocation non-coding RNA sequence,
      partner of LPP in the LPP-LRFT fusion transcript. It has been shown, that truncated LPP is oncogenic on its own
      
    CDKN2A(p14) CDKN2A and CDKN2A(p14) have been merged, as it is a single gene coding both p14 and p16 TSGs
      
    PCSK7 Pafah1B2 (a CGC gene) and not PCSK7, located in the same unstable genomic region,
      is the gene disrupted by the translocation [PMID:10362256], only one paper describes the fusion between IGH and 3'UTR of PCSK7
      
    RANBP17 only in one ALL case TRD was fused to exon 24 of RANBP17 [PMID:12399963], however TSS of TLX3,
      a CGC gene implicated in ALL, is located just 10kb downstream of RANBP17
      
    RNF217-AS1 oncogenic properties may not arise from a fusion protein but rather from the disruption
      or the transcriptional deregulation of the STL/RNF217 locus
      
    TCL6 in fusion with TRA, lncRNA located in a breakpoint region;
      closest coding genes are TCL1B and TCL1A - in CGC - oncogenes involved in T-cell leukemias
      
    TTL it is an obsolete name of LINC00598 non-coding RNA, involved in fusion with ETV6 in ALL,
      TTL is a current name of a non-cancer gene

In total, the following 49 genes have been removed from Tier 1 of the CGC:

NCKIPSDAKAP9ALDH2BCL5C12orf9HMGN2P46CDKN2A(p14)CHIC2COX6CDUX4L1ELNACSL6C15orf65FNBP1GMPSSPECC1CLP1MNX1JAZF1KIAA1549LCP1LHFPMDS2SEPT9NACAOMDPCSK7PMS1SEPT5RALGDSRANBP17RUNDC2ASEPT6SRGAP3RNF217-AS1TCL6TFPTTFRCTHRAP3TTLVTI1AECT2LMALAT1PWWP2AZCCHC8KIAA1598CEP89LSM14AFAM131B

ICGC release 24

New ICGC Studies:

COSU647 LIAD-FRCOSU672 GBM-CNCOSU675 PRAD-FRCOSU676 THCA-CNCOSU677 UTCA-FR

New Copy number data:

COSU535 ESAD-UK

Systematic Screen Papers

Follow links below to the 11 papers which are new in v82, or view the full table of papers here.

COSP39924COSP40964COSP41366COSP42109COSP42647COSP42872COSP42908COSP42955COSP43068COSP43148COSP43326

COSMIC Statistics:

1,326,347
Samples (+24,554)
4,835,986
Coding mutations (+359,460)
25,170
Papers (+327)
18,809
Fusions (+46)
31,276
Whole genomes (+1,471)
1,180,789
Copy number variants (+12,732)
9,176,036
Gene expression variants (+574)
7,879,142
Differentially methylated CPGs (+0)

COSMIC v81 - 9th May 2017

COSMIC v81 (May 2017) includes 6 new fully curated genes, a substantial curation update for TET2, 1 new fusion pair, 220 genomes from 9 new systematic screen papers and updated resistance mutation data; 1 new drug and 5 updated. We also announce the launch of a new COSMIC beta site featuring new styles and layout as well as an enhanced version of the Cancer Gene Census and additional website download options.


COSMIC Beta Site

The new COSMIC Beta site http://cancer-beta.sanger.ac.uk has now been launched. This site will be under continual update over the next 3 months and will be regularly updated. We welcome your feedback, please email cosmic@sanger.ac.uk with any issues or suggestions for improvement.

New features include -

  • - Updated styles and page layouts
  • - New Cancer Gene Census pages (Hallmarks of cancer)
  • - New 'Targeted Screen' filter on the gene page and cancer browser
  • - Option to download filtered datasets from download page directly, avoiding SFTP (requires login)

Oracle download

For users who download the COSMIC Oracle database dumps, please note that from v82 we will only support Oracle 12c. This is because Oracle 11.2 is no longer supported by Oracle.


Data Updates

  1. New fully curated cancer genes (6);
    • DDR2 - 3,879 samples, 278 mutations, 95 papers
    • SMAD2 - 3,531 samples, 217 mutations, 82 papers
    • SMAD3 - 2,836 samples, 212 mutations, 85 papers
    • PREX2 - 2,909 samples, 919 mutations, 138 papers
    • NCOR1 - 2,172 samples, 595 mutations, 129 papers
    • PPM1D - 912 samples, 156 mutations, 56 papers
  2. Substantial curation update (1);
    • TET2 - 17,144 samples (+2,027), 2,946 mutations (+277), 606 papers (+46)
  3. Curated Gene Fusions (1);
    • SET-NUP214 - 473 samples, 41 mutations, 14 papers
  4. Cancer Gene Census;
    • New web pages integrate functional descriptions focused on Hallmarks of Cancer.
  5. Resistance Mutations Update;
    • Imatinib: 20 new samples, 3 new unique resistant mutations
    • Tyrosine kinase inhibitor-NS: 30 new samples
    • Gefitinib: 1 new sample
    • Crizotinib: 1 new gene (MET), 3 new samples, 5 new unique resistant mutations
    • Nilotinib: 1 new gene (KIT), 1 new sample, 1 new unique resistant mutation
    • New drug:
      • Savolitinib: 1 gene (MET), 1 sample, 1 unique resistant mutations
    • Change of drug name:
      • AZD9291: changed to Osimertinib
  6. Merging of duplicate mutations
    • Copy Number Variations have been merged where there were multiple data sources for the same sample ID.This has resulted in a slight drop in the overall number of CNV segments in v81.
  7. Genome Data
    • Re-loaded ICGC study COSU417 LUAD-US (307 samples with 130,235 novel mutations).
    • Cell Lines Project, copy number download; 4 samples were missing GRCh38 coordinates, these have now been included.
    • Systematic Screens; 9 new papers (220 new genomes)
    • SNVs and indels loaded from a Colorectal Cancer Organoids study from the suppresSTEM consortium.

COSMIC's cancer genome data is interpreted into standardised annotations from a variety of sources, described here.

Curated Genes

DDR2

Oncogenic gain-of-function mutations in DDR2 have been identified in squamous cell carcinoma (SqCC) of the lung. DDR2 encodes the discoidin domain receptor 2, a collagen-stimulated receptor tyrosine kinase. These kinases are involved in the regulation of cell differentiation, cell migration and cell proliferation. DDR2 mutations are present in 4% of lung SqCC where they are associated with sensitivity to dasatinib. Low frequency DDR2 mutations have been found in other cancer types such endometrial, kidney, brain, breast and colorectal, and in recurrent/metastatic head-neck SqCC.

SMAD2 and SMAD3

Mutations in SMAD2 and SMAD3 occur at very low frequency in various cancers types. SMAD2 mutations have been found in cervical and colorectal cancer, hepatocellular carcinoma and non-small cell lung cancer. SMAD3 mutations have been detected in colorectal cancer and in oral squamous cell carcinoma. Most of the mutations observed are missense mutations. Both SMAD2 and SMAD3 encode proteins which are major signalling molecules acting downstream of the serine/threonine kinase receptors.

NCOR1

NCOR1 (nuclear receptor corepressor 1) plays a part in maintenance of genomic integrity. It has been reported among the most frequently mutated drivers in breast cancer. Downregulation of NCOR1 expression abrogates HDAC3 function and results in genomic instability. Breast cancer patients with high NCOR1 expression levels have been found to have a better prognosis than those with low expression (Zhang et al., 2005). NCOR1 mutations also play a role in skin cancer, colorectal carcinoma and many other cancer types. Predicted damaging and somatic mutations in epigenetic regulators were detected in one third of high hyperdiploid acute lymphoblastic leukaemia (HD-ALL) patients (de Smith AJ 2016).

PPM1D

Protein phosphatase, Mg2+/Mn2+-dependent, 1D (PPM1D) encodes WIP1, a member of the PP2C family of serine/threonine protein phosphatases. PPM1D dephosphorylates DNA damage response mediators such as CHEK2 and p53, antagonising their function and promoting reentry into the cell cycle. Recurrent PPM1D mutations have been observed in brainstem gliomas, with many of these resulting in truncation of the C-terminal regulatory domain and leaving the phosphatase domain intact.

PREX2

Mutations across the PREX2 gene, including numerous truncating mutations, have been found in metastatic melanoma, including in desmoplastic melanoma, and also in other cancers such as basal cell carcinoma, pancreatic ductal and lung adenocarcinomas, and merkel cell carcinoma. PREX2 has been recognised as playing a role in melanoma for some years, although the precise nature of all the mechanisms of its involvement remain uncertain. Some in vivo and mouse studies have a demonstrated that cancer-associated PREX2 mutations promote the growth of human melanoma cells. It is a GTP/GDP exchange factor and both mutated and wild type PREX2 inhibit the tumour suppressor PTEN, but PTEN can no longer inhibit mutated PREX2, hence mutual inhibition is disrupted promoting tumour growth via activation of the PIK3 signalling pathway. Increased RAC-dependent invasiveness is also associated with mutated PREX2.

TET2 (updated)

TET2 (ten-eleven-translocation gene) is an epigenetic regulator responsible for converting DNA cytosine methylation to hydroxymethylation, a process disrupted by mutations which are known to be associated with myeloproliferative neoplasms (MPN), leukaemias and mastocytosis. An update of 46 publications which included screening of TET2, often along-side other genes or gene panels, has been carried out. Overall 2,027 new samples were curated which identified 277 new mutations of all types and located across the gene. Publications included reports of many haematopoietic and lymphoid disorders, as well as 2 where solid cancers progressed following hormone or tyrosine kinase therapy. One of these publications reported TET2 mutations associated with metastatic prostate cancer after hormone therapy and the second publication reported 12% TET2 mutated samples in non-small cell lung cancer progressions following tyrosine kinase therapy. MPN publications curated include those where TET2 was found associated with progression, and chronic myelomonocytic leukaemia, where mutated TET2 was predictive of inferior prognosis when co-occurring with ASXL1 mutation; myelodysplastic syndrome (MDS) and chronic eosinophilic leukaemia (CEL), including a report where mutated TET2 could help distinguish MDS/CEL from reactive disorders and hypereosinophilic syndrome respectively. Leukaemia publications include HTLV-1 associated adult T cell associated leukaemia/lymphoma (with TET2 as the most commonly mutated gene); angioimmunoblastic T cell leukaemia and peripheral T cell leukaemia, where TET2 mutation are associated with shorter PFS; And somatic TET2 mutation associated with AML in a family with familial platelet disorder.

Curated Gene Fusions

SET-NUP214

The SET-NUP214 fusion results from a recurrent genetic abnormality at 9q34 and is found predominantly in T-cell acute lymphoblastic leukaemia (T-ALL), with a reported frequency of up to 10%. The fusion is rarely detected in acute myeloid leukaemia, acute undifferentiated leukaemia and B-cell acute lymphoblastic leukaemia. In T-ALL, the SET-NUP214 fusion is associated with elevated expression of HOXA cluster genes and with corticosteroid/chemotherapy resistance. SET encodes a protein with a critical role in chromatin binding and remodelling, while NUP214 encodes an FG-repeat-containing nucleoporin involved in the cell cycle and transportation of material between the nucleus and cytoplasm. Most commonly the breakpoints in the SET-NUP214 transcript are at exon 7 of SET and exon 18 of NUP214.

Cancer Gene Census

Complete census list available here

Based on the concept defined by D. Hanahan and R. A. Weinberg, COSMIC, in collaboration with Open Targets, integrates functional descriptions focused on Hallmarks of Cancer into the Cancer Gene Census. New Hallmark pages visually explain the role of a gene in cancer by highlighting which of the classic behaviours are displayed by the gene and whether they are promoted or suppressed. A concise overview with associated references is initially available for 116 census genes and will be expanded on a regular basis.

Systematic Screen Papers

Follow links below to the 9 papers which are new in v81, or view the full table of papers here.

COSP43009COSP42418COSP41127COSP42048COSP42046COSP42047COSP42049COSP42050COSP41741

SNVs and indels have also been uploaded from a Colorectal Cancer Organoids study from the suppresSTEM consortium: COSU670


COSMIC Statistics:

1,301,793
Samples
4,476,526;
Coding mutations
24,843
Papers
18,763
Fusions
29,805
Whole genomes
1,168,057
Copy number variants
9,175,462
Gene expression variants
7,879,142
Differentially methylated CPGs

COSMIC v80 - 13th February 2017

COSMIC v80 (Feb 2017) includes a major new tool "COSMIC-3D" supporting target characterisation and pharmaceutical design alongside significant updates to our cancer genome and key cancer gene curations.


COSMIC-3D - expanding COSMIC's support for pharmaceutical design

We have a new interface to explore cancer mutations on 3D protein structures, "COSMIC-3D", now available for public evaluation. Produced in partnership with Astex Pharmaceuticals (Cambridge, UK), it shows interactive 3D visualisations of over 8000 human proteins (using PDB structures), with COSMIC mutations mapped, and options to see frequency and effect. Putative small-molecule drug pockets are identified, and can be explored alongside cancer mutations to identify, characterise and design molecules against new targets across oncology. All the information is correct, but as an beta-evaluation release we would value your feedback on the web interface, so we can make it as useful as possible.

New Curations in v80

In our traditional way, full and exhaustive literature curations are now provided across cancer genes USP8, FAT1, FAT4, CXCR4 and fusion pair PML-RARA; substantial curation updates are made to AR and CTNNB1 and the Cancer Gene Census describes 7 new genes. Genome-wide molecular profiles have been curated from the ICGC (release 23, Oct 7th 2016) and 421 new genomes have been added by curation of 18 systematic screen publications. For full details of the new content in v80 please see the Datasheet.

From our Pipeline

We use recommendations from the HGVS for syntax when annotating the data within COSMIC. As part of our ongoing commitment to data quality we are currently in the process of ensuring all our mutation data are described in the most modern ways, including the latest HGVS nomenclature and gene structures. Over the last 6 months we have been working on a new system to continually annotate COSMIC data to the latest standards. Of course, to ensure the new annotations are exactly correct, we are including expert manual oversight, so it takes a little time to completely validate our huge dataset. Once we have verified the precision of our system, it will be deployed in forthcoming releases.

Newsletter new

For more information about release v80 and other news please see the first issue of our Newsletter. We will be using this to communicate with you more frequently about the project and the exciting developments we have in the pipeline. This issue includes details about the COSMIC Workshop on March 6th and the beta release of COSMIC-3D


COSMIC v79 - 14th November 2016

COSMIC v79 (Nov 2016) includes substantial updates to our cancer genome and key cancer gene curations. Full literature curations are now provided across cancer genes PRKACA and AR, and fusion pair CBFA2T3-GLIS2; substantial curation updates are made, especially to GNAS, GNAQ, and GNA11, and the Cancer Gene Census describes 7 new genes. Genome-wide molecular profiles have been curated from the ICGC (release 22, Aug 2016) and 265 new genomes have been added by curation of 9 systematic screen publications. A new drug, Vismodegib, has been added to our Genetics of Drug Resistance, describing 19 therapy-resistance variants in the gene SMO.


Data Updates in brief (for full details of this latest release, please see the v79 Datasheet).

  1. New fully curated cancer genes;
    • PRKACA - 2,145 samples, 286 mutations, 55 papers
    • AR - 3,226 samples, 598 mutations, 141 papers
  2. Substantial updates to GNAS, GNAQ, and GNA11
  3. Curated Gene Fusions;
  4. Cancer Gene Census;
    • 7 new genes added
  5. Drug Resistance; 1 new gene (SMO) and 1 new drug (Vismodegib), 19 new unique resistance mutations curated.
  6. Genome Data
    • ICGC release 22; August 23rd 2016
      • Mutations; 2 new studies
      • Copy Number Variants; 2 new studies, 3 studies updated
      • Structural Variants; 1 new study
    • Systematic Screens; 9 new papers (265 new genomes)

COSMIC's cancer genome data is interpreted into standardised annotations from a variety of sources, described here.


Genetics of Drug Resistance

We now include drug resistance data for the gene SMO (Vismodegib) as well as updates for EGFR (Gefitinib,Erlotinib and Afatinib), ESR1 (Endocrine therapy) and ALK (Alectinib).

All drug resistance data is detailed here, describing our curations across 11 genes and 21 pharmaceuticals. Links are provided to explore this information in detail, with charts showing the landscape of resistance to drugs targeting mutations in the gene of interest.


Cancer Gene Census

7 genes have been added to the Cancer Gene Census:
EPAS1PTPRTPPM1DBTKPREX2TP63QKI

The complete list is available in the census table, which describes the role of each gene in cancer progression (tumour suppressor or oncogene). Currently this information is available for 244 census genes. This content, as well as additional functional annotation is being substantially expanded for future releases.


COSMIC and St Jude Children's Research Hospital data-sharing agreement

COSMIC data have been combined with the ProteinPaint data mining and visualization system at St. Jude Children's Research Hospital in Memphis TN, to support the discovery and understanding of genetic mutations in paediatric cancers [ .... read more ].


COSMIC Workshop March 2017

On Monday 6th March 2017 we are holding a workshop titled 'An introduction to COSMIC' at the Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

The course will begin with a presentation overviewing the COSMIC project, followed by a hands-on tutorial introducing the COSMIC website and strategies for exploring cancer variation data and investigating the genetic causes of human cancers. In addition, there will be short presentations describing exciting new developments scheduled for future release, and opportunities to engage the team in a group Q&A session and informal discussions about the COSMIC website and future plans.

Registration will open in January, but please email cosmic@sanger.ac.uk if you would like more information or wish to express an interest in attending.

If you would be interested in hosting a COSMIC workshop at your workplace, we would be very pleased to hear from you. Please contact the COSMIC helpdesk (cosmic@sanger.ac.uk)


Website Developments

We are planning to merge the functionality of the COSMIC and Whole Genomes websites in February 2017 (v80). We will be introducing a new 'whole genomes' filter on the gene and cancer browser pages, and as a consequence the Whole Genomes site will become redundant and will be retired.

An API and new web interfaces for downloading COSMIC data will also be developed and rolled out in 2017. As part of these developments, and due to incompatibility between BioMart (0.7) and the latest version of our Oracle databases, we are discontinuing support for the COSMICMart in this release.

If you have any questions about these changes please email the COSMIC helpdesk (cosmic@sanger.ac.uk).


COSMIC Statistics:

1,257,487
Samples
4,175,878
Coding Mutations
23,870
Papers
18,165
Fusions
29,112
Whole Genomes
2,113,866
Copy Number Variants
9,175,462
Gene Expression Variants
7,879,142
Differentially Methylated CpGs

COSMIC v78 - 5th September 2016

COSMIC has been updated significantly in v78 (Sept 2016). This major data release includes new full literature curations of cancer genes HIF1A, MTOR and PTPN13, drug resistance profiles across Sorafenib & Quizartinib, and a complete update of genome-wide analysis from the ICGC (release 21, May 2016). We have also added 9 new genes to the Cancer Gene Census, and fully re-analysed the copy number data across all TCGA samples using the ASCAT2 algorithm.


Data Updates in brief (for full details of this latest release, please see the v78 Datasheet).

  1. New fully curated cancer genes;
    • HIF1A - 1,782 samples, 196 mutations, 56 papers
    • MTOR - 3,239 samples, 634 mutations, 132 papers
    • PTPN13 - 1,761 samples, 429 mutations, 85 papers
  2. Curated Fusion;
    • ETV6-RUNX1 - 2,276 samples, 357 mutations, 37 papers
  3. Mouse insertional mutagenesis
    • The latest update adds mouse data for an additional 5,600 genes
  4. Genetics of Drug Resistance
    • Resistance data for 1 new therapeutic target gene and 2 additional drugs
  5. Genome Data
    • Copy Number Variation; Re-analysis of all TCGA studies with ASCAT v2
    • ICGC release 21; May 16th 2016
      • Gene Expression; 5 studies updated
      • Methylation; 3 studies updated
      • Copy Number Variants; 10 new studies, 28 studies updated
      • Structural Variants; 2 new studies
    • Systematic Screens; 31 new papers (1,165 new samples)

COSMIC's cancer genome data is interpreted into standardised annotations from a variety of sources, described here.


Genetics of Drug Resistance

New in v78; FLT3 with drugs Quizartinib, and Sorafenib, detailing a total of 76 new unique resistance mutations.

All drug resistance data is now detailed here, describing our curations across 10 genes and 20 pharmaceuticals. Links are provided to explore this information in detail, with charts showing the landscape of resistance to drugs targeting mutations in the gene of interest.


Cancer Gene Census

9 genes have been added to the Cancer Gene Census:
DDR2MAPK1BCORL1KEAP1LRP1BDROSHAB2MDDX3XAPOBEC3B

The complete list is available in the census table, which describes the role of each gene in cancer progression (tumour suppressor or oncogene). Currently this information is available for 237 census genes. This content, as well as additional functional annotation is being substantially expanded for future releases.


Cell Lines Project

Over time we have added filters aimed at selecting those variants within the cell lines that are more likely to contribute to carcinogenesis. These have included the ability to select variants in genes known to contribute to cancer (Cancer Gene Census), as well as an estimation of the mutation impact on the protein as determined by FATHMM. We have now extended this list of filters to include a filter that identifies variants within the cell lines that are similar to variants seen recurrently in whole genome screened tumour samples. The criteria for calling a variant as recurrent differs based on mutation type. For further details please see the Genome Annotation page.


COSMIC Expansion

We welcome two new starters to the COSMIC team, Dr. John Tate and Ms. Bhavana Harsha. John is our new web design and visualisation specialist who will be driving new developments and improving the design of the website. Bhavana, our new bioinformatic specialist, is developing a new annotation system to handle the ever increasing volume and complexity of genomic variation data.

Thank you for your continued support.


COSMIC Workshop

On Monday 26th September 2016 we are holding a workshop at the University of Cambridge, UK, titled 'COSMIC: Exploring cancer genetics at high resolution'.

During the course we will use the live COSMIC website and genome browser to show you how to access and explore cancer variation data, seeking to identify genetic causes and targets in all human cancers.

For more details please see the course timetable.

If you wish to attend the workshop, please visit the registration page.

If you would be interested in hosting a COSMIC workshop at your workplace, we would be very pleased to hear from you. Please contact cosmic@sanger.ac.uk


Oracle database downloads

We are considering changing the compatibility of the Oracle data pump export files from supporting Oracle 10g to 11g (11.2). If this change will cause problems for you, please let us know by emailing the COSMIC helpdesk (cosmic@sanger.ac.uk).


COSMIC Statistics:

1,235,846
Samples
4,067,689
Coding Mutations
23,489
Papers
18,029
Fusions
28,366
Whole Genomes
1,271,436
Copy Number Variants
9,175,462
Gene Expression Variants
7,879,142
Differentially Methylated CpGs

COSMIC v77 - 16th May 2016

COSMIC now encompasses the Genetics of Drug Resistance across 9 therapeutic target genes and 18 drugs (release v77). Also, full mutation profiles across ATR, TBX3 & NFKBIE, STIL-TAL1 & DNAJB1-PRKACA gene fusions, and over 700 new cancer genomes.


Data Updates in brief (for full details of this latest release, please see the v77 Datasheet).

  1. New fully curated cancer genes;
    • TBX3 - 55 papers, 2,724 samples, 241 mutations.
    • NFKBIE - 34 papers, 1,710 samples, 176 mutations.
    • ATR - 106 papers, 3,379 samples, 646 mutations.
  2. Curated Fusions;
  3. Genome Data
    • 16 New Systematic Screen papers (741 genome-wide tumour analyses).
  4. Genetics of Drug Resistance
    • New Content: Drug resistance data for 9 therapeutic target genes and 18 drugs
  5. Cancer Gene Census
    • 23 new cancer-causing genes annotated, with evidence.

COSMIC's cancer genome data is interpreted into standardised annotations from a variety of sources, described here.


Genetics of Drug Resistance

In this COSMIC release, we now encompass the genetics of drug resistance, somatic mutations that allow a tumour to continue growing despite targeted therapeutics. Initial curations cover 9 genes and 18 pharmaceutical therapies (listed below), detailing 226 resistance-driving mutations.

Genes: ABL1 ,ALK ,BRAF ,EGFR ,ESR1 ,KIT ,MAP2K1 ,MAP2K2 ,PDGFRA
Drugs: Vemurafenib, AZD9291, Ceritinib, Erlotinib, Gefitinib, Imatinib, Nilotinib, Tyrosine kinase inhibitor - NS, Afatinib, Endocrine therapy, Alectinib, PD0325901, Dasatinib, Crizotinib, Selumetinib, Sunitinib, Dabrafenib, Bosutinib

This information is available in the 'Drug Resistance' tab of the gene analysis pages; where a table and charts show the landscape of resistance to drugs targeting mutations in the gene of interest. For example, please look at the Tyrosine Kinase Inhibitors associated with EGFR.


Cancer Gene Census

23 genes have been added to the Cancer Gene Census:
AR, CHD4, CTCF, CXCR4,ERBB4 , FAT1 , FAT4 , HIF1A , LEF1 , LZTR1 , MTOR , NCOR2 , PRKACA , PTK6 , PTPN13 ,RBM10 , SDHA , SMAD2 , SMAD3 , TGFBR2 , USP8 , ZFHX3

New information is added to the census table, describing the role of each gene in cancer progression (tumour suppressor or oncogene). Currently this information is available for 156 census genes. This content, as well as additional functional annotation is being substantially expanded for future releases.


COSMIC Expansion

This spring, we welcome three new additions to the COSMIC team. Dr Laura Ponting and Dr Raymund Stefancsik join us from Cambridge University (UK) as curator scientists. They are now enhancing our team of expert manual curators, aiming to comprehensively describe the range of cancer-causing mutations across all cancer genes (driven by the Cancer Gene Census, describing 595 genes).

In addition, Charalambos (Harry) Boutselakis joins us from London's Farr Institute, bringing substantial informatic expertise across databases and data analytics. He will be expanding the ways in which COSMIC can be used while ensuring its immediate responsiveness as the database increases in size and scope.

Thank you for continuing to support us.


Registration and email announcements

Please ensure you are registered (here) for data downloads, and to ensure you receive future communications.


COSMIC Statistics:

1,209,567
Samples
4,118,156
Coding Mutations
23,084
Papers
17,628
Fusions
25,875
Whole Genomes
1,064,039
Copy Number Variants
9,479,893
Gene Expression Variants
7,879,142
Differentially Methylated CpGs

COSMIC v76 - 16th February 2016

COSMIC v76 includes full curations across cancer genes PPP6C and SPOP, genomic content from 17 systematic screen publications, and a complete update from ICGC release 20. We welcome two new scientists to the COSMIC team who will be focused on identifying targets and biomarkers across the expanding COSMIC dataset. The streamlining of our website also continues, improving the layout and design of many large-data webpages, and we have improved our Download files to simplify frequency calculations across COSMIC datasets.


Data Updates

For full details of this latest release, please see the v76 Datasheet; in brief:

  1. Curated Genes;
    • PPP6C - 33 papers, 1023 samples, 164 mutations.
    • SPOP - 74 papers, 2230 samples, 224 mutations.
  2. Genome Data
    • ICGC release 20; November 27th 2015
      • 2 new studies; AML-US and BTCA-JP
      • Gene Expression; 17 studies updated (187 new samples, 227,023 new variants)
      • Structural Mutations; 2 studies updated (243 new samples, 18,369 new variants)
      • Copy Number Variants; 18 studies updated (1,188 new samples, 44,689 new variants)
    • 17 New Systematic Screen papers (238 new samples).

COSMIC's cancer genome data is interpreted into standardised annotations from a variety of sources, described here.


COSMIC Expansion

We welcome two new scientists who will be investigating the curated database and annotating the most interesting target and biomarker opportunities across this enormous database.

Dr. Sam Thompson is a medical statistician with expertise in clinical trials. In collaboration with Bayer Pharmaceuticals, she will be exploring correlations across the different types of variant annotation in COSMIC, aiming to systematically identify novel markers for disease.

Dr. Harry Jubb brings a proteomic perspective to COSMIC. Working together with Astex Pharmaceuticals, Harry will spend the next three years enhancing our visualisation of coding mutations, and investigating which mutated peptide domains are tractable for pharmaceutical design.

Thank you for your support, allowing us to enhance the utility of the curations in COSMIC.


Website Updates

We have extended the layout and design used on the Gene page to the Cancer Browser, Sample, Study, and Mutations pages. Tabulations showing variant annotations from multiple datatypes have been combined into a 'Variants' tab on these pages.

On the Overview tab of the Gene page various icons indicate if the selected gene is part of a significant dataset. The icons census icon, classic icon and mouse icon indicate a cancer census gene, an expert curated gene, and a gene with a significant role in oncogenesis as evidenced from mouse insertional mutagenisis experiments.

Substantial changes are made on the Genome Browser home page with a new smart search feature with the option to select any of the specific datasets; COSMIC, Whole Genomes or the Cell Lines Project.


Download Files

We have updated the structure of the mutation files in our Download site to simplify the calculation of mutation frequencies. Data has been separated according to the type of screening method used; targeted gene screen and whole genome screen. We have also enhanced the information available from the sample details file so that whole genome samples can be extracted for use in whole genome screen mutation frequency calculations. Please see our FAQ for details.


Registration and email announcements

We are changing the way we communicate release updates to COSMIC users. Please register to ensure you receive future communications.


COSMIC Statistics:

1,192,776
Samples
3,942,175
Coding Mutations
22,844
Papers
17,245
Fusions
25,133
Whole Genomes
1,064,039
Copy Number Variants
9,479,893
Gene Expression Variants
7,879,142
Differentially Methylated CpGs

COSMIC v75 - 24th November 2015

COSMIC v75 includes curations across GRIN2A, fusion pair TCF3-PBX1, and genomic data from 17 systematic screen publications. We are also beginning a reannotation of TCGA exome datasets using Sanger's Cancer Genome Project analyis pipeline to ensure consistency; four studies are included in this release, to be expanded across the next few releases. The Cancer Gene Census now has a dedicated curator, Dr. Zbyslaw Sondka, who will be focused on expanding the Census, enhancing the evidence underpinning it, and developing improved expert-curated detail describing each gene's impact in cancer. Finally, as we begin to streamline our ever-growing website, we have combined all information for each gene onto one page and simplified the layout and design to improve navigation.


Data Updates

For full details of this latest release, please see the v75 Datasheet; in brief:

  1. Curated Genes;
    • GRIN2A - 93 papers, 2004 samples, 667 mutations.
  2. Curated Fusions;
    • TCF3-PBX1 (E2A-PBX1) - 48 papers, 3416 samples, 296 mutations
  3. Cancer Gene Census; 4 names have been updated.
  4. Genome Data
    • 4 TCGA studies reanalysed by the WTSI Cancer Genome Project.
    • 17 new Systematic Screen papers (474 new samples).

COSMIC's cancer genome data is interpreted into standardised annotations from a variety of sources, described here.


The Cancer Gene Census

We welcome Dr. Zbyslaw Sondka to the COSMIC team. Working in collaboration with The Centre for Therapeutic Target Validation (CTTV) he will be curating the Cancer Gene Census; building the evidence behind existing genes as well as extending the census list.


Website Updates

Gene Pages upgrade and redesign

Overview information has been merged into the Gene Analysis page. This page also has a full featured Genome Browser which repsonds to filters. The page layout has also been redesigned, with tabulations organised under a single 'Data' tab and studies and publications combined under the 'References' tab.

COSMIC Beacon

The GA4GH (Global Alliance for Genomics and & Health) Beacon Project is a project to encourage international sites to share genetic data in the simplest of all technical contexts. The service is designed merely to accept a query of the form "Do you have any genomes with an 'A' at position 100,735 on chromosome 3" (or similar data) and responds with one of "Yes" or "No."

The Beacon Network lists all the known beacons, including the newly released COSMIC Beacon

Cancer Genome Browser

A new miRNA track has been added across all browsers, with the data sourced from miRBase.


Registration and email announcements

We are changing the way we communicate website updates to COSMIC users. As from this release all our registered users will receive email notification of updates to the website. We would encourage all those who have subscribed to the mailing list cosmic-announce@sanger.ac.uk to register as communication via this list is being phased out. If you are registered but prefer not to receive emails you can opt out by logging in and going to the Account Settings page.

We have also introduced a new 'non-affiliated' category to allow users who do not belong to a recognised academic or corporate organisation to register for email updates using their personal email address.


COSMIC Statistics:

1177397
Samples
3702312
Coding Mutations
22621
Papers
17209
Fusions
23223
Whole Genomes
1019350
Copy Number Variants
9252870
Gene Expression Variants
7879142
Differentially Methylated CpGs

COSMIC v74 - 8th September 2015

COSMIC v74 brings a new focus on curating blood cancer fusion genes, starting with BCR/ABL and KMT2A (MLL) fusions. We are also beginning to capture much greater clinical details on the samples we curate, now available for download. More traditionally, somatic mutations are curated from three new cancer genes, POLE, AXIN2 and KDM6A. Substantial new genomic data are included from 17 systematic screen publications, and a full update to the latest ICGC release (v19).


Data Updates

For full details of this latest release, please see thethe v74 Datasheet; in brief:

  1. Curated Genes; 3 new fully curated genes (POLE,AXIN2 and KDM6A).
  2. Curated Fusions; Representative curations of blood cancer fusions BCR/ABL and KMT2A (MLL).
  3. Cancer Gene Census; FLT4 has been added the census.
  4. Genome Data Imports
    • Simple Somatic Mutations (SSM);
      • ICGC release 19 (2 new studies, 49 studies updated).
      • 17 new Systematic Screen papers .
    • Structural Variants; 2 new and 9 updated studies (ICGC releaase 19)
    • Copy Number Variants; 3 new and 13 updated studies (ICGC releaase 19)
    • Gene Expression Variants; 15 updated studies (ICGC releaase 19)

COSMIC's cancer genome data is interpreted into standardised annotations from a variety of sources, described here.


Website Updates

Non-coding variants

"Mutation Impact" scores (via FATHMM-MKL) are now available for non-coding variants. These values can be viewed on the NCV, Study and Sample overview pages, and the COSMIC Genome Browser (functionally significant variants are coloured blue). They are also included in the download files on the SFTP site. There are 422,212 functionally significant variants (scores ≥ 0.7). Please see the Mutation Impact section of Cancer Genome Annotation for help interpreting the scores.

COSMIC Sample Features

We are now capturing substantially more clinical feature annotations on the samples we curate. Across 24 new columns we are capturing, where available, annotations such as therapeutic regimes and responses, mutation allele specification, tumour stage/grade/cytogenetics, patient age/ethnicity/gender. This full information is available via COSMIC Downloads, and is also displayed on the website on each individual Sample Overview page. For full details of these rich expanded clinical annotations, please see the 'Cosmic sample features' section (describing the CosmicSample.tsv.gz file) here.


COSMIC Statistics:

1,144,255
Samples
3,480,051
Coding Mutations
22,276
Papers
16,648
Fusions
22,690
Whole Genomes
1,018,171
Copy Number Variants
9,252,792
Gene Expression Variants
7,879,142
Differentially Methylated CpGs

COSMIC v73 - 24th June 2015

COSMIC v73 contains full expert curation across 9 cancer genes, 26 systematic screen publications and ICGC release 18. 'Mutation impact' filters across the website now estimate pathogenic functional consequences, based on the new FATHMM-MKL algorithm. Substantial new information is now present in the COSMIC Genome Browser: regulatory features from ENCODE are now available, particularly enhancing the utility of the differential methylation and non-coding variant data; human SNPs are now shown alongside COSMIC somatic mutations, and genome browsing is now navigable via our Cancer Browser.


Data Updates

Below is a summary of new data in v73, please see the v73 Datasheet for further description.

Expert Curations
  1. Curated Genes; 9 new fully curated genes (SPEN, IKBKB, MYOD1, KDM5C, ACVR1, ESR1, CDKN2C, COL2A1, and CACNA1D).
  2. Cancer Gene Census; 67 gene symbols have been updated to correspond with the approved symbols described in the HGNC database.
  3. Genome Data Imports
    • • Simple Somatic Mutations (SSM);
      • • ICGC release 18 (3 new studies).
      • • 26 new Systematic Screen papers (805 new samples).
  4. Cell Lines Project; tissue/histology classifications updated in 267 cell lines.

COSMIC's cancer genome data is interpreted into standardised annotations from a variety of sources, described here.


Website Updates

Mutation Impact

We have upgraded our 'Mutation Impact' filters to use scores generated by the a new version of FATHMM (FATHMM-MKL). See the v73 Datasheet for more information.

Genome Browser
  • 3 new tracks have been added:
    • • ENCODE Regulatory Features.
    • • dbSNP (build 142).
    • • SNPs flagged by our 'noise reduction' filtering system which are excluded from the COSMIC website.
  • A new version of the Genome Browser (JBrowse) has been added to the Cancer Browser page to view data by disease.

COSMIC Statistics:

Samples
1,121,509
Coding Mutations
3,430,789
Papers
21,631
Fusions
10,894
Whole Genomes
21,076
Copy Number Variants
863,308
Gene Expression Variants
8,610,091
Differentially Methylated CpGs
7,882,461

COSMIC v72 - 31st March 2015

COSMIC v72 is our largest release ever, containing new annotations across 5466 cancer genomes and full literature curation across 22 new cancer genes, 28 fusion pairs; 26 genes have been added to the Cancer Gene Census. We provide our first integration of differential methylation data and many additional mutations, copy number aberrations and expression variants from recent ICGC & TCGA releases. All genomic events in COSMIC have been upgraded to GRCh38 (with a GRCh37 archive available). Finally, we present a new curated resource, to be regularly updated, describing the characterisation of 30 mutation signatures across human cancer.


Licensing

COSMIC is adopting a new licensing strategy for v72, to grow the scope of our literature curations, enhance the analytics available across our data, and support the capacity to sustain this ever-growing database into the future. Key changes are -

  1. Access to the COSMIC website will stay free for all users.
  2. For-profit organisations will be required to pay a fee to download COSMIC datasets.
  3. Download by academic and non-profit organisations will remain free.

All licensing payments are used to grow COSMIC, its coverage and analytic usefulness for oncology insight. We will also be inviting licensees to tell us which priorities we might best pursue, to ensure the direction of COSMIC best supports these industries' commercial oncology research. Please see our Licensing page for more details.


Data Updates

This v72 release is too large to describe here in detail. Here's a summary, please see the v72 Datasheet for further description.

Expert Curations
  • • 22 new cancer genes with full literature curation.
  • • 28 new fully curated fusion pairs.
  • • 26 new genes added to Cancer Gene Census.

Our curations are generated by expert postdoctoral scientists, described here.

Cancer Genomes
  • • Upgrade to genome version GRCh38, (GRCh37 archive website available at http://grch37-cancer.sanger.ac.uk)
  • • Differential Methylation; now integrated across 12 new TCGA studies (4,377 samples, from ICGC release 18)
  • • Copy Number Variants (CNV); ICGC release 18 (1 new study, BOCA-UK). Data added for 16 new samples in Cell Lines Project
  • • Gene Expression Variants; added for all samples in Cell Lines Project
  • • Simple Somatic Mutations (SSM);
    • • ICGC release 17
    • • Paediatric Cancer Genome Project; 7 studies (142 new samples).
    • • 69 new Systematic Screen papers (841 new tumour samples).

COSMIC's cancer genome data is interpreted into standardised annotations from a variety of sources, described here.


Website Updates

Genome Version GRCh38 Upgrade

We have updated the genomic coordinates in COSMIC to GRCh38. However, we are also hosting a parallel website to display the data on the GRCh37 reference. This GRCH37 site will be maintained and updated throughout 2015 with any new source data where the original coordinates are on GRCh37. However, it will not be updated with any new data where the original coordinates are on GRCh38.

Mutation Signatures

Different mutational processes generate unique combinations of mutation types, termed "Mutational Signatures". Based on an analysis of 10,952 exomes and 1,048 whole-genomes across 40 distinct types of human cancer we have added a Mutation Signatures page on the website; a curated census of signatures providing the profiles of, and additional information about, known mutational signatures.

Integration of Differential Methylation data

We have integrated methylation data across the COSMIC website. The Gene Analysis page has been extended to show a methylation track on the mutation histogram, differential methylation counts in the tissue tab and a new 'Methylation' tab has been added to display a table of variants. The Cancer Browser, Study and Sample Overview pages have also been updated to integrate methylation data. The majority of methylation annotations are outside gene footprints, COSMIC's Genome Browser is the best way to explore this information.

Genome Browser

The COSMIC Genome Browser is valuable tool for exploring COSMIC data in its genomic context. This browser can be used to explore the data in COSMIC, COSMIC genomes (WGS) and the COSMIC Cell lines Project, on either the GRCh37 or GRCh38 reference sequence. It can also be used to view the data for an individual sample if selected. Please see the COSMIC Genome Browser homepage for more details.


COSMIC Statistics:

Samples
1,103,964
Coding Mutations
3,158,657
Papers
21,086
Fusions
10,890
Genomic Rearrangements
61,232
Whole Genomes
19,672
Copy Number Variants
842,651
Gene Expression Variants
8,228,797
Methylation Variants
7,652,950

COSMIC V71 - 4th Nov 2014

COSMIC v71 includes full literature curation of PTPRB, PLCG1, POT1 and STAG2, the addition of 25 new census genes and an update of gene expression and copy number data from ICGC release 17 (Sept 2014).

Census Genes

The Cancer Gene Census has been updated with 25 new genes, this brings the total of known cancer genes substantiated by the scientific literature to 547. The new genes are :

CUX1 COL2A1PTPRB PLCG1 NAB2 STAT6 FOXO4 NFATC2 DCTN1 RSPO2 RSPO3 EIF3E PTPRK NRG1 HLA-A MYO5A PPFIBP1 ERC1 PWWP2A CLIP1 ZCCHC8 KIAA1598 LMNA CEP89 LSM14A


Cell Lines Project Update

We have added an additional 16 cell lines to the Cell Lines Project. The lines are:

SW403 MCAS UDSCC2 KYSE-30 MC-1010 CRO-AP3 BL-70 GEO LIM1215 Sarc9371 OUMS-23 HUH-6-clone5 NCI-H820 MCC13 MCC26Set2


Website Updates

Mouse Data

We have included an initial integration of mouse insertional mutagenesis data for 851 COSMIC genes from the CCGD (Candidate Cancer Gene Database) adding supporting evidence for cancer driver genes. These data are integrated in the Gene Overview page, more details can be found here.

Mutation Matrix (Study/Publication Page)

A mutation matrix plot has been added to the Study Overview page, enabling the relationship between genes, point mutations, copy number gains/losses, over/under gene expression and samples to be investigated for a specific study or publication.

Genome Browser (Sample Overview Page)

For whole genome analysed samples the Sample Overview page now includes a Genome Browser (JBrowse), allowing all mutation types for a sample (including coding and non-coding mutations, and aberrant copy number and gene expression) to be viewed in genomic context with COSMIC and Ensembl gene annotations (GRCh37).

Tutorials

There is a new tutorial section in the help pages including 4 new tutorials demonstrating the Sample, Gene, Fusion and Cancer Browser pages.

Copy Number Variation (CNV) Data

We have added 7,148 new copy number variants from 8 new TCGA studies (source ICGC release 17, re-analysed with ASCAT).

Gene Expression

We have nearly doubled the gene expression data in COSMIC by adding data from 10 new studies from TCGA (source ICGC release 17). The platforms supported are: IlluminaHiSeq_RNASeqV2, IlluminaGA_RNASeqV2, IlluminaHiSeq_RNASeq, and IlluminaGA_RNASeq. Please note that as from this release we no longer show results from the array platforms AgilentG4502A_07_2 and AgilentG4502A_07_3. For more information please visit the gene expression help page.


Literature Curation

PTPRB and PLCG1

PTPRB, encoding a tyrosine phosphatase specific to the vascular endothelium that inhibits angiogenesis, has been identified as a tumour suppressor gene in angiosarcoma. Mutations were found in secondary tumours or those with MYC amplification, a biomarker of radiation-associated secondary angiosarcoma. PLCG1, encoding a tyrosine kinase signal transducer in the phosphoinositide pathway, also has recurrent, likely activating mutations in angiosarcoma. PLCG1 gain of function mutations have previously been identified in cutaneous T-cell lymphoma.

POT1

POT1 encodes a single-stranded telomere-binding component of the shelterin complex. It is the only shelterin that contains 2 N-terminal oligonucleotide/oligosaccharide-binding (OB) domains. Recurrent mutations in POT1 have been found in chronic lymphocytic leukaemia where they occur in the clinically aggressive subtype with wild-type IGHV@. The POT1 mutations are most often found in gene regions encoding the 2 OB folds.

STAG2

Stromal antigen 2 (STAG2) is a subunit of cohesin complex and has a role in chromatid separation during cell division. Genetic disruption of this process can lead to aneuploidy in cancer. A number of tumour types have been found to harbour somatic mutations in STAG2, these include bladder cancer, myeloid neoplasms and glioblastoma. The gene maps to the X-chromosome (Xq25) and is present as a single copy in males; in females the other X-chromosome is inactivated. Hence, complete genetic inactivation of STAG2 requires only a single mutational event. STAG2 has also been suggested to act as a tumour suppressor via other mechanisms distinct from its role in cohesion.

Systematic screens

We have added mutation data for 841 tumour samples from publications where genome wide analyses have been used. More details can be found here


COSMIC Statistics:

Genes
28977
Samples
1058292
Coding Mutations
2710449
Papers
20247
Unique Variants
2139424
Fusions
10567
Genomic Rearrangements
61232
Whole Genomes
15047
Copy Number Variation
702652
Gene Expression
118886698

IE Support

As from this release we no longer support Internet Explorer version 8. This allows us to facilitate and develop tools for the latest browsers and provide a richer user experience. We apologise for the inconvenience caused to IE 8 users.

COSMIC V70 - 14th Aug 2014

COSMIC v70 includes an initial integration of gene expression data from TCGA, full literature curation of CALR, CD79A and CD79B, 12 whole-genome sequencing publications, and extensive updates to point mutation and structural variant data from ICGC (release 16, May 2014) and TCGA.

Website Updates

Gene Expression

Gene expression level 3 data has been integrated into COSMIC from 10 publicly accessible TCGA studies. The platform codes currently used to produce the COSMIC gene expression values are: IlluminaGA_RNASeqV2, IlluminaHiSeq_RNASeqV2, AgilentG4502A_07_2, AgilentG4502A_07_3 . COSMIC now includes gene expression alongside coding mutations and copy number aberrations on the cancer browser, sample overview, gene analysis and study/paper overview pages. We have also added a gene expression track to the histogram on the gene analysis page and the circos diagram on the sample overview page, more details can be found here.

Mutation Matrix

A mutation matrix has been added to the cancer browser, enabling the relationship between genes, point mutations, copy number gains/losses, over/under gene expression and samples to be investigated for a specific cancer.

The mutation matrix chart shows 20 x 175 boxes, with each box representing a gene-sample combination. Genes are ranked by the number of samples with variations (depending on the selected data type) and the samples are sorted using a clustering algorithm to group them in relation to the ranked genes, more details can be found here.


Data Filtering

To improve the value of COSMIC data we have tried to identify the most significant high-value data within cancer genomes using the following filtering strategies -

Mutations

We have excluded data from any sample with over 15,000 mutations. In addition, we have flagged all known SNPs as defined by the 1000 genomes project, dbSNP and a panel of 378 normal (non-cancer) samples from Sanger CGP sequencing. Using this approach 812,136 mutations have been flagged. Although all data are included in our download files, we have excluded flagged mutations from the website.

Copy Number Variation (CNV) Data

Although no CNV data has been excluded from the website, we have applied filtering so that by default only the most significant variants are shown. For these CNVs the minor allele and total copy number values are known and gain/loss has been defined using stringent criteria [ see the Copy Number Variants section in the help pages ]. However, at the head of every table showing CNVs there is an option to switch off the filter and view all the data.

Sample Overview Page

In order to make it easier to examine each sample, analysis filters have been introduced on the sample overview page. These filters allow you to specify that the mutations viewed should be likely pathogenic (as defined by FATHMM analysis), in the cancer census genes, or of a particular mutation type. In future releases, we will be developing further filters across these data to enhance their analysis.


Tutorials

We have started to upgrade our help pages and have introduced two new tutorials to help users navigate the COSMIC website. The first of these tutorials focus on the components of the website [ Site Tour ] and a guide to searching COSMIC [ Search ].


Literature Curation

CALR

The recently identified oncogene calreticulin (CALR) is a multi-functional Ca+ binding protein chaperone localised in the endoplasmic reticulum. CALR somatic mutations are now the second most prevalent mutation seen in patients with myeloproliferative neoplasms; Mutations have found in the majority of JAK2/MPL mutation-negative essential thrombocythaemia (ET) and primary myelofibrosis (PMF) patients, in addition to a small number of myelodysplastic patients (RARS, RARS-T, CMML and aCML). Almost all the reported mutations are insertion, deletion or complex mutations generating a +1 bp frameshift and an extended novel CALR C-terminal domain. CALR mutations appear to be associated with a more benign clinical course, younger age and male sex.

CD79A and CD79B

The Ig-alpha and Ig-beta proteins encoded by CD79A and CD79B are necessary for expression and function of the B-cell antigen receptor. Recurrent activating mutations in CD79A and CD79B have been identified in diffuse large B cell lymphoma where they occur more frequently in the activated B-cell-like subtype. The ITAM (immunoreceptor tyrosine-based activation motif) domain is targeted, with a hot spot at Y196 in CD79B. Mutations in both genes have also been found in Waldenstrom???s macroglobulinaemia.

Systematic screens

In this release 12 systematic screen publications have been curated in COSMIC, more details can be found here.


COSMIC Statistics:

Genes
28735
Samples
1029547
Coding Mutations
2002811
Papers
19703
Unique Variants
1564699
Fusions
10435
Genomic Rearrangements
61299
Whole Genomes
12542
Copy Number Variation
695504
Gene Expression
60119787

IE Support

We have decided to drop support for Internet Explorer version 8 from November 2014. This allows us to facilitate and develop tools for latest browsers and provide rich user experience for our users. We apologise in advance for the inconvenience caused to IE 8 users.