Release Notes

v98 - 23rd May 2023

Summary

v98 (May 2023) A focus on rare skin tumour cancer, 2 census genes, 8 cancer hallmark genes are updated. In this release of COSMIC, we have 410,000 new genomic variants, 585,000 new coding mutations, 290,000 non-coding mutations, 4,300 new samples and 1,358 new whole genomes. We have also curated 19 new systematic screen papers.

Our other products Cancer Mutation Census (CMC) and COSMIC-3D are also updated with the latest datasets.

 

Key Updates

  1. New focus curation on Rare skin tumours.
  2. Gene focused curation on MUC6 gene
  3. Cancer Gene Census: 1 new Tier 1 gene (NTRK2) have been added and a Tier 2 gene (ASPM)
  4. We have created cancer hallmark annotations for each of the 8 Cancer Gene Census Tier 1 genes (SETBP1, SETD2, SH2B3, SMARCA4, SMARCB1, SMO, SOCS1, SPEN) and so hallmarks of cancer annotations are now available for 346 Census Tier 1 cancer genes.
  5. 19 new systematic screen papers.
  6. Cancer Mutation Census data is also updated to align with the latest COSMIC data (v98).
  7. COSMIC-3D is also updated to align with the latest COSMIC data (v98), with 187 new genes mapped to PDB structures, 1496 more mapped PDB structures and 9 new census gene structures.
  8. New beta download files and webpage - These are revamped data download files of COSMIC to increase interoperability, reduce redundancy of the same data in different files, use of COSMIC identifiers, new file naming conventions, handy readmes with each file, support for 4 versions of COSMIC releases and support for different checksums. Along with this, we are supporting the legacy COSMIC downloads for a whole year (until May 2024).

 

 

Website Changes

In this release we have launched our beta downloads for COSMIC, they are available on this page: New downloads.

This is a beta version of the new COSMIC Downloads page. The new page and the download files available here have been re-designed to improve useability and accessibility. It is now possible to browse by project and download complete datasets for all available products and genome versions for the current and 3 previous releases – COSMIC Core, Cell Lines Project (CLP), Actionability, and Cancer Mutation Census (CMC).

A detailed technical document listing all the changes in the new download files, along with the ERD (Entity Relationship Diagram) to explain the links between different products and a list of all the COSMIC identifiers is available in the change log

To help transition to the new files we are supporting the legacy downloads for a year (i.e. for v98 and v99), but thereafter these downloads will be discontinued. Until then access to the legacy downloads is available from the “Archive Downloads” page.

We value your feedback on the new Download page and download files. Please help us as we work to improve the useability and accessibility of COSMIC data by sending your thoughts to cosmic@sanger.ac.uk

The current beta version only supports whole file downloads, for our future releases, we will extend the download functionality to support the scripted and filtered downloads.

 

Download File Updates

The new download files are available at the new beta website: New downloads. The development work aimed to revamp the current COSMIC download files.

The key changes and benefits of the beta download files:

  • To address interoperability by making the file more interconnected with internal and external identifiers.
  • To reduce the redundancy of the data in different files; we have reduced the number of identical columns used across multiple files such as the tissue classification and instead, they are replaced with a central COSMIC phenotype identifier (COSO). This identifier can further be linked to a detailed classification file, that contains more detailed information.
  • Some columns are renamed to better match the description of the content.
  • Consistent use of COSMIC identifiers- We have 10 COSMIC identifiers - COSMIC Phenotype Id (COSO), COSMIC Gene Id (COSG), COSMIC Sample Id (COSS), COSMIC Structural Id (COST), COSMIC CNV Id (COSCNV), COSMIC Fusion Id (COSF), Legacy Mutation Id (COSM/COSN), COSMIC Paper Id (COSP), COSMIC Study Id (COSU), COSMIC Genomic Mutation Id (COSV). All these identifiers are linked in the files where applicable data is listed.
  • A tar file (tarball) has been created for each product: It contains a data file with all the contents related to the product and a read-me file describing each of the columns in the data file. Each tarball has a standard naming convention.
    • Tar file naming:
    • project name, product, format of the data, release version, assembly [Project]_[Filename]_[format]_v[Release]_GRCh[assembly].tar.gz
    • Data file naming:
    • project name, product, release version, assembly [Project]_[Filename]_v[Release]_GRCh[assembly].[format].gz
    • Read me file naming:
    • product, release version, assembly README_[Project]_[Filename]_v[Release]_GRCh[assembly].txt
  • The newly formatted download files are easily available for the current release i.e. v98 and also v97, going forward this will be extended to the typical 4 release versions.
  • Support for different checksums - md5sum, sha1sum, sha256sum
  • File changes:
    • CosmicMutantExport – deprecated and now replaced by Cosmic_GenomeScreensMutant and Cosmic_CompleteTargetedScreensMutant (excluding negative data)
    • CosmicCodingMuts.vcf has been split into two files Cosmic_GenomeScreensMutant_v97_GRCh37.vcf and Cosmic_CompleteTargetedScreensMutant_v97_GRCh37.vcf.
    • CosmicHGNC has been replaced with Cosmic_Gene
  • There are a few products and projects that we need to adapt in the newly formatted way. CMC and Actionability projects and the sample features for COSMIC are still to be changed. All these products and projects are made available via the Beta download page and the legacy download page.

A complete list of changes in all the files is available on the Beta download page

 

Curated Genes

MUC6

Mucin 6, oligomeric mucus/gel-forming (MUC6) is a member of a mucin family of high molecular weight glycoproteins produced by many epithelial tissues. The protein encoded by MUC6 gene is secreted and forms an insoluble mucous barrier that protects the gut lumen. MUC6 was identified for defining a subset of H&N cancers, specifically the Schneiderian low grade papillary sinonasal carcinoma. MUC6 has been assessed for inclusion in the Cancer Gene Census and assigned a Tier-2 status (v99). The variant data for the gene has been curated from the literature over the last 3 releases. In addition to targeted sequencing, v98 releases 26 samples with exome sequencing data. MUC6 somatic mutations were detected in 9/15 triple negative breast cancers and 4/5 Wilms tumours. The mechanism of this gene in cancer is not clear.   

 

Rare skin tumour focus

Rare skin tumour

Common skin cancers, such as basal cell carcinoma, squamous cell carcinoma and melanoma, are relatively overrepresented in the scientific literature. This reflects their frequency in the general population. For the v98 release of COSMIC, we sourced data about the rarer skin cancers to have them fairly represented in our database. We searched for publications about adnexal tumours, Merkel cell carcinoma, Kaposi sarcoma, dermatofibrosarcoma protuberans and extramammary Pagets disease. It is a mixed group of skin manifestations and not a comprehensive list of all tumour types that were included in these publications. For example, adnexal tumours (tumours of the sweat glands, hair follicles and sebaceous glands) alone have 50 different tumour types in our histology classification system. This number includes many non-cancerous conditions, such as sebaceous adenoma or cylindroma. Some of them have a potential to develop into malignant tumours, or need to be differentiated from the malignant tumours for treatment purposes. All the tumour types in COSMIC derive from samples that have been found to have somatic mutations in them. In v98 of COSMIC, 776 new samples were curated from the publications and 25,236 new variants were found in the rare skin tumours newly included. 17 new skin tumour types or subtypes were added to the histology classification system. You can explore the variant data and the sample metadata using the COSMIC Cancer Browser on the website. Most skin tumours can be found under Tissue Type = skin. Dermatofibrosarcoma protuberans is classified under soft_tissue > fibrous_tissue_and_uncertain_origin and Kaposi_sarcoma under soft_tissue > blood_vessel. The full spectrum of data linked to all sub tissues and sub histologies can be found in our download files.

 

Cancer Gene Census (CGC)


New Census Genes (Tier 1):

NTRK2 (Neurotrophic Receptor Tyrosine Kinase 2)

A neurotrophin receptor tyrosine kinase involved in peripheral and CNS development and maturation.

Somatic alterations: 3’-end (including tyrosine kinase domain) in-frame fusions to multiple genes in glioma, HNSCC and lung adenocarcinoma.

Functional evidence: QKI-NTRK2 (found in astrocytoma) expression in CDKN2A-deficient mouse astrocytes leads to glioma formation following intracranial transplantation into mice. SPECC1L-NTRK2 (occurs in anaplastic astrocytoma) expression enables IL-3 independent Ba/F3 cell growth.


New Census Genes (Tier 2):

ASPM (Assembly Factor For Spindle Microtubules)

An assembly factor for spindle microtubules, involved in cell spindle regulation.

Somatic alterations: Recurrent amplification, and associated increased expression, in LN metastasis-positive, triple negative invasive ductal breast carcinomas, and cutaneous melanoma metastases.

Germline alterations: A rare, predicted pathogenic germline variant segregates with disease in a nevoid basal cell carcinoma syndrome family.

Functional evidence: Overexpression in a weakly invasive acral lentiginous melanoma cell line increases cell migration in vitro. Knockdown in glioblastoma cells and fibroblasts decreases the NHEJ repair of X-ray-induced DNA double-strand breaks, and increases chromosomal aberrations, respectively. Knockdown reduces HR-mediated DNA double-strand break repair (osteosarcoma), and increases X-ray-induced chromosomal aberrations (cervical carcinoma).


Hallmarks of Cancer

Hallmarks of cancer annotations summarise the effect of Cancer Gene Census Tier 1 genes on processes that are relevant to cancer development and progression.

COSMIC v98 includes Hallmark gene pages for a further 8 genes (SETBP1, SETD2, SH2B3, SMARCA4, SMARCB1, SMO, SOCS1, SPEN), and so hallmarks of cancer annotations are now available for 346 Census Tier 1 cancer genes.

In addition to hallmarks of cancer annotations, each Hallmark gene page summarises the role of a gene in cancer, how it is affected by somatic and germline alteration in cancer, and how it affects biological processes relevant to cancer. Likely pathogenic germline mutations are described for 4 of the 8 genes with a new Hallmark gene page.

 

Systematic Screen Papers

Follow links below to the 19 papers which are new in v98, or view the full table of papers here.

COSP51008COSP47088COSP50829COSP49033COSP50776COSP49928COSP50704COSP50803COSP50775COSP50069COSP46651COSP50677COSP44696COSP50715COSP50702COSP45527COSP50548COSP41209COSP46024

 

COSMIC Statistics

2,385,4105
Total genomic variants (COSV) (+410,264)
16,304,701
Genomic non-coding variants (+289,190)
5,078,567
Genomic mutations within exons (coding variants) (+40,586)
8,878,333
Genomic mutations within intronic and other intragenic regions (+160,598)
1,520,321
Samples (+4,356)
29,024
Papers (+144)
19,428
Fusions (+0)
42,519
Whole genome screen samples (+1,358)
1,207,190
Copy number variants (+0)
9,215,470
Gene expression variants (+0)
7,930,489
Differentially methylated CpGs (+0)

 

Actionability v9 - May 2023

COSMIC Actionability v9 includes 5 additional fully-curated genes:
ESR1, CCND1 ,CCND2 ,CCND3 and RB1

This means we have a total of 99 fully curated genes:
ABL1, AKT1, AKT2, AKT3, ALK, AR, ASXL1, ATM, ATR, BCL2, BCR, BRAF, BRCA1, BRCA2, BTK, CCND1, CCND2, CCND3, CD274(PD-L1), CD33, CDK12, CDK4, CDK6, CDKN2A, CEBPA, CHEK2, CTNNB1, DDR2, DNMT3A, EGFR, ERBB2, ERBB3, ERBB4, ESR1, ETV6, EZH2, FBXW7, FGFR1, FGFR2, FGFR3, FGFR4, FLT3, FOXL2, GNA11, GNAS, GNAQ, HRAS, IDH1, IDH2, JAK1, JAK2, JAK3, KIT, KMT2A, KRAS, MAP2K1(MEK1), MAP2K2, MDM2, MDM4, MET, MLH, MPL, MSH2, MSH6, MTOR, MYD88, NF1, NF2, NOTCH1, NPM1, NRAS, NTRK1, NTRK2, NTRK3, PALB2, PDGFRA, PDGFRB, PIK3CA, PIK3CB, PMS2, PTCH1, PTEN, PTPN11, RAF1, RB1, RET, ROS1, RUNX1, SF3B1, SMAD4, SMO, STK11, SYK, TERT, TET2, TP53, TSC1, VHL, WT1

To view the full list of curated genes visit the About page on the Actionability website. All previously-recorded clinical trials have been checked for new or updated results.

 

Statistics

99
Genes fully curated (+5)
372
Genes included (+15)
1835
Drugs (+100)
4796
Treatment combinations (+566)
4610
Trials with results (+221)
5746
Trials with no results (+315)
10356
Total trials (+948)
7364
Evidence from clinical trials databases (+643)
3177
Evidence from PubMed and other (+322)
154
Point mutations (0)
876
Total variants (+72)

 

COSMIC-3D

COSMIC-3D data has been updated for v98 release. These are the key updates:

  • COSMIC-3D has been updated to align the mutations with COSMIC release 98.
  • 187 new genes map to a PDB structure, bringing the total number of genes with structures to 7925.
  • 1496 new PDB structures are also added, increasing the total number of mapped protein structures (PDB ids) from 51,816 to 53,312
  • 9 Cancer Gene Census genes now have mapped structures (ASXL1, BAP1, EXT1, EXT2, GNAS, GPC3, NTRK2, RBM15, TFEB)

 


v97 - 29th November 2022

Summary

v97 (Nov 2022) A focus on blood cancer, 4 census Tier 2 genes, 10 cancer hallmark genes are updated along with resistance data. In this release of COSMIC, we have 44,000 new genomic variants, 127,000 new coding mutations, 27,000 non-coding mutations, 6000 new samples and 1,435 new whole genomes. We have also curated 20 new systematic screen papers.

 

Key Updates

  • New focus curation on Blood cancers
  • Cancer Gene Census: 3 new Tier 2 genes (GOLPH3FADDSUB1) have been added and, following a recent review 1 gene (SMARCD1 has been moved from Tier 1 to Tier 2 )
  • We have created cancer hallmark annotations for each of the 10 Cancer Gene Census Tier 1 genes (NFKBIENTRK3PHF6POLD1POLEPPP2R1APRDM1PTCH1RPL5SALL4). By so doing, we are adding functional annotations for 10 genes causally associated with cancer, thereby providing an overview of how the genes contribute to tumour development, in regard to the hallmarks of cancer.
  • 20 new systematic screen papers
  • Data for drug resistance is updated
  • Cancer Mutation Census data is also updated to align with the latest COSMIC data (v97), along with ClinVar and gnomAD datasets
  • COSMIC-3D is also updated to align with the latest COSMIC data (v97), assembly update, new census gene mapped structures and around 1100 more mapped protein structures.
  • Actionability data has been fully updated. Many new trials have been added, the number of trials with results available has substantially increased, several new mutations are represented and 11 new fully curated genes have been added.
  • Actionability and CMC downloads are free for non-commercial use, files are available on the download page.
  • New download file to map missing significant variants in the Non-Coding region

 

Download File Updates

Actionability and CMC downloads

Actionability and CMC downloads are free for non-commercial use, files are available on the COSMIC download page. Please refer to our licensing page here to understand if you are a Non-Commercial or Commercial user and how to obtain a license.

New download file NCV CDS syntax mapping

Since the annotation system upgrade in v90, VEP is used to standardise and normalise all variant annotations.https://www.ensembl.org/info/docs/tools/vep/index.html

One unintended consequence of using VEP is that it outputs genomic level ( g.) annotations for many non-coding mutations in the 5' UTRs of genes, as well as for all mutations in intergenic regions Sometimes these mutations are associated with a named gene and are known or predicted to be functionally significant, having well known CDS (c.) annotations reported in the scientific literature (eg TERT promoter mutations). Previously, these CDS (c.) annotations were shown in COSMIC, but since the v90 upgrade these are overwritten by the standardised VEP genomic annotations and any link to the gene is lost in the case of promoter mutations.

In order to maintain a standardised dataset, we will continue to show the VEP genomic annotations for all mutations, but we have now produced a mapping file to allow the non-coding variant (NCV) genomic annotations to be linked back to the CDS syntaxes.

The new mapping file NCV_CDS_syntax_mapping.tsv released in v97 can be cross referenced with the CosmicNonCodingVariants.vcf.gz or CosmicNCV.tsv.gz download files to link CDS syntaxes with LEGACY_ID or COSV identifiers.

Generally, on the website we focus on coding mutations, but non-coding variants are displayed on the Genome Browser and can also be viewed directly by searching for the COSN identifier eg: https://cancer.sanger.ac.uk/cosmic/search?q=COSN32285790

In v97, the new mapping file contains only TERT promoter mutations, but we plan to include non-coding mutation mapping for other genes in future releases.

This new file is available on the COSMIC download page.

 

Drug Resistance

NT5C2 - purine

Unique samples - 57, Unique Mutation - 81

FGFR2 - BGJ398

Unique samples - 9, Unique Mutation - 29

FGFR2 - pemigatinib

Unique samples - 8, Unique Mutation - 6

 

Blood Cancer

As part of release v97 we have focused on updating the expert-curated mutation data for blood tumours. Blood tumours in COSMIC are classified under haematopoietic and lymphoid tissue as haematopoietic neoplasms or lymphoid neoplasms, which include cancer types such as leukaemias, lymphomas and myelomas as well as myeloproliferative neoplasms. Seventy six additional publications with mutation screening data in these tumour types are included in this release. The types of data ranges from whole genome studies and studies utilising large next generation sequencing panels to case reports with more unusual clinical details and novel treatments. Over 2,600 samples were curated and 24,356 new variants added from these samples. Release v97 also incorporates 9 new blood tumour types into COSMIC.

 

Cancer Gene Census (CGC)


New Census Genes (Tier 2):
GOLPH3FADDSUB1SMARCD1

Hallmarks of Cancer

Based on the concept defined by D. Hanahan and R. A. Weinberg, COSMIC, in collaboration with Open Targets, integrates functional descriptions focused on Hallmarks of Cancer into the CGC. The Hallmark pages visually explain the role of a gene in cancer by highlighting which of the classic behaviours are displayed by the gene and whether they are promoted or suppressed.

NFKBIENTRK3PHF6POLD1POLEPPP2R1APRDM1PTCH1RPL5SALL4

 

Systematic Screen Papers

Follow links below to the 20 papers which are new in v97, or view the full table of papers here.

COSP50547COSP42471COSP40854COSP45619COSP50467COSP41068COSP50471COSP50438COSP43701COSP40675COSP38483COSP41453COSP35787COSP50319COSP40730COSP41722COSP45615COSP43650COSP49463COSP50070

 

COSMIC Statistics

23,443,841
Total genomic variants (COSV) (+44,671)
16,015,511
Genomic non-coding variants (+26,839)
5,037,981
Genomic mutations within exons (coding variants) (+35,377)
1,515,965
Samples (+6,287)
28,880
Papers (+186)
41,161
Whole genome screen samples (+1,435)
321,804
Genomic Rearrangements (+3,788)
19,428
Fusions
1,207,190
Copy number variants
9,215,470
Gene expression variants
7,930,489
Differentially methylated CpGs

 

Actionability Release v7

COSMIC Actionability v7 includes 11 additional fully-curated genes:CD274 (PD-L1), HRAS, MAP2K1 (MEK1), AR, GNA11, GNAQ, SMAD4, TSC1, DDR2, ETV6, FOXL2

This means we have a total of 72 fully curated genes:ABL1, AKT1, AKT2, AKT3, ALK, ASXL1, ATM, BCR ,BRAF, BRCA1, BRCA2, BTK, CDK12, CDK4, CDK6, CEBPA, CTNNB1, DNMT3A, EGFR, ERBB2, ERBB3, EZH2, FGFR1, FGFR2, FGFR3, FGFR4, FLT3, IDH1, IDH2, JAK1, JAK2, JAK3, KIT, KMT2A (MLL) ,KRAS, MDM2, MDM4, MET, MLH, MPL, MSH2, MSH6, NF1, NF2 ,NPM1, NRAS, PDGFRA, PDGFRB, PIK3CA, PMS2, PTCH1, PTEN, RET, ROS1, RUNX1, SF3B1, SMO, STK11, TET2, TP53, WT1, CD274 (PD-L1), HRAS, MAP2K1 (MEK1), AR, GNA11, GNAQ, SMAD4, TSC1, DDR2, ETV6, FOXL2

To view the full list of curated genes visit the About page on the Actionability website.

All previously-recorded clinical trials have been checked for new or updated results.

Expressed/not category added to Patient Pre-screening; From v7 onwards the download file contains a new category: 'Expressed/not' This is used for trials that compare patients that express a protein with those that don???t or compare patients with high expression with those with low expression. In practice, there is usually a threshold expression level and the comparison is between patients above/below it. If our curator is able to find out the measure and threshold level that was used, it appears as part of the trial name.This new value is represented by the term Patient Pre-Screening, in the column mutation_selected_dict

  • positive/above threshold expression - patient number and results value recorded in treatment values column
  • negative expression/below threshold values - value will appear in the fields used for control values, as is the case for trials that compare a treatment in patients with/without a mutation.

There are several trials using this new category in v7.

Addition of Australian/New Zealand Clinical Trials Registry

Actionability v7 includes the addition of a new datasource: the Australian New Zealand Clinical Trials Registry (ANZCTR). This can be seen in the Source_Type column as a value of 9.

Numbers with a '+' at the end of each statistics denotes the increase since the last release.

 

Actionability Statistics

72
Genes fully curated (+11)
311
Genes included (+25)
1,520
Drugs (+112)
3,943
Drug combinations (+362)
6,609
Evidence from trials databases (+481)
2,685
Evidence from PubMed and other sources (+447)
154
Point mutations (+2)
734
Total mutations (+35)
3,756
Trials with results (+361)
9,021
Total trials (+655)

 

Cancer Mutation Census (CMC)

Cancer Mutation Census data has been updated for v97 release. These are the key updates:

  • Cancer Mutation Census has been updated to align the mutations with COSMIC release 97
  • The ClinVar dataset has been updated to release 2022-08
  • gnomAD exome frequencies are from release v2.1.1 and contain data from 125,748 exome samples
  • gnomAD genome frequencies have been updated to release v3.1 containing 76,156 genome samples. This release also includes a new population - Middle Eastern (MID)
  • CMC data are free for non-commercial use, downloads are available on the COSMIC download page.

 

COSMIC-3D

COSMIC-3D data has been updated for v97 release. These are the key updates:

  • COSMIC-3D has been updated to align the mutations with COSMIC release 97
  • Switch from GRCh38 to GRCh37 human genome assemblies in line with the CMC data
  • 7 census genes now have mapped structures: (ABI1, ARID2, ATP1A1, FOXA1, FOXL2, SS18, TLR5, TRRAP
  • Increased total number of mapped protein structures (pdb ids) from 50,735 to 51, 816

 

Mutational Signatures

COSMIC Mutational Signatures is a resource curated in partnership with COSMIC and Cancer Grand Challenges, and in close association with our collaborators at Wellcome Sanger Institute, the Pillay lab at University College London and the Alexandrov lab at University of California.

New for COSMIC Mutational Signatures release v3.3

We have added a novel collection of reference signatures to describe copy number variations, in total we have 24 CN signatures. Copy number signatures are defined using the 48-channel copy number classification scheme. The scheme incorporates loss-of-heterozygosity status, total copy number state, and segment length to categorise segments from allele-specific copy number profiles (as major copy number and minor copy number respectively i.e. non-phased profiles), and the signatures displayed here were identified from 9,873 tumour copy number profiles obtained from The Cancer Genome Atlas (TCGA) SNP6 array data spanning 33 cancer types.

The SBS and DBS signatures have been enriched with more topographical data and graphs, across 7 new features. These are:

  • replication timing
  • nucleosome occupancy
  • CTCF occupancy
  • histone modifications
  • replication strand asymmetry
  • and differences in genic and intergenic regions

In adding these new topographical features we overhauled the existing transcriptional strand asymmetry feature and made it possible to view a feature's respective graph in a tissue specific as well as an aggregated manner.

Other changes include:

  • reprocessed signature data files to better handle situations where the percentage for a channels was over-zealously rounded to zero
  • addition of SBS95, a sequencing artefact signature
  • improvements to the interface to better compare graphs without reloading the page

 


v96 - 31st May 2022

Key Updates

Focused curation on rare head and neck cancers:

  • 56 papers
  • 2526 new variants
  • 129 new site-histology pairs with sequencing data added

Gene focused curation:

Updates to Cancer Gene Census

Updates to Hallmark Genes

Whole Genome data

  • 9 new papers containing whole genome/exome or RNA sequencing data

 

 

Author submission

COSMIC welcomes author contributions of data as they are invaluable in supporting us to identify new genes and trends in cancer research. We actively collaborate with authors who have their publications at a submission stage. Correctly formatted variant data ensures faster inclusion of the paper in COSMIC and dissemination of the data into the research community to further empower new research. An example of such collaboration was an author submission that highlighted POLR2A. As a result of this submission, POLR2A status as a cancer gene was reviewed and it was added to the Cancer Gene Census as a Tier 2 gene and the literature reporting POLR2A mutations across all cancers was comprehensively curated for this release.
Whether our submissions report previously undescribed cancer mutations or cancer genes or well known variants in new cancer types, all papers are triaged and prioritised. Some journals require a proof of submission to COSMIC as a pre-publication requisite. However, papers are released in COSMIC only after peer-review and publication to guarantee high quality and open access status of data.

Instructions on how to submit data to COSMIC can be found here: https://cancer.sanger.ac.uk/cosmic/submissions

 

Curated Genes

POLR2A

POLR2A, the gene encoding RNA polymerase II catalytic subunit A, is a key player in meningiomas (COSP 41827).

A subset of WHO grade I meningiomas are defined by somatic hotspot mutations in p.Q403K and p.L438_H439 deletions. Germline mutations in POLR2A are associated with heterogenous multi-system disorders and p.L438_H439del is associated with the most severe phenotype. POLR2A status as a cancer gene was reviewed and it was added to the Cancer Gene Census as a Tier 2 gene for its role as a potential oncogene in meningioma. The gene seems to be commonly deleted in cancer where recurrent mutations cause widespread changes in gene expression, although no definitive evidence was found that they cause cancer. Differential expression is enriched for genes involved in the cell cycle, apoptosis and cancer-associated signalling pathways. The literature reporting POLR2A mutations across all cancers was comprehensively curated.

PRKD1

The Protein kinase D1 (14q12) gene has been added to the Cancer Gene Census as a Tier 2 gene for its role in fusions found in cribiform adenocarcinomas of the salivary gland. It is a serine/threonine protein kinase involved in several signalling pathways and many cellular processes including cell migration and differentiation, cell survival and regulation of cell shape and adhesion. Our H&N curation focus included several papers reporting recurrent p.E710D mutations in the majority of polymorphous low grade adenocarcinomas (PAC), the second most common malignant tumour of minor salivary glands (COSP 4640849877494984950049502). The p.E710D mutation is also found in a minority of cribiform adenocarcinomas of the salivary gland (CASG) (COSP 49877 & PMID 31492931), but not in more aggressive head and neck adenoid cystic carcinomas or pleomorphic adenomas (COSP 49498), nor in other solid tumours and leukaemias. A minority of PACs and a majority of CASGs carry fusions involving either PRKD1PRKD2 or PRKD3. Other PRKD1 mutations are found at lower frequencies in a variety of other tumours including breast, leukaemia, lymphoma and gastric cancers.

 

Rare head and neck cancer focus

Head and neck (H&N) cancer is a relatively uncommon type of cancer. Around 12,400 new cases are diagnosed in the UK each year (NHS) and H&N cancer accounts for 3% and 4% of the total cancer incidence in the US and Europe respectively. v96 contains data from focused curation on less common H&N cancers, for example the ones that develop in the salivary glands, sinuses, or muscle and bone in the head and neck. Variant and patient data was curated from 56 publications. The focus of the papers ranged widely, including defining mutational profiles for the tumours, their aetiology, histopathology or treatment options,and finding actionable mutations for each tumour type. From this curation, 129 new site-histology pairs with sequencing data were added to COSMIC, and a New NCI Thesaurus code has been created for Sinonasal low-grade Schneiderian papillary carcinoma in collaboration with the NCIT.

17 further publications were evaluated and are listed on the COSMIC website but data from these could not be curated for quality reasons or because they were review type publications that don't report novel variant data.

 

Cancer Gene Census (CGC)


New Census Genes (Tier 1):

ACVR1B: Tumour suppressor gene

  1. Recurrent homozygous deletions in pancreatic adenocarcinomas
  2. Knockdown in pancreatic cancer cell lines increases cell proliferation in vitro and the growth of mouse xenograft tumours

New Census Genes (Tier 2):

CTNNA1: Tumour suppressor gene, fusion gene

  1. Links transmembrane cadherins to the cell actin cytoskeleton
  2. Hemizygous deletion in myelodysplastic syndrome and acute myeloid leukaemia, and fusion with RAF1 in cutaneous melanoma
  3. Germline truncating mutations in Hereditary Diffuse Gastric Cancer Syndrome
  4. Overexpression in a deletion-bearing acute myeloid leukaemia cell line leads to G0/G1 cell cycle arrest and apoptosis

POLR2A:

  1. Largest subunit of RNA polymerase II
  2. Recurrent missense mutation and in-frame deletion in WHO grade 1 meningiomas
  3. Knockdown in an acute myeloid leukaemia cell line increases apoptosis, and alters the expression of genes involved in the cell cycle, apoptosis and cancer-associated pathways

PRKD1: fusion gene

  1. Serine/threonine protein kinase
  2. Recurrent p.E710D in polymorphous adenocarcinoma of the palatal minor salivary glands, and fusions with PRKAR2A and SNX9 in cribriform adenocarcinoma of the minor salivary glands
  3. p.E710D increases kinase activity in a cell-free assay, whilst overexpression of the wild-type and p.E710D mutant genes in breast epithelial cells increases cell proliferation

Hallmarks of Cancer

Based on the concept defined by D. Hanahan and R. A. Weinberg, COSMIC, in collaboration with Open Targets, integrates functional descriptions focused on Hallmarks of Cancer into the CGC. The Hallmark pages visually explain the role of a gene in cancer by highlighting which of the classic behaviours are displayed by the gene and whether they are promoted or suppressed.

MAP2K4MAXMLH1MPLMSH2MSH6MYOD1NCOA2NCOR1NTRK1

 

Systematic Screen Papers

Follow links below to the 9 papers which are new in v96, or view the full table of papers here.

COSP41122COSP45420COSP41827COSP48664COSP49502COSP47849COSP46408COSP45485COSP49477

 

COSMIC Statistics

23,399,170
Total genomic variants (COSV) (+5982)
15,988,672
Genomic non-coding variants (+3693)
5,002,604
Genomic mutations within exons (coding variants) (+4336)
8,704,304
Genomic mutations within intronic and other intragenic regions (+2635)
1,509,678
Samples (+3686)
28,694
Papers (+143)
19,422
Fusions
318,016
Genomic Rearrangements
1,207,190
Copy number variants
9,215,470
Gene expression variants
7,930,489
Differentially methylated CpGs

 


v95 - 24th November 2021

Summary

As part of the V95 release we have focused on updating the expert-curated mutation data for rare cancers of the female genital tract and breast. This release has approximately 100 additional publications with mutation screening data in these diseases, including ovarian germ cell tumours, uterine Mullerian tumours and breast adenomyoepithelioma. We have also updated the classification of mucosal melanomas, including those associated with the female genital tract, to give details for the specific mucosal tissue.

V95 includes information about the resistance mutations in the FGFR2 and NT5C2 genes. We also have two new expert-curated genes, SDHA and TENT5C, which are associated with gastrointestinal stromal tumours and multiple myelomas respectively. Finally, we have focused on in-depth curation and updates of mutation data for the chromatin remodelling genes ARID1AARID1BARID2, andSMARCD1. More than 80 additional publications with mutation screening data in these genes are included in this release.

 

Key Updates

COSMIC statistics definitions

For transparency, we have recently changed our data definitions and created sub-categories to be clearer as to what the different mutation statistics mean for our users. 

  1. Genomic variants (COSV)
    Total number of unique variants recorded at the DNA/genomic level. Each COSV can be mapped to transcripts as a specific change. Multiple samples/papers with the same reported genetic change will all be reported under the same COSV. Because of multiple transcripts, a single COSV may fall into multiple sub-categories, hence the sum total of non-coding mutations, mutations with exons, and intronic/intragenic mutations will not equal total COSVs.
    1. Non-coding variants
      Variants in the non-coding regions of DNA, including intergenic regions, regulatory regions and non-coding transcripts (pseudogenes, lncRNAs etc). 
    2. Mutations within exons (coding variants)
      Mutations that lie entirely or partly within the protein-coding regions (exons) of transcripts and give rise to a change in the Amino Acid sequence. This includes splice-site mutations
    3. Intronic + other intragenic mutations
      Mutations within genes but located in introns, 5'UTR and 3'UTR regions; these mutations don't lead to known/predicted AA changes.   
  2. Samples 
    Where the mutations have come from. Can be patient samples, tissue samples, or cell line samples.
  3. Papers
    Number of papers our curators have studied in-depth inc. figures and supplementary data to derive COSMIC's data.
  4. Whole genome screen samples
    Some WGS and WES data, and some gene-specific data  from WGS.

 

 

Updates to T&Cs

We've updated our Terms & Conditions for Non-Commercial use of COSMIC Core data (including the Cell Lines Project, COSMIC-3D, and Mutational Signatures). Whether you're thinking of registering or a current user, it's vital you read these thoroughly.

Unless stated, these apply to all releases of COSMIC, including previous versions that you may have downloaded. 

As part of this change, the following statement has been removed: 'If I now need a licence for my use of COSMIC data, instead of licensing I can use/continue to use an old unsupported version of COSMIC'. This means that you are not permitted to use old and unsupported versions of COSMIC. 

We don't have capacity to support older versions of COSMIC. Our database is designed as a 'living tool' that is constantly evolving in line with the latest research and information. It's also important to note that old versions of COSMIC aren't kept up to date. As a result, many of the links are broken and the data isn't accurate. With this in mind, we hope you will understand the necessity for this change to our T&Cs. 

Read the full T&Cs here

 

In-depth curation for chromatin remodelling genes

SWitch/Sucrose NonFermentable (SWI/SNF) is a chromatin remodelling complex which uses the energy of ATP hydrolysis to reposition nucleosomes, thereby regulating access to the DNA and modulating transcription and DNA replication/repair. Mutations involving subunits of the SWI/SNF complex are common in a wide range of cancers, occurring in approximately 20%, with ARID1A the most frequently mutated subunit. Those with the highest SWI/SNF mutation rates are ovarian clear cell carcinoma, clear cell renal cell carcinoma, hepatocellular carcinoma, gastric cancer, melanoma and pancreatic cancer.

Typical teratoid/rhabdoid tumour (AT/RT), a rare and highly aggressive malignancy of the central nervous system (CNS), is usually diagnosed in infancy or childhood and is most often characterised by loss of expression of theSMARCB1 gene product (INI1). However, an unusual case with retained expression of INI1 and without mutations identified in SMARCB1 is reported by Bookhout et al. (COSP49144) in an infant with thalamic AT/RT. 

Johan et al. (COSP41712) report a series of cribriform neuroepithelial tumour (CRINET), a rare non-rhabdoid brain tumour showing cribriform growth pattern and SMARCB1 loss. They conclude that CRINET represents a SMARCB1-deficient non-rhabdoid tumour which shares molecular similarities with the AT/RT-TYR subgroup but has distinct histopathological features and favourable long-term outcome. 

In renal medullary carcinoma, a highly aggressive type of renal cancer occurring in patients with sickle cell trait, loss of SMARCB1 expression has emerged as a key diagnostic feature and Jia et al. (COSP49139) demonstrate biallelic inactivation of SMARCB1 in the majority of their 20 cases. 

In breast implant-associated anaplastic large cell lymphoma, a distinct entity which arises in the capsule surrounding textured saline or silicone breast implants, Quesada et al. (COSP49321) report a novel STAT3-JAK2 fusion as well as mutations or gene losses in several genes including SMARCB1.

Rooper et al. (COSP49118) find recurrent loss of SMARCA4 in sinonasal teratocarcinosarcoma (TSC), a rare and aggressive tumour with mixed teratomatous, carcinomatous and sarcomatous components. They suggest SMARCA4 inactivation may be the dominant genetic event in TCS and that this lesion is on a diagnostic spectrum with SMARCA4-deficient sinonasal carcinoma.

ARID1A is a key non-catalytic component in the SWI/SNF complex. It acts primarily as a tumour suppressor and is emerging as a potential therapeutic target. Hung et al. (COSP49190) study the spectrum of ARID1A genetic alterations in non-small cell lung carcinoma and assess the clinicopathological significance of these mutations and expression loss in these tumours. 

Wu et al. (COSP49189) perform comprehensive genomic profiling in ovarian seromucinous borderline tumours, an uncommon ovarian epithelial neoplasm characterised by association with endometriosis, and find frequent somatic mutations in KRAS, PIK3CA and ARID1A.

The mutation profile at hotspots of ARID2 in oral squamous cell carcinoma patients from South India is examined by Das et al. (COSP49067) and Bala et al. (COSP48911) identify ARID2 as a novel tumour suppressor in early-onset sporadic rectal cancer. Both ARID1A and ARID2 are among the genes identified by Varaljai et al. (COSP49256) as drivers in intracranial metastases in malignant melanoma and as such are therapeutic targets.

 

Curated Genes

TENT5C

Terminal Nucleotidyltransferase 5C (TENT5C), previously known as FAM46C, encodes a non-canonical poly(A) RNA polymerase. It is thought to enhance mRNA stability and gene expression, the main targets is mRNA which encodes ER-targeted proteins. Commonly found to be mutated in multiple myeloma, evidence suggests that TENT5C is a B-cell lineage-specific growth suppressor. Somatic mutations in multiple myeloma samples are recorded across the gene, most of these are substitutions.

SDHA

SDHA (Succinate dehydrogenase complex flavoprotein subunit A) encodes a catalytic subunit of succinate-ubiquinone oxidoreductase, a complex of the mitochondrial respiratory chain. Germline mutations associated with loss of heterozygosity in the tumour drive several cancer types. However, rarer second-hit somatic mutations, and occasionally double somatic mutations, are also reported. This is notably in SDHA expression-negative 'wild type' gastro-intestinal stromal tumours (GISTS) lacking KIT or PDGFRA mutations. Somatic mutations in other tumours, such as pituitary adenomas and paragangliomas, are also seen. 

 

Drug Resistance

FGFR2

Extensive research has shown that targeting FGFRs with small molecule inhibitors halts receptor activation, downstream signalling, and results in tumour shrinkage. However, the efficacy of these inhibitors can be limited due to acquired mechanisms of chemotherapy drug resistance which impedes treatment and leads to tumour relapse. 

v95 includes patient mutation data in which resistance to drug treatment is caused by point mutations in the FGFR2 gene. Cancers studied include; intrahepatic cholangiocarcinoma (iCCA), breast cancer, lung cancer and gastric cancer.

Multiple alternatively spliced isoforms of FGFR2 are known to exist, and mutations detailed here refer to amino acid numbering in the FGFR2-IIIb isoform, the FGFR transcript shown as canonical on the COSMIC website (ENST00000457416.6)

Genomic analysis shows an alteration in targetable oncogenes in almost 50% of cholangiocarcinoma patients with recurrent alteration in IDH1 and FGFR2. This occurs almost exclusively in patients with iCCA compared to extrahepatic. 

FGFR2genomic alterations including activating point mutations, fusions, and rearrangements are known oncogenic drivers and provide a molecular signature to identify patients who may benefit from inhibition of FGFR2 tyrosine kinase activity.  

Whilst second generation selective (ATP competitive) FGFR inhibitors such as BGJ398/infigratinib, Debio 1347, and pemigatinib/INCB054828 have been shown to increase the disease control rate, the rapid emergence of acquired drug resistance has been frequently observed. Goyal et al. (COSP42875) first described genetic mechanisms of clinical acquired resistance to FGFR inhibition in patients with FGFR2fusion-positive iCCA. Through the analysis of pre- and post-progression ctDNA and tumour biopsies in three patients with FGFR2 fusion positive iCCAs treated with BGJ398, this study revealed the emergence of FGFR2 kinase domain mutations including a FGFR2 V565F gatekeeper mutation in all 3 patients. Goyal et al (COSP46683) followed this initial study with six FGFR2fusion-positive iCCA patients treated with FGFR kinase inhibitors BGJ398 and Debio 1347, and again found mutations in kinase domain residues - K660M, V565F/H, N550K/H/T, and L618V, plus M372I in the transmembrane domain. Consistent with these findings, four other investigators identified the emergence of similar FGFR2 kinase domain mutations in patients with FGFR2 fusion positive cholangiocarcinoma, who had initially responded to pemigatinib (Silverman et al, COSP49195 and Krook et al, COSP49205 ), BGJ398 (Krook et al, COSP47638) or an unspecified FGFR inhibitor (Kasi et al, COSP49199).

Mutations observed in these studies include M539L, N550K/H/T, V565F/H, E566A, L618V, K660M and K642R which result in increased receptor kinase activity. Structural modelling has suggested two ways in which these mutations confer resistance:

  1. Disruption of the "molecular brake" formed by the triad of residues N550, E566 and K642, thus stabilizing the active form of the kinase, or pushing the kinase into an active form. 
  2. The gatekeeper mutation induces a steric clash, preventing drugs from entering the ATP-binding pocket.

Similar kinase domain gain of function FGFR2 activating mutations (and FGFR amplifications) were shown to be apparent in post-resistance samples of ER+ metastatic breast cancer after treatment with ER-directed therapy (Mao et al, COSP48455) and ER therapy with CDK4/6 inhibitors (palbociclib) (Formisano et al, COSP46556).

Apart from the emergence of secondary FGFR alterations, another challenge to the effectiveness of FGFR targeted therapies in patients is the occurrence of intra-tumoural and temporal heterogeneity. This is a major obstacle to the effectiveness of FGFR-targeted therapies in patients with liver cancers as shown by Goyal et al (COSP42875) and Kasi (COSP49199).

Bypass mechanisms also contribute to the development of drug resistance. Min Lau et al (COSP44344) demonstrated the emergence of a PKC dependent re-wiring mechanism to confer resistance to AZD4547 (second generation FGFR inhibitor) in FGFR2 amplified diffuse gastric cancer. The FGFR2 V565F gatekeeper mutation also emerged in a PDX model of the gastric cancer and overexpression during ex-vivo culture with AZD4547 which caused resistance to AZD4547 and cross resistance to infigratinib.

Next-generation covalent (irreversible) inhibitors, such as futibatinib (TAS 120), as a possible means to overcome or suppress resistance mutations, are studied by Goyal et al (COSP46683). They describe four patients with FGFR2 fusion positive cholangiocarcinoma who developed acquired resistance to infigratinib or Debio-1347 and subsequently responded to TAS-120, although gatekeeper resistance mutations were later found. A similar subsequent response to TAS-120 was shown using in vitro assays by Krooke et al. (COSP47638).

NT5C2

NT5C2 (5'-nucleotidase, cytosolic II) encodes a hydrolase that serves as an important role in cellular purine metabolism by acting primarily on inosine 5'-monophosphate and other purine nucleotides. Gain of function mutations in NT5C2 result in altered activating and autoregulatory switch-off mechanisms and a protein with increased nucleotidase activity. This drives resistance to thiopurine chemotherapy, such as 6-mercaptopurine, in relapsed acute lymphoblastic leukaemia (ALL). NT5C2 point mutations commonly occur at R39, R238, R367, and D407, and are frequently recurrent, with R367Q the most common relapse-associated NT5C2 mutation, accounting for 90% of mutant cases.

Please note that due to technical difficulties, resistance data for FGFR2 and NT5C2 are not showing on the website currently. All resistance mutations are available in the download files. 

 

Disease Focus

Rare female genital tract cancers

Female adnexal tumours of probable Wolffian origin (FATWO) are very rare gynaecological tumours of low malignant potential thought to derive from the mesonephric (Wolffian) remnants in the upper female genital tract. Most frequently they occur in the paraovarian region and occasionally within the ovary, fallopian tube or retroperitoneum. Mirkovic et al. (COSP45731) examine the molecular changes in FATWO to determine whether they are molecularly similar to mesonephric carcinoma. They find FATWO lacking mutations of KRAS/NRAS, which are characteristic of mesonephric carcinoma. Bennett et al. (COSP47213) also perform a molecular analysis of FATWO, finding few pathogenic mutations and suggesting this could be useful in the differential diagnosis of difficult FATWO cases showing similarity to more common ovarian and broad ligament lesions.

Wang et al. (COSP47834) report mutations in primary vaginal malignant melanoma, an extremely rare mucosal melanoma. In their cohort of 36 patients, NRAS mutations and PD-L1 expression are most prevalent, whereas the detection rate of KIT and TERT mutations is low. Patients with NRAS mutations have a poorer survival outcome compared with those with wild-type NRAS. For invasive melanomas arising from different anatomical sites in the lower female genital tract, Zarei et al. (COSP47239) observe the most common genetic alterations in KITTP53 and NF1.

Jung et al. (COSP48959) present whole exome sequencing results for gestational choriocarcinoma, a unique cancer of pregnant tissues. Hodroj et al. (COSP49095) report the molecular characterisation of ovarian yolk sac tumour, a rare malignant germ cell tumour, with mutations in KRASKIT and ARID1A which may be used as therapeutic targets. Frumovitz et al. (COSP48770) investigate mutational hotspots in cancer-related genes in small cell neuroendocrine cervical cancer. Dundr et al. (COSP48465) highlight a case of ovarian mesonephric-like adenocarcinoma arising in serous borderline tumour.

Rare breast cancers

Rare cancers of the breast include metaplastic breast cancer (MpBC), a predominantly triple negative breast cancer (TNBC) representing a histologically heterogeneous group of invasive carcinomas. MpBC is defined by differentiation of the neoplastic epithelium to a non-glandular component, such as squamous or mesenchymal e.g. spindle cell, osseous or chondroid. It is an aggressive form of breast cancer, with patients presenting at an advanced stage and it is often more resistant to conventional chemotherapy than other TNBC. 

TP53 is the most frequently mutated gene in MpBC followed by PIK3CA, as shown by Afkhami et al. (COSP48792) who also report a PIK3CA-mutated case of MpBC with exceptional response to everolimus therapy. Vranic et al. (COSP48788) perform molecular profiling of spindle cell breast cancers and show they are characterised by targetable molecular alterations in the majority of cases. Reed et al. (COSP48795) report results of whole exome sequencing for MpBC, confirming previous reports of high frequency of TP53 mutations and presenting evidence for a significant enrichment of co-occurring mutations in PTENPIK3CA and TP53.

Breast adenomyoepithelioma, is an uncommon, biphasic tumour ranging from benign, to atypical in situ, and malignant, with the latter associated with carcinoma which can arise in the epithelial or myoepithelial component. Using whole exome and targeted massively parallel sequencing analysis Geyer et al. (COSP48780) demonstrate that oestrogen receptor-positive adenomyoepitheliomas display mutually exclusive PIK3CA or AKT1 activating mutations, while oestrogen receptor-negative tumours harbour highly recurrent codon Q61 HRAS hotspot mutations, which co-occur with PIK3CA or PIK3R1 mutations. This update also includes case reports of adenomyoepithelioma from Watanabe et al. (COSP48777) and Han et al. (COSP48782).

 

COSMIC Statistics

23,393,188
Total genomic variants (COSV) (+111,368)
15,984,979
Genomic non-coding variants (+68,362)
4,998,268
Genomic mutations within exons (coding variants)
8,701,939
Genomic mutations within intronic and other intragenic regions
1,505,992
Samples (+14,903)
28,551
Papers (+376)
19,422
Fusions
39,563
Whole genome screen samples
1,207,190
Copy number variants
9,215,470
Gene expression variants
7,930,489
Differentially methylated CpGs

 


For reference, release notes for earlier versions are available on the Release Notes Archive page. However, please note that these versions are no longer available to download, are not supported, and the release notes may link or refer to pages which are now obsolete.