v91 (April 2020) includes 4 new fully curated genes, substantial curation update on APC gene, we have also focussed on testicular and other male cancers as well as breast implant related lymphoma. There are nearly 5 million new coding mutations, 2 million genomic mutations, 2 million non-coding mutations and 2900 new whole genomes. We have curated 48 new systematic screen papers and have also updated our ICGC dataset to v28, which includes 2 new studies: ICGC( BPLL-FR ) : B-Cell Prolymphocytic Leukemia - FRi and ICGC( GACA-JP ) : Gastric Cancer - JP; with a complete re-annotation using Ensembl Variant Effect Predictor (VEP). We are including new data download files to track the mutations and new normalised VCF files for COSMIC and Cell Line Project.
Data Updates
- New fully curated cancer genes (4);
- KMT2A 27,638 samples, 790 mutations, 515 papers
- CDKN1B 21,181 samples, 249 mutations, 146 papers
- MYC 20,202 samples, 249 mutations, 231 papers
- TRRAP 2,983 samples, 110 mutations, 199 papers
- Substantial curation update to APC 44,957 samples, 7,101 mutations, 1,446 papers
- New curation on Testicular and other male cancers and Breast implant-associated anaplastic large cell lymphoma (BIA-ALCL)
- Whole Genome data
- 48 Systematic Screen Papers
COSMIC's cancer genome data is interpreted into standardised annotations from a variety of sources, described here.
New download files
CosmicMutationTracking.tsv.gz/CellLinesMutationTracking.tsv.gz -We are providing a new mutation tracking file for COSMIC and the Cell Line Project to map the legacy COSM/COSN IDs to the new genomic ID (COSV), along with the gene names, accession number and a new unique mutation identifier. We will also indicate in the file if these mutations were coding or non-coding. There is also a field that indicates if the annotation is on the canonical transcript.
CosmicCodingMuts.normal.vcf.gz/ CosmicNonCodingMuts.normal.vcf.gz/ CellLinesCodingMuts.normal.vcf.gz/ CellLinesNonCodingMuts.normal.vcf.gz - We have also improved our VCF files to include HGVS syntaxes on the genomic (HGVSG), on the cds (HGVSC) to include the transcript accession number with the version and on the peptide (HGVSP) with Ensembl's peptide accession.
To our VCF files, we have added normalised version denoted with the suffix , where each variant is 5' shifted whilst maintaining the HGVS compliant (3' shifted) syntaxes in the INFO section. This reflects the non-normalised version where different. We have also compressed these files with bgzip following user feedback.
All our files with the mutation syntaxes have got these additional columns of HGVSG, HGVSC and HGVSP, for more information please have a look at the download page, here.
Curated Genes
KMT2A (lysine methyltransferase 2A, formerly MLL) encodes a protein containing multiple conserved functional domains including the SET domain, which is responsible for histone H3 lysine 4 (H3K4) methyltransferase activity, mediating chromatin modifications associated with epigenetic transcriptional activation. Recurrent KMT2A mutations, mostly nonsense, frameshift and missense, have been found in peripheral T-cell lymphoma-not otherwise specified. Additionally, components of the histone methyltransferase complex, including KMT2A, show a high frequency of alterations in pancreatic ductal adenocarcinoma and in oesophageal sarcomatoid carcinoma. KMT2A point mutations are relatively rare and mutations are often part of the "long tail" of possible driver mutations seen in many cancer types e.g. breast cancer and colorectal cancer.
CDKN1B (cyclin dependent kinase inhibitor 1B) belongs to a family of CDK inhibitor genes that includes CDKN1A (encoding for p21/WAF1) and CDKN1C (encoding for p57/KIP2). CDKN1B encodes protein p27/KIP1, which binds to and prevents the activation of cyclin E-CDK2 or cyclin D-CDK4 complexes, controlling cell cycle progression at G1. Evidence suggests CDKN1B is a haploinsufficient tumour suppressor gene. In large sequencing studies, driver mutations in CDKN1B have been detected in selected cancer types, where truncating frameshift or nonsense mutations are largely predominant. Recurrent somatic mutations and deletions in CDKN1B are also found in small intestinal neuroendocrine tumours, affecting approximately 8% of tumours, and in primary luminal breast cancer there are recurrent CDKN1B mutations, mostly truncating mutations occurring in the C-terminal. Novel driver CDKN1B mutations have been identifed in classical hairy cell leukaemia (HCL) where the gene is mutated in up to 16% of patients, suggesting a role for cell cycle deregulation in the pathogenesis of HCL. Additionally, studies have shown CDNK1B to be significantly mutated in prostate cancer, commonly with deletions..
MYC (MYC proto-oncogene) protein is a transcription factor that activates transcription of growth-related genes. Recent cancer genome sequencing efforts affirm that MYC is one of the most frequently amplified genes across many cancer types. In Burkitt's lymphoma (BL) it is the MYC translocations that are the hallmark of the cancer. While generally considered an under-mutated gene, majority of BLs also acquire somatic MYC mutations that can have increased oncogenic potency. MYC gene translocation into one of the immunoglobulin loci may drive a hypermutation phenotype often observed in the BL. Most BL cells express only the translocated allele whereas the normal allele is transcriptionally silent. Clustered somatic mutations located in the transcriptional activation domain are found in aggressive lymphomas arising in the acquired immunodeficiency syndrome (AIDS) and the presence of mutations is correlated with the rearrangement of the oncogene. Mutations were also found in other de novo non-AIDS, non-Burkltt's aggressive lymphomas with MYC rearrangements.
Transformation/transcription domain-associated protein (TRRAP, located at 7q22.1) encodes a large multidomain protein of the phosphoinositide 3-kinase-related kinases (PIKK) family and functions as part of a multiprotein coactivator complex. It has histone acetyltranferase activity and is involved in chromatin remodelling as well as Wnt-signalling and acts as a positive regulator of both wild type and mutant TP53 transcription levels and is central in MYC transcription activation. TRRAP appears to act as an oncogene. Missense mutations have been observed in a variety of cancers, notably a recurrent mutation at p.S722F observed in melanomas as well as other cancer types. Codon p.S722 is highly conserved evolutionarily and knock down studies have suggested that mutant p.S722F TRRAP is necessary for melanoma cell survival. Other missense mutations have been seen in cancers including WaldenstrÖm macroglubulinaemia, sebaceous carcinoma, appendiceal goblet cell carcinoid, bladder cancer, lymphomas, urinary tract and colorectal cancer as well as high-risk ulcerative colitis.
As part of the v91 release we have focused on updating the expert-curated mutation data for the gene encoding APC, Adenomatous Polyposis Coli. Over 202 additional publications screening the APC gene (amongst others) have been surveyed with the addition of 736 new APC mutations and over 5000 samples.
APC is a large protein (2843 amino acids) encoded by a gene on chromosome 5q21-22. Being a multi-domain protein, APC serves multiple functions through different binding partners. It is involved in cellular processes relating to cell migration, cell adhesion, proliferation, differentiation and chromosome segregation. The gene for APC is a tumour suppressor, dysregulated at both the germline and somatic level. Germline mutations result in Familal Adenomatous Polyposis, the major hereditary predisposition event leading to CRC development. Somatic APC mutations are found in approximately 80% of all sporadic non-hypermutated CRC patients. More than 90% of APC mutations generate premature stop codons, resulting in stable truncated gene products, most (~60%) of which occur within a region referred to as the mutation cluster region (MCR). C-terminal truncated proteins present in CRC lack the domains that are required for binding to microtubules, end-binding protein 1 (EB1) and ??-catenin potentially leading to the induction of chromosome instability, activation of proliferation and inhibition of differentiation. Hence, as a tumour suppressor, loss of APC function caused by bi-allelic mutations and/or LOH lead to constitutive activation of the Wnt/??-catenin pathway, which is considered one of the driving forces of the initiation and development of colorectal tumours. Additionally, APC driven CRC tumorigenesis occurs independently of the Wnt signalling through the loss of effect of the protein on chromosome segregation; cellular polarity and migration; and DNA replication.
The papers in this update have followed the evolution of the study of the involvement of this gene in sporadic human cancers, particularly colorectal carcinoma (CRC). From early Single Strand Conformation Polymorphism (SSCP) papers of a single gene to Next Generation Sequencing (NGS) studies of whole exomes and targeted panels, researchers have looked at the incidence of APC mutations in a huge array of cancer types - including those that show a high rate of mutation (e.g. subtypes of biliary tract carcinoma (reviewed by Roos et al in COSP46736), those with a low frequency of mutations (e.g.primary multiple melanoma, COSP46996) and cancers with no observed APC mutations (e.g. cemento-ossifying fibromas, COSP44725). Others have studied the incidence of APC mutations in a variety of populations, e.g. Chinese (COSP45556), Romanians (COSP46499) and Thai (COSP46949).
After the initial findings of mutation incidence in CRC, many researchers have looked the role of APC (and TP53) in tumour development, as part of the adenoma-adenocarcinoma-carcinoma pathway (e.g. COSP46469, COSP46949), and the occurrence of metastases (e.g. COSP43720). Mutations in APC also have a role in the development of CRC through the inflammation pathway, as evidenced in patients with ulcerative colitis (COSP44212) and Crohn's disease (COSP45588). Additionally, mutations in APC are thought to be a universal initiating event in gastric carcinogenesis (COSP42555).
Other recent advances, in the use of plasma and serum to detect APC mutations in cell-free DNA, could have a role in screening and early diagnosis of CRC (e.g. COSP46679, COSP45523) . Likewise, stool DNA has been a similar source for screening for both CRC (e.g. COSP46463) and gastric carcinoma (COSP44856).
Finally, with regards to treatment of CRC, APC mutations have been studied in a variety of contexts. As examples, one study showed that they are involved with resistance to preoperative chemoradiotherapy (COSP47277). With more targeted drugs, such as G007-LK, a tankyrase inhibitor (TI), it was demonstrated (COSP434130) that TI responsive cells harbour the short form APC mutation.
As part of release v91 we have focused on updating the expert-curated mutation data for testicular and other male cancers. Over 40 additional publications with mutation screening data in these diseases are included in the release.
Testicular germ cell tumours (TGCT) are the most common testicular cancers and, although relatively rare, are the most frequent cancer type in younger men aged 15-49. They progress from precursor lesions, germ cell neoplasia in situ, and show a heterogeneous clinical and pathological range. Broadly classified as seminomatous and non-seminomatous, the latter is further characterized by different histological subtypes, such as embryonal carcinoma, yolk sac tumour, teratoma and choriocarcinoma. These tumours can be pure or comprised of more than one histological component. Recurrent somatic mutations in KIT, KRAS, BRAF and NRAS have been reported in TGCT, but generally point mutations are uncommon, and TP53, frequently mutated in many cancer types, is rarely mutated. Now advances in next generation sequencing have enabled the genomic landscape of TGCT to be better studied. This update includes a paper by Boublikova et al. (COSP46711) who confirm the frequency of RAS/BRAF mutations and identify WT1 as a novel factor involved in TGCT pathogenesis, with potential as a prognostic marker. Outcome for TGCT is often good, with many patients responding to combination cisplatin- and etoposide-based therapies, but approximately 20% will progress or relapse after first-line chemotherapy. Necchi et al. (COSP46693) study a chemorefractory subset with TGCT and find different alterations in seminomas and non-seminomas. They suggest targeted therapy for KRAS alterations and immunotherapy for a subset of nonseminomas.
Testicular sex cord-stromal tumours (TSCST)are uncommon tumours also with diverse histology e.g. Leydig cell tumours, Sertoli cell tumours and granulosa cell tumours. While the majority of these are clinically benign, 5-10% are malignant and present with metastatic lesions or relapse with metastases. Systemic treatment of patients with malignant disease is not standardized. Necchi et al. (COSP46698) perform comprehensive genomic profiling of malignant TSCST to identify potential therapy targets. They find targetable alterations uncommon in all types of malignant TSCST although some tumours show potential for mTOR inhibitors (PTEN-mutated) and hedgehog inhibitors (PTCH1-mutated) . Tatsi et al. (COSP46876) report a rare case of testicular large cell calcifying Sertoli cell tumour with a somatic mutation in PRKAR1A mutation, with no association with Carney complex, a hereditary disorder characterized by multiple benign tumors and often with germline inactivating PRKAR1A mutations.
Male breast cancer (MBC) is very rare and accounts for less than 1% of all breast neoplasms. Moelans (COSP46952) study the landscape of MBC by targeting all exons of 1943 cancer-related genes in more than 135 cases. They find recurrent PIK3CA and GATA3 mutations, with results mirroring those in female breast cancer to some extent, but TP53 mutations are significantly less frequent in MBC whereas mutations in genes regulating chromatin function, such as PBRM1 and KMT2C, are more prevalent. These differences provide additional evidence that MBC is its own entity, requiring a different clinical approach.
Penile squamous cell carcinoma (PSCC) is also a rare malignancy, in the developed world, and advanced PSCC is associated with poor survival, with many showing chemo-/radio-resistance. Huang et al. (COSP46841) evaluate salvage therapy with the EGFR mono-antibody nimotuzumab in chemorefractory advanced PSCC with mutations in TP53, CDKN2A and PIK3CA, while Trafalis et al. (COSP46719) report successful treatment with human programmed death receptor-1 (PD-1) blocking antibody nivolumab in a case of radio- and chemorefractory advanced PSCC with a CDKN2A mutation.
Additionally, Frick et al. ( COSP46705) screen diffuse large B cell lymphomas and find primary testicular lymphomas to be significantly associated with mutations in CD79B and MYD88, and Michalova et al. ( COSP46703) report a pancreatic analogue, solid pseudopapillary neoplasm (SPN) of the testis. A comparison of mutational profiles of both testicular and pancreatic SPN showed oncogenic mutations in exon 3 of CTNNB1 in both.
Reports of lymphomas associated with cosmetic and reconstructive breast implants appeared over 20 years ago and the 2017 WHO classification update of lymphoid neoplasia recognised these ALK-ve CD30 +ve tumours as a distinct subtype of non-Hodgkin T-cell lymphoma: Breast implant-associated anaplastic large cell lymphoma (BIA-ALCL). The morphological features of these rare tumours are similar to other ALK-ve ALCL tumours, but the location adjacent to implants, the molecular landscape and generally favourable outcome are distinct (reviewed in Oishi et al. COSP46322).
BIA-ALCLs usually present as a periprosthetic effusion some years after a textured implant, having arisen in the seroma cavity surrounding the implant (in situ lymphoma). Without invasion of surrounding tissues surgical removal of the implant and total capsulectomy is associated with an excellent outlook. However, BIA-ALCLs can present with lymph node involvement or as a mass, and these are adverse prognostic factors.
Molecular analysis of BIA-ALCLs in eight publications demonstrates frequent mutations in JAK/STAT pathway genes, in particular recurrent gain-of-function activating mutations in the JAK1 kinase domain (JAK1 p.G1097D/C/V) and the STAT3 SH2 domain (p.S614R, p.Y640F, p.D661Y, p.G618R). Mutations are also present in other genes involved in the JAK/STAT pathway such as STAT5A/5B, SOCS1, SOCS3 and PTPN1. In addition, mutations occur in TP53 and several epigenetic genes such as KMT2C, KMT2D, CHD1 and CREBB (Breast implant-associated anaplastic large cell lymphoma (BIA-ALCL). Whilst the JAK/STAT pathway mutations are usually activating, those observed in epigenetic regulators are often potentially inactivating nonsense or frameshift mutations. In one study ( COSP47421) over 70% of tumours had mutations in an epigenetic regulator or histone modifier, compared to 59% in JAK/STAT pathway gene, suggesting an important role for epigenetics. Fusion genes found in other ALCL subtypes (ALK, DUSP22, TP63) are absent from BIA-ALCL.
The single most frequent mutation reported, STAT3 p.S614R, results in increased transcriptional activity of STAT3 whilst JAK1 p.G1097 mutations constitutively activate STAT3, and it is a feature of BIA-ALCL that STAT3 is activated, regardless of the mutation status of JAK/STAT pathway genes ( COSP47077). Over 50% of BIA-ALCL cases are mutated in JAK/STAT cascade genes and it is thought likely that chronic inflammation and interference in cytokine receptor signalling play a role in BIA-ALCL.
Co-occurrence of two or more somatic mutations in genes involved in the JAK/STAT pathway/regulation, or in combination with an epigenetic regulator/histone modifier gene or TP53 is not unusual (COSP42682, COSP45295, COSP46938, COSP47421 ) and chromosomal deletions in regions containing these genes are also reported. Several publications report presence of additional germline mutations in JAK/STAT pathways genes and TP53 ( COSP42530, COSP46172), and it has been suggested that double hits enhance JAK/STAT signalling in BIA-ALCL, acting together to facilitate growth.
Whilst most BIA-ALCL cases are limited to the effusion in the seroma cavity, some present as masses and current data suggests differences between them in mutation patterns and rates. Letourneau et al. (COSP45295) reported a solid tumour positive for recurrent STAT3/JAK1 mutations, a second JAK1 nonsense mutation and a TRG-TRB rearrangement. An in situ tumour identified during subsequent implant-related surgery lacked the second JAK1 nonsense mutation. Furthermore, the 15 solid and 19 in situ tumours studied by Laurent et al. (COSP47421) showed that 80% and 42% respectively presented with mutations in the JAK/STAT pathway; the solid tumours had a significantly higher STAT3 mutation rate and were more likely to contain mutations in cell cycle controlling genes.
Mukhtar et al. (COSP46938)reported a case of synchronous breast tumours 18 years after an implant; the stage 2 BIA-ALCL was positive for a recurrent STAT3 mutation as well as several others, but the second tumour, a breast invasive carcinoma, shared no common mutations.
Chen et al. (COSP43377) looked at the sensitivity of ALK-ve ALCL cell lines (including two BIA-ALCL cell lines) to JAK inhibitors and found them to be sensitive in all cases, regardless of the presence of STAT3 and/or JAK1 mutations.
Systematic Screen Papers
Follow links below to the 48 papers which are new in v91, or view the full table of papers here.
COSP44549COSP47139COSP47421COSP43982COSP46568COSP45556COSP45498COSP47277COSP47224COSP39957COSP47104COSP42128COSP46181COSP44881COSP46970COSP43252COSP46949COSP47107COSP47161COSP47075COSP46968COSP43281COSP43720COSP45187COSP44800COSP43830COSP39315COSP40528COSP46756COSP46552COSP33607COSP46701COSP45470COSP45825COSP46696COSP44727COSP39585COSP40773COSP40827COSP45984COSP39390COSP45540COSP45457COSP33421COSP45959COSP42555COSP42149COSP44543
ICGC studies
Follow links below to the 2 studies which are new in v91, or view the full table of studies here.
COSU683 GACA-JPCOSU693 BPLL-FR
COSMIC Statistics:
Numbers with a '+' at the end of each statistics denotes the increase since the last release.
1,443,198 Samples (+30,732) 34,657,730 Coding mutations (Mutation Id) +5,137,810 11,453,569 Coding mutations (Legacy COSM) +1,720,144 21,901,440 Genomic mutations (COSV) +1,999,299 27,496 Papers (+667) 19,396 Fusions 37,221 Whole genomes (+2,901) 1,207,190 Copy number variants 9,197,630 Gene expression variants 7,930,489 Differentially methylated CpGs (+657) 15,156,086 Non Coding Variants (+2,056,985)