Summary
v99 (November 2023) A focus on 7 expertly curated genes; 6 census genes, and 8 cancer hallmark genes are updated along with a new resistance gene drug pair. In this release of COSMIC, we have 438,063 new genomic variants, over a million of new coding mutations, 274,853 non-coding mutations, 6,810 new samples, and 1,303 new whole genomes. We have also curated 32 new systematic screen papers.
Our other products Cancer Mutation Census (CMC) and COSMIC-3D are also updated with the latest datasets. Actionability v10 has been updated with a new release in October 2023; more details can be found here.
Mutational Signatures has been updated with a new release v3.4.
Key Updates
- Gene focused curation on 7 new genes IKZF3, ARHGAP35, BAX, ASPM, GSK3B, NTRK2, SOX2
- New IDH1-Ivosidenib : 14 new samples, 9 new resistance mutations
- Cancer Gene Census: 3 new Tier 1 genes (HGF, RAD50 , RRAS2 ) and 3 new Tier 2 genes (GSK3B, MUC6 , RAP1B) have been added
- We have created cancer hallmark annotations for each of 8 Cancer Gene Census Tier 1 genes (SRC, SRSF2, STAT3, STAT5B, STK11, SUFU, TBX3, TNFRSF14)
- 32 new systematic screen papers.
- Cancer Mutation Census data is also updated to align with the latest COSMIC data (v99).
- COSMIC-3D is also updated to align with the latest COSMIC data (v99), with 279 new genes mapped to PDB structures, 1698 more mapped PDB structures and 9 new census gene structures.
- Updates in the new download files - Added 20 missing sample features e.g. age, grade, drug response in the COSMIC_samples and Cell_lines sample files. Also, the mutation somatic status column is added in COSMIC and Cell Lines download files. A gentle reminder, we are supporting the legacy COSMIC downloads for a whole year (until May 2024).
- New features on the download web page: The functionality of Scripted and Filtered download features (similar to the archive downloads) are added to the new download files with enhanced user experience.
Website Changes
We have added the newly styled Scripted and Filtered downloads onto the Download pages.
Scripted download - This feature allows the products to be downloaded programmatically using the command line or scripts. Once select the option for "Scripted download" in the pop-up window, detailed help with examples is provided on the web page to guide you through the process.
Filtered download - This feature allows a product to be downloaded for a small chunk of data. The categories a product can be filtered on are the Gene symbol, Sample name, and Primary Site (Cancer). The filters could vary depending on the product if it has the filtered fields. E.g. Genome Screen Mutants product can be filtered on all 3 categories of Gene symbol, Sample name, and Primary Site whilst Cancer Gene Census product can only be filtered on Gene Symbols as this product file doesn't include Sample Name or Primary Site. More details and help are provided on the Download pages.
In future releases, we are aiming to expand the options for filtered categories.
To help transition to the new files we are supporting the legacy downloads for v99 until the next year May 2024, but thereafter these downloads will be discontinued. Until then access to the legacy downloads is available from the “Archive Downloads” page.
We value your feedback on the new Download page and download files. Please help us as we work to improve the useability and accessibility of COSMIC data by sending your thoughts to cosmic@sanger.ac.uk
Download File Updates
We have added missing data fields in the new download files.
- 20 missing sample features - Age, Therapy Relationship, Sample Differentiator, Mutation Allele specification, MSI, Average ploidy, Sample Remark, drug response, Grade, Age at tumour recurrence, stage, cytogenetics, Metastatic site, Tumour remark, Ethnicity, Environmental variables, Germline mutation, Therapy, Family, Individual remark are added to Cosmic_Sample and CellLinesProject_Sample files.
- Added Mutation somatic status column to Cosmic_CompleteTargetedScreensMutant, Cosmic_GenomeScreensMutant, Cosmic_MutantCensus, Cosmic_NonCodingVariants, Cosmic_ResistanceMutations, CellLinesProject_GenomeScreensMutant, CellLinesProject_NonCodingVariants
A complete list of changes in all the files is available on the Download page
Curated Genes
IKZF3 (IKAROS family zinc finger 3), a haemopoietic zinc finger DNA-binding protein, is a central regulator of lymphoid differentiation and is implicated in leukaemogenesis. IKZF3 was identified as CLL driver gene, recurrent L162R substitutions (11, 2.0%) targeting a highly conserved amino acid (COSP40730). Moreover, the same hotspot mutation has been identified in diffuse large B cell lymphoma (DLBCL) and mantle cell lymphoma, suggesting its critical role in the malignant transformation of B cells. Adult low-hypodiploid acute B-lymphoblastic leukaemia with IKZF3 deletion has also been reported. IKZF3 gene fusions have been identified in colorectal (PMID:29955133) and breast cancer (PMID:21247443).
ARHGAP35 codes for a Rho GTPase-activating protein and it ranks among the top ∼30 most significantly mutated genes in human cancers. ARHGAP35 is frequently mutated in epithelial tumours, and the high proportion of inactivating mutations coupled with functional evidence indicates that it is a tumour suppressor gene in endometrial carcinoma. We have 2,262 new samples tested for ARHGAP35, and 142 of these had mutations, mostly missense substitutions, followed by synonymous and nonsense.
The BAX gene encodes the BCL2 Associated X, apoptosis regulator protein and is a member of the BCL2 protein family. It forms a heterodimer with the BCL2 protein with the ratio of BAX to BCL2 determining the death or survival of a cell following an apoptotic stimulus. BAX mutations have been implicated in many cancers but have a particular prevalence in colorectal cancer, endometrial cancer and haematopoietic and lymphoid neoplasms. Many mutations occur within a poly (G) 8 tract within exon 3 and are associated with microsatellite instability (MSI) - around 90% of the new mutations curated for BAX were insertions or deletions in this region, with the majority being involved in cancers of the stomach and intestines. However, missense mutations in other parts of the BAX gene have also been curated in a broader spectrum of cancers including the haematopoietic or lymphoid cancers, cancers of the skin, especially malignant melanomas, and liver and breast cancers.
NTRK2 encodes a receptor tyrosine kinase that is involved in neural cell differentiation, survival, and proliferation. A common mechanism of NTRK2 oncogenic activation involves fusion of the tyrosine kinase domain to an N-terminal portion donated by various partner genes, leading to the production of a chimaeric, constitutively activated receptor. NTRK2 gene fusions have reported in glioma, HNSCC and lung adenocarcinoma. Some somatic NTRK2 SNVs have been detected in a variety of tumour types (COSP51285, 8958, 50832).
SOX2 is a member of the Sox family of transcription factors that are essential for many aspects of mammalian development. Normally, SOX2 maintains the pluripotency of embryonic stem cells. In cancer, it functions as a tumour suppressor such that amplifications of SOX2 are known to induce hyperplasia leading to neoplasms. As the main driver mechanism for SOX2 is amplification rather than mutation, we have limited somatic mutation data for this gene, nevertheless it is now manually curated as a Tier 1 gene.
ASPM encodes a large microtubule-binding protein that plays an important role in neurogenesis and cell proliferation. The gene is frequently affected by somatic mutation (predominantly missense) in lung adenocarcinoma (PMID:25079552) and endometroid carcinoma (PMID:23636398). ASPM is also overexpressed in many types of cancer, where it correlates with tumor progression and poor clinical prognosis.
GSK3B encodes a serine-threonine kinase that is a negative regulator of many physiological processes, including glycogen metabolism, neuronal function, and microtubule dynamics. We have 1,959 new tested samples for GSK3B; only 18 had mutations. Increased protein-level expression is frequently observed in several cancer types, including NSCLC (PMID:24618715), and decreased protein expression in cutaneous squamous cell carcinoma and basal cell carcinoma (PMID:17699780). Functional evidence suggests that the gene's role in cancer is cell type-specific. However, it has been implicated in playing roles in cancers which are resistant to chemo-, radio-, and targeted therapy [PMID:21881296).
Drug Resistance
IDH1-Ivosidenib; IDH2 - Enasidenib
Mutations in IDH1 and IDH2 are capable of driving cancer in several types of cancer, including leukaemias, lymphomas, gliomas and some bone cancers, amongst others. The most common driver mutations are IDH1 R132H and IDH2 R140Q/R172K. The small molecule inhibitor drugs, ivosidenib and enasidenib target these mutations in IDH1 and IDH2, respectively. While these drugs tend to deliver better outcomes, it is inevitable that their use also drives the development of secondary resistance mutations. These predominantly include secondary IDH mutations and isoform switching between mutant IDH1 and IDH2. We have compiled data from multiple publications capturing patient response to treatment and the subsequent development of resistance, including several new resistance mutations.
Cancer Gene Census (CGC)
The Cancer Gene Census (CGC) has been compiled over a 19-year period and is subject to periodic review. This ensures that gene assignment to the Census reflects the latest evidence indicative of the strength of a causal association between a gene and one, or more, cancer types, and consistency in the application of the COSMIC inclusion criteria for CGC Tier 1 and Tier 2 assignment.
Following a recent review, TSHR has been re-assigned from Tier 1 of the Census to Tier 2, and its previous designation as an oncogene rescinded. TSHR is notably affected by recurrent missense mutation (in particular p.T632I and p.M453T) in ~33% of toxic thyroid adenoma, a benign cancer type which progresses to carcinoma in 1 - 10% of cases. Although 39% of the missense mutations occur in a mutation hotspot encompassing codons 630 - 633, there is a paucity of experimental evidence demonstrating functionally how TSHR may contribute to oncogenic transformation. In particular, ‘avoiding immune destruction’ is the only Hallmark of Cancer that wild type TSHR has been shown to promote.
New Census Genes (Tier 1)
HGF (Hepatocyte growth factor)
A growth factor for a broad spectrum of tissues and cell types, and a ligand for MET Proto-Oncogene, Receptor Tyrosine Kinase.
Somatic alterations: Amplified in 10.5%, and affected by missense mutation, in 4.4% of lung adenocarcinomas. Gained/amplified in 25.4% of breast tumours in pre-menopausal, and in 33.7% of post-menopausal breast cancer patients. Promoter activity increase-associated truncating mutations occur in a promoter 30b poly(dA) transcriptional repressor sequence in 51.4% of African-American and 15.1% of European patient breast tumours. HGF-CACNA2D1 fusion in multiple myeloma.
Germline alterations: Germline promoter poly(dA) tract ≥3b truncating mutations affect 50% of bladder cancer patients and 24.2% of healthy controls.
Functional evidence:
NSCLC and breast carcinoma cells cultured in the presence of HF display increased migration and invasion. Mammary epithelium-specific expression in transgenic mice (during pregnancy and lactation) leads to the development of mammary carcinomas (89.1% of mice), and lung metastases (21.8% of mice).
RAD50 (RAD50 double strand break repair protein)
Part of the MRN complex, involved in DNA double-strand break repair, recombination and telomere maintenance.
Somatic alterations: Encompassed in 5q11-35 deletions, associated with decreased mRNA expression, in 50% - 60% of basal-like subtype breast cancers.
Germline alterations: Predicted pathogenic germline variants (including nonsense and frameshift-indels) associated with breast and ovarian cancer susceptibility.
Functional evidence: Deletion in BRCA1/2 +-type ovarian cancers correlates with increased genome instability. Knockdown in an ovarian serous adenocarcinoma cell line causes irregular mitotic chromosome segregation and increases aneuploidy.
RRAS2 (RAS related 2)
A member of the R-Ras subfamily of Ras-like small GTPases. Involved in signal transduction within the MAPK signalling pathway.
Somatic alterations: Missense mutations (p.G23A/D/S/V, p.G24C/D) in 12.5 - 13.6% of intracranial germ cell tumour subtypes.
Functional evidence: Expression of p.G23 (A, C, S, V) and p.G24 (C, D, V) mutant genes transforms mouse NIH 3T3 embryonic fibroblasts. Systemic expression of RRAS2-p.Q72L (occurs in several cancer types, most frequently endometrioid carcinoma) leads to the development of multiple cancer types in knock-in mice.
New Census Genes (Tier 2):
GSK3B (Glycogen Synthase Kinase 3 Beta)
A serine-threonine kinase (constitutively active in the basal state, but inactivated by p.S9-phosphorylation by kinases in various signalling pathways) that is a negative regulator of many physiological processes, including glycogen metabolism.
Somatic alterations: Mutated in 3.6% of endometrial cancers. Increased level of protein expression and of active GSK3B-pY216 in colorectal cancer and pancreatic cancer. Decreased protein expression in cutaneous squamous cell carcinoma and basal cell carcinoma.
Functional evidence: Knockdown in pancreatic cancer leads to increased apoptosis, and decreases both mouse xenograft tumour growth and angiogenesis. Knockdown increases the proliferation of cholangiocarcinoma cells, and melanoma cells.
MUC6 (Mucin 6, oligomeric mucus/gel-forming)
A secreted 2,439 amino acid-glycoprotein that forms an insoluble mucous barrier to protect epithelial surfaces, including the gut lumen.
Somatic alterations: Mutated (frameshift and in-frame indels, missense) in 6.0 - 9.8% of non-hypermutated gastric cancers.
Germline alterations: Minisatellite MS5 allelic variants are associated with gastric and rectal cancer.
Functional evidence: Knockdown in foetal gastric epithelial cell line GES-1 increases cell migration and invasion. Expression in pancreatic ductal adenocarcinoma cell line MIA PaCa-2 decreases cell proliferation, migration and invasion.
RAP1B (RAP1B, member of RAS oncogene family)
A GTPase that (1) stimulates BRAF to activate MAPK signalling, (2) modulates adhesion and signalling functions of integrins and cadherins, and (3) positively regulates angiogenesis during development.
Somatic alterations: Amplified (and overexpressed) in a subset of high grade gliomas.
Functional evidence: Knockdown in glioma cells decreases cell proliferation and invasion, and increases apoptosis.
Hallmarks of Cancer
Hallmarks of cancer annotations summarise the effect of Cancer Gene Census Tier 1 genes on the phenotypic traits shared by cancers. COSMIC v99 includes Hallmark gene pages for an additional 8 genes (SRC, SRSF2, STAT3, STAT5B, STK11, SUFU, TBX3, TNFRSF14).
In addition to hallmarks of cancer annotations, each Hallmark gene page summarises the role of a gene in cancer, how it is affected by somatic and germline alteration in cancer, and how it affects other biological processes relevant to cancer.
SRCSRSF2STAT3STAT5BSTK11SUFUTBX3TNFRSF14
Systematic Screen Papers
Follow the links below to the 32 papers that are new in v99, or view the full table of papers here.
COSP40589COSP40973COSP41798COSP42376COSP43057COSP43418COSP43792COSP44286COSP45330COSP46529COSP47258COSP49332COSP49700COSP49702COSP49708COSP49809COSP50112COSP50657COSP50778COSP50881COSP50950COSP50952COSP51034COSP51081COSP51164COSP51214COSP51234COSP51302COSP51433COSP51448COSP51450COSP51532
COSMIC Statistics
24,292,168
Total genomic variants (COSV) (+438,063)
16,579,554
Genomic non-coding variants (+274,853)
5,286,735
Genomic mutations within exons (coding variants) (+208,168)
9,069,262
Genomic mutations within intronic and other intragenic regions (+160,598)
1,527,131
Samples (+6,810)
29,230
Papers (+206)
19,428
Fusions (+0)
43,822
Whole genome screen samples (+1,303)
1,207,190
Copy number variants (+0)
9,215,470
Gene expression variants (+0)
7,930,489
Differentially methylated CpGs (+0)
COSMIC-3D
COSMIC-3D data has been updated for v99 release. These are the key updates:
- 9 Cancer Gene Census genes now have mapped structures (FAT4, HGF, MPL, NUTM1, RAD50, RAD51B, RRAS2, SDHC, SDHD)
- 279 new genes map to a PDB structure, bringing the total number of genes with structures to 8214.
- 1698 new PDB structures are also added, increasing the total number of mapped protein structures (PDB ids) from 53,312 to 55,010
Mutational Signatures
We are also thrilled to announce the release of COSMIC Mutational Signatures, version 3.4. In this release, we introduce the curation of mutational signatures from two new variant classes: structural variants (SV1-10) and RNA-SBS variants (RNA-SBS1 through RNA-SBS5). The former describes large genomic changes resulting from chromosome rearrangements, while the latter enables precise inference of the patterns of nucleotide changes due to RNA editing.
Additionally, we have expanded our existing catalogue with newly extracted signatures, including SBS signatures(SBS96-99), DBS signatures (DBS12-20), ID signatures (ID19-23), and CN signatures (CN25). Finally, we have also been able to refine our reference set of mutational signatures by splitting SBS22 into SBS22a and SBS22b as well as SBS40 into SBS40a, SBS40b, and SBS40c. Consequently, SBS22 and SBS40 have now been deprecated in version 3.4.
For more details please see the signatures website.