COSMIC v82 (August 2017) includes 4 new fully curated genes, a substantial curation update for SMAD4, 1 new fusion pair, 342 genomes from 11 new systematic screen papers, updates from ICGC release 24, and updated resistance mutation data; 1 new drug and 4 updated. We also launch the new COSMIC website featuring new styles and layout as well as an enhanced version of the Cancer Gene Census and additional website download options.
The new COSMIC website has now been launched. We welcome your feedback, please email firstname.lastname@example.org with any issues or suggestions for improvement.
The old websites have been updated to v82 and will continue as the legacy website and GRCh37 (archive) legacy website. These will be available until the next release in November 2017, but we do not plan to maintain them beyond that date. However, we will continue to provide our download files as both GRCh38 and GRCh37 versions for the foreseeable future.
New features include -
For users who download the COSMIC Oracle database dumps, please note that we now only support Oracle 12c. This is because Oracle 11.2 is no longer supported by Oracle.
COSMIC's cancer genome data is interpreted into standardised annotations from a variety of sources, described here.
Kelch-like ECH-associated protein 1 (KEAP1) is a component of the Cullin 3-based E3 ubiquitin ligase complex and controls the stability and accumulation of NRF2 protein. When cells are exposed to oxidative damage, KEAP1 releases NRF2 which translocates into the nucleus where it specifically recognises an enhancer sequence known as Antioxidant Response Element (ARE) resulting in the activation of redox balancing genes. Several studies have reported somatic mutations of the interacting domain between KEAP1 and NRF2 leading to a permanent NRF2 activation. Somatic mutations of the KEAP1 gene are found in non-small cell lung cancer, hepatocellular carcinoma, endometrial cancer, melanoma and many other cancer types and have been associated with a poor outcome and resistance to chemotherapy. The mutations are generally widely distributed in the KEAP1 gene and the frequency of mutations depends on the cancer type and origin.
microRNAs (miRNA) are vital regulators of gene expression. Together with its co-factor DGCR8, the miRNA processing gene DROSHA (drosha ribonuclease III) is involved in the early stages of miRNA processing and is essential for the biogenesis of most miRNAs. Low DROSHA expression levels are observed in several cancer types, including neuroblastoma, endometrial and ovarian cancer, and are associated with advanced stages of several cancer types. In contrast, copy number increases (seen in advanced cervical squamous cell carcinoma) and over-expression are observed in other cancer types, including serous ovarian carcinoma, gastric and non-small cell lung cancers, often associated with prognosis or progression. DROSHA is frequently mutated in Wilms tumour, with the majority of mutations found in the RNase IIIb domain, at p.E1147. The recurrent mutation p.E1147K affects miRNA processing via a dominantnegative mechanism resulting in down regulation of miRNAs.
BTK encodes Bruton tyrosine kinase, a TEC family cytoplasmic tyrosine kinase required for the development, activation and differentiation of B cells, and an early component of the B-cell receptor signalling pathway. Recurrent mutations at BTK C481 have been identified in patients with chronic lymphocytic leukaemia (CLL) who have progressed after an initial response to ibrutinib treatment. Ibrutinib is a highly specific BTK inhibitor, inactivating by irreversible binding to C481 within the ATP-binding domain of BTK. While C481 mutations are most common among CLL patients who progress on ibrutinib, mutations at the non-kinase SH2 domain at T316 have also been reported. Progression of mantle cell lymphoma after a durable response to ibrutinib may also be due to C481 BTK mutation. This same mutation has also been detected in Waldenstrom macroglobulinaemia patients progressing on ibrutinib.
Hypoxia-inducible factors (HIFs) are transcription factors that respond to changes in tissue oxygen concentration. One of these, Hypoxia-inducible factor 2-alpha (HIF-2-alpha), is encoded by EPAS1. Somatic mutations in EPAS1 occur recurrently in sporadic pheochromocytomas and paragangliomas, as well as in somatostatinomas as part of Pacak-Zhuang syndrome (multiple paragangliomas and somatostatinomas associated with polycythaemia). In some patients with multiple tumours, these somatic EPAS1 mutations are mosaic, having arisen post-zygotically. The majority of somatic EPAS1 mutations are found in exon 12, and gain of function mutations in this region have been shown to cause stabilisation of the HIF2A protein, resulting in transcription of genes involved in the hypoxia response and promotion of angiogenesis and proliferation.
The expertly curated data for SMAD4 have been updated. Over 40 publications which include screening of SMAD4, often alongside other genes, are included in this release. SMAD4 encodes a member of the Smad family of signal transduction proteins which plays a pivotal role in signal transduction of the transforming growth factor beta superfamily cytokines by mediating transcriptional activation of target genes. SMAD4, a tumour suppressor gene, is one of the major driver genes in pancreatic cancer. A lack of SMAD4 mutations in high-grade pancreatic intraepithelial neoplasia, the major precursor of pancreatic ductal adenocarcinoma, indicates these are late genetic alterations in pancreatic carcinoma. SMAD4 mutations are also found in colorectal carcinoma (CRC), where they have a prognostic role in metastatic CRC cases, and less frequently in other tumours, including lung cancer.
The SET-NUP214 fusion results from a recurrent genetic abnormality at 9q34 and is found predominantly in T-cell acute lymphoblastic leukaemia (T-ALL), with a reported frequency of up to 10%. The fusion is rarely detected in acute myeloid leukaemia, acute undifferentiated leukaemia and B-cell acute lymphoblastic leukaemia. In T-ALL, the SET-NUP214 fusion is associated with elevated expression of HOXA cluster genes and with corticosteroid/chemotherapy resistance. SET encodes a protein with a critical role in chromatin binding and remodelling, while NUP214 encodes an FG-repeat-containing nucleoporin involved in the cell cycle and transportation of material between the nucleus and cytoplasm. Most commonly the breakpoints in the SET-NUP214 transcript are at exon 7 of SET and exon 18 of NUP214.
Based on the concept defined by D. Hanahan and R. A. Weinberg, COSMIC, in collaboration with Open Targets, integrates functional descriptions focused on Hallmarks of Cancer into the Cancer Gene Census. New Hallmark pages visually explain the role of a gene in cancer by highlighting which of the classic behaviours are displayed by the gene and whether they are promoted or suppressed. A concise overview with associated references is available for 226 census genes and will be expanded on a regular basis.
All Cancer Gene Census (CGC) genes have been re-evaluated and classified with regard to their function in cancer, as oncogenes or tumour suppressive genes, as well as genes participating in fusions, where it was applicable.
To be able to provide high-confidence and comprehensive data, we have divided the CGC into two tiers. Currently, only the Tier 1 genes are shown on the website and in the download files.
To classify into Tier 1 of the CGC, a gene must possess a documented activity that may drive or suppress cancer, and there must be evidence of mutations in this gene, detected in cancer, and changing the activity of the protein in a way that promotes the oncogenic transformation. We also take into account the existence of somatic mutation patterns in cancer samples, typical for tumour suppressor genes (broad range of inactivating mutations) or oncogenes (well defined hotspots of missense mutations).
Tier 2 of the Cancer Gene Census consists of genes with strong indications of roles in cancer but with less expansive available evidence, compared to Tier 1. It currently contains 41 genes from the previous release of the Cancer Gene Census and is being expanded, with a planned initial release of about 200 genes in November 2017, along with COSMIC v83.
Complete census list (Tier 1) is available here
CGC Genes moved to Tier 2 of the Cancer Gene Census
1. PMS1 - PMS1, a component of DDR, only one recurrent frameshift mutation K164fs*6 in four samples, newer papers about MMR genes in cancer don't mention this gene, mice deficient in PMS1 do not develop tumours, no evidence for significant activity in MMR in vitro [PMID: 10542278]
2. Fusion genes with only one case (or rare partners of potent oncogenes known to be fused to multiple partners and able to drive the transformation on their own):
3. Fusion genes transcribed with a shifted reading frame or untranscribed upon fusion, for which there is no sufficient evidence for tumour suppressing activity:
4. Non-coding genes and pseudogenes do not fit to the current schema of Tier 1 of the Cancer Gene Census. We are working on better characterisation of the role of such genes in cancer. Temporarily they are classified as Tier 2 CGC genes:
5. Genes known to be involved in cancer only through fusions, where the oncogenic mechanisms depend on disruption of the structure of their fusion partner and there is no evidence of their other cancer-promoting activity so far:
6. Genes known to be involved in cancer only through fusions, for which there is not enough data describing their participation in oncogenic transformation
Genes removed from the Cancer Gene Census (Tier 1 and 2)
In total, the following 49 genes have been removed from Tier 1 of the CGC:
New ICGC Studies:
New Copy number data:
Follow links below to the 11 papers which are new in v82, or view the full table of papers here.
COSMIC v81 (May 2017) includes 6 new fully curated genes, a substantial curation update for TET2, 1 new fusion pair, 220 genomes from 9 new systematic screen papers and updated resistance mutation data; 1 new drug and 5 updated. We also announce the launch of a new COSMIC beta site featuring new styles and layout as well as an enhanced version of the Cancer Gene Census and additional website download options.
The new COSMIC Beta site http://cancer-beta.sanger.ac.uk has now been launched. This site will be under continual update over the next 3 months and will be regularly updated. We welcome your feedback, please email email@example.com with any issues or suggestions for improvement.
For users who download the COSMIC Oracle database dumps, please note that from v82 we will only support Oracle 12c. This is because Oracle 11.2 is no longer supported by Oracle.
Oncogenic gain-of-function mutations in DDR2 have been identified in squamous cell carcinoma (SqCC) of the lung. DDR2 encodes the discoidin domain receptor 2, a collagen-stimulated receptor tyrosine kinase. These kinases are involved in the regulation of cell differentiation, cell migration and cell proliferation. DDR2 mutations are present in 4% of lung SqCC where they are associated with sensitivity to dasatinib. Low frequency DDR2 mutations have been found in other cancer types such endometrial, kidney, brain, breast and colorectal, and in recurrent/metastatic head-neck SqCC.
Mutations in SMAD2 and SMAD3 occur at very low frequency in various cancers types. SMAD2 mutations have been found in cervical and colorectal cancer, hepatocellular carcinoma and non-small cell lung cancer. SMAD3 mutations have been detected in colorectal cancer and in oral squamous cell carcinoma. Most of the mutations observed are missense mutations. Both SMAD2 and SMAD3 encode proteins which are major signalling molecules acting downstream of the serine/threonine kinase receptors.
NCOR1 (nuclear receptor corepressor 1) plays a part in maintenance of genomic integrity. It has been reported among the most frequently mutated drivers in breast cancer. Downregulation of NCOR1 expression abrogates HDAC3 function and results in genomic instability. Breast cancer patients with high NCOR1 expression levels have been found to have a better prognosis than those with low expression (Zhang et al., 2005). NCOR1 mutations also play a role in skin cancer, colorectal carcinoma and many other cancer types. Predicted damaging and somatic mutations in epigenetic regulators were detected in one third of high hyperdiploid acute lymphoblastic leukaemia (HD-ALL) patients (de Smith AJ 2016).
Protein phosphatase, Mg2+/Mn2+-dependent, 1D (PPM1D) encodes WIP1, a member of the PP2C family of serine/threonine protein phosphatases. PPM1D dephosphorylates DNA damage response mediators such as CHEK2 and p53, antagonising their function and promoting reentry into the cell cycle. Recurrent PPM1D mutations have been observed in brainstem gliomas, with many of these resulting in truncation of the C-terminal regulatory domain and leaving the phosphatase domain intact.
Mutations across the PREX2 gene, including numerous truncating mutations, have been found in metastatic melanoma, including in desmoplastic melanoma, and also in other cancers such as basal cell carcinoma, pancreatic ductal and lung adenocarcinomas, and merkel cell carcinoma. PREX2 has been recognised as playing a role in melanoma for some years, although the precise nature of all the mechanisms of its involvement remain uncertain. Some in vivo and mouse studies have a demonstrated that cancer-associated PREX2 mutations promote the growth of human melanoma cells. It is a GTP/GDP exchange factor and both mutated and wild type PREX2 inhibit the tumour suppressor PTEN, but PTEN can no longer inhibit mutated PREX2, hence mutual inhibition is disrupted promoting tumour growth via activation of the PIK3 signalling pathway. Increased RAC-dependent invasiveness is also associated with mutated PREX2.
TET2 (ten-eleven-translocation gene) is an epigenetic regulator responsible for converting DNA cytosine methylation to hydroxymethylation, a process disrupted by mutations which are known to be associated with myeloproliferative neoplasms (MPN), leukaemias and mastocytosis. An update of 46 publications which included screening of TET2, often along-side other genes or gene panels, has been carried out. Overall 2,027 new samples were curated which identified 277 new mutations of all types and located across the gene. Publications included reports of many haematopoietic and lymphoid disorders, as well as 2 where solid cancers progressed following hormone or tyrosine kinase therapy. One of these publications reported TET2 mutations associated with metastatic prostate cancer after hormone therapy and the second publication reported 12% TET2 mutated samples in non-small cell lung cancer progressions following tyrosine kinase therapy. MPN publications curated include those where TET2 was found associated with progression, and chronic myelomonocytic leukaemia, where mutated TET2 was predictive of inferior prognosis when co-occurring with ASXL1 mutation; myelodysplastic syndrome (MDS) and chronic eosinophilic leukaemia (CEL), including a report where mutated TET2 could help distinguish MDS/CEL from reactive disorders and hypereosinophilic syndrome respectively. Leukaemia publications include HTLV-1 associated adult T cell associated leukaemia/lymphoma (with TET2 as the most commonly mutated gene); angioimmunoblastic T cell leukaemia and peripheral T cell leukaemia, where TET2 mutation are associated with shorter PFS; And somatic TET2 mutation associated with AML in a family with familial platelet disorder.
Complete census list available here
Based on the concept defined by D. Hanahan and R. A. Weinberg, COSMIC, in collaboration with Open Targets, integrates functional descriptions focused on Hallmarks of Cancer into the Cancer Gene Census. New Hallmark pages visually explain the role of a gene in cancer by highlighting which of the classic behaviours are displayed by the gene and whether they are promoted or suppressed. A concise overview with associated references is initially available for 116 census genes and will be expanded on a regular basis.
Follow links below to the 9 papers which are new in v81, or view the full table of papers here.
SNVs and indels have also been uploaded from a Colorectal Cancer Organoids study from the suppresSTEM consortium: COSU670
COSMIC v80 (Feb 2017) includes a major new tool "COSMIC-3D" supporting target characterisation and pharmaceutical design alongside significant updates to our cancer genome and key cancer gene curations.
We have a new interface to explore cancer mutations on 3D protein structures, "COSMIC-3D", now available for public evaluation. Produced in partnership with Astex Pharmaceuticals (Cambridge, UK), it shows interactive 3D visualisations of over 8000 human proteins (using PDB structures), with COSMIC mutations mapped, and options to see frequency and effect. Putative small-molecule drug pockets are identified, and can be explored alongside cancer mutations to identify, characterise and design molecules against new targets across oncology. All the information is correct, but as an beta-evaluation release we would value your feedback on the web interface, so we can make it as useful as possible.
In our traditional way, full and exhaustive literature curations are now provided across cancer genes USP8, FAT1, FAT4, CXCR4 and fusion pair PML-RARA; substantial curation updates are made to AR and CTNNB1 and the Cancer Gene Census describes 7 new genes. Genome-wide molecular profiles have been curated from the ICGC (release 23, Oct 7th 2016) and 421 new genomes have been added by curation of 18 systematic screen publications. For full details of the new content in v80 please see the Datasheet.
We use recommendations from the HGVS for syntax when annotating the data within COSMIC. As part of our ongoing commitment to data quality we are currently in the process of ensuring all our mutation data are described in the most modern ways, including the latest HGVS nomenclature and gene structures. Over the last 6 months we have been working on a new system to continually annotate COSMIC data to the latest standards. Of course, to ensure the new annotations are exactly correct, we are including expert manual oversight, so it takes a little time to completely validate our huge dataset. Once we have verified the precision of our system, it will be deployed in forthcoming releases.
For more information about release v80 and other news please see the first issue of our Newsletter. We will be using this to communicate with you more frequently about the project and the exciting developments we have in the pipeline. This issue includes details about the COSMIC Workshop on March 6th and the beta release of COSMIC-3D
COSMIC v79 (Nov 2016) includes substantial updates to our cancer genome and key cancer gene curations. Full literature curations are now provided across cancer genes PRKACA and AR, and fusion pair CBFA2T3-GLIS2; substantial curation updates are made, especially to GNAS, GNAQ, and GNA11, and the Cancer Gene Census describes 7 new genes. Genome-wide molecular profiles have been curated from the ICGC (release 22, Aug 2016) and 265 new genomes have been added by curation of 9 systematic screen publications. A new drug, Vismodegib, has been added to our Genetics of Drug Resistance, describing 19 therapy-resistance variants in the gene SMO.
Data Updates in brief (for full details of this latest release, please see the v79 Datasheet).
We now include drug resistance data for the gene SMO (Vismodegib) as well as updates for EGFR (Gefitinib,Erlotinib and Afatinib), ESR1 (Endocrine therapy) and ALK (Alectinib).
All drug resistance data is detailed here, describing our curations across 11 genes and 21 pharmaceuticals. Links are provided to explore this information in detail, with charts showing the landscape of resistance to drugs targeting mutations in the gene of interest.
7 genes have been added to the Cancer Gene Census: EPAS1PTPRTPPM1DBTKPREX2TP63QKI
The complete list is available in the census table, which describes the role of each gene in cancer progression (tumour suppressor or oncogene). Currently this information is available for 244 census genes. This content, as well as additional functional annotation is being substantially expanded for future releases.
COSMIC data have been combined with the ProteinPaint data mining and visualization system at St. Jude Children's Research Hospital in Memphis TN, to support the discovery and understanding of genetic mutations in paediatric cancers [ .... read more ].
On Monday 6th March 2017 we are holding a workshop titled 'An introduction to COSMIC' at the Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
The course will begin with a presentation overviewing the COSMIC project, followed by a hands-on tutorial introducing the COSMIC website and strategies for exploring cancer variation data and investigating the genetic causes of human cancers. In addition, there will be short presentations describing exciting new developments scheduled for future release, and opportunities to engage the team in a group Q&A session and informal discussions about the COSMIC website and future plans.
Registration will open in January, but please email firstname.lastname@example.org if you would like more information or wish to express an interest in attending.
If you would be interested in hosting a COSMIC workshop at your workplace, we would be very pleased to hear from you. Please contact the COSMIC helpdesk (email@example.com)
We are planning to merge the functionality of the COSMIC and Whole Genomes websites in February 2017 (v80). We will be introducing a new 'whole genomes' filter on the gene and cancer browser pages, and as a consequence the Whole Genomes site will become redundant and will be retired.
An API and new web interfaces for downloading COSMIC data will also be developed and rolled out in 2017. As part of these developments, and due to incompatibility between BioMart (0.7) and the latest version of our Oracle databases, we are discontinuing support for the COSMICMart in this release.
If you have any questions about these changes please email the COSMIC helpdesk (firstname.lastname@example.org).
COSMIC has been updated significantly in v78 (Sept 2016). This major data release includes new full literature curations of cancer genes HIF1A, MTOR and PTPN13, drug resistance profiles across Sorafenib & Quizartinib, and a complete update of genome-wide analysis from the ICGC (release 21, May 2016). We have also added 9 new genes to the Cancer Gene Census, and fully re-analysed the copy number data across all TCGA samples using the ASCAT2 algorithm.
Data Updates in brief (for full details of this latest release, please see the v78 Datasheet).
New in v78; FLT3 with drugs Quizartinib, and Sorafenib, detailing a total of 76 new unique resistance mutations.
All drug resistance data is now detailed here, describing our curations across 10 genes and 20 pharmaceuticals. Links are provided to explore this information in detail, with charts showing the landscape of resistance to drugs targeting mutations in the gene of interest.
9 genes have been added to the Cancer Gene Census: DDR2MAPK1BCORL1KEAP1LRP1BDROSHAB2MDDX3XAPOBEC3B
The complete list is available in the census table, which describes the role of each gene in cancer progression (tumour suppressor or oncogene). Currently this information is available for 237 census genes. This content, as well as additional functional annotation is being substantially expanded for future releases.
Over time we have added filters aimed at selecting those variants within the cell lines that are more likely to contribute to carcinogenesis. These have included the ability to select variants in genes known to contribute to cancer (Cancer Gene Census), as well as an estimation of the mutation impact on the protein as determined by FATHMM. We have now extended this list of filters to include a filter that identifies variants within the cell lines that are similar to variants seen recurrently in whole genome screened tumour samples. The criteria for calling a variant as recurrent differs based on mutation type. For further details please see the Genome Annotation page.
We welcome two new starters to the COSMIC team, Dr. John Tate and Ms. Bhavana Harsha. John is our new web design and visualisation specialist who will be driving new developments and improving the design of the website. Bhavana, our new bioinformatic specialist, is developing a new annotation system to handle the ever increasing volume and complexity of genomic variation data.
Thank you for your continued support.
On Monday 26th September 2016 we are holding a workshop at the University of Cambridge, UK, titled 'COSMIC: Exploring cancer genetics at high resolution'.
During the course we will use the live COSMIC website and genome browser to show you how to access and explore cancer variation data, seeking to identify genetic causes and targets in all human cancers.
For more details please see the course timetable.
If you wish to attend the workshop, please visit the registration page.
If you would be interested in hosting a COSMIC workshop at your workplace, we would be very pleased to hear from you. Please contact email@example.com
We are considering changing the compatibility of the Oracle data pump export files from supporting Oracle 10g to 11g (11.2). If this change will cause problems for you, please let us know by emailing the COSMIC helpdesk (firstname.lastname@example.org).
COSMIC now encompasses the Genetics of Drug Resistance across 9 therapeutic target genes and 18 drugs (release v77). Also, full mutation profiles across ATR, TBX3 & NFKBIE, STIL-TAL1 & DNAJB1-PRKACA gene fusions, and over 700 new cancer genomes.
Data Updates in brief (for full details of this latest release, please see the v77 Datasheet).
In this COSMIC release, we now encompass the genetics of drug resistance, somatic mutations that allow a tumour to continue growing despite targeted therapeutics. Initial curations cover 9 genes and 18 pharmaceutical therapies (listed below), detailing 226 resistance-driving mutations.
Genes: ABL1 ,ALK ,BRAF ,EGFR ,ESR1 ,KIT ,MAP2K1 ,MAP2K2 ,PDGFRADrugs: Vemurafenib, AZD9291, Ceritinib, Erlotinib, Gefitinib, Imatinib, Nilotinib, Tyrosine kinase inhibitor - NS, Afatinib, Endocrine therapy, Alectinib, PD0325901, Dasatinib, Crizotinib, Selumetinib, Sunitinib, Dabrafenib, Bosutinib
This information is available in the 'Drug Resistance' tab of the gene analysis pages; where a table and charts show the landscape of resistance to drugs targeting mutations in the gene of interest. For example, please look at the Tyrosine Kinase Inhibitors associated with EGFR.
23 genes have been added to the Cancer Gene Census: AR, CHD4, CTCF, CXCR4,ERBB4 , FAT1 , FAT4 , HIF1A , LEF1 , LZTR1 , MTOR , NCOR2 , PRKACA , PTK6 , PTPN13 ,RBM10 , SDHA , SMAD2 , SMAD3 , TGFBR2 , USP8 , ZFHX3
New information is added to the census table, describing the role of each gene in cancer progression (tumour suppressor or oncogene). Currently this information is available for 156 census genes. This content, as well as additional functional annotation is being substantially expanded for future releases.
This spring, we welcome three new additions to the COSMIC team. Dr Laura Ponting and Dr Raymund Stefancsik join us from Cambridge University (UK) as curator scientists. They are now enhancing our team of expert manual curators, aiming to comprehensively describe the range of cancer-causing mutations across all cancer genes (driven by the Cancer Gene Census, describing 595 genes).
In addition, Charalambos (Harry) Boutselakis joins us from London's Farr Institute, bringing substantial informatic expertise across databases and data analytics. He will be expanding the ways in which COSMIC can be used while ensuring its immediate responsiveness as the database increases in size and scope.
Thank you for continuing to support us.
Please ensure you are registered (here) for data downloads, and to ensure you receive future communications.
COSMIC v76 includes full curations across cancer genes PPP6C and SPOP, genomic content from 17 systematic screen publications, and a complete update from ICGC release 20. We welcome two new scientists to the COSMIC team who will be focused on identifying targets and biomarkers across the expanding COSMIC dataset. The streamlining of our website also continues, improving the layout and design of many large-data webpages, and we have improved our Download files to simplify frequency calculations across COSMIC datasets.
For full details of this latest release, please see the v76 Datasheet; in brief:
We welcome two new scientists who will be investigating the curated database and annotating the most interesting target and biomarker opportunities across this enormous database.
Dr. Sam Thompson is a medical statistician with expertise in clinical trials. In collaboration with Bayer Pharmaceuticals, she will be exploring correlations across the different types of variant annotation in COSMIC, aiming to systematically identify novel markers for disease.
Dr. Harry Jubb brings a proteomic perspective to COSMIC. Working together with Astex Pharmaceuticals, Harry will spend the next three years enhancing our visualisation of coding mutations, and investigating which mutated peptide domains are tractable for pharmaceutical design.
Thank you for your support, allowing us to enhance the utility of the curations in COSMIC.
We have extended the layout and design used on the Gene page to the Cancer Browser, Sample, Study, and Mutations pages. Tabulations showing variant annotations from multiple datatypes have been combined into a 'Variants' tab on these pages.
On the Overview tab of the Gene page various icons indicate if the selected gene is part of a significant dataset. The icons , and indicate a cancer census gene, an expert curated gene, and a gene with a significant role in oncogenesis as evidenced from mouse insertional mutagenisis experiments.
Substantial changes are made on the Genome Browser home page with a new smart search feature with the option to select any of the specific datasets; COSMIC, Whole Genomes or the Cell Lines Project.
We have updated the structure of the mutation files in our Download site to simplify the calculation of mutation frequencies. Data has been separated according to the type of screening method used; targeted gene screen and whole genome screen. We have also enhanced the information available from the sample details file so that whole genome samples can be extracted for use in whole genome screen mutation frequency calculations. Please see our FAQ for details.
We are changing the way we communicate release updates to COSMIC users. Please register to ensure you receive future communications.
COSMIC v75 includes curations across GRIN2A, fusion pair TCF3-PBX1, and genomic data from 17 systematic screen publications. We are also beginning a reannotation of TCGA exome datasets using Sanger's Cancer Genome Project analyis pipeline to ensure consistency; four studies are included in this release, to be expanded across the next few releases. The Cancer Gene Census now has a dedicated curator, Dr. Zbyslaw Sondka, who will be focused on expanding the Census, enhancing the evidence underpinning it, and developing improved expert-curated detail describing each gene's impact in cancer. Finally, as we begin to streamline our ever-growing website, we have combined all information for each gene onto one page and simplified the layout and design to improve navigation.
For full details of this latest release, please see the v75 Datasheet; in brief:
We welcome Dr. Zbyslaw Sondka to the COSMIC team. Working in collaboration with The Centre for Therapeutic Target Validation (CTTV) he will be curating the Cancer Gene Census; building the evidence behind existing genes as well as extending the census list.
Overview information has been merged into the Gene Analysis page. This page also has a full featured Genome Browser which repsonds to filters. The page layout has also been redesigned, with tabulations organised under a single 'Data' tab and studies and publications combined under the 'References' tab.
The GA4GH (Global Alliance for Genomics and & Health) Beacon Project is a project to encourage international sites to share genetic data in the simplest of all technical contexts. The service is designed merely to accept a query of the form "Do you have any genomes with an 'A' at position 100,735 on chromosome 3" (or similar data) and responds with one of "Yes" or "No."
The Beacon Network lists all the known beacons, including the newly released COSMIC Beacon
A new miRNA track has been added across all browsers, with the data sourced from miRBase.
We are changing the way we communicate website updates to COSMIC users. As from this release all our registered users will receive email notification of updates to the website. We would encourage all those who have subscribed to the mailing list email@example.com to register as communication via this list is being phased out. If you are registered but prefer not to receive emails you can opt out by logging in and going to the Account Settings page.
We have also introduced a new 'non-affiliated' category to allow users who do not belong to a recognised academic or corporate organisation to register for email updates using their personal email address.
COSMIC v74 brings a new focus on curating blood cancer fusion genes, starting with BCR/ABL and KMT2A (MLL) fusions. We are also beginning to capture much greater clinical details on the samples we curate, now available for download. More traditionally, somatic mutations are curated from three new cancer genes, POLE, AXIN2 and KDM6A. Substantial new genomic data are included from 17 systematic screen publications, and a full update to the latest ICGC release (v19).
For full details of this latest release, please see thethe v74 Datasheet; in brief:
"Mutation Impact" scores (via FATHMM-MKL) are now available for non-coding variants. These values can be viewed on the NCV, Study and Sample overview pages, and the COSMIC Genome Browser (functionally significant variants are coloured blue). They are also included in the download files on the SFTP site. There are 422,212 functionally significant variants (scores ≥ 0.7). Please see the Mutation Impact section of Cancer Genome Annotation for help interpreting the scores.
We are now capturing substantially more clinical feature annotations on the samples we curate. Across 24 new columns we are capturing, where available, annotations such as therapeutic regimes and responses, mutation allele specification, tumour stage/grade/cytogenetics, patient age/ethnicity/gender. This full information is available via COSMIC Downloads, and is also displayed on the website on each individual Sample Overview page. For full details of these rich expanded clinical annotations, please see the 'Cosmic sample features' section (describing the CosmicSample.tsv.gz file) here.
COSMIC v73 contains full expert curation across 9 cancer genes, 26 systematic screen publications and ICGC release 18. 'Mutation impact' filters across the website now estimate pathogenic functional consequences, based on the new FATHMM-MKL algorithm. Substantial new information is now present in the COSMIC Genome Browser: regulatory features from ENCODE are now available, particularly enhancing the utility of the differential methylation and non-coding variant data; human SNPs are now shown alongside COSMIC somatic mutations, and genome browsing is now navigable via our Cancer Browser.
Below is a summary of new data in v73, please see the v73 Datasheet for further description.
We have upgraded our 'Mutation Impact' filters to use scores generated by the a new version of FATHMM (FATHMM-MKL). See the v73 Datasheet for more information.
COSMIC v72 is our largest release ever, containing new annotations across 5466 cancer genomes and full literature curation across 22 new cancer genes, 28 fusion pairs; 26 genes have been added to the Cancer Gene Census. We provide our first integration of differential methylation data and many additional mutations, copy number aberrations and expression variants from recent ICGC & TCGA releases. All genomic events in COSMIC have been upgraded to GRCh38 (with a GRCh37 archive available). Finally, we present a new curated resource, to be regularly updated, describing the characterisation of 30 mutation signatures across human cancer.
COSMIC is adopting a new licensing strategy for v72, to grow the scope of our literature curations, enhance the analytics available across our data, and support the capacity to sustain this ever-growing database into the future. Key changes are -
All licensing payments are used to grow COSMIC, its coverage and analytic usefulness for oncology insight. We will also be inviting licensees to tell us which priorities we might best pursue, to ensure the direction of COSMIC best supports these industries' commercial oncology research. Please see our Licensing page for more details.
This v72 release is too large to describe here in detail. Here's a summary, please see the v72 Datasheet for further description.
Our curations are generated by expert postdoctoral scientists, described here.
We have updated the genomic coordinates in COSMIC to GRCh38. However, we are also hosting a parallel website to display the data on the GRCh37 reference. This GRCH37 site will be maintained and updated throughout 2015 with any new source data where the original coordinates are on GRCh37. However, it will not be updated with any new data where the original coordinates are on GRCh38.
Different mutational processes generate unique combinations of mutation types, termed "Mutational Signatures". Based on an analysis of 10,952 exomes and 1,048 whole-genomes across 40 distinct types of human cancer we have added a Mutation Signatures page on the website; a curated census of signatures providing the profiles of, and additional information about, known mutational signatures.
We have integrated methylation data across the COSMIC website. The Gene Analysis page has been extended to show a methylation track on the mutation histogram, differential methylation counts in the tissue tab and a new 'Methylation' tab has been added to display a table of variants. The Cancer Browser, Study and Sample Overview pages have also been updated to integrate methylation data. The majority of methylation annotations are outside gene footprints, COSMIC's Genome Browser is the best way to explore this information.
The COSMIC Genome Browser is valuable tool for exploring COSMIC data in its genomic context. This browser can be used to explore the data in COSMIC, COSMIC genomes (WGS) and the COSMIC Cell lines Project, on either the GRCh37 or GRCh38 reference sequence. It can also be used to view the data for an individual sample if selected. Please see the COSMIC Genome Browser homepage for more details.
COSMIC v71 includes full literature curation of PTPRB, PLCG1, POT1 and STAG2, the addition of 25 new census genes and an update of gene expression and copy number data from ICGC release 17 (Sept 2014).
The Cancer Gene Census has been updated with 25 new genes, this brings the total of known cancer genes substantiated by the scientific literature to 547. The new genes are :
We have added an additional 16 cell lines to the Cell Lines Project. The lines are:
We have included an initial integration of mouse insertional mutagenesis data for 851 COSMIC genes from the CCGD (Candidate Cancer Gene Database) adding supporting evidence for cancer driver genes. These data are integrated in the Gene Overview page, more details can be found here.
A mutation matrix plot has been added to the Study Overview page, enabling the relationship between genes, point mutations, copy number gains/losses, over/under gene expression and samples to be investigated for a specific study or publication.
For whole genome analysed samples the Sample Overview page now includes a Genome Browser (JBrowse), allowing all mutation types for a sample (including coding and non-coding mutations, and aberrant copy number and gene expression) to be viewed in genomic context with COSMIC and Ensembl gene annotations (GRCh37).
There is a new tutorial section in the help pages including 4 new tutorials demonstrating the Sample, Gene, Fusion and Cancer Browser pages.
We have added 7,148 new copy number variants from 8 new TCGA studies (source ICGC release 17, re-analysed with ASCAT).
We have nearly doubled the gene expression data in COSMIC by adding data from 10 new studies from TCGA (source ICGC release 17). The platforms supported are: IlluminaHiSeq_RNASeqV2, IlluminaGA_RNASeqV2, IlluminaHiSeq_RNASeq, and IlluminaGA_RNASeq. Please note that as from this release we no longer show results from the array platforms AgilentG4502A_07_2 and AgilentG4502A_07_3. For more information please visit the gene expression help page.
PTPRB, encoding a tyrosine phosphatase specific to the vascular endothelium that inhibits angiogenesis, has been identified as a tumour suppressor gene in angiosarcoma. Mutations were found in secondary tumours or those with MYC amplification, a biomarker of radiation-associated secondary angiosarcoma. PLCG1, encoding a tyrosine kinase signal transducer in the phosphoinositide pathway, also has recurrent, likely activating mutations in angiosarcoma. PLCG1 gain of function mutations have previously been identified in cutaneous T-cell lymphoma.
POT1 encodes a single-stranded telomere-binding component of the shelterin complex. It is the only shelterin that contains 2 N-terminal oligonucleotide/oligosaccharide-binding (OB) domains. Recurrent mutations in POT1 have been found in chronic lymphocytic leukaemia where they occur in the clinically aggressive subtype with wild-type IGHV@. The POT1 mutations are most often found in gene regions encoding the 2 OB folds.
Stromal antigen 2 (STAG2) is a subunit of cohesin complex and has a role in chromatid separation during cell division. Genetic disruption of this process can lead to aneuploidy in cancer. A number of tumour types have been found to harbour somatic mutations in STAG2, these include bladder cancer, myeloid neoplasms and glioblastoma. The gene maps to the X-chromosome (Xq25) and is present as a single copy in males; in females the other X-chromosome is inactivated. Hence, complete genetic inactivation of STAG2 requires only a single mutational event. STAG2 has also been suggested to act as a tumour suppressor via other mechanisms distinct from its role in cohesion.
We have added mutation data for 841 tumour samples from publications where genome wide analyses have been used. More details can be found here
As from this release we no longer support Internet Explorer version 8. This allows us to facilitate and develop tools for the latest browsers and provide a richer user experience. We apologise for the inconvenience caused to IE 8 users.
COSMIC v70 includes an initial integration of gene expression data from TCGA, full literature curation of CALR, CD79A and CD79B, 12 whole-genome sequencing publications, and extensive updates to point mutation and structural variant data from ICGC (release 16, May 2014) and TCGA.
Gene expression level 3 data has been integrated into COSMIC from 10 publicly accessible TCGA studies. The platform codes currently used to produce the COSMIC gene expression values are: IlluminaGA_RNASeqV2, IlluminaHiSeq_RNASeqV2, AgilentG4502A_07_2, AgilentG4502A_07_3 . COSMIC now includes gene expression alongside coding mutations and copy number aberrations on the cancer browser, sample overview, gene analysis and study/paper overview pages. We have also added a gene expression track to the histogram on the gene analysis page and the circos diagram on the sample overview page, more details can be found here.
A mutation matrix has been added to the cancer browser, enabling the relationship between genes, point mutations, copy number gains/losses, over/under gene expression and samples to be investigated for a specific cancer.
The mutation matrix chart shows 20 x 175 boxes, with each box representing a gene-sample combination. Genes are ranked by the number of samples with variations (depending on the selected data type) and the samples are sorted using a clustering algorithm to group them in relation to the ranked genes, more details can be found here.
To improve the value of COSMIC data we have tried to identify the most significant high-value data within cancer genomes using the following filtering strategies -
We have excluded data from any sample with over 15,000 mutations. In addition, we have flagged all known SNPs as defined by the 1000 genomes project, dbSNP and a panel of 378 normal (non-cancer) samples from Sanger CGP sequencing. Using this approach 812,136 mutations have been flagged. Although all data are included in our download files, we have excluded flagged mutations from the website.
Although no CNV data has been excluded from the website, we have applied filtering so that by default only the most significant variants are shown. For these CNVs the minor allele and total copy number values are known and gain/loss has been defined using stringent criteria [ see the Copy Number Variants section in the help pages ]. However, at the head of every table showing CNVs there is an option to switch off the filter and view all the data.
In order to make it easier to examine each sample, analysis filters have been introduced on the sample overview page. These filters allow you to specify that the mutations viewed should be likely pathogenic (as defined by FATHMM analysis), in the cancer census genes, or of a particular mutation type. In future releases, we will be developing further filters across these data to enhance their analysis.
We have started to upgrade our help pages and have introduced two new tutorials to help users navigate the COSMIC website. The first of these tutorials focus on the components of the website [ Site Tour ] and a guide to searching COSMIC [ Search ].
The recently identified oncogene calreticulin (CALR) is a multi-functional Ca+ binding protein chaperone localised in the endoplasmic reticulum. CALR somatic mutations are now the second most prevalent mutation seen in patients with myeloproliferative neoplasms; Mutations have found in the majority of JAK2/MPL mutation-negative essential thrombocythaemia (ET) and primary myelofibrosis (PMF) patients, in addition to a small number of myelodysplastic patients (RARS, RARS-T, CMML and aCML). Almost all the reported mutations are insertion, deletion or complex mutations generating a +1 bp frameshift and an extended novel CALR C-terminal domain. CALR mutations appear to be associated with a more benign clinical course, younger age and male sex.
The Ig-alpha and Ig-beta proteins encoded by CD79A and CD79B are necessary for expression and function of the B-cell antigen receptor. Recurrent activating mutations in CD79A and CD79B have been identified in diffuse large B cell lymphoma where they occur more frequently in the activated B-cell-like subtype. The ITAM (immunoreceptor tyrosine-based activation motif) domain is targeted, with a hot spot at Y196 in CD79B. Mutations in both genes have also been found in Waldenstrom???s macroglobulinaemia.
In this release 12 systematic screen publications have been curated in COSMIC, more details can be found here.
We have decided to drop support for Internet Explorer version 8 from November 2014. This allows us to facilitate and develop tools for latest browsers and provide rich user experience for our users. We apologise in advance for the inconvenience caused to IE 8 users.