Variant Updates

The COSMIC database is undergoing an extensive update and reannotation. The first stage of this is improved HGVS syntax compliance. Future changes will include new, up-to-date transcripts and genes from Ensembl, parallel GRCh37 and GRCh38 assemblies, gene fusion rebuilds, and updated cross-reference links between COSMIC genes and other widely-used databases such as HGNC, RefSeq, Uniprot and CCDS.

There will be significant changes in the upcoming releases as we work through this process. This page will provide a detailed explanation of the changes that we are making in COSMIC, and how these updates will be reflected in the data and on the website.

HGVS syntax update (COSMIC release 88)

In this release (v88) we have updated the HGVS nomenclature for many of the manually curated mutations that were published without CDS/genomic information (c.? mutations). Details of how these updated syntaxes are reflected in the data are given below.

Use of X in place of N to indicate unknown amino acid

We are now using X to indicate an unknown amino acid instead of N as per HGVS standards. Many of the more recently curated mutations retain the NNN... notation. These will be updated to the XXX... notation in a future release.

Frameshift mutations

Most manually curated frameshift mutations with unknown CDS change (c.?) now include the first mutant amino acid in the syntax. For example:

c.? / p.C1396Lfs*5
c.? / p.V1833Afs*?
c.? / p.S1303Xfs*58
c.? / p.P463Xfs*?

Frameshift mutations with known genomic/CDS details have not yet been updated and therefore retain the original syntax, for example c.355_356insATGG / p.E121fs*5.

Unknown substitution and insertions

Most missense substitution mutations with no reported CDS change now have the syntax style p.H1904X.

Unknown mutations remain p.H1904?.

Most unknown insertion mutations now have the syntax p.F12_S13insXXX.

Whole gene deletions

Many manually curated whole gene deletions have been updated to the syntax c.1_*del / p.0. A handful remain in the old style c.1_3267del / p.0?.

fs*1 / nonsense mutations

Most manually curated mutations which had a fs*1 (for example p.S123fs*1) syntax have been updated to a substitution nonsense syntax (e.g. p.S123*) and AA mutation type, in keeping with HGVS recommendations. This also applies to a few with known CDS information. However, most with genomic information have not yet been updated and retain the old syntax style, e.g. c.368_369insT / p.N124fs*1.