The COSMIC database is undergoing an extensive update and reannotation. The first stage of this is improved HGVS syntax compliance. Future changes will include new, up-to-date transcripts and genes from Ensembl, parallel GRCh37 and GRCh38 assemblies, gene fusion rebuilds, and updated cross-reference links between COSMIC genes and other widely-used databases such as HGNC, RefSeq, Uniprot and CCDS.
There will be significant changes in the upcoming releases as we work through this process. This page will provide a detailed explanation of the changes that we are making in COSMIC, and how these updates will be reflected in the data and on the website.
HGVS syntax update (COSMIC release 88)
In this release (v88) we have updated the
HGVS nomenclature for many of the
manually curated mutations that were published without CDS/genomic
c.? mutations). Details of how these updated
syntaxes are reflected in the data are given below.
Use of X in place of N to indicate unknown amino acid
We are now using
X to indicate an unknown amino acid instead
N as per
HGVS standards. Many of the more recently curated mutations retain
NNN... notation. These will be updated to the
XXX... notation in a future release.
Most manually curated frameshift mutations with unknown CDS change
c.?) now include the first mutant amino acid in the syntax.
c.? / p.C1396Lfs*5 c.? / p.V1833Afs*? c.? / p.S1303Xfs*58 c.? / p.P463Xfs*?
Frameshift mutations with known genomic/CDS details have not yet been
updated and therefore retain the original syntax, for example
c.355_356insATGG / p.E121fs*5.
Unknown substitution and insertions
Most missense substitution mutations with no reported CDS change now have
the syntax style
Unknown mutations remain
Most unknown insertion mutations now have the syntax
Whole gene deletions
Many manually curated whole gene deletions have been updated to the syntax
c.1_*del / p.0. A handful remain in the old style
c.1_3267del / p.0?.
fs*1 / nonsense mutations
Most manually curated mutations which had a
p.S123fs*1) syntax have been updated to a
substitution nonsense syntax (e.g.
p.S123*) and AA mutation
type, in keeping with HGVS recommendations. This also applies to a few
with known CDS information. However, most with genomic information have not
yet been updated and retain the old syntax style, e.g.