I am an author, how should I describe mutation data in my publications?
We list here some guidelines for authors to help improve the speed of curation for an increasing volume of literature relevant to COSMIC and to ensure the continuing accuracy of our curation. By following these guidelines authors will contribute to the quick and efficient dissemination of their research results via COSMIC.
- • Data is curated in COSMIC on a per sample basis so mutation or clinical data can only be entered in detail if it has been provided by the author on this basis. The minimum requirement for a sample to be included is that it has full mutation details provided at either nucleotide or protein level (see example below).
- • If some samples in a paper have already been screened for some or all of the reported genes in an earlier publication we would exclude these from the curation of the new paper in order to avoid duplication. To do this it is helpful if the duplicate samples have been highlighted in some way by the author.
- • For papers reporting samples with more than one mutation, from one or more genes, we need to know which specific mutations occur together in any given sample.
- • It is much easier and quicker for us to map reported mutations to COSMIC reference sequences if the author has stated which reference sequence and version was used to describe their reported mutations (e.g. NM_006015.3 or ENST00000215919). For some genes this information is essential for mutation curation.
- • COSMIC mutation syntax is based on the Human Genome Variation Society recommendations so it is useful if authors also use this nomenclature.
- • Ideally mutations would be described both at the nucleotide and amino acid levels. This is not so important for well characterised mutations (e.g. BRAF c.1799T>A, p.V600E) but is important for novel mutations so that we can confirm the mutation position on our reference sequence.
- • For insertion mutations it is very helpful if they are described as e.g. c.1118_1119insA rather than c.1118insA, which can be ambiguous, and if the protein result e.g. p.N373fs*6 is also provided so the position can be confirmed.
- • For frameshift mutations it is helpful for curation if they can at least be identified as either insertions or deletions, even if no nucleotide details can be provided.
Author guidelines - suggested presentation of results
Sample (or Patient) ID* Patient age* Patient gender * Primary tissue Primary subsite Tumour source* Primary histology Subhistology Stage* CDS mutation AA mutation 1 45 F colon left primary adenoma villous c.34G>T p.G12C 2 67 F rectum primary adenoma tubular c.183A>T p.Q61H 3 51 M colon sigmoid metastasis (rectum) carcinoma IIA c.38_39GC>AT p.G13D
Reference sequence: KRAS NM_004985.3
*These additional clinical details can be added if data are available.
Additional columns could be added for further information e.g. smoking status, drug response, etc.
We hope you find these guidelines useful and if you have further questions please contact us at firstname.lastname@example.org.
The COSMIC literature curators