Data Submission

I am an author, how should I describe mutation data in my publications?

Author Guidelines

We list here some guidelines for authors to help improve the speed of curation for an increasing volume of literature relevant to COSMIC and to ensure the continuing accuracy of our curation. By following these guidelines authors will contribute to the quick and efficient dissemination of their research results via COSMIC.


  • • Data is curated in COSMIC on a per sample basis so mutation or clinical data can only be entered in detail if it has been provided by the author on this basis. The minimum requirement for a sample to be included is that it has full mutation details provided at either nucleotide or protein level (see example below).
  • • If some samples in a paper have already been screened for some or all of the reported genes in an earlier publication we would exclude these from the curation of the new paper in order to avoid duplication. To do this it is helpful if the duplicate samples have been highlighted in some way by the author.
  • • For papers reporting samples with more than one mutation, from one or more genes, we need to know which specific mutations occur together in any given sample.

Reference sequences

  • • It is much easier and quicker for us to map reported mutations to COSMIC reference sequences if the author has stated which reference sequence and version was used to describe their reported mutations (e.g. NM_006015.3 or ENST00000215919). For some genes this information is essential for mutation curation.


  • • COSMIC mutation syntax is based on the Human Genome Variation Society recommendations so it is useful if authors also use this nomenclature.
  • • Ideally mutations would be described both at the nucleotide and amino acid levels. This is not so important for well characterised mutations (e.g. BRAF c.1799T>A, p.V600E) but is important for novel mutations so that we can confirm the mutation position on our reference sequence.
  • • For insertion mutations it is very helpful if they are described as e.g. c.1118_1119insA rather than c.1118insA, which can be ambiguous, and if the protein result e.g. p.N373fs*6 is also provided so the position can be confirmed.
  • • For frameshift mutations it is helpful for curation if they can at least be identified as either insertions or deletions, even if no nucleotide details can be provided.

Author guidelines - suggested presentation of results

Sample (or Patient) ID* Patient age*  Patient gender *  Primary tissue  Primary subsite Tumour source*      Primary histology Subhistology  Stage*  CDS mutation  AA mutation
1                       45            F                 colon           left            primary             adenoma           villous               c.34G>T       p.G12C
2                       67            F                 rectum                          primary             adenoma           tubular               c.183A>T      p.Q61H
3                       51            M                 colon           sigmoid         metastasis (rectum) carcinoma                       IIA     c.38_39GC>AT  p.G13D

Reference sequence: KRAS NM_004985.3

*These additional clinical details can be added if data are available.

Additional columns could be added for further information e.g. smoking status, drug response, etc.

We hope you find these guidelines useful and if you have further questions please contact us at


The COSMIC literature curators

Can I submit data to COSMIC before publication?

Yes, although your data will not be visible on the COSMIC website until after publication. Some journals now require data to be submitted to COSMIC before publication. If you have data to submit to COSMIC please email and we will contact you about data formats and the submission process.