Expert Curation of Genes

Expert manual curation allows the capture of very high detail across mutation positions, disease descriptions, and other patient and population data. Manual curation additionally provides improved quality control over systematic approaches. Experienced curators can identify inconsistencies or errors in publications, allowing the rejection of untrustworthy, incomplete or unspecific data sources.

Key points about expert curated data

  1. Manually interpreted from peer reviewed publications by COSMIC's team of postdoctoral level scientist curators.
  2. Consists of comprehensive literature curation of selected Census genes at release, followed by subsequent updates. (Cancer Gene Census)
  3. Includes additional data points relevant to each disease and publication.
  4. Provides accurate frequency data as mutation negative samples are specified.
  5. Also called non-systematic or targeted screen data.

Gene selection

We have assembled a list of genes that are somatically mutated and causally implicated in human cancer ( Futreal et al, 2004 ). We call this list the The Cancer Gene Census and it is updated periodically with new genes. From this list we are selecting genes for COSMIC expert curation with an emphasis on genes for which there are no existing databases. A list of expert curated genes (also called COSMIC classic genes) can be found at the bottom of this page. The list of expert curated genes grows at each release and newly released genes can be found in the release notice alert.

Selecting papers from the literature

To identify papers reporting somatic mutations PubMed is broadly searched for papers containing relevant mutation data (example search: (ras OR genes, ras) AND human AND mutation). Those identified from their abstracts to include somatic mutation information relating to cancer or pre-cancerous conditions are then selected for curation. After examination of the information in the full text of the paper, the sample and mutation data are extracted. Any papers containing incomplete data (e.g. mutations that are reported but not fully described) or data of insufficient quality (e.g. errors identified in the data) are not fully curated but are added to a list of "additional references containing somatic mutation information".

What kind of data is curated?

  1. Up to 45 different data points are curated per sample at 4 levels: Individual, Tumour/Tissue, Sample and Mutation.
  2. Individual features include: age, sex, ethnicity, environmental variables (e.g. current smoker, human papillomavirus-negative), family information, prior therapy and disease history.
  3. Tumour/Tissue features include: tumour source (e.g. primary, metastasis), metastatic site, stage, grade, drug response and cytogenetic data.
  4. Sample features include: sample source (surgery-fixed, autopsy-ns, cell line), therapy relationship (e.g. sample analysed after 12 months of dasatinib therapy), sample differentiator (e.g. sample from sarcomatous component), mutation allele specification (where >1 mutation in same gene in same sample) and MSS/MSI.
  5. Mutation features include: LOH, mutation detail, zygosity, somatic status and if normal tissue was tested.