Data Downloads (release v99, 28th November 2023)

We are supporting the legacy downloads for a year (i.e for v98 and v99), thereafter these downloads will be phased out and all the downloads will be available in the new format on this page : New Downloads

Commercial users: please access COSMIC data downloads from the Qiagen website.

This page allows you to download the various COSMIC data files. It also has descriptions of the data contained in each file.

You will need to login to download the files. As part of COSMIC's growth and development plan, we have implemented a licensing strategy. Everyone is required to register in order to download data. More information can be found on our licensing page.

Whole File Downloads

To download a complete file, simply click on the dark blue 'Download Whole File' button for the file that you require and your download will begin.

Filtered File Downloads

Some files can be filtered by any combination of gene, sample or cancer type:

  • click on the blue 'Download Filtered File' button to show the filter fields
  • fill in the filters that you require
  • as you type, look in the drop-down list for the gene, sample or cancer type that you need
  • the field will turn green if the filter matches something in the COSMIC database or red otherwise
  • click 'Download' to retrieve the filtered data

Scripted Downloads

You can download files programmatically. Click the purple 'Scripted download' button next to each file for information on how to retrieve that file via the command line or a script. All files for the current and past 3 versions of COSMIC are available for download. Check out our help pages for more information on downloading, and for an explanation of how to find a manifest for all available files.

Download a sample of COSMIC data

We have made the first 100 lines of each of the download files freely available so you can try out the data. More information can be found on our about page.

Actionability Data v10 (Jan 2024)


A tab separated table of Actionability information and a read me file. Please Note: This file is based on GRCh37 assembly.

Actionability.tar

Cancer Mutation Census Data


A tab separated table of Cancer Mutation Census information and a read me file. Please Note: This file is based on GRCh37 assembly.

CMC.tar

Classification Information


A comma separated table of COSMIC cancer classification information.

classification.csv

File Description

[column number:label] Heading

[1:A] Cosmic_Phenotype_id - Unique COSMIC identifier for the classification.

[2:B] Site_Primary - Primary tissue specified in the publication.

[3:C] Site_Subtype1 - Sub tissue specified in the publication.

[4:D] Site_Subtype2 - Sub tissue specified in the publication.

[5:E] Site_Subtype3 - Sub tissue specified in the publication.

[6:F] Histology - Primary histology specified in the publication.

[7:G] Hist_Subtype1 - Sub histology specified in the publication.

[8:H] Hist_Subtype2 - Sub histology specified in the publication.

[9:I] Hist_Subtype3 - Sub histology specified in the publication.

[10:J] Site_Primary_COSMIC - Primary tissue specified in COSMIC.

[11:K] Site_Subtype1_COSMIC - Sub tissue specified in COSMIC.

[12:L] Site_Subtype2_COSMIC - Sub tissue specified in COSMIC.

[13:M] Site_Subtype3_COSMIC - Sub tissue specified in COSMIC.

[14:N] Histology_COSMIC - Primary histology specified in COSMIC.

[15:O] Hist_Subtype1_COSMIC - Sub histology specified in COSMIC.

[16:P] Hist_Subtype2_COSMIC - Sub histology specified in COSMIC.

[17:Q] Hist_Subtype3_COSMIC - Sub histology specified in COSMIC.

[18:R] NCI code - NCI thesaurus code for tumour histological classification. For details see here

[19:S] EFO code - Experimental Factor Ontology (EFO), for details see here

COSMIC Complete Mutation Data (Targeted Screens)


A tab separated table of the complete curated COSMIC dataset (targeted screens) from the current release. It includes all coding point mutations, and the negative data set.

CosmicCompleteTargetedScreensMutantExport.tsv.gz

File Description

[column number:label] Heading

[1:A] Gene name - The gene name for which the data has been curated in COSMIC. In most cases this is the accepted HGNC symbol.

[2:B] Accession Number - The transcript identifier of the gene.

[3:C] Gene CDS length - Length of the gene (base pair) counts.

[4:D] HGNC id - Unique HGNC identifier, if the gene is in HGNC.

[5:E] Sample name,Sample id,Id tumour - A sample is an instance of a portion of a tumour being examined for mutations. The sample name can be derived from a number of sources. In many cases it originates from the cell line name. Other sources include names assigned by the annotators, or an incremented number assigned during an anonymisation process. A number of samples can be taken from a single tumour and a number of tumours can be obtained from one individual. A sample id is used to identify a sample within the COSMIC database. There can be multiple ids, if the same sample has been entered into the database multiple times from different papers.

[8:H] Primary Site - The primary tissue/cancer from which the sample originated. More details on the tissue classification are avaliable from here. In COSMIC we have standard classification system for tissue types and sub types because they vary a lot between different papers.

[9:I] Site Subtype 1 - Further sub classification (level 1) of the samples tissue of origin.

[10:J] Site Subtype 2 - Further sub classification (level 2) of the samples tissue of origin.

[11:K] Site Subtype 3 - Further sub classification (level 3) of the samples tissue of origin.

[12:L] Primary Histology - The histological classification of the sample.

[13:M] Histology Subtype 1 - Further histological classification (level 1) of the sample.

[14:N] Histology Subtype 2 - Further histological classification (level 2) of the sample.

[15:O] Histology Subtype 3 - Further histological classification (level 3) of the sample.

[16:P] Genome-wide screen - if the entire genome/exome is sequenced.

[17:Q] GENOMIC_MUTATION_ID - Genomic mutation identifier (COSV) to indicate the definitive position of the variant on the genome. This identifier is trackable and stable between different versions of the release.

[18:R] LEGACY_MUTATION_ID - Legacy mutation identifier (COSM) that will represent existing COSM mutation identifiers.

[19:S] MUTATION_ID - An internal mutation identifier to uniquely represent each mutation on a specific transcript on a given assembly build.

[20:T] Mutation CDS - The change that has occurred in the nucleotide sequence. Formatting is identical to the method used for the peptide sequence.

[21:U] Mutation AA - The change that has occurred in the peptide sequence. Formatting is based on the recommendations made by the Human Genome Variation Society. The description of each type can be found by following the link to Mutation Overview page.

[22:V] Mutation Description - Type of mutation at the amino acid level (substitution, deletion, insertion, complex, fusion, unknown etc.)

[23:W] Mutation zygosity - Information on whether the mutation was reported to be homozygous , heterozygous or unknown within the sample.

[24:X] LOH - LOH Information on whether the gene was reported to have loss of heterozygosity in the sample: yes, no or unknown.

[25:Y] GRCh - The coordinate system used -

    37 = GRCh37/Hg19
    38 = GRCh38/Hg38

[26:Z] Mutation genome position - The genomic coordinates of the mutation.

[27:AA] Mutation strand - Positive or negative.

[28:AB] Resistance Mutation - The mutation confers drug resistance (see CosmicResistanceMutations.tsv.gz for gene/drug details).

[29:AC] Mutation somatic status - Information on whether the sample was reported to be Confirmed Somatic, Previously Reported or Variant of unknown origin -

    variant of unknown origin = when the mutation is known to be somatic but the tumour was sequenced without a matched normal.
    Confirmed Somatic = if the mutation has been confirmed to be somatic in the experiment by sequencing both the tumour and a matched normal from the same patient.
    Previously observed = when the mutation has been reported as somatic previously but not in current paper.

[30:AD] Pubmed_PMID - The PUBMED ID for the paper that the sample was noted in, linking to pubmed to provide more details of the publication.

[31:AE] Id Study - Lists the unique Ids of studies that have involved this sample.

[32:AF] Sample Type,Tumour origin - Describes where the sample has originated from including the tumour type.

[34:AH] Age - Age of the individual (if this information is provided with the publications).

[35:AI] HGVSP - Human Genome Variation Society peptide syntax.

[36:AJ] HGVSC - Human Genome Variation Society coding dna sequence syntax (CDS).

[37:AK] HGVSG - Human Genome Variation Society genomic syntax (3' shifted).

COSMIC Mutation Data (Genome Screens)


A tab separated table of coding point mutations from genome wide screens (including whole exome sequencing).

CosmicGenomeScreensMutantExport.tsv.gz

File Description

[column number:label] Heading

[1:A] Gene name - The gene name for which the data has been curated in COSMIC. In most cases this is the accepted HGNC identifier.

[2:B] Accession Number - The transcript identifier of the gene.

[3:C] Gene CDS length - Length of the gene (base pair) counts.

[4:D] HGNC id - Unique HGNC identifier, if the gene is in HGNC.

[5:E] Sample name,Sample id,Id tumour - A sample is an instance of a portion of a tumour being examined for mutations. The sample name can be derived from a number of sources. In many cases it originates from the cell line name. Other sources include names assigned by the annotators, or an incremented number assigned during an anonymisation process. A number of samples can be taken from a single tumour and a number of tumours can be obtained from one individual. A sample id is used to identify a sample within the COSMIC database. There can be multiple ids, if the same sample has been entered into the database multiple times from different papers.

[8:H] Primary Site - The primary tissue/cancer from which the sample originated. More details on the tissue classification are avaliable from here. In COSMIC we have standard classification system for tissue types and sub types because they vary a lot between different papers.

[9:I] Site Subtype 1 - Further sub classification (level 1) of the samples tissue of origin.

[10:J] Site Subtype 2 - Further sub classification (level 2) of the samples tissue of origin.

[11:K] Site Subtype 3 - Further sub classification (level 3) of the samples tissue of origin.

[12:L] Primary Histology - The histological classification of the sample.

[13:M] Histology Subtype 1 - Further histological classification (level 1) of the sample.

[14:N] Histology Subtype 2 - Further histological classification (level 2) of the sample.

[15:O] Histology Subtype 3 - Further histological classification (level 3) of the sample.

[16:P] Genome-wide screen - if the entire genome/exome is sequenced.

[17:Q] MUTATION_ID - An internal mutation identifier to uniquely represent each mutation on a specific transcript on a given assembly build.

[18:R] GENOMIC_MUTATION_ID - Genomic mutation identifier (COSV) to indicate the definitive position of the variant on the genome. This identifier is trackable and stable between different versions of the release.

[19:S] LEGACY_MUTATION_ID - Legacy mutation identifier (COSM) that will represent existing COSM mutation identifiers.

[20:T] Mutation CDS - The change that has occurred in the nucleotide sequence. Formatting is identical to the method used for the peptide sequence.

[21:U] Mutation AA - The change that has occurred in the peptide sequence. Formatting is based on the recommendations made by the Human Genome Variation Society. The description of each type can be found by following the link to Mutation Overview page.

[22:V] Mutation Description - Type of mutation at the amino acid level (substitution, deletion, insertion, complex, fusion, unknown etc.)

[23:W] Mutation zygosity - Information on whether the mutation was reported to be homozygous , heterozygous or unknown within the sample.

[24:X] LOH - LOH Information on whether the gene was reported to have loss of heterozygosity in the sample: yes, no or unknown.

[25:Y] GRCh - The coordinate system used -

    37 = GRCh37/Hg19
    38 = GRCh38/Hg38

[26:Z] Mutation genome position - The genomic coordinates of the mutation.

[27:AA] Mutation strand - positive or negative.

[28:AB] Mutation somatic status - Information on whether the sample was reported to be Confirmed Somatic, Previously Reported or Variant of unknown origin -

    variant of unknown origin = when the mutation is known to be somatic but the tumour was sequenced without a matched normal.
    Confirmed Somatic = if the mutation has been confimed to be somatic in the experiment by sequencing both the tumour and a matched normal from the same patient.
    Previously observed = when the mutation has been reported as somatic previously but not in current paper.

[29:AC] Pubmed_PMID - The PUBMED ID for the paper that the sample was noted in, linking to pubmed to provide more details of the publication.

[30:AD] Id Study - Lists the unique Ids of studies that have involved this sample.

[31:AE] Sample Type,Tumour origin - Describes where the sample has originated from including the tumour type.

[33:AG] Age - Age of the individual (if this information is provided with the publications).

[34:AH] HGVSP - Human Genome Variation Society peptide syntax.

[35:AI] HGVSC - Human Genome Variation Society coding dna sequence syntax (CDS).

[36:AJ] HGVSG - Human Genome Variation Society genomic syntax (3' shifted).

COSMIC Mutation Data


A tab separated table of all COSMIC coding point mutations from targeted and genome wide screens from the current release.

CosmicMutantExport.tsv.gz

File Description

[column number:label] Heading

[1:A] Gene name - The gene name for which the data has been curated in COSMIC. In most cases this is the accepted HGNC identifier.

[2:B] Accession Number - The transcript identifier of the gene.

[3:C] Gene CDS length - Length of the gene (base pair) counts.

[4:D] HGNC id - if gene is in HGNC, this id helps linking it to HGNC.

[5:E] Sample name,Sample id,Id tumour - A sample is an instance of a portion of a tumour being examined for mutations. The sample name can be derived from a number of sources. In many cases it originates from the cell line name. Other sources include names assigned by the annotators, or an incremented number assigned during an anonymisation process. A number of samples can be taken from a single tumour and a number of tumours can be obtained from one individual. A sample id is used to identify a sample within the COSMIC database. There can be multiple ids, if the same sample has been entered into the database multiple times from different papers.

[8:H] Primary Site - The primary tissue/cancer from which the sample originated. More details on the tissue classification are avaliable from here. In COSMIC we have standard classification system for tissue types and sub types because they vary a lot between different papers.

[9:I] Site Subtype 1 - Further sub classification (level 1) of the samples tissue of origin.

[10:J] Site Subtype 2 - Further sub classification (level 2) of the samples tissue of origin.

[11:K] Site Subtype 3 - Further sub classification (level 3) of the samples tissue of origin.

[12:L] Primary Histology - The histological classification of the sample.

[13:M] Histology Subtype 1 - Further histological classification (level 1) of the sample.

[14:N] Histology Subtype 2 - Further histological classification (level 2) of the sample.

[15:O] Histology Subtype 3 - Further histological classification (level 3) of the sample.

[16:P] Genome-wide screen - if the entire genome/exome is sequenced.

[17:Q] GENOMIC_MUTATION_ID - Genomic mutation identifier (COSV) to indicate the definitive position of the variant on the genome. This identifier is trackable and stable between different versions of the release.

[18:R] LEGACY_MUTATION_ID - Legacy mutation identifier (COSM) that will represent existing COSM mutation identifiers.

[19:S] MUTATION_ID - An internal mutation identifier to uniquely represent each mutation on a specific transcript on a given assembly build.

[20:T] Mutation CDS - The change that has occurred in the nucleotide sequence. Formatting is identical to the method used for the peptide sequence.

[21:U] Mutation AA - The change that has occurred in the peptide sequence. Formatting is based on the recommendations made by the Human Genome Variation Society. The description of each type can be found by following the link to Mutation Overview page.

[22:V] Mutation Description - Type of mutation at the amino acid level (substitution, deletion, insertion, complex, fusion, unknown etc.)

[23:W] Mutation zygosity - Information on whether the mutation was reported to be homozygous , heterozygous or unknown within the sample.

[24:X] LOH - LOH Information on whether the gene was reported to have loss of heterozygosity in the sample: yes, no or unknown.

[25:Y] GRCh - The coordinate system used -

    38 = GRCh38/Hg38
    37 = GRCh37/Hg19

[26:Z] Mutation genome position - The genomic coordinates of the mutation.

[27:AA] Mutation strand - postive or negative.

[28:AB] Resistance Mutation - mutation confers drug resistance (see CosmicResistanceMutations.tsv.gz for gene/drug details).

[29:AC] Mutation somatic status - Information on whether the sample was reported to be Confirmed Somatic, Previously Reported or Variant of unknown origin -

    Previously observed = when the mutation has been reported as somatic previously but not in current paper.
    Confirmed Somatic = if the mutation has been confimed to be somatic in the experiment by sequencing both the tumour and a matched normal from the same patient.
    Variant of unknown origin = when the mutation is known to be somatic but the tumour was sequenced without a matched normal.

[30:AD] Pubmed_PMID - The PUBMED ID for the paper that the sample was noted in, linking to pubmed to provide more details of the publication.

[31:AE] Id Study - Lists the unique Ids of studies that have involved this sample.

[32:AF] Sample Type,Tumour origin - Describes where the sample has originated from including the tumour type.

[34:AH] Age - Age of the sample (if this information is provided with the publications).

[35:AI] HGVSP - Human Genome Variation Society peptide syntax.

[36:AJ] HGVSC - Human Genome Variation Society coding dna sequence syntax (CDS).

[37:AK] HGVSG - Human Genome Variation Society genomic syntax (3' shifted).

Structural Genomic Rearrangements


All structural variants from the current release in a tab separated table.

CosmicStructExport.tsv.gz

File Description

[column number:label] Heading

[1:A] Sample name,Sample id,Id tumour - A sample is an instance of a portion of a tumour being examined for mutations. The sample name can be derived from a number of sources. In many cases it originates from the cell line name. Other sources include names assigned by the annotators, or an incremented number assigned during an anonymisation process. A number of samples can be taken from a single tumour and a number of tumours can be obtained from one individual. A sample id is used to identify a sample within the COSMIC database. There can be multiple ids, if the same sample has been entered into the database multiple times from different papers.

[4:D] Primary Site - The primary tissue/cancer from which the sample originated. More details on the tissue classification are avaliable from here. In COSMIC we have standard classification system for tissue types and sub types because they vary a lot between different papers.

[5:E] Site Subtype 1 - Further sub classification (level 1) of the samples tissue of origin.

[6:F] Site Subtype 2 - Further sub classification (level 2) of the samples tissue of origin.

[7:G] Site Subtype 3 - Further sub classification (level 3) of the samples tissue of origin.

[8:H] Primary Histology - The histological classification of the sample.

[9:I] Histology Subtype 1 - Further histological classification (level 1) of the sample.

[10:J] Histology Subtype 2 - Further histological classification (level 2) of the sample.

[11:K] Histology Subtype 3 - Further histological classification (level 3) of the sample.

[12:L] Mutation Id - unique mutation identifier.

[13:M] Mutation Type - Type of mutation : Intra/Inter (chromosomal), tandem duplication, deletion, inversion, complex substitutions, complex amplicons.

[14:N] GRCh - The coordinate system used -

    38 = GRCh38/Hg38
    37 = GRCh37/Hg19

[15:O] Description - A syntax which describes the structural variant, based on HGVS recommendations.

[16:P] Pubmed_PMID - The PUBMED ID for the paper that the sample was noted in.

[17:Q] ID_STUDY - Lists the unique Ids of studies that have involved this structural mutation.


All breakpoint data from the current release in a tab separated table.

CosmicBreakpointsExport.tsv.gz

File Description

[column number:label] Heading

[1:A] Sample name,Sample id,Id tumour - A sample is an instance of a portion of a tumour being examined for mutations. The sample name can be derived from a number of sources. In many cases it originates from the cell line name. Other sources include names assigned by the annotators, or an incremented number assigned during an anonymisation process. A number of samples can be taken from a single tumour and a number of tumours can be obtained from one individual. A sample id is used to identify a sample within the COSMIC database. There can be multiple ids, if the same sample has been entered into the database multiple times from different papers.

[4:D] Primary Site - The primary tissue/cancer from which the sample originated. More details on the tissue classification are avaliable from here. In COSMIC we have standard classification system for tissue types and sub types because they vary a lot between different papers.

[5:E] Site Subtype 1 - Further sub classification (level 1) of the samples tissue of origin.

[6:F] Site Subtype 2 - Further sub classification (level 2) of the samples tissue of origin.

[7:G] Site Subtype 3 - Further sub classification (level 3) of the samples tissue of origin.

[8:H] Primary Histology - The histological classification of the sample.

[9:I] Histology Subtype 1 - Further histological classification (level 1) of the sample.

[10:J] Histology Subtype 2 - Further histological classification (level 2) of the sample.

[11:K] Histology Subtype 3 - Further histological classification (level 3) of the sample.

[12:L] Mutation Type - Type of mutation : Intra/Inter (chromosomal), tandem duplication, deletion, inversion, complex substitutions, complex amplicons.

[13:M] Mutation Id - unique mutation identifier.

[14:N] Breakpoint Order - For variants involving multiple breakpoints, the predicted order along chromosome(s).Otherwise '0'.

[15:O] GRCh - The coordinate system used -

    37 = GRCh37/Hg19
    38 = GRCh38/Hg38

[16:P] Chrom From - The chromosome where the first variant/breakpoint occurs.

[17:Q] Location From min - The first position in breakpoint range.

[18:R] Location From max - The last position in breakpoint range.

[19:S] Strand From - positive or negative.

[20:T] Chrom To - The chromosome where the last variant/breakpoint occurs.

[21:U] Location To min - The first position in breakpoint range.

[22:V] Location To max - The last position in breakpoint range.

[23:W] Strand To - positive or negative.

[24:X] Non-templated ins seq - Non Templated Sequence (if any) which is inserted at the breakpoint. The sequence is not encoded.

[25:Y] Pubmed_PMID - The PUBMED ID for the paper that the sample was noted in.

[26:Z] Id Study - Lists the unique Ids of studies that have involved this structural mutation.

Complete Fusion Export


All gene fusion mutation data from the current release in a tab separated table.

CosmicFusionExport.tsv.gz

File Description

[column number:label] Heading

[1:A] Sample id,Sample name, - A sample is an instance of a portion of a tumour being examined for mutations. The sample name can be derived from a number of sources. In many cases it originates from the cell line name. Other sources include names assigned by the annotators, or an incremented number assigned during an anonymisation process. A number of samples can be taken from a single tumour and a number of tumours can be obtained from one individual. A sample id is used to identify a sample within the COSMIC database. There can be multiple ids, if the same sample has been entered into the database multiple times from different papers.

[3:C] Primary Site - The primary tissue/cancer from which the sample originated. More details on the tissue classification are avaliable from here. In COSMIC we have standard classification system for tissue types and sub types because they vary a lot between different papers.

[4:D] Site Subtype 1 - Further sub classification (level 1) of the samples tissue of origin.

[5:E] Site Subtype 2 - Further sub classification (level 2) of the samples tissue of origin.

[6:F] Site Subtype 3 - Further sub classification (level 3) of the samples tissue of origin.

[7:G] Primary Histology - The histological classification of the sample.

[8:H] Histology Subtype 1 - Further histological classification (level 1) of the sample.

[9:I] Histology Subtype 2 - Further histological classification (level 2) of the sample.

[10:J] Histology Subtype 3 - Further histological classification (level 3) of the sample.

[11:K] Fusion Id - Unique fusion mutation identifier.

[12:L] Translocation Name - Syntax describing the portions of mRNA present (in HGVS 'r.' format) from each gene (allows representation of UTR sequences).

[13:M] 5'_CHROMOSOME - Chromosome of 5' gene.

[14:N] 5'_STRAND - The orientation of the 5' gene (+/-).

[15:O] 5'_GENE_ID - The transcript identifier of the 5' gene.

[16:P] 5'_GENE_NAME - Gene symbol for the 5' gene fusion partner for which the data has been curated in COSMIC. In most cases this is the accepted HGNC identifier.

[17:Q] 5'_LAST_OBSERVED_EXON - Last observed exon number of the 5' gene fusion partner.

[18:R] 5'_GENOME_START_FROM - The genomic coordinate of the start (+ strand)/breakpoint (- strand) of the 5' fusion gene as described in the Translocation Name.

[19:S] 5'_GENOME_START_TO - The range of genomic coordinates of the start (+ strand)/breakpoint (- strand) of the 5' fusion gene if it is an unknown base position.

[20:T] 5'_GENOME_STOP_FROM - The genomic coordinate of the breakpoint (+ strand)/start (- strand) of the 5' fusion gene as described in the Translocation Name.

[21:U] 5'_GENOME_STOP_TO - The range of genomic coordinates of the breakpoint (+ strand)/start (- strand) of the 5' fusion gene if it is an unknown base position.

[22:V] 3'_CHROMOSOME - Chromosome of 3' gene.

[23:W] 3'_STRAND - The orientation of the 3' gene (+/-).

[24:X] 3'_GENE_ID - The transcript identifier of the 3' gene.

[25:Y] 3'_GENE_NAME - Gene symbol for the 3' gene fusion partner for which the data has been curated in COSMIC. In most cases this is the accepted HGNC identifier.

[26:Z] 3'_FIRST_OBSERVED_EXON - First observed exon number of the 3' gene fusion partner.

[27:AA] 3'_GENOME_START_FROM - The genomic coordinate of the breakpoint (+ strand)/stop (- strand) of the 3' fusion gene as described in the Translocation Name.

[28:AB] 3'_GENOME_START_TO - The range of genomic coordinates of the breakpoint (+ strand)/stop (- strand) of the 3' fusion gene if it is an unknown base position.

[29:AC] 3'_GENOME_STOP_FROM - The genomic coordinate of the stop (+ strand)/breakpoint (- strand) of the 3' fusion gene as described in the Translocation Name.

[30:AD] 3'_GENOME_STOP_TO - The range of genomic coordinates of the stop (+ strand)/breakpoint (- strand) of the 3' fusion gene if it is an unknown base position.

[31:AE] Fusion type - Type of mutation.

[32:AF] Pubmed_PMID - The PUBMED ID for the paper that the sample was noted in.

All Mutations in Census Genes


All coding mutations in genes listed in the Cancer Gene Census ( http://cancer.sanger.ac.uk/census ) in a tab separated table.

CosmicMutantExportCensus.tsv.gz

File Description

[column number:label] Heading

[1:A] Gene name - The gene name for which the data has been curated in COSMIC. In most cases this is the accepted HGNC identifier.

[2:B] Accession Number - The transcript identifier of the gene.

[3:C] Gene CDS length - Length of the gene (base pair) counts.

[4:D] HGNC id - if gene is in HGNC, this id helps linking it to HGNC.

[5:E] Sample name,Sample id,Id tumour - A sample is an instance of a portion of a tumour being examined for mutations. The sample name can be derived from a number of sources. In many cases it originates from the cell line name. Other sources include names assigned by the annotators, or an incremented number assigned during an anonymisation process. A number of samples can be taken from a single tumour and a number of tumours can be obtained from one individual. A sample id is used to identify a sample within the COSMIC database. There can be multiple ids, if the same sample has been entered into the database multiple times from different papers.

[8:H] Primary Site - The primary tissue/cancer from which the sample originated. More details on the tissue classification are avaliable from here. In COSMIC we have standard classification system for tissue types and sub types because they vary a lot between different papers.

[9:I] Site Subtype 1 - Further sub classification (level 1) of the samples tissue of origin.

[10:J] Site Subtype 2 - Further sub classification (level 2) of the samples tissue of origin.

[11:K] Site Subtype 3 - Further sub classification (level 3) of the samples tissue of origin.

[12:L] Primary Histology - The histological classification of the sample.

[13:M] Histology Subtype 1 - Further histological classification (level 1) of the sample.

[14:N] Histology Subtype 2 - Further histological classification (level 2) of the sample.

[15:O] Histology Subtype 3 - Further histological classification (level 3) of the sample.

[16:P] Genome-wide screen - if the entire genome/exome is sequenced.

[17:Q] GENOMIC_MUTATION_ID - Genomic mutation identifier (COSV) to indicate the definitive position of the variant on the genome. This identifier is trackable and stable between different versions of the release.

[18:R] LEGACY_MUTATION_ID - Legacy mutation identifier (COSM) that will represent existing COSM mutation identifiers.

[19:S] MUTATION_ID - An internal mutation identifier to uniquely represent each mutation on a specific transcript on a given assembly build.

[20:T] Mutation CDS - The change that has occurred in the nucleotide sequence. Formatting is identical to the method used for the peptide sequence.

[21:U] Mutation AA - The change that has occurred in the peptide sequence. Formatting is based on the recommendations made by the Human Genome Variation Society. The description of each type can be found by following the link to Mutation Overview page.

[22:V] Mutation Description - Type of mutation (substitution, deletion, insertion, complex, fusion etc.)

[23:W] Mutation zygosity - Information on whether the mutation was reported to be homozygous , heterozygous or unknown within the sample.

[24:X] LOH - LOH Information on whether the gene was reported to have loss of heterozygosity in the sample: yes, no or unknown.

[25:Y] GRCh - The coordinate system used -

    37 = GRCh37/Hg19
    38 = GRCh38/Hg38

[26:Z] Mutation genome position - The genomic coordinates of the mutation.

[27:AA] Mutation strand - positive or negative.

[28:AB] Resistance Mutation - mutation confers drug resistance (see CosmicResistanceMutations.tsv.gz for gene/drug details).

[29:AC] Mutation somatic status - Information on whether the sample was reported to be Confirmed Somatic, Previously Reported or Variant of unknown origin -

    Previously observed = when the mutation has been reported as somatic previously but not in current paper.
    Variant of unknown origin = when the mutation is known to be somatic but the tumour was sequenced without a matched normal.
    Confirmed Somatic = if the mutation has been confimed to be somatic in the experiment by sequencing both the tumour and a matched normal from the same patient.

[30:AD] Pubmed_PMID - The PUBMED ID for the paper that the sample was noted in, linking to pubmed to provide more details of the publication.

[31:AE] Id Study - Lists the unique Ids of studies that have involved this sample.

[32:AF] Sample Type,Tumour origin - Describes where the sample has originated from including the tumour type.

[34:AH] Age - Age of the sample (if this information is provided with the publications).

[35:AI] Tier - 1 or 2 [see here for details or Tier 1 and 2]

[36:AJ] HGVSP - Human Genome Variation Society peptide syntax.

[37:AK] HGVSC - Human Genome Variation Society coding dna sequence syntax (CDS).

[38:AL] HGVSG - Human Genome Variation Society genomic syntax (3' shifted).

Non coding variants


A tab separated table of all non-coding mutations from the current release.

CosmicNCV.tsv.gz

File Description

[column number:label] Heading

[1:A] Sample name,Sample id,Tumour id - A sample is an instance of a portion of a tumour being examined for mutations. The sample name can be derived from a number of sources. In many cases it originates from the cell line name. Other sources include names assigned by the annotators, or an incremented number assigned during an anonymisation process. A number of samples can be taken from a single tumour and a number of tumours can be obtained from one individual. A sample id is used to identify a sample within the COSMIC database. There can be multiple ids, if the same sample has been entered into the database multiple times from different papers.

[4:D] Primary Site - The primary tissue/cancer from which the sample originated. More details on the tissue classification are avaliable from here. In COSMIC we have standard classification system for tissue types and sub types because they vary a lot between different papers.

[5:E] Site Subtype 1 - Further sub classification (level 1) of the samples tissue of origin.

[6:F] Site Subtype 2 - Further sub classification (level 2) of the samples tissue of origin.

[7:G] Site Subtype 3 - Further sub classification (level 3) of the samples tissue of origin.

[8:H] Primary Histology - The histological classification of the sample.

[9:I] Histology Subtype 1 - Further histological classification (level 1) of the sample.

[10:J] Histology Subtype 2 - Further histological classification (level 2) of the sample.

[11:K] Histology Subtype 3 - Further histological classification (level 3) of the sample.

[12:L] GENOMIC_MUTATION_ID - Genomic mutation identifier (COSV) to indicate the definitive position of the variant on the genome. This identifier is trackable and stable between different versions of the release.

[13:M] LEGACY_MUTATION_ID - Legacy mutation identifier (COSN) that will represent existing COSN mutation identifiers.

[14:N] Zygosity - Information on whether the mutation was reported to be homozygous , heterozygous or unknown within the sample.

[15:O] GRCh - The coordinate system used -

    38 = GRCh38/Hg38
    37 = GRCh37/Hg19

[16:P] Genome position - The genomic cooridnate of the mutation.

[17:Q] Mutation somatic status - Information on whether the sample was reported to be Confirmed Somatic, Previously Reported or Variant of unknown origin -

    Previously observed = when the mutation has been reported as somatic previously but not in current paper.
    variant of unknown origin = when the mutation is known to be somatic but the tumour was sequenced without a matched normal.
    Confirmed Somatic = if the mutation has been confimed to be somatic in the experiment by sequencing both the tumour and a matched normal from the same patient.

[18:R] WT SEQ - wild type sequence.

[19:S] MUT SEQ - Mutated sequence.

[20:T] Whole Genome Reseq - if the enitre genome is sequenced.

[21:U] Whole_Exome - if the enitre exome is sequenced.

[22:V] Id Study - Lists the unique Ids of studies that have involved this non coding mutation.

[23:W] Pubmed_PMID - The PUBMED ID for the paper that the sample was noted in.

[24:X] HGVSG - Human Genome Variation Society genomic syntax (3' shifted).

Copy Number Variants


All copy number abberations from the current release in a tab separated table. For more information on copy number data, please see http://cancer.sanger.ac.uk/cosmic/help/cnv/overview.

CosmicCompleteCNA.tsv.gz

File Description

[column number:label] Heading

[1:A] CNV_ID - The unique identifier for the variant (not stable, differs between releases).

[2:B] Id gene,Gene name - The ID and symbol of the gene which overlaps the copy number segment (or '-' where there is no overlapping gene).

[4:D] Sample id,Id tumour - A sample is an instance of a portion of a tumour being examined for mutations. The sample name can be derived from a number of sources. In many cases it originates from the cell line name. Other sources include names assigned by the annotators, or an incremented number assigned during an anonymisation process. A number of samples can be taken from a single tumour and a number of tumours can be obtained from one individual. A sample id is used to identify a sample within the COSMIC database. There can be multiple ids, if the same sample has been entered into the database multiple times from different papers. These samples are from the ICGC and TCGA.

[6:F] Primary Site - The primary tissue/cancer from which the sample originated. More details on the tissue classification are avaliable from here. In COSMIC we have standard classification system for tissue types and sub types because they vary a lot between different papers.

[7:G] Site Subtype 1 - Further sub classification (level 1) of the samples tissue of origin.

[8:H] Site Subtype 2 - Further sub classification (level 2) of the samples tissue of origin.

[9:I] Site Subtype 3 - Further sub classification (level 3) of the samples tissue of origin.

[10:J] Primary Histology - The histological classification of the sample.

[11:K] Histology Subtype 1 - Further histological classification (level 1) of the sample.

[12:L] Histology Subtype 2 - Further histological classification (level 2) of the sample.

[13:M] Histology Subtype 3 - Further histological classification (level 3) of the sample.

[14:N] Sample Name - The name of the sample.

[15:O] Total_CN - The sum of the major and minor allele counts eg if ABB, total copy number = 3.

[16:P] Minor Allele - The number of copies of the least frequent allele eg if ABB, minor allele = A ( 1 copy) and major allele = B ( 2 copies).

[17:Q] Mut Type - Defined as Gain or Loss. For ICGC samples; as defined in the original data. For TCGA samples reanalysed with ASCAT -

    LOSS = average genome ploidy <= 2.7 AND total copy number = 0 OR average genome ploidy > 2.7 AND total copy number < ( average genome ploidy - 2.7 )
    GAIN = average genome ploidy <= 2.7 AND total copy number >= 5 OR average genome ploidy > 2.7 AND total copy number >= 9

[18:R] Id Study - Lists the unique Ids of studies that have involved this copy number variation.

[19:S] GRCh - The coordinate system used -

    37 = GRCh37/Hg19
    38 = GRCh38/Hg38

[20:T] Chromosome:G_Start..G_Stop - The genomic coordinates of the variation.

Gene Expression


All gene expression level 3 data from the TCGA portal for the current most release in a tab separated table. Please note : The platform codes currently used to produce the COSMIC gene expression values are: IlluminaGA_RNASeqV2, IlluminaHiSeq_RNASeqV2, AgilentG4502A_07_2, AgilentG4502A_07_3. For more information on the gene expression data, please see http://cancer.sanger.ac.uk/cosmic/analyses.

CosmicCompleteGeneExpression.tsv.gz

File Description

[column number:label] Heading

[1:A] Sample id,Sample name - A sample is an instance of a portion of a tumour being examined for mutations. The sample name can be derived from a number of sources. In many cases it originates from the cell line name. Other sources include names assigned by the annotators, or an incremented number assigned during an anonymisation process. A number of samples can be taken from a single tumour and a number of tumours can be obtained from one individual. A sample id is used to identify a sample within the COSMIC database. There can be multiple ids, if the same sample has been entered into the database multiple times from different papers. These samples are from the ICGC and TCGA.

[3:C] Gene name - The gene name for which the data has been curated in COSMIC. In most cases this is the accepted HGNC identifier.

[4:D] Regulation - it could be over or under depending on the scores from different platforms if they are above or below the threshold.

[5:E] Z-score - z_score serves as an indicative score taken from the gene_expression from different platforms in order of preference: IlluminaHiSeq_RNASeqV2, IlluminaGA_RNASeqV2, AgilentG4502A_07_3.

[6:F] Id Study - Lists the unique Ids of studies that have involved this gene expression data.

Methylation


TCGA Level 3 methylation data from the ICGC portal for the current release in a tab separated table. More information on the methylation data is available from http://cancer.sanger.ac.uk/cosmic/analyses.

CosmicCompleteDifferentialMethylation.tsv.gz

File Description

[column number:label] Heading

[1:A] Study_ID - The study Id for these data.

[2:B] Id Sample,Sample name,Id tumour - A sample is an instance of a portion of a tumour being examined for mutations. The sample name can be derived from a number of sources. In many cases it originates from the cell line name. Other sources include names assigned by the annotators, or an incremented number assigned during an anonymisation process. A number of samples can be taken from a single tumour and a number of tumours can be obtained from one individual. A sample id is used to identify a sample within the COSMIC database. There can be multiple ids, if the same sample has been entered into the database multiple times from different papers. These samples are from the TCGA.

[5:E] Primary Site - The primary tissue/cancer from which the sample originated. More details on the tissue classification are avaliable from here. In COSMIC we have standard classification system for tissue types and sub types because they vary a lot between different papers.

[6:F] Site Subtype 1 - Further sub classification (level 1) of the samples tissue of origin.

[7:G] Site Subtype 2 - Further sub classification (level 2) of the samples tissue of origin.

[8:H] Site Subtype 3 - Further sub classification (level 3) of the samples tissue of origin.

[9:I] Primary Histology - The histological classification of the sample.

[10:J] Histology Subtype 1 - Further histological classification (level 1) of the sample.

[11:K] Histology Subtype 2 - Further histological classification (level 2) of the sample.

[12:L] Histology Subtype 3 - Further histological classification (level 3) of the sample.

[13:M] Fragment Id - The unique probe Id for a specific CpG.

[14:N] Genome Version - The coordinate system used -

    37 = GRCh37/Hg19
    38 = GRCh38/Hg38

[15:O] Chromosome - The chromosome location of the probe (1-22, X or Y).

[16:P] Position - The genome location of the CpG targeted by the probe (1-based coordinates).

[17:Q] Strand - Positive or negative.

[18:R] Gene Name - The gene name (if the probe falls within the coding region of a COSMIC gene) or the probe annotation as descibed by Illumina.

[19:S] Methylation - The methylation level; H (High, beta-value >0.8) or L (Low, beta-value < 0.2).

[20:T] Avg Beta Value Normal - The average beta-value across the normal population. The beta-value of the tumour must differ from this value by >0.5 to be considered a variant.

[21:U] Beta Value - The beta-value for the probe in the tumour sample. Only values >0.8 (High) or <0.2 (Low) are included.

[22:V] Two Sided P-Value - The two sided p-value.

Cancer Gene Census


A list of all cancer census genes from the current release in a comma separated table. The census table is exported from http://cancer.sanger.ac.uk/census and the format is the same.

cancer_gene_census.csv

COSMIC Sample Features


All the features related to a sample from the current release in a tab separated file.

CosmicSample.tsv.gz

File Description

[column number:label] Heading

[1:A] Sample id,Sample name,Id tumour,Id Individual - A sample is an instance of a portion of a tumour being examined for mutations. The sample name can be derived from a number of sources. In many cases it originates from the cell line name. Other sources include names assigned by the annotators, or an incremented number assigned during an anonymisation process. A number of samples can be taken from a single tumour and a number of tumours can be obtained from one individual. A sample id is used to identify a sample within the COSMIC database. There can be multiple ids, if the same sample has been entered into the database multiple times from different papers.

[5:E] Primary Site - The primary tissue/cancer from which the sample originated. More details on the tissue classification are avaliable from here. In COSMIC we have standard classification system for tissue types and sub types because they vary a lot between different papers.

[6:F] Site Subtype 1 - Further sub classification (level 1) of the samples tissue of origin.

[7:G] Site Subtype 2 - Further sub classification (level 2) of the samples tissue of origin.

[8:H] Site Subtype 3 - Further sub classification (level 3) of the samples tissue of origin.

[9:I] Primary Histology - The histological classification of the sample.

[10:J] Histology Subtype 1 - Further histological classification (level 1) of the sample.

[11:K] Histology Subtype 2 - Further histological classification (level 2) of the sample.

[12:L] Histology Subtype 3 - Further histological classification (level 3) of the sample.

[13:M] Therapy Relationship - Relates the time-point of tissue sampling to the drug therapy used to treat the tumour.

[14:N] Sample Differentiator - Gives additional information if more than one sample (e.g. carcinomatous and sarcomatous components) from a tumour has been screened for mutations or if samples from a tumour were taken at different time points.

[15:O] Mutation Allele Specification - Where a publication has information on more than one mutation for one gene in a sample and reports whether or not the mutations occurred on the same or different chromosomes.

[16:P] Msi - If microsatellite instability data is given in the publication per sample then High, Low, Stable/Low, MSI or Stable is reported in COSMIC. Unknown is the default.

[17:Q] Average Ploidy - The average ploidy of the sample, calculated from copy number data (where available).

[18:R] Whole Genome Screen - 'y' if the sample was whole genome screened.

[19:S] Whole Exome Screen - 'y' if the sample was whole exome sequenced.

[20:T] Sample Remark - Any additional sample information e.g. % mutant allele burden.

[21:U] Drug Response - Clinical and in vitro responses to drugs (particularly targeted drugs). Phrasing based on RECIST guidelines. Note that in COSMIC, SD (stable disease) and PD (progressive disease) = clinical primary non response.

[22:V] Grade - Grade of tumour. The phrase 'Some Grade data are given in publication' is used when publication reports grade data or when data hasn't been given per sample. More detailed data follow commonly used grading systems in tumours.

[23:W] Age at tumour recurrence - Where both primary and recurrent tumour samples from an individual have been screened for mutations and the age (in years) of the patient at the time of the recurrence is different to that at diagnosis.

[24:X] Stage - Stage of tumour. The phrase 'Some Stage data are given in publication' is used when publication reports stage data or when data hasn't been given per sample. More detailed data follow commonly used staging systems in tumours.

[25:Y] Cytogenetics - Karyotype of the tumour.

[26:Z] Metastatic Site - Tissue site of any metastases identified in an individual.

[27:AA] Tumour Source - Source of tumour tissue sample e.g. primary, metastasis.

[28:AB] Tumour Remark - Any additional tumour information e.g. metachronous tumour.

[29:AC] Age - Age (in years) of individual at diagnosis.

[30:AD] Ethnicity - Ethnicity (e.g. Caucasian) of individual.

[31:AE] Environmental Variables - Environmental variables to which an individual has been exposed (e.g. viral exposure, smoking status).

[32:AF] Germline Mutation - Gene name/mutation if a germline mutation as well as a somatic mutation has been detected in the same gene in the same tumour sample.

[33:AG] Therapy - Any significant treatment an individual has received prior to mutation screening.

[34:AH] Family - Any familial cancer history for an individual or familial relationships of individuals screened for mutations in the same publication.

[35:AI] Normal tissue tested - If normal tissue from the same individual has been screened for mutations.

[36:AJ] Gender - Sex of individual.

[37:AK] Individual Remark - Any additional individual information (e.g. age group, hereditary syndromes).

[38:AL] NCI code - NCI thesaurus code for tumour histological classification.

[39:AM] SAMPLE_TYPE - Describes where the sample originated from.

[40:AN] COSMIC_PHENOTYPE_ID - This is an ID to uniquely identify a sample based on primary tissue and primary histology.

COSMIC HGNC


A tab separated table showing the relationship between the Cancer Gene Census, COSMIC ID, Gene Name, HGNC ID and Entrez ID.

CosmicHGNC.tsv.gz

File Description

[column number:label] Heading

[1:A] COSMIC_ID - COSMIC Gene ID (COSG*).

[2:B] COSMIC_GENE_NAME - Gene name used in COSMIC.

[3:C] Entrez_id - Entrez ID mapping.

[4:D] HGNC_ID - HGNC mapping.

[5:E] Mutated? - Does the gene have coding mutations y/n.

[6:F] Cancer_census? - Is the gene in the Cancer gene census y/n.

[7:G] Expert Curated? - Has the gene been manually curated by the team of expert curators y/n.

COSMIC Resistance Mutations


A tab separated table listing the details of all mutations in COSMIC which are known to confer drug resistance.

CosmicResistanceMutations.tsv.gz

File Description

[column number:label] Heading

[1:A] Sample name,Sample id - A sample is an instance of a portion of a tumour being examined for mutations. The sample name can be derived from a number of sources. In many cases it originates from the cell line name. Other sources include names assigned by the annotators, or an incremented number assigned during an anonymisation process. A number of samples can be taken from a single tumour and a number of tumours can be obtained from one individual. A sample id is used to identify a sample within the COSMIC database. There can be multiple ids, if the same sample has been entered into the database multiple times from different papers.

[3:C] Gene Name - The gene name for which the data has been curated in COSMIC. In most cases this is the accepted HGNC identifier.

[4:D] Transcript - The transcript identifier (accession number) of the gene.

[5:E] Census Gene - Is the gene in the Cancer Gene Census (Yes, or No).

[6:F] Drug Name - The name of the drug which the mutation confers resistance to.

[7:G] MUTATION_ID - An internal mutation identifier to uniquely represent each mutation on a specific transcript on a given assembly build.

[8:H] GENOMIC_MUTATION_ID - Genomic mutation identifier (COSV) to indicate the definitive position of the variant on the genome. This identifier is trackable and stable between different versions of the release.

[9:I] LEGACY_MUTATION_ID - Legacy mutation identifier (COSM) that will represent existing COSM mutation identifiers.

[10:J] AA Mutation - The change that has occurred in the peptide sequence. Formatting is based on the recommendations made by the Human Genome Variation Society.

[11:K] CDS Mutation - The change that has occurred in the nucleotide sequence. Formatting is identical to the method used for the peptide sequence.

[12:L] Primary Tissue - The primary tissue/cancer from which the sample originated. More details on the tissue classification are avaliable from here. In COSMIC we have standard classification system for tissue types and sub types because they vary a lot between different papers.

[13:M] Tissue Subtype 1 - Further sub classification (level 1) of the samples tissue of origin.

[14:N] Tissue Subtype 2 - Further sub classification (level 2) of the samples tissue of origin.

[15:O] Histology - The histological classification of the sample.

[16:P] Histology Subtype 1 - Further histological classification (level 1) of the sample.

[17:Q] Histology Subtype 2 - Further histological classification (level 2) of the sample.

[18:R] Pubmed ID - The PUBMED ID for the paper that the sample was noted in, linking to pubmed to provide more details of the publication.

[19:S] CGP Study - Lists the unique Ids of studies that have involved this sample.

[20:T] Somatic Status - Information on whether the sample was reported to be Confirmed Somatic, Previously Reported or Variant of unknown origin -

    Confirmed Somatic = if the mutation has been confimed to be somatic in the experiment by sequencing both the tumour and a matched normal from the same patient.
    Variant of unknown origin = when the mutation is known to be somatic but the tumour was sequenced without a matched normal.
    Previously observed = when the mutation has been reported as somatic previously but not in current paper.

[21:U] Sample Type - Describes where the sample has originated from including the tumour type.

[22:V] Zygosity - Information on whether the mutation was reported to be homozygous , heterozygous or unknown within the sample.

[23:W] Genome Coordinates (GRCh37/38) - The genome location of the mutation (chr:start..end), on the specified genome version.

[24:X] Tier - 1 or 2 [see here for details or Tier 1 and 2]

[25:Y] HGVSP - Human Genome Variation Society peptide syntax.

[26:Z] HGVSC - Human Genome Variation Society coding dna sequence syntax (CDS).

[27:AA] HGVSG - Human Genome Variation Society genomic syntax (3' shifted).

COSMIC Mutation Tracking


A tab separated table listing the mapping all of COSMIC's legacy mutations(COSMs) to the new genomic identifiers(COSVs). This file also helps to identify the transcripts and the accession numbers on which the current mutation is annotated on, along with the mutation type.

CosmicMutationTracking.tsv.gz

File Description

[column number:label] Heading

[1:A] Is_canonical - To indentify the transcript, if it is a canonical transcript the column value would be a yes otherwise a no.

[2:B] Mutation_type - Type of mutation (coding, non-coding etc.)

[3:C] GRCH - Genome version of the mutation.

[4:D] MUTATION_ID - An internal mutation identifier to uniquely represent each mutation on a specific transcript on a given assembly build.

[5:E] LEGACY_MUTATION_ID - Legacy mutation identifier (COSM) that will represent existing COSM mutation identifiers.

[6:F] GENOMIC_MUTATION_ID - Genomic mutation identifier (COSV) to indicate the definitive position of the variant on the genome. This identifier is trackable and stable between different versions of the release.

[7:G] Accession Number - The transcript identifier of the gene.

[8:H] Gene Name - The gene name for which the data has been curated in COSMIC. In most cases this is the accepted HGNC identifier.

Census Hallmarks


A tab separated table listing the hallmarks of cancer for a subset of cancer census genes.

Cancer_Gene_Census_Hallmarks_Of_Cancer.tsv.gz

File Description

[column number:label] Heading

[1:A] GENE_NAME - The gene name for which the data have been curated in CGC. In most cases this is the accepted HGNC gene symbol.

[2:B] CELL_TYPE - Tissue or cancer for which the Hallmark is described.

[3:C] PUBMED_PMID - The PUBMED ID for the paper that the Hallmark was noted in

[4:D] HALLMARK - Name of the biological process that when dysregulated, may promote cancer or other data category describing the role of a gene in cancer.

[5:E] IMPACT - Describes how the gene activity impacts the hallmarks of cancer i.e. promotes/suppresses or characterises the role of a gene in carcinogenesis i.e. Oncogene/Tumour suppressor Gene/Fusion

[6:F] DESCRIPTION - A brief functional summary of how gene's activity impacts a hallmark of cancer.

[7:G] CELL_LINE - For evidence obtained from experiments on cell lines, the name of the cell lines are provided here.

ASCAT Ploidy and Purity Estimates


A tab separated table listing the ploidy and aberrant cell fraction (purity estimate), for TCGA samples re-analysed using ASCAT.

ascat_acf_ploidy.tsv

File Description

[column number:label] Heading

[1:A] Cancer_Type_Code - The disease code (decode available from here.

[2:B] Sample - The name of the sample.

[3:C] Aberrant_Cell_Fraction(Purity) - The aberrant cell fraction (purity estimate).

[4:D] Ploidy - The ploidy of the genome.

VCF Files (coding and non-coding mutations)


VCF file of all coding mutations in the current release.

VCF/CosmicCodingMuts.vcf.gz


VCF file of all coding mutations( normalised ) in the current release. The file has the variants 5' shifted as per the VCF standard, and the info part contains the 3' shifted syntaxes for cds and genome, along with the unshifted variants in the OLD_VARIANT field.

VCF/CosmicCodingMuts.normal.vcf.gz


VCF file of all non coding mutations in the current release.

VCF/CosmicNonCodingVariants.vcf.gz


VCF file of all non-coding variants( normalised ) in the current release. The file has the variants 5' shifted as per the VCF standard, and the info part contains the 3' shifted syntaxes for cds and genome, along with the unshifted variants in the OLD_VARIANT field.

VCF/CosmicNonCodingVariants.normal.vcf.gz

Fasta File (genes)


COSMIC Transcripts


A tab separated table listing the gene name and transcript accession for each gene ID.

CosmicTranscripts.tsv.gz

File Description

[column number:label] Heading

[1:A] Gene ID - The unique ID of the gene.

[2:B] Gene_NAME - The name of the gene.

[3:C] Transcript ID - The accession of the transcript.

[4:D] Strand - Positive or negative.

NCV CDS Syntax Mapping


A tab separated table with ID mapping and CDS syntax information for significant variants in non-coding regions.

NCV_CDS_syntax_mapping.tsv

File Description

[column number:label] Heading

[1:A] GENE - The gene name for which the data has been curated in COSMIC. In most cases this is the accepted HGNC symbol.

[2:B] GENOMIC_MUTATION_ID - Genomic mutation identifier (COSV) to indicate the definitive position of the variant on the genome. This identifier is trackable and stable between different versions of the release.

[3:C] LEGACY_MUTATION_ID - Legacy mutation identifier (COSM or in some cases COSN) that will represent existing mutation identifiers for transcript level annotations.

[4:D] ALT_ID - Alternative mutation identifier describing the variant (in some cases used previously but now obsolete).

[5:E] CHR - The chromosome identifier for the genome position.

[6:F] POS_GRCh37 - Genome Position GRCh37/hg19.

[7:G] POS_GRCh38 - Genome Position GRCh38/hg38.

[8:H] WT_ALLELE - The wild type allele (positive strand).

[9:I] MUT_ALLELE - The mutated allele (positive strand).

[10:J] TRANSCRIPT_ID - The accession/version of the transcript used as reference for the curated CDS syntax

[11:K] CURATED_CDS_SYNTAX - The curated CDS (c.) syntax of the mutation