Gene Analysis - Gene View

The gene view histogram is a graphical view of mutations across the gene. These mutations are displayed at the amino acid level across the full length of the gene by default. This default peptide view shows a histogram of single base substitutions, colour coded by residue according to the colour scheme used in Ensembl. Under this is shown the Amino acid sequence and the Pfam protein structures, followed by complex mutations and insertions (red triangles) and deletions (inverted blue triangles). The graphical view can be switched to cDNA coordinates by selecting from the "Sequence Type" options from the "Filters" panel on the right hand side of the page.

Zooming

  • There are 3 methods for zooming in (listed in order of speed and useability)
    • -- Click and drag across a region of interest on the histogram graphic, a yellow box will be drawn highlighting the region to be zoomed.
    • -- Move the sliders along the top/side of the histogram until they show the desired region.
    • -- Manually enter the coordinates in the start and end boxes on the filters panel.
  • To zoom out, click on the round blue minus icon at the top left of the histogram.

Exploring mutation data

  1. The horizontal bar along the the top of the graphic is the scale bar showing the nucleotide or amino acid position. The horizontal bar across the centre of the graphic shows the nucleotode or amino acid sequence (when zoomed in).
  2. Below the sequence bar is the PFAM domain track. Mouseover a domain for more detailed information.
  3. Mutations are grouped according to type, with substitutions at the top followed by complex and indels below the sequence bar. Each mutation type is shown in a histogram along the sequence, with the height determined by the mutation frequency. Substitutions (including complex mutations) are shown as rectangles, insertions as red triangles, and deletions as blue inverted triangles.
  4. Substitutions and complex mutations are colour coded by amino acid
  5. Mouseover a mutation to view the syntax and count of mutated samples.
  6. Click a mutation to link to the Mutation Overview page where more information is available.

Applying Filters

  1. To expand the filter options click the plus icon for any of the filters listed in the panel on the right hand side of the page. Select any filter options and then click the 'Apply' button to filter the data. The filters are applied to the data in all the tabs
  2. The 'Reset' button will restore the default settings ie the sequence type will be set to amino acid, the coordinate range to the full length of the gene, and all filters will be de-selected
  3. The Screen Type filter can be applied to show data from all screens (All), or only from whole genome sequencing or whole exome sequencing (Whole Genome Screen).
  4. The Mutation Impact (FATHMM Prediction) filters can also be applied to show the mutations based on FATHMM descriptors, Cancer or Damaging as 'Pathogenic' and Passenger or Tolerated as 'Neutral'. More informationa about FATHMM is available here.
All the "filters" applied in the histogram changes the content of all other subsequent tabs.

Gene Analysis - Overview

The Overview tab on the main panel shows a general overview of the information available for the gene.

  1. Gene Name - The gene name for which the data has been curated in COSMIC. In most cases this is the accepted HGNC identifier.
  2. Gene Id - Unique identifier for this gene in COSMIC.
  3. Synonymns - All the HUGO synonyms related to this gene.
  4. Drug Sensitivity Data - A list of drugs associated with mutations in this gene
  5. Mouse Mutatgenesis Data - Links to a table of mouse insertional mutagenesis data from the CCGD (Candidate Cancer Gene Database).
    
             Genomic regions are listed where clusters of insertional events (Common Insertion Sites, CIS)
             have driven the progression of cancer in mouse models (Ranzani et al; 2013).
    
             The table columns are -
    
             1. Insertionsal Cluster (CIS) -
                The genome location of transposon common insertion site (CIS); chromosome:start-end
             2. Effect - description of the effect, or 'not determined'
             3. Tissue - the tissue/disease classification
             4. Pubmed - Reference ID, links to the Pubmed website
             
  6. No. of samples - Total number of unique samples screened and samples with mutations.

Gene Analysis - Genome Browser

The COSMIC Genome Browser is shown in a panel below the main tabs (visible from all). The browser is embedded - for a full screen view with easier navigation click the 'Full-screen view' link in the top right corner.

Gene Analysis - Tissues

This tab shows a table of primary tissues for selected gene with percentage of mutated samples for both point mutations and Copy number variation. The mutation percentage are represented as histograms with a tooltip showing the total samples analysed, total samples mutated and the percentage value. The table also takes into consideration of filters selected in the right hand panel ( Though some filters like Mutation category might not affect the Copy number variation data ).

  1. Primary Tissue - Lists all the primary tissue present in the database.
  2. Point Mutations - Category representing the percentage of mutated samples & total number of samples analysed.
    1. % Mutated - Histogram showing the mutation percentage
    2. Tested - Total number of samples analysed
  3. Copy Number Variation - Category representing the CNV data for both loss and gain in the histogram and the total number of samples analysed.
    1. Variant % - Histogram showing mutation percentage of Gain( Pink ) and Loss( Blue ) respectively. Sorting works only on gain values
    2. Tested - Total number of samples analysed
  4. Gene Expression - Category representing the gene expression data for both 'under' and 'over' in the histogram and the total number of samples analysed.
    1. % Regulated - Histogram showing percentage of Over( Red ) and Under( Green ) respectively. Sorting works only on Over values
    2. Tested - Total number of samples analysed
  5. Methylation - Category representing the methylation data for both 'high' and 'low' in the histogram and the total number of samples analysed.
    1. % Diff. Methylated - Histogram showing percentage of High( Red ) and Low( Green ) respectively. Sorting works only on High values
    2. Tested - Total number of samples analysed

Gene Analysis - Distribution

This tab shows mutation distribution pie charts and histograms for the selected gene and filters.

Distribution Overview chart

This interactive pie chart displays an overview of the mutation spectrum for the selected gene and filters; move your cursor over the pie chart or table columns to highlight regions of interest. Mutation data can be exported from this table by clicking the 'Mutation Type' links.

  1. Colour - The colours in the pie chart are matched to the colours in the table.
  2. Mutation Type - All the mutations are classified as either substitution, deletion, insertion or complex mutations, the 'other' category contains all those mutations that fall outside the defined categories or if there is no information for nucleotide changes. This links to a page detailing much additional information on the relevant samples and mutations.
  3. Mutant Samples - Number of unique mutated samples for the selected disease. Each mutation is counted, each sample is only counted once in the Total. Therefore if a sample has for instance, both missense and nonsense mutations, it will contribute to both of these counts, but only once to the Total count; as a result of this, the total count will often be a slightly lower number than the sum of the individual mutation counts.
  4. Percentage - Observed frequency of this mutation type.

Note that samples may have multiple mutations, which can fall into different categories. For example, one sample may have a nonsense mutation and a synonymous mutation, and when thinking about the percentages in the table, that sample is "double counted" in a sense.

For each type of mutation, we calculate the percentage as num_samples / total samples * 100. For example, for nonsense mutations in the gene TET2, we get 731 / 2461 * 100 = 29.7%. For synonymous mutations, we get 62 / 2461 * 100 = 2.52%. However, those two values don't take into account the possibility of the same sample appearing in both calculations, and if we simply add them together we'll get an over-inflated percentage score — the total will be more than 100%.

Substitutions (coding strand)

This pie chart gives a detailed overview of substitution mutations for the base pair changes on the coding strand. It captures all the changes from A,C,G,T to A,C,G,T. If there is no nucleotide information available then it will be reported separately at the bottom of the graph.

Substitutions (both strands)

This bar graph shows the number of deletions in the selected disease on the X axis with the nucleotide length of deletions on the Y axis. Click on the blue bar to see additional details, if there is no nucleotide information available then it will be reported separately at the bottom of the graph.

Deletions

This bar graph shows the number of deletions in the selected tissues/histologies on the X axis, and the length of the deletions on the Y axis (base pairs). Click on the blue bar to see the mutated sample details export page. If there is no nucleotide information available then it will be reported separately at the bottom of the graph.

Insertions

This bar graph shows the number of insertions in the selected tissues/histologies on the X axis, and the length of the insertions on the Y axis (base pairs). Click on the blue bar to see the mutated sample details export page. If there is no nucleotide information available then it will be reported separately at the bottom of the graph.

Gene Analysis - Drug Resistance/Genes

This tab displays a table and pie chart relating to drug-resistant mutation distribution.

The listed Drugs are targeted treatment for tumours containing mutations in the selected gene. The Genes table contains a list of genes which have a resistant mutation annotated for the selected drug(s). The resistant mutation can confer acquired resistance (after treatment) or intrinsic resistance (before treatment).

Samples with acquired resistance are annotated with a Drug Response of 'resistant recurrence' (see Sample Overview). A recurrent tumour or a metastatic site has been screened for mutations following relapse after an initial drug response. Only those secondary mutations reported as proven to be associated with resistance or presumed by authors to be associated with resistance, e.g. based on their gene location, are annotated as acquired resistance mutations and not incidental passenger mutations detected in a recurrent tumour.

Samples with intrinsic resistance are annotated with a Drug Response of 'primary non response' (see Sample Overview). Only those mutations reported as associated with resistance are annotated as primary resistant mutations.[Please see Curation/Drug Resistance for more extensive help].

All drugs are selected by default. Use the 'Update Drugs' functionality to change the drug selection. The other Genes listed also have a resistant mutation to one or more of the selected drugs. These other genes may have more resistant mutations to other drugs which are not listed here. For a full list of curated targeted drug resistance information for these genes, enter the gene of interest in the filter on the right hand side panel and select Apply.

Unique Resistant Samples gives the total number of samples which have a resistant mutation annotated for a particular gene for any of the selected drugs. The distribution of these samples per gene is shown in the pie chart. The sample number links to a page detailing additional information on the relevant samples and mutations. Browsing tip: You may want to right click the number to open it in a new window.

Unique Resistant Mutations shows the number of unique resistant mutations associated with one or more of the selected drug(s).

Browsing tip: When you have clicked the main Drug Resistance tab, click 'Update Drugs' once and you’ll be able to navigate the tabs on this page more fluently.

Gene Analysis - Drug Resistance/Mutations

The graph shows the number of samples with a particular resistant mutation for all the selected genes and selected drugs.

Only data for the drugs selected are displayed. Use 'Update Genes' functionality to change the gene selection.

Hover-over a blue sample bar to display sample number, gene and drug(s). Or click on the gene mutation to bring up a full table of details for those samples tagged with the particular resistant mutation.

Gene Analysis - Variants/Mutations

This tab contains a table describing all the mutations observed on the gene, grouped by number of times observed with links to the corresponding mutation overview pages. Similar mutations may arise a number of times with different counts, for example in COSMIC v63 p.D816V is observed 760 times for one instance and 404 times in another. This is because the peptide annotation is the same while the nucleotide annotation is different. This table also provides the ability to search, sort and export the table in csv and tsv format.

  1. Position - The position of the mutation, it could be an amino acid or cds position depending on the 'Sequence Type' filter selection.
  2. Mutation (CDS) - This details the change that has occurred in the nucleotide sequence as a result of the mutation. Formatting is based on the recommendations made by the Human Genome Variation Society. The description of each type can be found by following the link to the Mutation Overview page.
  3. Mutation (Amino Acid) - This section details the change that has occurred in the peptide sequence as a result of the mutation. Formatting is identical to the method used for the nucleotide sequence.
  4. Legacy Mutation Id (COSM) - Legacy mutation identifier (COSM) represents existing COSM mutation identifiers. This identifier remains the same between different assemblies (GRCh37 and GRCh38). All the COSM ids at the same genomic location have been collapsed into one representative COSM id. These ids are maintained to help track existing mutations.
  5. Count - The number of unique mutated samples for this mutation (taking into account any filters applied).
  6. Mutation Type - The type of mutation (substitution, deletion, insertion, complex, fusion etc.)
Mutation Types:


    Nonsense :      A substitution mutation resulting in a termination codon,
                    foreshortening the translated peptide.

    Missense :      A substitution mutation resulting in an alternate codon,
                    altering the amino acid at this position only.

    Coding silent : A synonymous substitution mutation which encodes the same
                    amino acid as the wild type codon.

    Intronic :      A substitution mutation outside the coding domains. No interpretation is
                    made as to its effect on splice sites or nearby regulatory regions.

    Complex :       A compound mutation which may involve multiple insertions, deletions
                    and substitutions.

    Unknown :       A mutation with no detailed information available.

Gene Analysis - Variants/Fusions

This tab contains a table describing all the fusions containing this gene, grouped by number of times observed with links to the corresponding fusion summary pages. The table is in the same format as the previous Mutations tab and provides the ability to search, sort and export the table in csv and tsv format.

  1. Position - Not shown
  2. Mutation (CDS) - Not shown
  3. Mutation (Amino Acid) - This shows the syntax describing the fusion. Formatting is based on the recommendations made by the Human Genome Variation Society.
  4. Mutation Id (COSF) - Unique Fusion Identifier.
  5. Count - The number of unique mutated samples for this gene fusion (taking into account any filters applied).
  6. Mutation Type - Always defined as Fusion in this table

Gene Analysis - Variants/CNV & Expression

This tab shows a table of gene expression and copy number variation (CNV) data for the selected gene with links to Sample, Study, CNV and (icons) to the ChromoView page (to view CNVs across the whole chromosome), the COSMIC Genome Browser and Ensembl. The table contains the following Columns -

  1. Sample - Sample identifier which links to the sample overview page.
  2. Expression - classified as 'Under', 'Over', 'Normal' or '-' where there is no data.
  3. Expr Level - The Z-score value for gene expression. Normal range -2 to 2. Over > 2. Under < -2.
  4. CN Type - Type of CNV, either Loss or Gain.
  5. Minor Allele - Minor allele count for the specified CNV
  6. Copy Number - Total Copy Number for the specified CNV (major allele + minor allele counts)
  7. CN Segment Posn. - Genomic position of the CNV and icon links to the ChromoView page (to view CNVs across the whole chromosome), the COSMIC Genome Browser and Ensembl.
  8. Average Ploidy - average ploidy across the whole genome of the sample
  9. Study - Unique Study identifier which links to the study page.
  10. CNV - Unique CNV Identifier which links to the CNV overview page.

Gene Analysis - Variants/Methylation

This tab shows a table of differential methylation for the selected gene. The table contains the following Columns -

  1. Sample Name - The name of the sample
  2. Sample Id - The unique COSMIC identifier for the sample (COSS).
  3. Probe Id - The unique identifier for the probe which targets a specific CpG.
  4. Probe Posn. - The genomic position od the CpG targeted by the probe with links to the COSMIC Cancer Genome Browser and Ensembl.
  5. Type - The type of methylation variant; High (beta-value >0.8), Low (beta-value <0.2)
  6. Level (Beta-Value) - The beta-value for the probe/sample
  7. Normal Average - average beta-value across the normal samples
  8. Study - Unique Study identifier which links to the study page.

Gene Analysis - References

This table lists all curated publications and unpublished studies (eg downloaded consortium data) reporting this gene. More details of the paper can be found by following the ‘COSMIC link’ or ‘Pubmed Link’. This table also provides the ability to search, sort and export the table in csv and tsv format.


  1. Reference Title - The title of the article, mouse over the title to see the full article title name.
  2. Author – First author (s) of the publication, to see all authors of the publication please click COSMIC or Pubmed link.
  3. Year - The publication year of the journal from which the article was taken.
  4. Journal - An abbreviated title of the journal the article was sourced from, followed by volume and page number.
  5. Status – There are currently three statuses:
    • Curated : The reference has been fully curated in to COSMIC.
    • Listed : The reference have been read, but the data has not been entered into COSMIC for one of many reasons.
    • Reviews : These references are reviews of mutation data from other references. They are not entered into COSMIC as the data has generally been entered via the original papers.
  6. COSMIC – Links to the more detailed page of the publication or study, listing all details including samples and gene analysed with or without mutation.
  7. Pubmed - Links to Pubmed for more details on a publication.

Help Index