ID1 · GRCh37 · COSMIC v96

Mutational profile

The height of each mutational profile bar represents the proportion of one ID mutation type among all ID mutation types in the signature. Although there is no single intuitive and naturally constrained set of ID mutation types (as there arguably are for SBSs and DBSs), an 83 subclass categorisation of ID mutations was designed.

The 83 ID classification incorporates the prior knowledge that IDs commonly have sizes of 1-10 bps, that both insertions and deletions exist, that IDs of C and T occur at different rates, that IDs preferentially occur at repetitive elements, that the length of the repeat unit may influence the likelihood of an ID occurring, that the number of repeat units in a repeat stretch may influence the likelihood of an ID occurring, that IDs are also fostered in some instances by overlapping sequence microhomologies at the ID boundaries and that different mutational processes may, in principle, be differently influenced by these features. We therefore designed an 83 subclass categorisation of IDs that allows some exploration of all the above possibilities, while constraining the number of categories in order to accommodate the relatively small numbers of IDs (compared to substitutions) found in most genomes. This classification categorises IDs of lengths from 1bp to >5bp, for 1bp IDs classifies them as T or C and the number of single base repeats they occur in from 0 to >5, categorises lengths of non-single base repeat units from 2bp to >5bp and the number of repeats from 1 to >5 and size of microhomology from 0bp to >5bp. We recognise that different classifications of IDs may be preferred by others. The ID mutation types are enumerated in the following Excel document.

Proposed aetiology

Slippage during DNA replication of the replicated DNA strand. This signature is found in almost all samples, however, substantial number of mutations of this signature are found in cancers with DNA mismatch repair deficiency.

Comments

ID1 is found in most cancer samples. The ID1 mutation burden is correlated with the age of cancer diagnosis in non-hypermutated samples and this clock-like behaviour suggests that ID1 mutations accumulate in normal cells. This mutational signature also tends to be highly elevated in cancer samples with defective DNA mismatch repair and microsatellite instability (MSI). These MSI samples generally exhibit one or more of SBS6, SBS14, SBS15, SBS20, SBS21, SBS26, and/or SBS44.

Acceptance criteria

Supporting evidence for mutational signature validity

Validated evidence for real signature
Unclear evidence for real signature
Evidence for artefact signature
Background Identification study First included in COSMIC
Alexandrov et al. 2020 Nature v3
Identification NGS technique Different variant callers Multiple sequencing centres
WES & WGS Yes Yes
Technical validation Validated in orthogonal techniques Replicated in additional studies Extended context enrichment
Yes Yes -
Proposed aetiology Mutational process Support
Slippage of nascent strand during DNA replication Statistical association
Experimental validation Experimental study Species
- -

Summary of the technical and experimental evidence available in the scientific literature regarding the validation of the mutational signature.

Tissue distribution

v3.2_ID1_TISSUE.jpg

Numbers of mutations per megabase attributed to the mutational signature across the cancer types in which the signature was found. Each dot represents an individual sample and only samples where the signature is found are shown. The number of mutations per megabase was calculated by assuming that an average whole-exome has 30 Mb with sufficient coverage, whereas an average whole-genome has 2,800 Mb with sufficient coverage.

The numbers below the dots for each cancer type indicate the number of high confidence tumours in which the signature was attributed (above the blue horizontal line) and the total number of high confidence tumours analysed (below the blue horizontal line). Only high confidence data are displayed: samples with reconstruction accuracy >0.90.

Associated signatures

Associated with SBS1 in non-hypermutated samples.

Replication timing

Tissue: aggregated across 24 tissues

Tissue: Lymph-BNHL (B-Cell Non-Hodgkin Lymphoma)

Tissue: Bladder-TCC (Bladder Urothelial Carcinoma)

Tissue: Bone-Osteosarc (Bone Osteosarcoma)

Tissue: Breast-Cancer (Breast Carcinoma)

Tissue: Cervix-Cancer (Cervical Carcinoma)

Tissue: Biliary-AdenoCA (Cholangiocarcinoma)

Tissue: Lymph-CLL (Chronic Lymphocytic Leukemia)

Tissue: ColoRect-AdenoCA (Colorectal Adenocarcinoma)

Tissue: Skin-Melanoma (Cutaneous Melanoma)

Tissue: Uterus-AdenoCA (Endometrial Adenocarcinoma)

Tissue: Eso-AdenoCA (Esophageal Adenocarcinoma)

Tissue: ESCC (Esophageal Squamous Cell Carcinoma)

Tissue: Stomach-AdenoCA (Gastric Adenocarcinoma)

Tissue: CNS-GBM (Glioblastoma)

Tissue: Head-SCC (Head and Neck Squamous Cell Carcinoma)

Tissue: Liver-HCC (Hepatocellular Carcinoma)

Tissue: Lung-AdenoCA (Lung Adenocarcinoma)

Tissue: Lung-SCC (Lung Squamous Cell Carcinoma)

Tissue: CNS-Medullo (Medulloblastoma)

Tissue: Ovary-AdenoCA (Ovarian Adenocarcinoma)

Tissue: Panc-AdenoCA (Pancreatic Adenocarcinoma)

Tissue: Panc-Endocrine (Pancreatic Neuroendocrine tumour)

Tissue: Prost-AdenoCA (Prostate Adenocarcinoma)

Tissue: Kidney-RCC (Renal Cell Carcinoma)

Normalised mutational densities from early to late replicating regions in the human genome are shown with respect to real somatic mutations and simulated mutations. The dashed line reflects the behaviour of simulated mutations, whereas the bars represent the behaviour for real somatic mutations.

Nucleosome occupancy

Tissue: aggregated across 24 tissues

Tissue: Lymph-BNHL (B-Cell Non-Hodgkin Lymphoma)

Tissue: Bladder-TCC (Bladder Urothelial Carcinoma)

Tissue: Bone-Osteosarc (Bone Osteosarcoma)

Tissue: Breast-Cancer (Breast Carcinoma)

Tissue: Cervix-Cancer (Cervical Carcinoma)

Tissue: Biliary-AdenoCA (Cholangiocarcinoma)

Tissue: Lymph-CLL (Chronic Lymphocytic Leukemia)

Tissue: ColoRect-AdenoCA (Colorectal Adenocarcinoma)

Tissue: Skin-Melanoma (Cutaneous Melanoma)

Tissue: Uterus-AdenoCA (Endometrial Adenocarcinoma)

Tissue: Eso-AdenoCA (Esophageal Adenocarcinoma)

Tissue: ESCC (Esophageal Squamous Cell Carcinoma)

Tissue: Stomach-AdenoCA (Gastric Adenocarcinoma)

Tissue: CNS-GBM (Glioblastoma)

Tissue: Head-SCC (Head and Neck Squamous Cell Carcinoma)

Tissue: Liver-HCC (Hepatocellular Carcinoma)

Tissue: Lung-AdenoCA (Lung Adenocarcinoma)

Tissue: Lung-SCC (Lung Squamous Cell Carcinoma)

Tissue: CNS-Medullo (Medulloblastoma)

Tissue: Ovary-AdenoCA (Ovarian Adenocarcinoma)

Tissue: Panc-AdenoCA (Pancreatic Adenocarcinoma)

Tissue: Panc-Endocrine (Pancreatic Neuroendocrine tumour)

Tissue: Prost-AdenoCA (Prostate Adenocarcinoma)

Tissue: Kidney-RCC (Renal Cell Carcinoma)

Average nucleosome signal along a 2 kilobase window centred at the somatic mutation (dashed vertical line). The solid blue line shows the average nucleosome signal for real mutations, whereas the dashed line shows the average nucleosome signal for simulated somatic mutations. A higher signal reflects a higher propensity for nucleosome occupancy.

CTCF occupancy

Tissue: aggregated across 9 tissues

Tissue: Lymph-BNHL (B-Cell Non-Hodgkin Lymphoma)

Tissue: ColoRect-AdenoCA (Colorectal Adenocarcinoma)

Tissue: Uterus-AdenoCA (Endometrial Adenocarcinoma)

Tissue: Eso-AdenoCA (Esophageal Adenocarcinoma)

Tissue: ESCC (Esophageal Squamous Cell Carcinoma)

Tissue: Stomach-AdenoCA (Gastric Adenocarcinoma)

Tissue: Liver-HCC (Hepatocellular Carcinoma)

Tissue: Panc-AdenoCA (Pancreatic Adenocarcinoma)

Tissue: Prost-AdenoCA (Prostate Adenocarcinoma)

CCCTC-binding factor (CTCF) is a multi-functional, sequence-specific transcription factor encoded by the CTCF gene. It can function as a transcriptional activator, a repressor, or an insulator protein by blocking the communication between enhancers and promoters.

This plot shows the average CTCF signal along a 2 kilobase window centered at the somatic mutation (dashed vertical line). The blue solid line shows the average CTCF signal for real mutations, whereas the dashed line shows the average CTCF signal for simulated mutations. A higher signal reflects a higher propensity for CTCF binding.

Histone modifications

v3.2_ID1_HISTONE_MODS.jpg

This plot shows the associations between mutational signatures and histone marks. Pie charts display the number of cancer types that are either enriched, depleted, or have no statistical effect for a given mutational signature and a specific histone mark. Differential changes are calculated by statistically comparing the of average signals between real and simulated mutations using a 100 base window centered at the somatic mutation. Enrichments reflect at least 5% statistically significant increases in real signal when compared to simulated signal. Depletions reflect at least 5% statistically significant decreases in real signal when compared to simulated signal. Statistical significance is determined based on false-discovery rate corrected p-value below 0.05.

(i) H2AFZ, a replication-independent member of the histone H2A family that renders chromatin accessible at enhancers and promoters and regulates transcriptional activation and repression; (ii) H3K4me1, histone mark often associated with enhancer activity; (iii) H3K4me2, a histone post-translational modification enriched in cis-regulatory regions, including both enhancers and promoters; (iv) H3K4me3, post-translational modification enriched in active promoters near transcription start sites; (v) H3K9ac, associated with active gene promoters and active transcription; (vi) H3K9me3, typical mark of constitutive heterochromatin; (vii) H3K27ac, histone modification generally contained at nucleosomes flanking enhancers; (viii) H3K27me3, repressive, associated with silent genes; (ix) H3K36me3, associated with transcribed regions and playing a role in regulating DNA damage repair; (x) H3K79me2, detected in the transcribed regions of active genes; and (xi) H4K20me1, found in gene promoters and associated with gene transcriptional elongation and transcription activation.

Transcriptional strand asymmetry

Topography analysis could not be performed for transcriptional strand asymmetry as the number of mutations satisfying our constraints was insufficient or this signature was not yet analysed.

Genic and intergenic regions

v3.2_ID1_GENIC_ASYM.jpg

Mutational signatures can leave their mark in the form of differential mutational frequencies between the two DNA regions: genic regions and intergenic regions.

The upper bar plot represents the percentage of real mutations in genic and intergenic regions averaged across the human genome as well as all examined samples in 96 mutational context.

In the lower circle plot, the circles are filled with the colour of the significant region when there is an odds ratio of at least 1.1 with statistical significance. The first row of this plot displays the genic versus intergenic region asymmetry across all cancer types, while the remaining rows present genic versus intergenic region asymmetry for each cancer type.

Each mutation attributed to the mutational signature is annotated as either on the genic region (transcribed strand or untranscribed strand) or intergenic region (non-transcribed strand).

The odds ratio is defined as the real mutations ratio divided by the simulated mutations ratio, where each ratio is calculated using the number of mutations on genic and intergenic regions. And the number of real mutations on genic and intergenic regions must be statistically significant with respect to the average number of simulated mutations on genic and intergenic regions.

The region with the higher number of real mutations defines the numerator of both the real mutations ratio and the simulated mutations ratio.

The region with the lower number of real mutations defines the denominator of both the real mutations ratio and the simulated mutations ratio.

Replicational strand asymmetry

v3.2_ID1_REPLIC_ASYM.jpg

Mutational signatures exhibit asymmetric number of mutations due to either one of the DNA strands being preferentially repaired or one of the DNA strands having a higher propensity for being damaged. One common example of strand asymmetry is replication-strand asymmetry in which the DNA replication process may result in preferential mutagenesis of one of the strands.

The upper bar plot represents the percentage of real mutations in lagging and leading strands averaged across the human genome as well as all examined samples in 96 mutational context.

In the lower circle plot, the circles are filled with the colour of the significant strand when there is an odds ratio of at least 1.1 with statistical significance. The first row of this plot displays the replicational strand asymmetry across all cancer types, while the remaining rows present replicational strand asymmetry for each cancer type.

Each mutation attributed to the mutational signature is annotated as either on the lagging strand or the leading strand.

The odds ratio is defined as the real mutations ratio divided by the simulated mutations ratio, where each ratio is calculated using the number of mutations on lagging and leading strands. And the number of real mutations on lagging and leading strands must be statistically significant with respect to the average number of simulated mutations on lagging and leading strands.

The strand with the higher number of real mutations defines the numerator of both the real mutations ratio and the simulated mutations ratio.

The strand with the lower number of real mutations defines the denominator of both the real mutations ratio and the simulated mutations ratio.