Mutational Signatures (v3 - May 2019)

Small Insertion and Deletion (ID) Signatures

Each vignette has two figures.

The first figure (eg ID1 in the first vignette) shows the signature profile, where the height of each bar represents the proportion of one ID mutation type among all ID mutation types in the signature. The ID mutation types are enumerated at PCAWG7_indel_classification_2017_12_08.xslx. There is no single intuitive and naturally constrained set of ID mutation types (as there arguably are for single base substitutions and doublet base substitutions).

The classification here incorporates the prior knowledge that IDs commonly have sizes of 1-10 bps, that both insertions and deletions exist, that IDs of C and T occur at different rates, that IDs preferentially occur at repetitive elements, that the length of the repeat unit may influence the likelihood of an ID occurring, that the number of repeat units in a repeat stretch may influence the likelihood of an ID occurring, that IDs are also fostered in some instances by overlapping sequence microhomologies at the ID boundaries and that different mutational processes may, in principle, be differently influenced by these features. We therefore designed an 83 subclass categorisation of IDs that allows some exploration of all the above possibilities, while constraining the number of categories in order to accommodate the relatively small numbers of IDs (compared to substitutions) found in most genomes. This classification categorises IDs of lengths from 1bp to >5bp, for 1bp IDs classifies them as T or C and the number of single base repeats they occur in from 0 to >5, categorises lengths of non-single base repeat units from 2bp to >5bp and the number of repeats from 1 to >5 and size of microhomology from 0bp to >5bp. We recognise that different classifications of IDs may be preferred by others.

The second figure, "Cancer types in which the signature is found," shows the numbers of mutations per megabase attributed to each mutational signature in samples with the signature. Only those cancer types with tumors in which signature activity is attributed are shown. The numbers below the dots for each cancer type indicate the number of tumors in which the signatures was attributed (above the blue horizontal bar) and the total number of tumors analyzed (below the blue bar).

The signatures are available in numerical form from ID syn12009743.