A 0.3 Figure S4 A-H : Initiation site properties and evolutionary changes G-correction not used 0.25 Fraction of total counts 0.2 0.5 0. tag 2 tags 3 tags 4 tags 5 tags 6 tags 7tags 8tags 9 tags >9 tags expected fraction 0.05 0 AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT B 0.3 G-correction used Initiation site usage, broken down by level of TSS CAGE support Fraction of total counts 0.25 0.2 0.5 0. tag 2 tags 3 tags 4 tags 5 tags 6 tags 7tags 8tags 9 tags >9 tags expected fraction 0.05 0 AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT Initiation site usage, broken down by level of TSS CAGE support Figure 4 A-B. Dinucleotide distribution analysis of CTSS with varying CAGE tag support We analyzed the usage of different [-, +] dinucleotides relative to each CTSS in the data set (note that the - nucleotide is not part of the sequenced tag). We subdivided the cases in respect to how many tags the CTSS contained into 0 classes (,2,3 to 9 tags and 0 tags). As an additional reference class, we collected 0.000 randomly selected start points in the genome (non-overlapping and not part of repetitive regions). This distribution will correspond to the expected distribution if start sites are random (noise). The frequency of all possible dinucleotides for the classes is shown as a barplot, with (panel B) or without G correction (panel A). The dinucleotide distribution is dramatically different from random selection, even with single CAGE tag support. We also note that there is a higher preference for INR-like CA dinucleotides when the transcript has a higher expression (i.e. more tag counts), while AG and GG dinucleotides are more favored in rarely expressed transcripts. Part of the GG dinucleotides corresponds to the GGG motif (before G correction) we found for the novel 3'UTR transcripts.this is true regardless of whether the CTSSs are subjected to G correction or not. The difference in dinucleotide use when the tag count is 5 is a rounding artifact in the G correction algorithm (which was designed for correcting larger tag counts). Regardless of this, the overall frequency pattern as a function of number of supporting tags is indicative of very low level of noise in the CAGE dataset: otherwise the preference for TSSs supported by one tag (singletons) would be much closer to that expected by chance, and different from the preference of TSSs supported by two or three tags.
Figure S4 A-H : Initiation site properties and evolutionary changes Fig. S4C-D Examples of pyrimidine-purine dinucleotides substitutions and effects. Gallery of barplots of mouse and human orthologous TCs illustrating dinucleotide substitutions and their effect on the start site usage. Y-axis indicate the number of CAGE tags starting at given genomic positions(x axis). Green arrows indicate the transition from a pyrimidine-purine start site to any other base combination. C Ccm gene Tag cluster T05F0003AFA6 D Wasf2 gene Tag cluster T04F07D7XFEE
Figure S4 A-H : Initiation site properties and evolutionary changes E Pfdn2 gene Tag cluster T0F04A379D63 F Jaridb gene Tag clustert0f08038b70
Figure S4 A-I : Initiation site properties and evolutionary changes G DBwg363 gene Tag cluster T0R048684BF H Grim9 gene Tag cluster T08R04BDDDA
Figure S4 A-I : Initiation site properties and evolutionary changes Mutation of a purine-purine dinuclotide to... 0e+2 0e-2 0e-5 340 cases( 67.2 %) pu.pu>pu.pu 640 cases( 2.6 %) pu.pu>pu.py 56 cases( 3. %) pu.pu>py.pu 828 cases( 6.3 %) pu.pu>py.py 40 cases( 0.8 %) Mutation of a purine-pyrimidine dinuclotide to... 0e+2 0e-2 0e-5 49 cases( 5.3 %) pu.py>pu.pu 55 cases( 9 %) pu.py>pu.py 90 cases( %) pu.py>py.pu 80 cases( 9.8 %) pu.py>py.py 73 cases( 8.9 %) Mutation of a pyrimidine-pyrimidine dinuclotide to... 0e+2 0e-2 0e-5 228 cases( 53 %) py.py>pu.pu 42 cases(.8 %) py.py>pu.py 78 cases( 3.4 %) py.py>py.pu 695 cases( 30 %) py.py>py.py 275 cases(.9 %) Mutation of a pyrimidine-purine dinuclotide to... 0e+2 0e-2 0e-5 9270 cases( 67.8 %) py.pu>pu.pu 048 cases( 7.7 %) py.pu>pu.py 36 cases( %) py.pu>py.pu 2362 cases( 7.3 %) py.pu>py.py 865 cases( 6.3 %) Fig. S4I Substitution effects on dinucleotides in core promoters. Boxplots show the effects of substitutions on initiation sites for all possible base combinations. Mutations are annotated relative to mouse (i.e. mouse to human). Boxplot generation and Y axis score is described in Methods. The four sections correspond to four different reference dinucleotides (Pu-Pu, Pu-Py, Py-Pu, Py-Py).