Figure S4 A-H : Initiation site properties and evolutionary changes

Similar documents
High-throughput Transcriptome analysis

Supplementary Figure 1

Transcription factor binding site prediction in vivo using DNA sequence and shape features

Human mirna controls * * Lim 2003 Berezikov Mouse mirna controls. Not sequenced. Not enough reads. Berezikov 2006b. Xie 2005

DNA sequence and chromatin structure. Mapping nucleosome positioning using high-throughput sequencing

Gene splice sites correlate with nucleosome positions

Chapter 10: Gene Expression and Regulation

Mutation Rates and Sequence Changes

Accelerating Genomic Computations 1000X with Hardware

The Human Genome Project has always been something of a misnomer, implying the existence of a single human genome

Supplementary Information

Computational Technique for Improvement of the Position-Weight Matrices for the DNA/Protein Binding Sites

Figure 1. FasterDB SEARCH PAGE corresponding to human WNK1 gene. In the search page, gene searching, in the mouse or human genome, can be done: 1- By

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.

Systematic clustering of transcription start site landscapes Zhao, Xiaobei; Valen, Eivind; Parker, Brian J; Sandelin, Albin Gustav

Supplementary table 1: List of sequences of primers used in sequenom assay

Genome-Wide Survey of MicroRNA - Transcription Factor Feed-Forward Regulatory Circuits in Human. Supporting Information

Computational Investigation of Gene Regulatory Elements. Ryan Weddle Computational Biosciences Internship Presentation 12/15/2004

Annotation of contig27 in the Muller F Element of D. elegans. Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans.

Functional Annotation and Prioritization of Whole Exome and Whole Genome Sequencing Variants. Mulin Jun Li

Identification of individual motifs on the genome scale. Some slides are from Mayukh Bhaowal

Creation of a PAM matrix

Introduction to ChIP Seq data analyses. Acknowledgement: slides taken from Dr. H

BIOINFORMATICS TO ANALYZE AND COMPARE GENOMES

Nature Genetics: doi: /ng Supplementary Figure 1. The pedigree information for American upland cotton breeding.

ORTHOMINE - A dataset of Drosophila core promoters and its analysis. Sumit Middha Advisor: Dr. Peter Cherbas

Supplementary Information Targeting fidelity of adenine and cytosine base editors in mouse embryos

Reviewers' Comments: Reviewer #1 (Remarks to the Author)

Computational Genomics. Irit Gat-Viks & Ron Shamir & Haim Wolfson Fall

Supplementary Material

Supplementary Figures

nature methods A paired-end sequencing strategy to map the complex landscape of transcription initiation

Systematic evaluation of spliced alignment programs for RNA- seq data

Genomic resources. for non-model systems

Statistical Methods for Quantitative Trait Loci (QTL) Mapping

Mapping by recurrence and modelling the mutation rate

Axiom mydesign Custom Array design guide for human genotyping applications

Figure 7.1: PWM evolution: The sequence affinity of TFBSs has evolved from single sequences, to PWMs, to larger and larger databases of PWMs.

MATH 5610, Computational Biology

Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain. Elfar Þórarinsson February 2006

Gene Prediction in Eukaryotes

Computational Genomics. Ron Shamir & Roded Sharan Fall

Minor Introns vs Major Introns

Introduction to BIOINFORMATICS

Transcription start site classification

Result Tables The Result Table, which indicates chromosomal positions and annotated gene names, promoter regions and CpG islands, is the best way for

Mammalian non-cg methylations are conserved and cell-type specific and may have been involved in the evolution of transposon elements

Promoter Architectures and Developmental Gene Regulation

Functional microrna targets in protein coding sequences. Merve Çakır

Genomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010

BTRY 7210: Topics in Quantitative Genomics and Genetics

User s Manual Version 1.0

Mapping strategies for sequence reads

Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4

On the sequence specificity of apoptotic nucleases. Haifa-NP 2012

Supplementary Figure 1 Strategy for parallel detection of DHSs and adjacent nucleosomes

Annotation of Contig8 Sakura Oyama Dr. Elgin, Dr. Shaffer, Dr. Bednarski Bio 434W May 2, 2016

Traditional Genetic Improvement. Genetic variation is due to differences in DNA sequence. Adding DNA sequence data to traditional breeding.

An introduction to RNA-seq. Nicole Cloonan - 4 th July 2018 #UQWinterSchool #Bioinformatics #GroupTherapy

132 Grundlagen der Bioinformatik, SoSe 14, D. Huson, June 22, This exposition is based on the following source, which is recommended reading:

Grundlagen der Bioinformatik, SoSe 11, D. Huson, July 4, This exposition is based on the following source, which is recommended reading:

(Practical) Bioinformatics for CRISPR/Cas9

amplification High Resolution Melt Parameter Considerations for Optimal Data Resolution tech note 6009

Midterm exam BIOSCI 113/244 WINTER QUARTER,

Module 2: Core Bioinformatics FINAL EXAM SOLUTIONS

Genome annotation & EST

Non-conserved intronic motifs in human and mouse are associated with a conserved set of functions

Genetic Testing and Analysis. (858) MRN: Specimen: Saliva Received: 07/26/2016 GENETIC ANALYSIS REPORT

In 1996, the genome of Saccharomyces cerevisiae was completed due to the work of

EECS730: Introduction to Bioinformatics

Scoring Alignments. Genome 373 Genomic Informatics Elhanan Borenstein

Solutions will be posted on the web.

Nature Methods: doi: /nmeth.4396

Introduction to Transcription Factor Binding Sites (TFBS) Cells control the expression of genes using Transcription Factors.

MODULE TSS1: TRANSCRIPTION START SITES INTRODUCTION (BASIC)

Prioritization: from vcf to finding the causative gene

Outline. Gene Finding Questions. Recap: Prokaryotic gene finding Eukaryotic gene finding The human gene complement Regulation

Supporting Information

9/19/13. cdna libraries, EST clusters, gene prediction and functional annotation. Biosciences 741: Genomics Fall, 2013 Week 3

Retracing transcription regulatory activities that control expression and chromatin dynamics

Prediction of noncoding RNAs with RNAz

Applied Bioinformatics - Lecture 16: Transcriptomics

Comparative Genomics. Page 1. REMINDER: BMI 214 Industry Night. We ve already done some comparative genomics. Loose Definition. Human vs.

COMPAS for the Analysis of SELEX Experiments

Annotation of contig62 from Drosophila elegans Dot Chromosome

Edinburgh Research Explorer

Genetic characterization and polymorphism detection of casein genes in Egyptian sheep breeds

Selective constraints on noncoding DNA of mammals. Peter Keightley Institute of Evolutionary Biology University of Edinburgh

Use of a neural network to predict normalized signal strengths from a DNA-sequencing microarray

Figure S1: NUN preparation yields nascent, unadenylated RNA with a different profile from Total RNA.

What I hope you ll learn. Introduction to NCBI & Ensembl tools including BLAST and database searching!

Biology Evolution: Mutation I Science and Mathematics Education Research Group

Evolutionary Mechanisms

Supporting Information

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University

Supporting Information

Computational Systems Biology Deep Learning in the Life Sciences

Transcription:

A 0.3 Figure S4 A-H : Initiation site properties and evolutionary changes G-correction not used 0.25 Fraction of total counts 0.2 0.5 0. tag 2 tags 3 tags 4 tags 5 tags 6 tags 7tags 8tags 9 tags >9 tags expected fraction 0.05 0 AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT B 0.3 G-correction used Initiation site usage, broken down by level of TSS CAGE support Fraction of total counts 0.25 0.2 0.5 0. tag 2 tags 3 tags 4 tags 5 tags 6 tags 7tags 8tags 9 tags >9 tags expected fraction 0.05 0 AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT Initiation site usage, broken down by level of TSS CAGE support Figure 4 A-B. Dinucleotide distribution analysis of CTSS with varying CAGE tag support We analyzed the usage of different [-, +] dinucleotides relative to each CTSS in the data set (note that the - nucleotide is not part of the sequenced tag). We subdivided the cases in respect to how many tags the CTSS contained into 0 classes (,2,3 to 9 tags and 0 tags). As an additional reference class, we collected 0.000 randomly selected start points in the genome (non-overlapping and not part of repetitive regions). This distribution will correspond to the expected distribution if start sites are random (noise). The frequency of all possible dinucleotides for the classes is shown as a barplot, with (panel B) or without G correction (panel A). The dinucleotide distribution is dramatically different from random selection, even with single CAGE tag support. We also note that there is a higher preference for INR-like CA dinucleotides when the transcript has a higher expression (i.e. more tag counts), while AG and GG dinucleotides are more favored in rarely expressed transcripts. Part of the GG dinucleotides corresponds to the GGG motif (before G correction) we found for the novel 3'UTR transcripts.this is true regardless of whether the CTSSs are subjected to G correction or not. The difference in dinucleotide use when the tag count is 5 is a rounding artifact in the G correction algorithm (which was designed for correcting larger tag counts). Regardless of this, the overall frequency pattern as a function of number of supporting tags is indicative of very low level of noise in the CAGE dataset: otherwise the preference for TSSs supported by one tag (singletons) would be much closer to that expected by chance, and different from the preference of TSSs supported by two or three tags.

Figure S4 A-H : Initiation site properties and evolutionary changes Fig. S4C-D Examples of pyrimidine-purine dinucleotides substitutions and effects. Gallery of barplots of mouse and human orthologous TCs illustrating dinucleotide substitutions and their effect on the start site usage. Y-axis indicate the number of CAGE tags starting at given genomic positions(x axis). Green arrows indicate the transition from a pyrimidine-purine start site to any other base combination. C Ccm gene Tag cluster T05F0003AFA6 D Wasf2 gene Tag cluster T04F07D7XFEE

Figure S4 A-H : Initiation site properties and evolutionary changes E Pfdn2 gene Tag cluster T0F04A379D63 F Jaridb gene Tag clustert0f08038b70

Figure S4 A-I : Initiation site properties and evolutionary changes G DBwg363 gene Tag cluster T0R048684BF H Grim9 gene Tag cluster T08R04BDDDA

Figure S4 A-I : Initiation site properties and evolutionary changes Mutation of a purine-purine dinuclotide to... 0e+2 0e-2 0e-5 340 cases( 67.2 %) pu.pu>pu.pu 640 cases( 2.6 %) pu.pu>pu.py 56 cases( 3. %) pu.pu>py.pu 828 cases( 6.3 %) pu.pu>py.py 40 cases( 0.8 %) Mutation of a purine-pyrimidine dinuclotide to... 0e+2 0e-2 0e-5 49 cases( 5.3 %) pu.py>pu.pu 55 cases( 9 %) pu.py>pu.py 90 cases( %) pu.py>py.pu 80 cases( 9.8 %) pu.py>py.py 73 cases( 8.9 %) Mutation of a pyrimidine-pyrimidine dinuclotide to... 0e+2 0e-2 0e-5 228 cases( 53 %) py.py>pu.pu 42 cases(.8 %) py.py>pu.py 78 cases( 3.4 %) py.py>py.pu 695 cases( 30 %) py.py>py.py 275 cases(.9 %) Mutation of a pyrimidine-purine dinuclotide to... 0e+2 0e-2 0e-5 9270 cases( 67.8 %) py.pu>pu.pu 048 cases( 7.7 %) py.pu>pu.py 36 cases( %) py.pu>py.pu 2362 cases( 7.3 %) py.pu>py.py 865 cases( 6.3 %) Fig. S4I Substitution effects on dinucleotides in core promoters. Boxplots show the effects of substitutions on initiation sites for all possible base combinations. Mutations are annotated relative to mouse (i.e. mouse to human). Boxplot generation and Y axis score is described in Methods. The four sections correspond to four different reference dinucleotides (Pu-Pu, Pu-Py, Py-Pu, Py-Py).