Transcription factor binding site prediction in vivo using DNA sequence and shape features

Size: px
Start display at page:

Download "Transcription factor binding site prediction in vivo using DNA sequence and shape features"

Transcription

1 Transcription factor binding site prediction in vivo using DNA sequence and shape features Anthony Mathelier, Lin Yang, Tsu-Pei Chiu, Remo Rohs, and Wyeth REGSYSGEN 2015 Nov. 17th Centre for Molecular Medicine and Therapeutics 1

2 Transcriptional regulation of gene expression Histone octamer TFs Enhancer Nucleosome RNA transcripts Cohesin TSS A. Mathelier, W. Shi, and W.W. Wasserman, Trends in Genetics, DNA RNA PolII Regulatory proteins Promoters Transcription of genes is turned on/off thanks to transcription factors (TFs). TFs bind to DNA at transcription factor binding sites (TFBSs). 2

3 Modeling TFBS using position frequency matrices (PFMs) Known binding sites: GTAACAAT GTAAACAT GTAAACAA GTAAACAA GTAAACAT GTAAACAA GTAAACAC GTCAACAG GTAAACAT GTAAACAA GTAAACAT TTAAGTAA ATAAACAA CTAAACAG GTAAACAT GTAAACAA GTAAACAT GTAAACAC GTAAACAT GTAAACAG Position Frequency Matrix: A [ ] C [ ] G [ ] T [ ] PFMs - PWMs Classically, position weight (PWMs) are derived from PFMs to model TFBSs, assuming nucleotide independence within TFBSs. 3

4 Modeling TFBS using Transcription Factor Flexible Models >HNF4A 1...AGTTCAAAGTTCA... >HNF4A 2...AGTCCAAAGTTCA >HNF4A CTTGGAACCGGGG... >HNF4A GGCAAGGTTCATA... ChIP-seq sequences AA CA GA TA AA CA GA TA AC CC GC TC AC CC GC TC AG CG GG TG AG CG GG TG AT CT GT TT AT CT GT TT E0 E1 bg/fg position 1 BG 1 E2 position bg/bg TFFMs AA CA GA TA AA CA GA TA AC CC GC TC AC CC GC TC AG CG GG TG AG CG GG TG AT CT GT TT AT CT GT TT... 1 En position n bits Logos A. Mathelier and W.W. Wasserman, PLoS Computational Biology, TFFMs TFFMs model the sequence property of TFBSs from ChIP-seq data by capturing successive dinucleotide dependencies. 4

5 DNA shape features The DNAshape tool predicts DNA shape features of a DNA sequence. Genome wide DNA shape features available on GBshape are: Minor Groove Width (MGW) Roll Propeller Twist (ProT) Helix Twist (HelT) T. Zhou et al., Nucl. Acids Res., T.P. Chiu et al., Nucl. Acids Res.,

6 Using DNA shape to model TFBSs Studies showed DNA shapes importance to model TFBSs from: SELEX-seq experiments. Protein-binding microarray experiments. BunDLE-seq experiments. N. Abe et al., Cell, T. Zhou et al., PNAS, M. Levo et al., Genome Res.,

7 Using DNA shape to model TFBSs Studies showed DNA shapes importance to model TFBSs from: SELEX-seq experiments. Protein-binding microarray experiments. BunDLE-seq experiments. N. Abe et al., Cell, T. Zhou et al., PNAS, M. Levo et al., Genome Res., Aims of our study: Construct computational models from large scale in vivo data (ChIP-seq) by combining DNA sequence and shape features. Show TFBS prediction improvements on in vivo data. Analyze whether DNA shape induced improvements are TF family specific. Analyze position-specific DNA shape importance at TFBSs. 6

8 Combining TFFMs and DNA shapes at TFBSs Feature vector hit score MGW ProT Roll HelT We used an ensemble machine learning approach to combine DNA sequence and shape features. 7

9 DNA shape features improve TFBS prediction in vivo A B Results on 400 human ENCODE ChIP-seq data sets Combining TFFM scores and DNA shape features improve the discriminative power. AUROC difference > 0.05 in 107 cases. 8

10 DNA shape features are important for specific TF families B C Data sets from E2F and MADS-domain TF families are enriched for strong improvements when considering DNA shape features. 9

11 Validation on independent plant MADS-domain TFs Incorporating DNA shape features significantly improve TFBS prediction for plant MADS-domain TFs. 10

12 ProT position-specific importance for MADS-domain TFs A 2 AGL15 bits 1 B ProT is of critical importance for predicting TFBSs associated to plant MADS-domain TFs in a position-specific manner. 11

13 Conclusions Our analyses of ChIP-seq data reprensent the in vivo conterpart of the published in vitro studies. We can construct computational models combining DNA sequence and shape features from ChIP-seq data to improve TFBS prediction in vivo. Incorporating DNA shape information is most beneficial when applied to the E2F and MADS-domain TF families. ProT is critical for MADS-domain TF binding specificity in a position-specific manner. 12

14 Acknowledgements Wyeth Wasserman Remo Rohs Lin Yang Tsu-Pei Chiu François Parcy Oriol Fornes Chih-Yu Chen Centre for Molecular Medicine and Therapeutics 13

15 2 1 hit score A B Feature vector MGW ProT Roll C HelT Thank you AGL15 bits 14

Supplementary Data for DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding.

Supplementary Data for DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding. Supplementary Data for DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding. Wenxiu Ma 1, Lin Yang 2, Remo Rohs 2, and William Stafford Noble 3 1 Department of Statistics,

More information

The Next Generation of Transcription Factor Binding Site Prediction

The Next Generation of Transcription Factor Binding Site Prediction The Next Generation of Transcription Factor Binding Site Prediction Anthony Mathelier*, Wyeth W. Wasserman* Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, Department

More information

DOUBLE-STRAND DNA BREAK PREDICTION USING EPIGENOME MARKS AT KILOBASE RESOLUTION

DOUBLE-STRAND DNA BREAK PREDICTION USING EPIGENOME MARKS AT KILOBASE RESOLUTION DOUBLE-STRAND DNA BREAK PREDICTION USING EPIGENOME MARKS AT KILOBASE RESOLUTION Raphaël MOURAD, Assist. Prof. Centre de Biologie Intégrative Université Paul Sabatier, Toulouse III INTRODUCTION Double-strand

More information

DNA sequence and chromatin structure. Mapping nucleosome positioning using high-throughput sequencing

DNA sequence and chromatin structure. Mapping nucleosome positioning using high-throughput sequencing DNA sequence and chromatin structure Mapping nucleosome positioning using high-throughput sequencing DNA sequence and chromatin structure Higher-order 30 nm fibre Mapping nucleosome positioning using high-throughput

More information

Characterizing DNA binding sites high throughput approaches Biol4230 Tues, April 24, 2018 Bill Pearson Pinn 6-057

Characterizing DNA binding sites high throughput approaches Biol4230 Tues, April 24, 2018 Bill Pearson Pinn 6-057 Characterizing DNA binding sites high throughput approaches Biol4230 Tues, April 24, 2018 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 Reviewing sites: affinity and specificity representation binding

More information

Sequence Motif Analysis

Sequence Motif Analysis Sequence Motif Analysis Lecture in M.Sc. Biomedizin, Module: Proteinbiochemie und Bioinformatik Jonas Ibn-Salem Andrade group Johannes Gutenberg University Mainz Institute of Molecular Biology March 7,

More information

Supplementary Data Bioconductor Vignette for DNAshapeR Package. DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding

Supplementary Data Bioconductor Vignette for DNAshapeR Package. DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding Supplementary Data Bioconductor Vignette for DNAshapeR Package DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding Tsu-Pei Chiu 1,#, Federico Comoglio 2,#, Tianyin Zhou 1,&,

More information

Epigenetics and DNase-Seq

Epigenetics and DNase-Seq Epigenetics and DNase-Seq BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Anthony

More information

Figure S4 A-H : Initiation site properties and evolutionary changes

Figure S4 A-H : Initiation site properties and evolutionary changes A 0.3 Figure S4 A-H : Initiation site properties and evolutionary changes G-correction not used 0.25 Fraction of total counts 0.2 0.5 0. tag 2 tags 3 tags 4 tags 5 tags 6 tags 7tags 8tags 9 tags >9 tags

More information

Lecture 5: Regulation

Lecture 5: Regulation Machine Learning in Computational Biology CSC 2431 Lecture 5: Regulation Instructor: Anna Goldenberg Central Dogma of Biology Transcription DNA RNA protein Process of producing RNA from DNA Constitutive

More information

Supplementary Figure 1

Supplementary Figure 1 number of cells, normalized number of cells, normalized number of cells, normalized Supplementary Figure CD CD53 Cd3e fluorescence intensity fluorescence intensity fluorescence intensity Supplementary

More information

Many transcription factors! recognize DNA shape

Many transcription factors! recognize DNA shape Many transcription factors! recognize DN shape Katie Pollard! Gladstone Institutes USF Division of Biostatistics, Institute for Human Genetics, and Institute for omputational Health Sciences ENODE Users

More information

Probing transcription factor combinatorics in different promoter classes and in enhancers

Probing transcription factor combinatorics in different promoter classes and in enhancers D R A F T Probing transcription factor combinatorics in different promoter classes and in enhancers Jimmy Vandel 1,2 Océane Cassan 1,2 Sophie Lèbre 1,3 Charles-Henri Lecellier 1,4 Laurent Bréhélin 1,2

More information

CS273B: Deep learning for Genomics and Biomedicine

CS273B: Deep learning for Genomics and Biomedicine CS273B: Deep learning for Genomics and Biomedicine Lecture 2: Convolutional neural networks and applications to functional genomics 09/28/2016 Anshul Kundaje, James Zou, Serafim Batzoglou Outline Anatomy

More information

L8: Downstream analysis of ChIP-seq and ATAC-seq data

L8: Downstream analysis of ChIP-seq and ATAC-seq data L8: Downstream analysis of ChIP-seq and ATAC-seq data Shamith Samarajiwa CRUK Bioinformatics Autumn School September 2017 Summary Downstream analysis for extracting meaningful biology : Normalization and

More information

Computational Technique for Improvement of the Position-Weight Matrices for the DNA/Protein Binding Sites

Computational Technique for Improvement of the Position-Weight Matrices for the DNA/Protein Binding Sites Wright State University CORE Scholar Physics Faculty Publications Physics 2005 Computational Technique for Improvement of the Position-Weight Matrices for the DNA/Protein Binding Sites Naum I. Gershenzon

More information

Measuring Protein-DNA interactions

Measuring Protein-DNA interactions Measuring Protein-DNA interactions How is Biological Complexity Achieved? Mediated by Transcription Factors (TFs) 2 Transcription Factors are genetic switches 3 Regulation of Gene Expression by Transcription

More information

Chapter 10: Gene Expression and Regulation

Chapter 10: Gene Expression and Regulation Chapter 10: Gene Expression and Regulation Fact 1: DNA contains information but is unable to carry out actions Fact 2: Proteins are the workhorses but contain no information THUS Information in DNA must

More information

BunDLE-seq (Binding to Designed Library, Extracting and Sequencing) -

BunDLE-seq (Binding to Designed Library, Extracting and Sequencing) - Protocol BunDLE-seq (Binding to Designed Library, Extracting and Sequencing) - A quantitative investigation of various determinants of TF binding; going beyond the characterization of core site Einat Zalckvar*

More information

Year III Pharm.D Dr. V. Chitra

Year III Pharm.D Dr. V. Chitra Year III Pharm.D Dr. V. Chitra 1 Genome entire genetic material of an individual Transcriptome set of transcribed sequences Proteome set of proteins encoded by the genome 2 Only one strand of DNA serves

More information

2/10/17. Contents. Applications of HMMs in Epigenomics

2/10/17. Contents. Applications of HMMs in Epigenomics 2/10/17 I529: Machine Learning in Bioinformatics (Spring 2017) Contents Applications of HMMs in Epigenomics Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Background:

More information

Übung V. Einführung, Teil 1. Transktiptionelle Regulation TFBS

Übung V. Einführung, Teil 1. Transktiptionelle Regulation TFBS Übung V Einführung, Teil 1 Transktiptionelle Regulation TFBS Transcription Factors These proteins promote transcription 1. Bind DNA 2. Activate Transcription These two functions usually reside on separate

More information

NGS Approaches to Epigenomics

NGS Approaches to Epigenomics I519 Introduction to Bioinformatics, 2013 NGS Approaches to Epigenomics Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Contents Background: chromatin structure & DNA methylation Epigenomic

More information

Figure 7.1: PWM evolution: The sequence affinity of TFBSs has evolved from single sequences, to PWMs, to larger and larger databases of PWMs.

Figure 7.1: PWM evolution: The sequence affinity of TFBSs has evolved from single sequences, to PWMs, to larger and larger databases of PWMs. Chapter 7 Discussion This thesis presents dry and wet lab techniques to elucidate the involvement of transcription factors (TFs) in the regulation of the cell cycle and myogenesis. However, the techniques

More information

Introduction to Transcription Factor Binding Sites (TFBS) Cells control the expression of genes using Transcription Factors.

Introduction to Transcription Factor Binding Sites (TFBS) Cells control the expression of genes using Transcription Factors. Identification of Functional Transcription Factor Binding Sites using Closely Related Saccharomyces species Scott W. Doniger 1, Juyong Huh 2, and Justin C. Fay 1,2 1 Computation Biology Program and 2 Department

More information

Computational Analysis of Ultra-high-throughput sequencing data: ChIP-Seq

Computational Analysis of Ultra-high-throughput sequencing data: ChIP-Seq Computational Analysis of Ultra-high-throughput sequencing data: ChIP-Seq Philipp Bucher Wednesday January 21, 2009 SIB graduate school course EPFL, Lausanne Data flow in ChIP-Seq data analysis Level 1:

More information

Introduction to genome biology

Introduction to genome biology Introduction to genome biology Lisa Stubbs Deep transcritpomes for traditional model species from ENCODE (and modencode) Deep RNA-seq and chromatin analysis on 147 human cell types, as well as tissues,

More information

Applications of HMMs in Epigenomics

Applications of HMMs in Epigenomics I529: Machine Learning in Bioinformatics (Spring 2013) Applications of HMMs in Epigenomics Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Background:

More information

2/19/13. Contents. Applications of HMMs in Epigenomics

2/19/13. Contents. Applications of HMMs in Epigenomics 2/19/13 I529: Machine Learning in Bioinformatics (Spring 2013) Contents Applications of HMMs in Epigenomics Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Background:

More information

Introduction to BIOINFORMATICS

Introduction to BIOINFORMATICS COURSE OF BIOINFORMATICS a.a. 2016-2017 Introduction to BIOINFORMATICS What is Bioinformatics? (I) The sinergy between biology and informatics What is Bioinformatics? (II) From: http://www.bioteach.ubc.ca/bioinfo2010/

More information

A Supervised Learning Approach to the Prediction of Hi-C Data

A Supervised Learning Approach to the Prediction of Hi-C Data A Supervised Learning Approach to the Prediction of Hi-C Data Tyler Derr YUE Lab The Department of Biochemistry & Molecular Biology Institute for Personalized Medicine Pennsylvania State University College

More information

Prediction of Transcription Factors that Regulate Common Binding Motifs Dana Wyman and Emily Alsentzer CS 229, Fall 2014

Prediction of Transcription Factors that Regulate Common Binding Motifs Dana Wyman and Emily Alsentzer CS 229, Fall 2014 Prediction of Transcription Factors that Regulate Common Binding Motifs Dana Wyman and Emily Alsentzer CS 229, Fall 2014 Introduction A. Background Proper regulation of mrna levels is essential to nearly

More information

Understanding transcriptional regulation by integrative analysis of transcription factor binding data

Understanding transcriptional regulation by integrative analysis of transcription factor binding data Understanding transcriptional regulation by integrative analysis of transcription factor binding data Cheng et al. 2012 Shu Yang Feb. 21, 2013 1 / 26 Introduction 2 / 26 DNA-binding Proteins sequence-specific

More information

RAFT. Sandesh Prasai. Real And False TFBSs. Medical Technology Submission date: July 2014 Supervisor: Pål Sætrom, IDI Co-supervisor: Finn Drabløs, IKM

RAFT. Sandesh Prasai. Real And False TFBSs. Medical Technology Submission date: July 2014 Supervisor: Pål Sætrom, IDI Co-supervisor: Finn Drabløs, IKM RAFT Real And False TFBSs Sandesh Prasai Medical Technology Submission date: July 2014 Supervisor: Pål Sætrom, IDI Co-supervisor: Finn Drabløs, IKM Norwegian University of Science and Technology Department

More information

27041, Week 02. Review of Week 01

27041, Week 02. Review of Week 01 27041, Week 02 Review of Week 01 The human genome sequencing project (HGP) 2 CBS, Department of Systems Biology Systems Biology and emergent properties 3 CBS, Department of Systems Biology Different model

More information

Applying Machine Learning Strategy in Transcription Factor DNA Bindings Site Prediction

Applying Machine Learning Strategy in Transcription Factor DNA Bindings Site Prediction Applying Machine Learning Strategy in Transcription Factor DNA Bindings Site Prediction Ziliang Qian Key Laboratory of Systems Biology Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences,

More information

Accelerating Genomic Computations 1000X with Hardware

Accelerating Genomic Computations 1000X with Hardware Accelerating Genomic Computations 1000X with Hardware Yatish Turakhia EE PhD candidate Stanford University Prof. Bill Dally (Electrical Engineering and Computer Science) Prof. Gill Bejerano (Computer Science,

More information

Module 2: Core Bioinformatics FINAL EXAM SOLUTIONS

Module 2: Core Bioinformatics FINAL EXAM SOLUTIONS Master in Bioinformatics January 9th, 2013 Universitat Autònoma de Barcelona Module 2: Core Bioinformatics FINAL EXAM SOLUTIONS Question 1: What is the statement that does NOT apply to the FASTA format?

More information

Gene splice sites correlate with nucleosome positions

Gene splice sites correlate with nucleosome positions Gene splice sites correlate with nucleosome positions Simon Kogan and Edward N. Trifonov* Genome Diversity Center, Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel Abstract

More information

DIAMANTINA INSTITUTE for Cancer, Immunology and Metabolic Medicine

DIAMANTINA INSTITUTE for Cancer, Immunology and Metabolic Medicine DIAMANTINA INSTITUTE for Cancer, Immunology and Metabolic Medicine Defining MYB Transcriptional Network by Genome-wide Chromatin Occupancy Profiling (ChIP-Seq) 2010 E.Glazov, L. Zhao Transcription Factors:

More information

Array Informatics. Mark Gerstein

Array Informatics. Mark Gerstein 1 Lectures.GersteinLab.org (c) Array Informatics Mark Gerstein CEGS Informatics Developing Tools and Technical Analyses Related to Genome Technologies Main Genome Technologies Tiling Arrays Next Generation

More information

Motifs. BCH339N - Systems Biology / Bioinformatics Edward Marcotte, Univ of Texas at Austin

Motifs. BCH339N - Systems Biology / Bioinformatics Edward Marcotte, Univ of Texas at Austin Motifs BCH339N - Systems Biology / Bioinformatics Edward Marcotte, Univ of Texas at Austin An example transcriptional regulatory cascade Here, controlling Salmonella bacteria multidrug resistance Sequencespecific

More information

Lecture 7: April 7, 2005

Lecture 7: April 7, 2005 Analysis of Gene Expression Data Spring Semester, 2005 Lecture 7: April 7, 2005 Lecturer: R.Shamir and C.Linhart Scribe: A.Mosseri, E.Hirsh and Z.Bronstein 1 7.1 Promoter Analysis 7.1.1 Introduction to

More information

The ChIP-Seq project. Giovanna Ambrosini, Philipp Bucher. April 19, 2010 Lausanne. EPFL-SV Bucher Group

The ChIP-Seq project. Giovanna Ambrosini, Philipp Bucher. April 19, 2010 Lausanne. EPFL-SV Bucher Group The ChIP-Seq project Giovanna Ambrosini, Philipp Bucher EPFL-SV Bucher Group April 19, 2010 Lausanne Overview Focus on technical aspects Description of applications (C programs) Where to find binaries,

More information

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015 ChIP-Seq Data Analysis J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015 What s the Question? Where do Transcription Factors (TFs) bind genomic DNA 1? (Where do other things bind DNA

More information

ChIP-Seq Tools. J Fass UCD Genome Center Bioinformatics Core Wednesday September 16, 2015

ChIP-Seq Tools. J Fass UCD Genome Center Bioinformatics Core Wednesday September 16, 2015 ChIP-Seq Tools J Fass UCD Genome Center Bioinformatics Core Wednesday September 16, 2015 What s the Question? Where do Transcription Factors (TFs) bind genomic DNA 1? (Where do other things bind DNA or

More information

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 ChIP-Seq Data Analysis J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 What s the Question? Where do Transcription Factors (TFs) bind genomic DNA 1? (Where do other things bind

More information

Transcription start site classification

Transcription start site classification Transcription start site classification Max Libbrecht, Matt Fisher, Roy Frostig, Hrysoula Papadakis, Anshul Kundaje, Serafim Batzoglou December 11, 2009 Abstract Understanding the mechanisms of gene expression

More information

Bioinformatics of Transcriptional Regulation

Bioinformatics of Transcriptional Regulation Bioinformatics of Transcriptional Regulation Carl Herrmann IPMB & DKFZ c.herrmann@dkfz.de Wechselwirkung von Maßnahmen und Auswirkungen Einflussmöglichkeiten in einem Dialog From genes to active compounds

More information

Identification of Conserved Structural Features at Sequentially Degenerate Locations in Transcription Factor Binding Sites

Identification of Conserved Structural Features at Sequentially Degenerate Locations in Transcription Factor Binding Sites Genome Informatics 16(1): 49 58 (2005) 49 Identification of Conserved Structural Features at Sequentially Degenerate Locations in Transcription Factor Binding Sites Heather E. Burden 1,2 Zhiping Weng 1,2

More information

Discovery of Transcription Factor Binding Sites with Deep Convolutional Neural Networks

Discovery of Transcription Factor Binding Sites with Deep Convolutional Neural Networks Discovery of Transcription Factor Binding Sites with Deep Convolutional Neural Networks Reesab Pathak Dept. of Computer Science Stanford University rpathak@stanford.edu Abstract Transcription factors are

More information

Machine Learning. HMM applications in computational biology

Machine Learning. HMM applications in computational biology 10-601 Machine Learning HMM applications in computational biology Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Biological data is rapidly

More information

Introduction to genome biology

Introduction to genome biology Introduction to genome biology Lisa Stubbs We ve found most genes; but what about the rest of the genome? Genome size* 12 Mb 95 Mb 170 Mb 1500 Mb 2700 Mb 3200 Mb #coding genes ~7000 ~20000 ~14000 ~26000

More information

Supplementary table 1: List of sequences of primers used in sequenom assay

Supplementary table 1: List of sequences of primers used in sequenom assay Supplementary table 1: List of sequences of primers used in sequenom assay SNP_ID 2nd-PCRP Sequence 1st-PCRP Sequence Allele specific (iplex) iplex primer primer Direction ROCK2 1 rs978906 ACGTTGGATGATAAAGCTCTCTCGGCAGTC

More information

Discovering gene regulatory control using ChIP-chip and ChIP-seq. Part 1. An introduction to gene regulatory control, concepts and methodologies

Discovering gene regulatory control using ChIP-chip and ChIP-seq. Part 1. An introduction to gene regulatory control, concepts and methodologies Discovering gene regulatory control using ChIP-chip and ChIP-seq Part 1 An introduction to gene regulatory control, concepts and methodologies Ian Simpson ian.simpson@.ed.ac.uk http://bit.ly/bio2links

More information

Genome-Scale Predictions of the Transcription Factor Binding Sites of Cys 2 His 2 Zinc Finger Proteins in Yeast June 17 th, 2005

Genome-Scale Predictions of the Transcription Factor Binding Sites of Cys 2 His 2 Zinc Finger Proteins in Yeast June 17 th, 2005 Genome-Scale Predictions of the Transcription Factor Binding Sites of Cys 2 His 2 Zinc Finger Proteins in Yeast June 17 th, 2005 John Brothers II 1,3 and Panayiotis V. Benos 1,2 1 Bioengineering and Bioinformatics

More information

Discovering gene regulatory control using ChIP-chip and ChIP-seq. An introduction to gene regulatory control, concepts and methodologies

Discovering gene regulatory control using ChIP-chip and ChIP-seq. An introduction to gene regulatory control, concepts and methodologies Discovering gene regulatory control using ChIP-chip and ChIP-seq An introduction to gene regulatory control, concepts and methodologies Ian Simpson ian.simpson@.ed.ac.uk bit.ly/bio2_2012 The Central Dogma

More information

Nature Biotechnology: doi: /nbt Supplementary Figure 1

Nature Biotechnology: doi: /nbt Supplementary Figure 1 Supplementary Figure 1 An extended version of Figure 2a, depicting multi-model training and reverse-complement mode To use the GPU s full computational power, we train several independent models in parallel

More information

In Silico Transcription Factor Binding Site Prediction How To Improve?

In Silico Transcription Factor Binding Site Prediction How To Improve? Frequency 31/03/2014 In Silico Transcription Factor Binding Site Prediction How To Improve? Pieter De Bleser, Ph.D. pieterdb@irc.vib-ugent.be Credits: R. Bruskiewich and F. Brinkman, MBB with material

More information

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017 Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017 Agenda What is Functional Genomics? RNA Transcription/Gene Expression Measuring Gene

More information

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis Gene expression analysis Biosciences 741: Genomics Fall, 2013 Week 5 Gene expression analysis From EST clusters to spotted cdna microarrays Long vs. short oligonucleotide microarrays vs. RT-PCR Methods

More information

Editorial. Current Computational Models for Prediction of the Varied Interactions Related to Non-Coding RNAs

Editorial. Current Computational Models for Prediction of the Varied Interactions Related to Non-Coding RNAs Editorial Current Computational Models for Prediction of the Varied Interactions Related to Non-Coding RNAs Xing Chen 1,*, Huiming Peng 2, Zheng Yin 3 1 School of Information and Electrical Engineering,

More information

Solutions will be posted on the web.

Solutions will be posted on the web. MIT Biology Department 7.012: Introductory Biology - Fall 2004 Instructors: Professor Eric Lander, Professor Robert A. Weinberg, Dr. Claudette Gardel NAME TA SEC 7.012 Problem Set 7 FRIDAY December 3,

More information

Chromatin. Structure and modification of chromatin. Chromatin domains

Chromatin. Structure and modification of chromatin. Chromatin domains Chromatin Structure and modification of chromatin Chromatin domains 2 DNA consensus 5 3 3 DNA DNA 4 RNA 5 ss RNA forms secondary structures with ds hairpins ds forms 6 of nucleic acids Form coiling bp/turn

More information

Bio5488 Practice Midterm (2018) 1. Next-gen sequencing

Bio5488 Practice Midterm (2018) 1. Next-gen sequencing 1. Next-gen sequencing 1. You have found a new strain of yeast that makes fantastic wine. You d like to sequence this strain to ascertain the differences from S. cerevisiae. To accurately call a base pair,

More information

A Brief History. Bootstrapping. Bagging. Boosting (Schapire 1989) Adaboost (Schapire 1995)

A Brief History. Bootstrapping. Bagging. Boosting (Schapire 1989) Adaboost (Schapire 1995) A Brief History Bootstrapping Bagging Boosting (Schapire 1989) Adaboost (Schapire 1995) What s So Good About Adaboost Improves classification accuracy Can be used with many different classifiers Commonly

More information

Article Predicting Variation of DNA Shape Preferences in Protein-DNA Interaction in Cancer Cells with a New Biophysical Model

Article Predicting Variation of DNA Shape Preferences in Protein-DNA Interaction in Cancer Cells with a New Biophysical Model Article Predicting Variation of DNA Shape Preferences in Protein-DNA Interaction in Cancer Cells with a New Biophysical Model Kirill Batmanov and Junbai Wang * Department of Pathology, Oslo University

More information

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation AGACAAAGATCCGCTAAATTAAATCTGGACTTCACATATTGAAGTGATATCACACGTTTCTCTAAT AATCTCCTCACAATATTATGTTTGGGATGAACTTGTCGTGATTTGCCATTGTAGCAATCACTTGAA

More information

A Repressor Complex Governs the Integration of

A Repressor Complex Governs the Integration of Developmental Cell 15 Supplemental Data A Repressor Complex Governs the Integration of Flowering Signals in Arabidopsis Dan Li, Chang Liu, Lisha Shen, Yang Wu, Hongyan Chen, Masumi Robertson, Chris A.

More information

Promoter Prediction (really) 10/26/05

Promoter Prediction (really) 10/26/05 10/26/05 Promoter Prediction (really!) Announcements BCB Link for Seminar Schedules (updated) http://www.bcb.iastate.edu/seminars/inde.html Seminar (Fri Oct 28) 12:10 PM BCB Faculty Seminar in E164 Lagomarcino

More information

Non-coding Function & Variation, MPRAs. Mike White Bio5488 3/5/18

Non-coding Function & Variation, MPRAs. Mike White Bio5488 3/5/18 Non-coding Function & Variation, MPRAs Mike White Bio5488 3/5/18 Outline MONDAY Non-coding function and variation The barcode Basic versions of MRPA technology WEDNESDAY More varieties of MRPAs Some key

More information

Molecular Cell Biology - Problem Drill 06: Genes and Chromosomes

Molecular Cell Biology - Problem Drill 06: Genes and Chromosomes Molecular Cell Biology - Problem Drill 06: Genes and Chromosomes Question No. 1 of 10 1. Which of the following statements about genes is correct? Question #1 (A) Genes carry the information for protein

More information

Interaktionen und Modifikationen von RNAs und Proteinen RNA-Protein Interactions II

Interaktionen und Modifikationen von RNAs und Proteinen RNA-Protein Interactions II Interaktionen und Modifikationen von RNAs und Proteinen RNA-Protein Interactions II (Modul 10-202-2208; Spezialvorlesung) Jörg Fallmann Institute for Bioinformatics University of Leipzig 11.05.2018 1 /

More information

Computational Genomics. Irit Gat-Viks & Ron Shamir & Haim Wolfson Fall

Computational Genomics. Irit Gat-Viks & Ron Shamir & Haim Wolfson Fall Computational Genomics Irit Gat-Viks & Ron Shamir & Haim Wolfson Fall 2015-16 1 What s in class this week Motivation Administrata Some very basic biology Some very basic biotechnology Examples of our type

More information

High-throughput Transcriptome analysis

High-throughput Transcriptome analysis High-throughput Transcriptome analysis CAGE and beyond Dr. Rimantas Kodzius, Singapore, A*STAR, IMCB rkodzius@imcb.a-star.edu.sg for KAUST 2008 Agenda 1. Current research - PhD work on discovery of new

More information

Computational Systems Biology Deep Learning in the Life Sciences

Computational Systems Biology Deep Learning in the Life Sciences Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490 HST.506 Christina Ji April 6, 2017 DanQ: a hybrid convolutional and recurrent deep neural network for quantifying

More information

Sequence Analysis. II: Sequence Patterns and Matrices. George Bell, Ph.D. WIBR Bioinformatics and Research Computing

Sequence Analysis. II: Sequence Patterns and Matrices. George Bell, Ph.D. WIBR Bioinformatics and Research Computing Sequence Analysis II: Sequence Patterns and Matrices George Bell, Ph.D. WIBR Bioinformatics and Research Computing Sequence Patterns and Matrices Multiple sequence alignments Sequence patterns Sequence

More information

Activation of a Floral Homeotic Gene in Arabidopsis

Activation of a Floral Homeotic Gene in Arabidopsis Activation of a Floral Homeotic Gene in Arabidopsis By Maximiliam A. Busch, Kirsten Bomblies, and Detlef Weigel Presentation by Lis Garrett and Andrea Stevenson http://ucsdnews.ucsd.edu/archive/graphics/images/image5.jpg

More information

Supplementary Material

Supplementary Material Reverse Transcriptase-Mediated Tropism Switching in Bordetella Bacteriophage Minghsun Liu, Rajendar Deora, Sergei R. Doulatov, Mari Gingery, Frederick A. Eiserling, Andrew Preston, Duncan J. Maskell, Robert

More information

Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model

Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model Robin P. Smith 1,2*, Leila Taher 3,4* Rupali P. Patwardhan 5, Mee J. Kim 1,2, Fumitaka Inoue 1,2,

More information

Effective Placement of. Raymond J. Peterson, Ph.D.

Effective Placement of. Raymond J. Peterson, Ph.D. Effective Placement of LNA into Q-PCR Q Probes Raymond J. Peterson, Ph.D. Why Use LNA in Q-PCR? Q Proper design enables stable & specific hybridization for primers & probes Goals: Ensure sufficient signal

More information

Gene and DNA structure. Dr Saeb Aliwaini

Gene and DNA structure. Dr Saeb Aliwaini Gene and DNA structure Dr Saeb Aliwaini 2016 DNA during cell cycle Cell cycle for different cell types Molecular Biology - "Study of the synthesis, structure, and function of macromolecules (DNA, RNA,

More information

Learning Methods for DNA Binding in Computational Biology

Learning Methods for DNA Binding in Computational Biology Learning Methods for DNA Binding in Computational Biology Mark Kon Dustin Holloway Yue Fan Chaitanya Sai Charles DeLisi Boston University IJCNN Orlando August 16, 2007 Outline Background on Transcription

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics If the 19 th century was the century of chemistry and 20 th century was the century of physic, the 21 st century promises to be the century of biology...professor Dr. Satoru

More information

File S1. Program overview and features

File S1. Program overview and features File S1 Program overview and features Query list filtering. Further filtering may be applied through user selected query lists (Figure. 2B, Table S3) that restrict the results and/or report specifically

More information

Title: Genome-Wide Predictions of Transcription Factor Binding Events using Multi- Dimensional Genomic and Epigenomic Features Background

Title: Genome-Wide Predictions of Transcription Factor Binding Events using Multi- Dimensional Genomic and Epigenomic Features Background Title: Genome-Wide Predictions of Transcription Factor Binding Events using Multi- Dimensional Genomic and Epigenomic Features Team members: David Moskowitz and Emily Tsang Background Transcription factors

More information

Ensembl Funcgen: A Database and API for Epigenomics and Gene Regulation Data.

Ensembl Funcgen: A Database and API for Epigenomics and Gene Regulation Data. Ensembl Funcgen: A Database and API for Epigenomics and Gene Regulation Data. Nathan Johnson Ensembl Regulation EBI is an Outstation of the European Molecular Biology Laboratory.! Workshop Overview http://www.ebi.ac.uk/~njohnson/courses/23.05.2013-

More information

ChIP-seq data analysis with Chipster. Eija Korpelainen CSC IT Center for Science, Finland

ChIP-seq data analysis with Chipster. Eija Korpelainen CSC IT Center for Science, Finland ChIP-seq data analysis with Chipster Eija Korpelainen CSC IT Center for Science, Finland chipster@csc.fi What will I learn? Short introduction to ChIP-seq Analyzing ChIP-seq data Central concepts Analysis

More information

Charles Girardot, Furlong Lab. MACS, CisGenome, SISSRs and other peak calling algorithms: differences and practical use

Charles Girardot, Furlong Lab. MACS, CisGenome, SISSRs and other peak calling algorithms: differences and practical use Charles Girardot, Furlong Lab MACS, CisGenome, SISSRs and other peak calling algorithms: differences and practical use ChIP-Seq signal properties Only 5 ends of ChIPed fragments are sequenced Shifted read

More information

RNA-Seq Now What? BIS180L Professor Maloof May 24, 2018

RNA-Seq Now What? BIS180L Professor Maloof May 24, 2018 RNA-Seq Now What? BIS180L Professor Maloof May 24, 2018 We have differentially expressed genes, what do we want to know about them? We have differentially expressed genes, what do we want to know about

More information

Computational Genomics

Computational Genomics Computational Genomics http://www.cs.cmu.edu/~02710 Ziv Bar-Joseph zivbj@cs.cmu.edu GHC 8006 Chakra Chennubhotla chakracs@pitt.edu Suite 3064, BST3 Topics Introduction (1 Week) Sequence analysis(4 weeks)

More information

CSC 2427: Algorithms in Molecular Biology Lecture #14

CSC 2427: Algorithms in Molecular Biology Lecture #14 CSC 2427: Algorithms in Molecular Biology Lecture #14 Lecturer: Michael Brudno Scribe Note: Hyonho Lee Department of Computer Science University of Toronto 03 March 2006 Microarrays Revisited In the last

More information

Quantifying family-wise specificity of intramolecular flanking region flexibility and

Quantifying family-wise specificity of intramolecular flanking region flexibility and COMP 680-Final Project Quantifying family-wise specificity of intramolecular flanking region flexibility and structural motif interactions as features for transcription factor binding site classification

More information

Supplementary Fig. S1. Building a training set of cardiac enhancers. (A-E) Empirical validation of candidate enhancers containing matches to Twi and

Supplementary Fig. S1. Building a training set of cardiac enhancers. (A-E) Empirical validation of candidate enhancers containing matches to Twi and Supplementary Fig. S1. Building a training set of cardiac enhancers. (A-E) Empirical validation of candidate enhancers containing matches to Twi and Tin TFBS motifs and located in the flanking or intronic

More information

Genes - DNA - Chromosome. Chutima Talabnin Ph.D. School of Biochemistry,Institute of Science, Suranaree University of Technology

Genes - DNA - Chromosome. Chutima Talabnin Ph.D. School of Biochemistry,Institute of Science, Suranaree University of Technology Genes - DNA - Chromosome Chutima Talabnin Ph.D. School of Biochemistry,Institute of Science, Suranaree University of Technology DNA Cellular DNA contains genes and intragenic regions both of which may

More information

Predicting tissue specific cis-regulatory modules in the human genome using pairs of co-occurring motifs

Predicting tissue specific cis-regulatory modules in the human genome using pairs of co-occurring motifs BMC Bioinformatics This Provisional PDF corresponds to the article as it appeared upon acceptance. Fully formatted PDF and full text (HTML) versions will be made available soon. Predicting tissue specific

More information

Chapter 24: Promoters and Enhancers

Chapter 24: Promoters and Enhancers Chapter 24: Promoters and Enhancers A typical gene transcribed by RNA polymerase II has a promoter that usually extends upstream from the site where transcription is initiated the (#1) of transcription

More information

Introduction to Bioinformatics Online Course: IBT

Introduction to Bioinformatics Online Course: IBT Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec5: Interpreting your MSA Using Logos Using Logos - Logos are a terrific way to generate

More information

ChIP-seq and RNA-seq. Farhat Habib

ChIP-seq and RNA-seq. Farhat Habib ChIP-seq and RNA-seq Farhat Habib fhabib@iiserpune.ac.in Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions

More information

Nucleic acids. How DNA works. DNA RNA Protein. DNA (deoxyribonucleic acid) RNA (ribonucleic acid) Central Dogma of Molecular Biology

Nucleic acids. How DNA works. DNA RNA Protein. DNA (deoxyribonucleic acid) RNA (ribonucleic acid) Central Dogma of Molecular Biology Nucleic acid chemistry and basic molecular theory Nucleic acids DNA (deoxyribonucleic acid) RNA (ribonucleic acid) Central Dogma of Molecular Biology Cell cycle DNA RNA Protein Transcription Translation

More information