Deep learning frameworks for regulatory genomics and epigenomics

Size: px
Start display at page:

Download "Deep learning frameworks for regulatory genomics and epigenomics"

Transcription

1 Deep learning frameworks for regulatory genomics and epigenomics Chuan Sheng Foo Avanti Shrikumar Nicholas Sinnott- Armstrong ANSHUL KUNDAJE Genetics, Computer science Stanford University Johnny Israeli 1

2 Local chromatin architecture of histone marks regulatory elements TFs nucleosomes sequence motifs Adapted from Shlyueva et al. (2014) Nature Reviews Genetics. 2

3 Chromatin states Combinatorial chromatin states define broad classes of elements Promoter Transcribed Enhancer CTCF Heterochromatin Poised Repressed Quiescent Thurman et al. (2012) Nature

4 ATAC-seq: genome-wide chromatin accessibility from low input material Paired-end sequencing ATAC-seq peaks identify chromatin accessible regulatory elements

5 ATAC-seq reveals chromatin architecture in genome-wide fragment length distributions Buenrostro et al. (2013) Nature Methods.

6 Chromatin architecture reflects chromatin state Fragment lengths Promoter CTCF Enhancer Bivalent Transcribed Heterochromatin Represesed Quiescent

7 Fraglen Position-aware 2D fragment length distributions (V-plots) 400 bp Aggregate plot for CTCF state Positioned nucleosome 50 bp -1000bp Accessible binding site Peak summit (Midpoint of fragment) +1000bp Plot at single CTCF site sparse and noisy V-plots were first introduced by Henikoff et al. 2011, PNAS

8 Can we predict chromatin states/histone marks at ATAC-peaks? Which of 8 chromatin states? 2kb around ATAC-peak Image classification task! Which histone mark is present?

9 Deep neural networks (DNNs) for image classification Output label: Trump? Or Sanders? Input: image pixel values Lee et. al. (2009), ICML 9

10 Deep neural networks (DNNs) for V- plot classification Output label: Chromatin state/histone mark Input: V-plots Chromatin state/histone mark Nucleosome arrays, Large regions of open chromatin +1 / -1 nucleosome, Segments of open chromatin Nucleosomes, Open chromatin 10

11 An artificial neuron Sigmoid ReLu (Rectified Linear Unit)

12 Convolutional filters Input image Conv. Filter Feature map

13 Convolutional filters Input image Feature map

14 Convolutional layer: multiple filters learn distinct features

15 Pooling layers: locally smooth signal

16 How does a deep conv. neural network transform the raw V-plot input at each layer -1Kb 500 Pure CTCF Promoter Enhancer Kb Chromatin State Class Probabilities 2nd Fully Connected Layer 1st Fully Connected Layer 3rd Smoothing 2nd set of Convolutional Maps 2nd Smoothing 1st set of Convolutional Maps Initial Smoothing V-Plot Input (300 x 2001)

17 After initial pooling (smoothing) Pure CTCF Chromatin State Class Probabilities 2nd Fully Connected Layer 1st Fully Connected Layer Promoter 3rd Smoothing 2nd set of Convolutional Maps 2nd Smoothing Enhancer 1st set of Convolutional Maps Initial Smoothing V-Plot Input (300 x 2001)

18 Second set of convolutional maps Pure CTCF Chromatin State Class Probabilities 2nd Fully Connected Layer Promoter 1st Fully Connected Layer 3rd Smoothing 2nd set of Convolutional Maps 2nd Smoothing Enhancer 1st set of Convolutional Maps Initial Smoothing V-Plot Input (300 x 2001)

19 Learning from multiple 1D functional data (e.g. DNase, MNase) Chromatin State Class Probabilities 2 nd FC Layer 1 st FC Layer 3 rd Convolution Layer 2 nd Convolution Layer 1 st Convolution Layer 1D MNase signal (1 x 2001) 3 rd Convolution Layer 2 nd Convolution Layer 1 st Convolution Layer 1D DNase signal (1 x 2001) Scan DNase profile using filter 19

20 Learning from raw DNA sequence Class Probabilities Higher layers learn motif combinations Score sequence using filters Convolutional layers learn motif (PWM) like filters

21 The Chromputer Integrating multiple inputs (1D, 2D signals, sequence) to simulatenously predict multiple outputs H3K9me 3 H2A.Z H3K36me3 H3K27me3 H3K4me1 Class Probabilities 2 nd FC Layer 1 st FC Layer H3K4me3 TF Binding Chromatin State Multi-task learning 2nd Combined FC Layer 1st Combined FC Layer 3rd Smoothing 2nd set of Convolutional Maps 2nd Smoothing 1st set of Convolutional Maps 2nd Combined FC Layer 1st Combined FC Layer 3rd Smoothing 2nd set of Convolutional Maps 2nd Smoothing 1st set of Convolutional Maps 2nd Combined FC Layer 1st Combined FC Layer 3rd Smoothing 2nd set of Convolutional Maps 2nd Smoothing 1st set of Convolutional Maps Initial Smoothing Initial Smoothing Initial Smoothing V-Plot Input (300 x 2001)

22 Chromatin architecture can predict chromatin state in held out chromosome (same cell type) Model + Input data types Majority class (baseline) 42% Gene proximity 59% Random Forest: ATAC-seq (150M reads) 61% Chromputer: DNase (60M reads) 68.1% Chromputer: Mnase (1.5B reads) 69.3% Chromputer: ATAC-seq (150M reads) 75.9% Chromputer: DNase + MNase 81.6% Chromputer: ATAC-seq + sequence 83.5% Chromputer: DNase + MNase + sequence 86.2% Label accuracy across replicates (upper bound) 88% 8-class chromatin state accuracy (%)

23 High cross cell-type chromatin state prediction Learn model on DNase and MNase only Learn on GM12878, predict on K562 (and vice versa) Requires local normalization to make signal comparable 8 class chromatin state accuracy Train / Test GM12878 K562 GM K

24 Predicting individual histone marks from ATAC/DNase/MNase/Sequence Area under Precision recall curve CTCF H3K27ac H3K4me3 H3K4me1 H3K9ac H2Az H3K36me3 H3K27me3 H3K9me3

25 Chromputer trained on TF ChIP-seq predicts cross cell-type in-vivo TF binding with high accuracy Area under Precision recall curve Chromputer Inputs: Seq + DNA shape + DNase profile Positives: Reproducible ChIP-seq peaks Negatives: All other DNase peaks + flanks + matched random sites c-myc YY1 CTCF Test sets: Held out chromosomes in held out cell types 25

26 DeepLIFT: Scoring predictive power of features in Deep Neural Networks LIFT: Linear Importance Feature Tracker, or LIFTing the top off the black box. Provides a predictive importance score for any raw input feature (e.g. pixels in V-Plot images, each nucleotide in sequence) intermediate learned features (e.g. convolutional filters) Linear breakdown of contribution of each input to immediate outputs Recursively apply to get contribution of any input to any output Can be computed efficiently with a single backpropagation (unlike insilico mutagenesis) Less susceptible to buffering effects than in-silico mutagenesis Technical details: ReLU networks: equivalent to Taylor approximation of change in softmax/sigmoid logit if input eliminated. i.e. gradient (w.r.t logit) * input

27 What architecture properties of the ATAC-seq V- plots predict different chromatin states? CTCF state: centered binding, symmetric phased nucleosomes Enhancer state: localized signal, heterogeneity Promoter state: broad regions of accessible chromatin what is the change in classification probability relative to an unbiased classifier if we *only* consider the contributions from each pixel

28 Architectural heterogeneity of accessible elements in different chromatin states Low dimensional t-sne embedding (like PCA) of accessible sites (colored by chromatin state) Distinct classes of enhancers 28

29 Top scoring MNase filters and activating input patterns for CTCF state Top scoring MNase filters Maximally activating input MNase profiles Well phased arrays of nucleosomes 29

30 Top scoring MNase filters and activating input patterns for promoter state Top scoring MNase filters Maximally activating input MNase profiles Asymmetric positioning with well positioned +1/-1 nucleosomes 30

31 What useful patterns can we extract from raw DNA sequence models? Gata1 Motif discovery! What PWM like filters are predictive? G A T A A G C A T T A C C G A T A A Which nucleotides in input sequence are contributing to binding!

32 Top sequence filters for CTCF state Canonical motif 32

33 High resolution point binding events and sequence grammars at CTCF peaks MNase-seq DNase-seq CTCF ChIP-seq Nuc. level importance (height of letter) shows coordination of multiple point binding events 33

34 Context-specific reuse of regulatory sequence in chromatin accessibility changes during hematopoiesis Ryan Corces-Zimmerman Jason Buenrostro Will Greenleaf Howard Chang Ravi Majeti ATAC-seq

35 Deep learning sequence determinants of chromatin accessibility Output: Accessible (+1) vs. not accessible (0) HSC B-cells Erythroid Peyton Greenside G C A T T A C C G A T A A Input: Raw DNA sequence

36 Context-specific re-use of regulatory sequence in HSC, B-cells and Erythroblasts Gata (Rc) Position along sequence Gata Gata (Rc) SPI1 Peyton Greenside

37 Importance in HSC s Context-specific re-use of regulatory sequence in HSC, B-cells and Erythroblasts ATAC-seq SPI1 ChIP-seq GATA1 ChIP-seq?? Position along sequence SPI1 Peyton Greenside

38 Importance in B-cells Context-specific re-use of regulatory sequence in HSC, B-cells and Erythroblasts ATAC-seq SPI1 ChIP-seq GATA1 ChIP-seq No peak No peak Not expressed Position along sequence Peyton Greenside

39 Importance in Erythroid Context-specific re-use of regulatory sequence in HSC, B-cells and Erythroblasts ATAC-seq SPI1 ChIP-seq GATA1 ChIP-seq Gata (Rc) Position along sequence Gata Gata (Rc) SPI1 Peyton Greenside

40 YY1 & GATA and much, much more GATA, SPI1, RUNX2 STAT1 & GATA TEAD4 & GATA AP1 in B-cells only ETS & GATA

41 Summary and ongoing work New predictive deep learning framework (Chromputer) for integrative genomics New interpretation engine for deep learning models. We can extract predictive features (motifs, grammars, footprints, architecture features) from the deep neural networks Local chromatin architecture is predictive of chromatin state and histone marks within and across cell types We can predict in-vivo binding profiles of TFs in new cell types from sequence + shape + DNase/ATAC-seq with high accuracy Context-specific reuse of sequence grammars in accessible sites Extensions: From binary to continuous signal prediction Extensions: Functional variant (QTL, GWAS, rare variant) prediction from raw sequence models 41 bioarxiv paper and code coming soon!

42 Acknowledgements Kundaje Lab members Chuan Sheng Foo Funding Avanti Shrikumar Nicholas Sinnott- Armstrong U01HG (GGR) U41-HG S1 R01ES Johnny Israeli Conflict of Interest: Deep Genomics (SAB), Epinomics (SAB) Rahul Mohan Nathan Boley Peyton Greenside Will Greenleaf 42

43 Guess the element from the V-plot AI vs. human What is this regulatory element? Pure CTCF, Promoter, or Enhancer?

44 Its an enhancer! 1st Fully Connected Layer 2nd Fully Connected Layer Enhancer (correct) 44

Discovery of Transcription Factor Binding Sites with Deep Convolutional Neural Networks

Discovery of Transcription Factor Binding Sites with Deep Convolutional Neural Networks Discovery of Transcription Factor Binding Sites with Deep Convolutional Neural Networks Reesab Pathak Dept. of Computer Science Stanford University rpathak@stanford.edu Abstract Transcription factors are

More information

Supplementary Table 1: Oligo designs. A list of ATAC-seq oligos used for PCR.

Supplementary Table 1: Oligo designs. A list of ATAC-seq oligos used for PCR. Ad1_noMX: Ad2.1_TAAGGCGA Ad2.2_CGTACTAG Ad2.3_AGGCAGAA Ad2.4_TCCTGAGC Ad2.5_GGACTCCT Ad2.6_TAGGCATG Ad2.7_CTCTCTAC Ad2.8_CAGAGAGG Ad2.9_GCTACGCT Ad2.10_CGAGGCTG Ad2.11_AAGAGGCA Ad2.12_GTAGAGGA Ad2.13_GTCGTGAT

More information

The ENCODE Encyclopedia. & Variant Annotation Using RegulomeDB and HaploReg

The ENCODE Encyclopedia. & Variant Annotation Using RegulomeDB and HaploReg The ENCODE Encyclopedia & Variant Annotation Using RegulomeDB and HaploReg Jill E. Moore Weng Lab University of Massachusetts Medical School October 10, 2015 Where s the Encyclopedia? ENCODE: Encyclopedia

More information

Decoding Chromatin States with Epigenome Data Advanced Topics in Computa8onal Genomics

Decoding Chromatin States with Epigenome Data Advanced Topics in Computa8onal Genomics Decoding Chromatin States with Epigenome Data 02-715 Advanced Topics in Computa8onal Genomics HMMs for Decoding Chromatin States Epigene8c modifica8ons of the genome have been associated with Establishing

More information

The ChIP-Seq project. Giovanna Ambrosini, Philipp Bucher. April 19, 2010 Lausanne. EPFL-SV Bucher Group

The ChIP-Seq project. Giovanna Ambrosini, Philipp Bucher. April 19, 2010 Lausanne. EPFL-SV Bucher Group The ChIP-Seq project Giovanna Ambrosini, Philipp Bucher EPFL-SV Bucher Group April 19, 2010 Lausanne Overview Focus on technical aspects Description of applications (C programs) Where to find binaries,

More information

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015 ChIP-Seq Data Analysis J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015 What s the Question? Where do Transcription Factors (TFs) bind genomic DNA 1? (Where do other things bind DNA

More information

Ensembl Funcgen: A Database and API for Epigenomics and Gene Regulation Data.

Ensembl Funcgen: A Database and API for Epigenomics and Gene Regulation Data. Ensembl Funcgen: A Database and API for Epigenomics and Gene Regulation Data. Nathan Johnson Ensembl Regulation EBI is an Outstation of the European Molecular Biology Laboratory.! Workshop Overview http://www.ebi.ac.uk/~njohnson/courses/23.05.2013-

More information

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University Machine learning applications in genomics: practical issues & challenges Yuzhen Ye School of Informatics and Computing, Indiana University Reference Machine learning applications in genetics and genomics

More information

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist Whole Transcriptome Analysis of Illumina RNA- Seq Data Ryan Peters Field Application Specialist Partek GS in your NGS Pipeline Your Start-to-Finish Solution for Analysis of Next Generation Sequencing Data

More information

ChIP-seq/Functional Genomics/Epigenomics. CBSU/3CPG/CVG Next-Gen Sequencing Workshop. Josh Waterfall. March 31, 2010

ChIP-seq/Functional Genomics/Epigenomics. CBSU/3CPG/CVG Next-Gen Sequencing Workshop. Josh Waterfall. March 31, 2010 ChIP-seq/Functional Genomics/Epigenomics CBSU/3CPG/CVG Next-Gen Sequencing Workshop Josh Waterfall March 31, 2010 Outline Introduction to ChIP-seq Control data sets Peak/enriched region identification

More information

Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior- Enhanced Read Mapping

Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior- Enhanced Read Mapping RESEARCH ARTICLE Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior- Enhanced Read Mapping Xin Zeng 1,BoLi 2, Rene Welch 1, Constanza

More information

Green Center Computational Core ChIP- Seq Pipeline, Just a Click Away

Green Center Computational Core ChIP- Seq Pipeline, Just a Click Away Green Center Computational Core ChIP- Seq Pipeline, Just a Click Away Venkat Malladi Computational Biologist Computational Core Cecil H. and Ida Green Center for Reproductive Biology Science Introduc

More information

Nature Biotechnology: doi: /nbt Supplementary Figure 1

Nature Biotechnology: doi: /nbt Supplementary Figure 1 Supplementary Figure 1 An extended version of Figure 2a, depicting multi-model training and reverse-complement mode To use the GPU s full computational power, we train several independent models in parallel

More information

ChampionChIP Quick, High Throughput Chromatin Immunoprecipitation Assay System

ChampionChIP Quick, High Throughput Chromatin Immunoprecipitation Assay System ChampionChIP Quick, High Throughput Chromatin Immunoprecipitation Assay System Liyan Pang, Ph.D. Application Scientist 1 Topics to be Covered Introduction What is ChIP-qPCR? Challenges Facing Biological

More information

Introduction to genome biology

Introduction to genome biology Introduction to genome biology Lisa Stubbs We ve found most genes; but what about the rest of the genome? Genome size* 12 Mb 95 Mb 170 Mb 1500 Mb 2700 Mb 3200 Mb #coding genes ~7000 ~20000 ~14000 ~26000

More information

Introduction to ChIP Seq data analyses. Acknowledgement: slides taken from Dr. H

Introduction to ChIP Seq data analyses. Acknowledgement: slides taken from Dr. H Introduction to ChIP Seq data analyses Acknowledgement: slides taken from Dr. H Wu @Emory ChIP seq: Chromatin ImmunoPrecipitation it ti + sequencing Same biological motivation as ChIP chip: measure specific

More information

PIP-seq. Cells. Permanganate ChIP-Seq

PIP-seq. Cells. Permanganate ChIP-Seq PIP-seq ells Formaldehyde Permanganate 5 Harvest Lyse Sonicate First dapter Ligation 3 3 5 hip Elute Reverse rosslinks Piperidine cleavage 5 3 3 5 Primer Extension Second dapter Ligation 5 3 3 5 Deep Sequencing

More information

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 Topics Genetic variation Population structure Linkage disequilibrium Natural disease variants Genome Wide Association Studies Gene

More information

Identifying and mitigating bias in next-generation sequencing methods for chromatin biology

Identifying and mitigating bias in next-generation sequencing methods for chromatin biology STUDY DESIGNS Identifying and mitigating bias in next-generation sequencing methods for chromatin biology Clifford A. Meyer and X. Shirley Liu Abstract Next-generation sequencing (NGS) technologies have

More information

Occupancy by key transcription factors is a more accurate predictor of enhancer activity than histone modifications or chromatin accessibility

Occupancy by key transcription factors is a more accurate predictor of enhancer activity than histone modifications or chromatin accessibility Dogan et al. Epigenetics & Chromatin (2015) 8:16 DOI 10.1186/s13072-015-0009-5 RESEARCH Open Access Occupancy by key transcription factors is a more accurate predictor of enhancer activity than histone

More information

Bioinformatics of Transcriptional Regulation

Bioinformatics of Transcriptional Regulation Bioinformatics of Transcriptional Regulation Carl Herrmann IPMB & DKFZ c.herrmann@dkfz.de Wechselwirkung von Maßnahmen und Auswirkungen Einflussmöglichkeiten in einem Dialog From genes to active compounds

More information

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1 Supplementary Figure 1 Origin use and efficiency are similar among WT, rrm3, pif1-m2, and pif1-m2; rrm3 strains. A. Analysis of fork progression around confirmed and likely origins (from cerevisiae.oridb.org).

More information

WORKSHOP. Transcriptional circuitry and the regulatory conformation of the genome. Ofir Hakim Faculty of Life Sciences

WORKSHOP. Transcriptional circuitry and the regulatory conformation of the genome. Ofir Hakim Faculty of Life Sciences WORKSHOP Transcriptional circuitry and the regulatory conformation of the genome Ofir Hakim Faculty of Life Sciences Chromosome conformation capture (3C) Most GR Binding Sites Are Distant From Regulated

More information

SNPs - GWAS - eqtls. Sebastian Schmeier

SNPs - GWAS - eqtls. Sebastian Schmeier SNPs - GWAS - eqtls s.schmeier@gmail.com http://sschmeier.github.io/bioinf-workshop/ 17.08.2015 Overview Single nucleotide polymorphism (refresh) SNPs effect on genes (refresh) Genome-wide association

More information

BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations

BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations Published online 21 July 2015 Nucleic Acids Research, 2015, Vol. 43, No. 21 e147 doi: 10.1093/nar/gkv733 BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations Junbai

More information

WHY STUDY GENOMICS OF GENE REGULATION? Genomics of Gene Regulation 1. Motifs, Conservation, and Epigenetic Features

WHY STUDY GENOMICS OF GENE REGULATION? Genomics of Gene Regulation 1. Motifs, Conservation, and Epigenetic Features Genomics of Gene Regulation 1. Motifs, Conservation, and Epigenetic Features CSHL Course in Computational and Comparative Genomics 2016 Ross Hardison 10/29/16 1 WHY STUDY GENOMICS OF GENE REGULATION? 10/29/16

More information

Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions

Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions The MIT Faculty has made this article openly available. Please share how this access benefits you. Your

More information

Universiteit Leiden Opleiding Informatica

Universiteit Leiden Opleiding Informatica Universiteit Leiden Opleiding Informatica Convolutional Neural Networks for Regulatory Genomics Name: Niels ten Dijke Date: 17/06/2017 1st supervisor: dr. Wojtek Kowalczyk 2nd supervisor: dr. Gerard van

More information

ChromaSig: A Probabilistic Approach to Finding Common Chromatin Signatures in the Human Genome

ChromaSig: A Probabilistic Approach to Finding Common Chromatin Signatures in the Human Genome : A Probabilistic Approach to Finding Common Chromatin Signatures in the Human Genome Gary Hon 1,2, Bing Ren 1,2,3 *, Wei Wang 1,4 * 1 Bioinformatics Program, University of California San Diego, La Jolla,

More information

Mentor: Dr. Bino John

Mentor: Dr. Bino John MAT Shiksha Mantri Mentor: Dr. Bino John Model-based Analysis y of Tiling-arrays g y for ChIP-chip X. Shirley Liu et al., PNAS (2006) vol. 103 no. 33 12457 12462 Tiling Arrays Subtype of microarray chips

More information

Neural Networks and Applications in Bioinformatics. Yuzhen Ye School of Informatics and Computing, Indiana University

Neural Networks and Applications in Bioinformatics. Yuzhen Ye School of Informatics and Computing, Indiana University Neural Networks and Applications in Bioinformatics Yuzhen Ye School of Informatics and Computing, Indiana University Contents Biological problem: promoter modeling Basics of neural networks Perceptrons

More information

Many transcription factors! recognize DNA shape

Many transcription factors! recognize DNA shape Many transcription factors! recognize DN shape Katie Pollard! Gladstone Institutes USF Division of Biostatistics, Institute for Human Genetics, and Institute for omputational Health Sciences ENODE Users

More information

The SVM2CRM data User s Guide

The SVM2CRM data User s Guide The SVM2CRM data User s Guide Guidantonio Malagoli Tagliazucchi, Silvio Bicciato Center for genome research University of Modena and Reggio Emilia, Modena October 31, 2017 October 31, 2017 Contents 1 Overview

More information

Lecture 21: Epigenetics Nurture or Nature? Chromatin DNA methylation Histone Code Twin study X-chromosome inactivation Environemnt and epigenetics

Lecture 21: Epigenetics Nurture or Nature? Chromatin DNA methylation Histone Code Twin study X-chromosome inactivation Environemnt and epigenetics Lecture 21: Epigenetics Nurture or Nature? Chromatin DNA methylation Histone Code Twin study X-chromosome inactivation Environemnt and epigenetics Epigenetics represents the science for the studying heritable

More information

Supplemental Information. Antagonistic Activities of Sox2. and Brachyury Control the Fate Choice. of Neuro-Mesodermal Progenitors

Supplemental Information. Antagonistic Activities of Sox2. and Brachyury Control the Fate Choice. of Neuro-Mesodermal Progenitors Developmental Cell, Volume 42 Supplemental Information Antagonistic Activities of Sox2 and Brachyury Control the Fate Choice of Neuro-Mesodermal Progenitors Frederic Koch, Manuela Scholze, Lars Wittler,

More information

Nucleosome Positioning and Organization Advanced Topics in Computa8onal Genomics

Nucleosome Positioning and Organization Advanced Topics in Computa8onal Genomics Nucleosome Positioning and Organization 02-715 Advanced Topics in Computa8onal Genomics Nucleosome Core Nucleosome Core and Linker 147 bp DNA wrapping around nucleosome core Varying lengths of linkers

More information

Wednesday, November 22, 17. Exons and Introns

Wednesday, November 22, 17. Exons and Introns Exons and Introns Introns and Exons Exons: coded regions of DNA that get transcribed and translated into proteins make up 5% of the genome Introns and Exons Introns: non-coded regions of DNA Must be removed

More information

Genomics xxx (2011) xxx xxx. Contents lists available at SciVerse ScienceDirect. Genomics. journal homepage:

Genomics xxx (2011) xxx xxx. Contents lists available at SciVerse ScienceDirect. Genomics. journal homepage: YGENO-08338; No. of pages: 11; 4C: 4, 5, 7 Genomics xxx (2011) xxx xxx Contents lists available at SciVerse ScienceDirect Genomics journal homepage: www.elsevier.com/locate/ygeno Distant cis-regulatory

More information

Microarray Gene Expression Analysis at CNIO

Microarray Gene Expression Analysis at CNIO Microarray Gene Expression Analysis at CNIO Orlando Domínguez Genomics Unit Biotechnology Program, CNIO 8 May 2013 Workflow, from samples to Gene Expression data Experimental design user/gu/ubio Samples

More information

1 Najafabadi, H. S. et al. C2H2 zinc finger proteins greatly expand the human regulatory lexicon. Nat Biotechnol doi: /nbt.3128 (2015).

1 Najafabadi, H. S. et al. C2H2 zinc finger proteins greatly expand the human regulatory lexicon. Nat Biotechnol doi: /nbt.3128 (2015). F op-scoring motif Optimized motifs E Input sequences entral 1 bp region Dinucleotideshuffled seqs B D ll B1H-R predicted motifs Enriched B1H- R predicted motifs L!=!7! L!=!6! L!=5! L!=!4! L!=!3! L!=!2!

More information

Differential Gene Expression

Differential Gene Expression Biology 4361 Developmental Biology Differential Gene Expression September 28, 2006 Chromatin Structure ~140 bp ~60 bp Transcriptional Regulation: 1. Packing prevents access CH 3 2. Acetylation ( C O )

More information

Assay Standards Working Group Nov 2012 Assay Standards Working Group Recommendations, November 2012

Assay Standards Working Group Nov 2012 Assay Standards Working Group Recommendations, November 2012 Assay Standards Working Group Recommendations, November 2012 Contents Assay Standards Working Group Recommendations, August 2012... 1 Contents... 1 Introduction... 2 1: Reference Epigenome Criteria...

More information

Quality Control Assessment in Genotyping Console

Quality Control Assessment in Genotyping Console Quality Control Assessment in Genotyping Console Introduction Prior to the release of Genotyping Console (GTC) 2.1, quality control (QC) assessment of the SNP Array 6.0 assay was performed using the Dynamic

More information

Introductory Next Gen Workshop

Introductory Next Gen Workshop Introductory Next Gen Workshop http://www.illumina.ucr.edu/ http://www.genomics.ucr.edu/ Workshop Objectives Workshop aimed at those who are new to Illumina sequencing and will provide: - a basic overview

More information

Biology 201 (Genetics) Exam #3 120 points 20 November Read the question carefully before answering. Think before you write.

Biology 201 (Genetics) Exam #3 120 points 20 November Read the question carefully before answering. Think before you write. Name KEY Section Biology 201 (Genetics) Exam #3 120 points 20 November 2006 Read the question carefully before answering. Think before you write. You will have up to 50 minutes to take this exam. After

More information

Differential Gene Expression

Differential Gene Expression Biology 4361 Developmental Biology Differential Gene Expression June 19, 2008 Differential Gene Expression Overview Chromatin structure Gene anatomy RNA processing and protein production Initiating transcription:

More information

Structure of nucleic acids II Biochemistry 302. January 20, 2006

Structure of nucleic acids II Biochemistry 302. January 20, 2006 Structure of nucleic acids II Biochemistry 302 January 20, 2006 Intrinsic structural flexibility of RNA antiparallel A-form Fig. 4.19 High Temp Denaturants In vivo conditions Base stacking w/o base pairing/h-bonds

More information

Introduction to Microarray Analysis

Introduction to Microarray Analysis Introduction to Microarray Analysis Methods Course: Gene Expression Data Analysis -Day One Rainer Spang Microarrays Highly parallel measurement devices for gene expression levels 1. How does the microarray

More information

Regulation of eukaryotic transcription:

Regulation of eukaryotic transcription: Promoter definition by mass genome annotation data: in silico primer extension EMBNET course Bioinformatics of transcriptional regulation Jan 28 2008 Christoph Schmid Regulation of eukaryotic transcription:

More information

What is Evolutionary Computation? Genetic Algorithms. Components of Evolutionary Computing. The Argument. When changes occur...

What is Evolutionary Computation? Genetic Algorithms. Components of Evolutionary Computing. The Argument. When changes occur... What is Evolutionary Computation? Genetic Algorithms Russell & Norvig, Cha. 4.3 An abstraction from the theory of biological evolution that is used to create optimization procedures or methodologies, usually

More information

ChIP-seq analysis of protein DNA interactions on the Ion Proton System

ChIP-seq analysis of protein DNA interactions on the Ion Proton System APPLICATION NOTE Ion Proton System ChIP-seq analysis of protein DNA interactions on the Ion Proton System Key findings Optimized ChIP-seq protocol for limited cell populations The protocol combines the

More information

Bayesian Variable Selection and Data Integration for Biological Regulatory Networks

Bayesian Variable Selection and Data Integration for Biological Regulatory Networks Bayesian Variable Selection and Data Integration for Biological Regulatory Networks Shane T. Jensen Department of Statistics The Wharton School, University of Pennsylvania stjensen@wharton.upenn.edu Gary

More information

Learning a Weighted Sequence Model of the Nucleosome Core and Linker Yields More Accurate Predictions in Saccharomyces cerevisiae and Homo sapiens

Learning a Weighted Sequence Model of the Nucleosome Core and Linker Yields More Accurate Predictions in Saccharomyces cerevisiae and Homo sapiens Learning a Weighted Sequence Model of the Nucleosome Core and Linker Yields More Accurate Predictions in Saccharomyces cerevisiae and Homo sapiens Sheila M. Reynolds 1, Jeff A. Bilmes 1,2, William Stafford

More information

B&DA Committee Bioinformatics and Data Analysis. PAG January 2016

B&DA Committee Bioinformatics and Data Analysis. PAG January 2016 B&DA Committee Bioinformatics and Data Analysis PAG January 2016 B&DA progress so far Aim: to define standard bfx pipelines for FAANG data Group has skyped many times We have a Wiki: hosted by EBI Sub-groups

More information

Chapter 5 DNA and Chromosomes

Chapter 5 DNA and Chromosomes Chapter 5 DNA and Chromosomes DNA as the genetic material Heat-killed bacteria can transform living cells S Smooth R Rough Fred Griffith, 1920 DNA is the genetic material Oswald Avery Colin MacLeod Maclyn

More information

Introduction to the UCSC genome browser

Introduction to the UCSC genome browser Introduction to the UCSC genome browser Dominik Beck NHMRC Peter Doherty and CINSW ECR Fellow, Senior Lecturer Lowy Cancer Research Centre, UNSW and Centre for Health Technology, UTS SYDNEY NSW AUSTRALIA

More information

About Strand NGS. Strand Genomics, Inc All rights reserved.

About Strand NGS. Strand Genomics, Inc All rights reserved. About Strand NGS Strand NGS-formerly known as Avadis NGS, is an integrated platform that provides analysis, management and visualization tools for next-generation sequencing data. It supports extensive

More information

Bioinformatics Advice on Experimental Design

Bioinformatics Advice on Experimental Design Bioinformatics Advice on Experimental Design Where do I start? Please refer to the following guide to better plan your experiments for good statistical analysis, best suited for your research needs. Statistics

More information

Package DBChIP. May 12, 2018

Package DBChIP. May 12, 2018 Type Package Package May 12, 2018 Title Differential Binding of Transcription Factor with ChIP-seq Version 1.24.0 Date 2012-06-21 Author Kun Liang Maintainer Kun Liang Depends R

More information

Mocap: large-scale inference of transcription factor binding sites from chromatin accessibility

Mocap: large-scale inference of transcription factor binding sites from chromatin accessibility Published online 15 March 2017 Nucleic Acids Research, 2017, Vol. 45, No. 8 4315 4329 doi: 10.1093/nar/gkx174 Mocap: large-scale inference of transcription factor binding sites from chromatin accessibility

More information

Sept 2. Structure and Organization of Genomes. Today: Genetic and Physical Mapping. Sept 9. Forward and Reverse Genetics. Genetic and Physical Mapping

Sept 2. Structure and Organization of Genomes. Today: Genetic and Physical Mapping. Sept 9. Forward and Reverse Genetics. Genetic and Physical Mapping Sept 2. Structure and Organization of Genomes Today: Genetic and Physical Mapping Assignments: Gibson & Muse, pp.4-10 Brown, pp. 126-160 Olson et al., Science 245: 1434 New homework:due, before class,

More information

Chapter 1 Analysis of ChIP-Seq Data with Partek Genomics Suite 6.6

Chapter 1 Analysis of ChIP-Seq Data with Partek Genomics Suite 6.6 Chapter 1 Analysis of ChIP-Seq Data with Partek Genomics Suite 6.6 Overview ChIP-Sequencing technology (ChIP-Seq) uses high-throughput DNA sequencing to map protein-dna interactions across the entire genome.

More information

Lecture 2: Biology Basics Continued

Lecture 2: Biology Basics Continued Lecture 2: Biology Basics Continued Central Dogma DNA: The Code of Life The structure and the four genomic letters code for all living organisms Adenine, Guanine, Thymine, and Cytosine which pair A-T and

More information

Supplementary Data for DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding.

Supplementary Data for DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding. Supplementary Data for DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding. Wenxiu Ma 1, Lin Yang 2, Remo Rohs 2, and William Stafford Noble 3 1 Department of Statistics,

More information

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017 Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017 Agenda What is Functional Genomics? RNA Transcription/Gene Expression Measuring Gene

More information

Chapter 24: Promoters and Enhancers

Chapter 24: Promoters and Enhancers Chapter 24: Promoters and Enhancers A typical gene transcribed by RNA polymerase II has a promoter that usually extends upstream from the site where transcription is initiated the (#1) of transcription

More information

Chromatin. Structure and modification of chromatin. Chromatin domains

Chromatin. Structure and modification of chromatin. Chromatin domains Chromatin Structure and modification of chromatin Chromatin domains 2 DNA consensus 5 3 3 DNA DNA 4 RNA 5 ss RNA forms secondary structures with ds hairpins ds forms 6 of nucleic acids Form coiling bp/turn

More information

Convolutional Kitchen Sinks for Transcription Factor Binding Site Prediction

Convolutional Kitchen Sinks for Transcription Factor Binding Site Prediction Convolutional Kitchen Sinks for Transcription Factor Binding Site Prediction Alyssa Morrow*, Vaishaal Shankar* Anthony Joseph, Benjamin Recht, Nir Yosef Transcription Factor A protein that binds to DNA

More information

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls Joel Rozowsky, Ghia Euskirchen 2, Raymond K Auerbach 3, Zhengdong D Zhang, Theodore Gibson, Robert Bjornson 4, Nicholas Carriero

More information

Gene Expression Technology

Gene Expression Technology Gene Expression Technology Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Gene expression Gene expression is the process by which information from a gene

More information

Genetics and Genomics in Medicine Chapter 3. Questions & Answers

Genetics and Genomics in Medicine Chapter 3. Questions & Answers Genetics and Genomics in Medicine Chapter 3 Multiple Choice Questions Questions & Answers Question 3.1 Which of the following statements, if any, is false? a) Amplifying DNA means making many identical

More information

Molecular Markers CRITFC Genetics Workshop December 9, 2014

Molecular Markers CRITFC Genetics Workshop December 9, 2014 Molecular Markers CRITFC Genetics Workshop December 9, 2014 Molecular Markers Tools that allow us to collect information about an individual, a population, or a species Application in fisheries mating

More information

Intro. ANN & Fuzzy Systems. Lecture 36 GENETIC ALGORITHM (1)

Intro. ANN & Fuzzy Systems. Lecture 36 GENETIC ALGORITHM (1) Lecture 36 GENETIC ALGORITHM (1) Outline What is a Genetic Algorithm? An Example Components of a Genetic Algorithm Representation of gene Selection Criteria Reproduction Rules Cross-over Mutation Potential

More information

Einführung in die Genetik

Einführung in die Genetik Einführung in die Genetik Prof. Dr. Kay Schneitz (EBio Pflanzen) http://plantdev.bio.wzw.tum.de schneitz@wzw.tum.de Twitter: @PlantDevTUM, #genetiktum FB: Plant Development TUM Prof. Dr. Claus Schwechheimer

More information

Introduction to Bioinformatics and Gene Expression Technology

Introduction to Bioinformatics and Gene Expression Technology Vocabulary Introduction to Bioinformatics and Gene Expression Technology Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 1.1 Gene: Genetics: Genome: Genomics: hereditary DNA

More information

PrimePCR Assay Validation Report

PrimePCR Assay Validation Report Gene Information Gene Name SRY (sex determining region Y)-box 6 Gene Symbol Organism Gene Summary Gene Aliases RefSeq Accession No. UniGene ID Ensembl Gene ID SOX6 Human This gene encodes a member of the

More information

Epigenetic CHaracterization and Observation (ECHO) Proposers Day

Epigenetic CHaracterization and Observation (ECHO) Proposers Day Epigenetic CHaracterization and Observation (ECHO) Proposers Day Eric Van Gieson, Ph.D. Program Manager Biological Technologies Office (BTO) February 23, 2018 ECHO Program Overview Determining Exposure

More information

ChIPnorm: A Statistical Method for Normalizing and Identifying Differential Regions in Histone Modification ChIP-seq Libraries

ChIPnorm: A Statistical Method for Normalizing and Identifying Differential Regions in Histone Modification ChIP-seq Libraries ChIPnorm: A Statistical Method for Normalizing and Identifying Differential Regions in Histone Modification ChIP-seq Libraries Nishanth Ulhas Nair., Avinash Das Sahu 2., Philipp Bucher 3,4 *, Bernard M.

More information

Complete Sample to Analysis Solutions for DNA Methylation Discovery using Next Generation Sequencing

Complete Sample to Analysis Solutions for DNA Methylation Discovery using Next Generation Sequencing Complete Sample to Analysis Solutions for DNA Methylation Discovery using Next Generation Sequencing SureSelect Human/Mouse Methyl-Seq Kyeong Jeong PhD February 5, 2013 CAG EMEAI DGG/GSD/GFO Agilent Restricted

More information

Interpreting RNA-seq data (Browser Exercise II)

Interpreting RNA-seq data (Browser Exercise II) Interpreting RNA-seq data (Browser Exercise II) In previous exercises, you spent some time learning about gene pages and examining genes in the context of the GBrowse genome browser. It is important to

More information

Multi-omics in biology: integration of omics techniques

Multi-omics in biology: integration of omics techniques 31/07/17 Летняя школа по биоинформатике 2017 Multi-omics in biology: integration of omics techniques Konstantin Okonechnikov Division of Pediatric Neurooncology German Cancer Research Center (DKFZ) 2 Short

More information

ChroMoS Guide (version 1.2)

ChroMoS Guide (version 1.2) ChroMoS Guide (version 1.2) Background Genome-wide association studies (GWAS) reveal increasing number of disease-associated SNPs. Since majority of these SNPs are located in intergenic and intronic regions

More information

Analysis of Biological Sequences SPH

Analysis of Biological Sequences SPH Analysis of Biological Sequences SPH 140.638 swheelan@jhmi.edu nuts and bolts meet Tuesdays & Thursdays, 3:30-4:50 no exam; grade derived from 3-4 homework assignments plus a final project (open book,

More information

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010 Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010 Traditional QTL approach Uses standard bi-parental mapping populations o F2 or RI These have a limited number of

More information

arxiv: v1 [cs.lg] 7 Jul 2016

arxiv: v1 [cs.lg] 7 Jul 2016 Bioinformatics doi.10.1093/bioinformatics/xxxxxx Advance Access Publication Date: Day Month Year Manuscript Category Subject Section DeepChrome: Deep-learning for predicting gene expression from histone

More information

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 1 Vocabulary Gene: hereditary DNA sequence at a

More information

In silico prediction of novel therapeutic targets using gene disease association data

In silico prediction of novel therapeutic targets using gene disease association data In silico prediction of novel therapeutic targets using gene disease association data, PhD, Associate GSK Fellow Scientific Leader, Computational Biology and Stats, Target Sciences GSK Big Data in Medicine

More information

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids.

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids. Supplementary Figure 1 Number and length distributions of the inferred fosmids. Fosmid were inferred by mapping each pool s sequence reads to hg19. We retained only those reads that mapped to within a

More information

MATH 5610, Computational Biology

MATH 5610, Computational Biology MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class

More information

Leveraging biological replicates to improve analysis in ChIP-seq experiments

Leveraging biological replicates to improve analysis in ChIP-seq experiments , http://dx.doi.org/10.5936/csbj.201401002 CSBJ Leveraging biological replicates to improve analysis in ChIP-seq experiments Yajie Yang a,b, Justin Fear a,b, Jianhong Hu c, Irina Haecker d, Lei Zhou a,

More information

Enhancing motif finding models using multiple sources of genome-wide data

Enhancing motif finding models using multiple sources of genome-wide data Enhancing motif finding models using multiple sources of genome-wide data Heejung Shim 1, Oliver Bembom 2, Sündüz Keleş 3,4 1 Department of Human Genetics, University of Chicago, 2 Division of Biostatistics,

More information

Genome Sequence Assembly

Genome Sequence Assembly Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:

More information

What Are the Chemical Structures and Functions of Nucleic Acids?

What Are the Chemical Structures and Functions of Nucleic Acids? THE NUCLEIC ACIDS What Are the Chemical Structures and Functions of Nucleic Acids? Nucleic acids are polymers specialized for the storage, transmission, and use of genetic information. DNA = deoxyribonucleic

More information

FACTORS CONTRIBUTING TO VARIABILITY IN DNA MICROARRAY RESULTS: THE ABRF MICROARRAY RESEARCH GROUP 2002 STUDY

FACTORS CONTRIBUTING TO VARIABILITY IN DNA MICROARRAY RESULTS: THE ABRF MICROARRAY RESEARCH GROUP 2002 STUDY FACTORS CONTRIBUTING TO VARIABILITY IN DNA MICROARRAY RESULTS: THE ABRF MICROARRAY RESEARCH GROUP 2002 STUDY K. L. Knudtson 1, C. Griffin 2, A. I. Brooks 3, D. A. Iacobas 4, K. Johnson 5, G. Khitrov 6,

More information

Chromatin Structure. a basic discussion of protein-nucleic acid binding

Chromatin Structure. a basic discussion of protein-nucleic acid binding Chromatin Structure 1 Chromatin DNA packaging g First a basic discussion of protein-nucleic acid binding Questions to answer: How do proteins bind DNA / RNA? How do proteins recognize a specific nucleic

More information

Personal Genomics Platform White Paper Last Updated November 15, Executive Summary

Personal Genomics Platform White Paper Last Updated November 15, Executive Summary Executive Summary Helix is a personal genomics platform company with a simple but powerful mission: to empower every person to improve their life through DNA. Our platform includes saliva sample collection,

More information

Reconstruction of histone modification network from next-generation sequencing data

Reconstruction of histone modification network from next-generation sequencing data Reconstruction of histone modification network from next-generation sequencing data Ngoc Tu Le Japan Advanced Institute of Science and Technology 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan email: ngoctule@jaist.ac.jp

More information

DNA. bioinformatics. epigenetics methylation structural variation. custom. assembly. gene. tumor-normal. mendelian. BS-seq. prediction.

DNA. bioinformatics. epigenetics methylation structural variation. custom. assembly. gene. tumor-normal. mendelian. BS-seq. prediction. Epigenomics T TM activation SNP target ncrna validation metagenomics genetics private RRBS-seq de novo trio RIP-seq exome mendelian comparative genomics DNA NGS ChIP-seq bioinformatics assembly tumor-normal

More information

The Human Protein PRR14 Tethers Heterochromatin to the Nuclear Lamina During Interphase and Mitotic Exit

The Human Protein PRR14 Tethers Heterochromatin to the Nuclear Lamina During Interphase and Mitotic Exit Cell Reports, Volume 5 Supplemental Information The Human Protein PRR14 Tethers Heterochromatin to the Nuclear Lamina During Interphase and Mitotic Exit Andrey Poleshko, Katelyn M. Mansfield, Caroline

More information