Modern Epigenomics. Histone Code

Similar documents
Next Generation Sequencing Technologies. Some slides are modified from Robi Mitra s lecture notes

Next Generation Sequencing Technologies. Rob Mitra 1/30/17

Human genome sequence

Next-Generation Sequencing. Technologies

Introduction to Bioinformatics and Gene Expression Technologies

Welcome to the NGS webinar series

Bioinformatics Advice on Experimental Design

Introductory Next Gen Workshop

Genome Sequencing. I: Methods. MMG 835, SPRING 2016 Eukaryotic Molecular Genetics. George I. Mias

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014

Research school methods seminar Genomics and Transcriptomics

Third Generation Sequencing

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

DNA-Sequencing. Technologies & Devices

Outline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017

Analysing genomes and transcriptomes using Illumina sequencing

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

Sequence Assembly and Alignment. Jim Noonan Department of Genetics

Bioinformatics of Transcriptional Regulation

Gene Expression Technology

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

RNA-Sequencing analysis

Introduction to the UCSC genome browser

DNA-Sequenzierung. Technologien & Geräte

Next Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms

Opportunities offered by new sequencing technologies

Introduction to Next Generation Sequencing (NGS)

Differential Gene Expression

Differential Gene Expression

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Next Gen Sequencing. Expansion of sequencing technology. Contents

Unit 6: Molecular Genetics & DNA Technology Guided Reading Questions (100 pts total)

Next Generation Sequencing: An Overview

High Throughput Sequencing Technologies. UCD Genome Center Bioinformatics Core Monday 15 June 2015

INTRODUCCIÓ A LES TECNOLOGIES DE 'NEXT GENERATION SEQUENCING'

Sequencing techniques and applications

MHC Region. MHC expression: Class I: All nucleated cells and platelets Class II: Antigen presenting cells

Next Generation Sequencing. Simon Rasmussen Assistant Professor Center for Biological Sequence analysis Technical University of Denmark

HLA and Next Generation Sequencing it s all about the Data

CMPS 3110 : Bioinformatics. High-Throughput Sequencing and Applications

Epigenetics. Medical studies in English, Lecture # 12,

CSC Assignment1SequencingReview- 1109_Su N_NEXT_GENERATION_SEQUENCING.docx By Anonymous. Similarity Index

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017

Next-Generation Sequencing Services à la carte

IMGM Laboratories GmbH. Sales Manager

Chapter 18: Regulation of Gene Expression. 1. Gene Regulation in Bacteria 2. Gene Regulation in Eukaryotes 3. Gene Regulation & Cancer

Plant Molecular and Cellular Biology Lecture 9: Nuclear Genome Organization: Chromosome Structure, Chromatin, DNA Packaging, Mitosis Gary Peter

Genome Sequencing Technologies. Jutta Marzillier, Ph.D. Lehigh University Department of Biological Sciences Iacocca Hall

Division Ave. High School AP Biology

CHEM 4420 Exam I Spring 2013 Page 1 of 6

2. Outline the levels of DNA packing in the eukaryotic nucleus below next to the diagram provided.

Microarrays: since we use probes we obviously must know the sequences we are looking at!

Multi-omics in biology: integration of omics techniques

Lecture #1. Introduction to microarray technology

Lecture 21: Epigenetics Nurture or Nature? Chromatin DNA methylation Histone Code Twin study X-chromosome inactivation Environemnt and epigenetics

Next Generation Sequencing Technologies

Mate-pair library data improves genome assembly

Epigenetics, Environment and Human Health

CHAPTERS , 17: Eukaryotic Genetics

Ultrasequencing: Methods and Applications of the New Generation Sequencing Platforms

Chromatin. Structure and modification of chromatin. Chromatin domains

GENE REGULATION slide shows by Kim Foglia modified Slides with blue edges are Kim s

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd

Non-Organic-Based Isolation of Mammalian microrna using Norgen s microrna Purification Kit

Péter Antal Ádám Arany Bence Bolgár András Gézsi Gergely Hajós Gábor Hullám Péter Marx András Millinghoffer László Poppe Péter Sárközy BIOINFORMATICS

Cancer Genetics Solutions

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

Exam 2 BIO200, Winter 2012

Introduction to ChIP Seq data analyses. Acknowledgement: slides taken from Dr. H

Data and Metadata Models Recommendations Version 1.2 Developed by the IHEC Metadata Standards Workgroup

Year III Pharm.D Dr. V. Chitra

CHAPTER 21 LECTURE SLIDES

Chapter 11: Regulation of Gene Expression

RADSeq Data Analysis. Through STACKS on Galaxy. Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé

2/5/16. Honeypot Ants. DNA sequencing, Transcriptomics and Genomics. Gene sequence changes? And/or gene expression changes?

Genes - DNA - Chromosome. Chutima Talabnin Ph.D. School of Biochemistry,Institute of Science, Suranaree University of Technology

Top 5 Lessons Learned From MAQC III/SEQC

ACCEL-NGS 2S DNA LIBRARY KITS

Analysis of Differential Gene Expression in Cattle Using mrna-seq

Overview of Human Genetics

Variation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI

HiSeqTM 2000 Sequencing System

BIOLOGY. Chapter 16 GenesExpression

Systematic Analysis of single cells by PCR

The ENCODE Encyclopedia. & Variant Annotation Using RegulomeDB and HaploReg

Bio 311 Learning Objectives

Next Generation Sequencing (NGS) Market Size, Growth and Trends ( )

From DNA to Protein: Genotype to Phenotype

Data Analysis with CASAVA v1.8 and the MiSeq Reporter

The ChIP-Seq project. Giovanna Ambrosini, Philipp Bucher. April 19, 2010 Lausanne. EPFL-SV Bucher Group

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA

Chapter 15 Gene Technologies and Human Applications

TECHNOLOGIES, PRODUCTS & SERVICES for MOLECULAR DIAGNOSTICS, MDx ABA 298

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Methods of Biomaterials Testing Lesson 3-5. Biochemical Methods - Molecular Biology -

Transcription:

Modern Epigenomics Histone Code Ting Wang Department of Genetics Center for Genome Sciences and Systems Biology Washington University Dragon Star 2012 Changchun, China July 2, 2012

DNA methylation + Histone modification Chromatin

Chromatin DNA plus Protein in cells with nuclei Nucleosome 146 bp of DNA - 2 each of histones: H2A,H2B, H3 and H4

The Nucleosome core particle Nucleosome H3 H4

Post-translational Histone Modifications h"p://www.nature.com/nsmb/journal/v14/n11/images/nsmb1337- F1.gif

Post-translational Histone Modifications H3 tail Modifications: =Acetylation =Methylation Active HATs HDACs KMTases Repressive

Li e. al. (2007) Cell 128, 707

Histone Modifications in Relation to Gene Transcription Li e. al. (2007) Cell 128, 707

DNA methylation mediated repression

Repression independent of DNA methylation H3K9 methylation condensed chromatin

H3K27 methylation mediated repression 1. H3K27 methylation 2. DNA methylation

Mechanisms of Epigenetic Crosstalk

Epigenetic cancer therapy

DNA-methylation and HDAC inhibitors in clinical trials

Summary Dnmt1, Dnmt3A, Dnmt3b - the mammalian DNMTs Chromatin structure is influenced by covalent modification of histone tails Multiple chromatin modification pathways involved in silencing of genes which may show crosstalk with DNA methylation

Technologies for Interrogating Chromatin States Histone Modifications ChIP-chip Antibody specific to one type of histone modification ChIP-seq Deep sequencing

Chromatin-IP Sequencing K4me1 K4me2 K4me3 K27me3 ackve repressive

Histone methylation and transcriptional state Transcribed gene Silent developmental gene K4me3 K27me3 FoxP1 Olig1 Constitutive heterochromatin K9me3 K20me3 Poised developmental gene K4me3 K27me3 Olig1

Predicting non-coding RNA? From sequence? Not clear which properties can be exploited Sequence features such as promoters are too weak Histone modifications + conservation worked

Nucleosome Positioning from Histone ChIP-seq Barski et al, Cell 2007 Nucleosome resolution ChIP-seq of 21 histone marks in CD4 + T-cells Total 185.7 M 25 nt tags sequenced Analysis not at nucleosome resolution to map nucleosomes at specific regions MNase digest Antibody for

Combine Tags From All ChIP-Seq

Extend Tags 3 to 150 nt Check Tag Count Across Genome

Take the middle 75 nt

Digital DNaseI profiling Precise delineation of the accessible regulatory DNA compartment Accessible Inaccessible Inaccessible

Digital DNaseI profiling: direct access to regulatory sequences

ChromHMM Enhancer Transcriptio n Start Site Transcribed Region DNA Observed chromatin marks. Called based on a Poisson distribution K4me1 K4me3 K4me3 K4me1 K36me3 K36me3 K36me3 K36me3 K27ac K4me1 Most likely Hidden State 1 2 3 4 6 6 6 6 6 5 5 5 200bp interval s High Probability Chromatin Marks in State 0.8 0.8 0.7 1: 2: 3: K4me1 0.9 0.8 0.9 K27ac K4me3 K4me1 4: 5: 6: K4me1 0.9 K4me3 K36me3 All probabilities are learned from the data

ChromHMM

ApplicaKon of ChromHMM to 41 chromakn marks in CD4+ T- cells (Barski 07, Wang 08) Repe11ve Repressed Ac1ve intergenic Transcribed Promoter ChromaKn Marks from (Barski et al, Cell 2007; Wang et al Nature GeneKcs, 2008); DNAseI hypersensikvity from (Boyle et al, Cell 2008); Expression Data from 29 (Su et al, PNAS 2004); Lamina data from (Guelen et al; Naature 2008)

Next-gen Sequencing Technology

Forward Genetics Genotype Phenotype Hypothesis Test Hypothesis By Genetic Manipulation

Forward Genetics Two groups: 1. Develop Colorectal cancer At Young Age 2. Do not Phenotype Mutation in APC Gene Genotype Hypothesis APC is a Tumor Supressor Gene Test Hypothesis By Genetic Manipulation Delete APC in Mouse Control: Isogenic APC+

The Cycle of Forward Genetics Observation Phenotype?Sequencing? Genotype In 2005 $9 million/genome Not feasible Thinking Hypothesis Test Hypothesis By Genetic Manipulation Gene Deletion/Replacement Recombinant Technology

The Problem with Forward Genetics Sequencing Phenotype Sequencing Genotype Currently $40,000* /genome Cost is rapidly dropping Thinking Hypothesis Test Hypothesis By Genetic Manipulation Gene Deletion/Replacement Recombinant Technology

0 and 1 st generation sequencing Pre-1992 old fashioned way 1992-1999 1999 2003 ABI 373/377 ABI 3700 ABI 3730XL S35 ddntps Gels Manual loading Manual base calling Fluorescent ddntps* Gels Manual loading Automated base calling* Fluorescent ddntps Capillaries* Robotic loading* Automated base calling Breaks down frequently Fluorescent ddntps Capillaries Robotic loading Automated base calling Reliable*

Next or 2 nd -generation sequencing 454/Roche GS-20/FLX (Oct 2005) ABI SOLiD (Oct 2007) Illumina/Solexa 1G Genetic Analyser (Feb 2007)

A simple comparison of seq. tech. Technology Reads/run Ave read length 3730XL (ABI) 96 900-1200 bp 454 (Roche) 400,000 250-310 bp bp per Run ~100,000 70 million Data output 1-2MB 20GB Illumina 1G (Solexa) 40 million 36 bp 1 billion 1.5TB SoLID (ABI) 88-132 million 35 bp 1 billion 1.5-3.0TB (44-66 per slide)

They can be applied to different areas ABI 3730XL Next Gen short read instrument (Solexa) Next Gen long read instrument (454) Routine sequencing Verify SNPs from next gen 1X scaffold for novel genomes When quantity matters but length doesn t Expression tags Chip Seq Re-sequencing When length matters Novel genomes Metagenomics

Illumina Genome Analyzer

IGA Sequencing Pipeline 1. Sample Prep (1-5 days) 2. Cluster generation on flow cell (1.5 day) Ligate adapters Clonal Single molecular Array 4. Data Analysis (days-months) 3. Sequencing and imaging (2-3 days)

8 channels (lanes) Cluster generation

Attach DNA to flow cell Attach DNA to flow cell

Attach Bridge DNA amplification to flow cell Can we amplify epigenetic mark??

Cluster generation Clonal Single molecular Array

Clonal single molecule array 100um Random array of clusters ~1000 molecules per ~ 1 um cluster ~20-30,000 clusters per tile ~40 M clusters per flowcell

Sequencing by synthesis 3 5 Cycle 1: Add sequencing reagents First base incorporated Remove unincorporated bases A T C A G T C T G C T A C G A Detect signal Cycle 2-n: Add sequencing reagents and repeat G T C A G T A C C C G A T C G A T 5

Base calling from images T G C T A C G A T 1 2 3 7 8 9 4 5 6 T T T T T T T G T The identity of each base of a cluster is read off from sequential images Reversible terminator chemistry solves homopolymer problem

IGA without cover

Flow cell imaging

A flow cell A flow cell contains eight lanes Lane 1 Lane 2... Lane 8 Each lane/channel contains three columns of tiles Column 1 Column 2 Column 3 Tile 20K-30K Clusters Each column contains 100 tiles Each tile is imaged four times per cycle one image per base. 345,600 images for a 36-cycle run 350 X 350 µm

Data analysis pipeline Firecrest Bustard tiff image files (345,600) intensity files Sequence files Additional Data Analysis Alignment to Genome Eland

Applications Whole Genome Re-sequencing Gene Expression Targeted Re-sequencing ChIP Sequencing Other Applications MicroRNA discovery

Read Length is Not As Important For Resequencing

Applications Genomes Re-sequencing Human Exons (Microarray capture/ amplification) small (including mi-rna) and long RNA profiling (including splicing) ChIP-Seq: Transcription Factors Histone Modifications Effector Proteins DNA Methylation Polysomal RNA Origins of Replication/Replicating DNA Whole Genome Association (rare, high impact SNPs) Copy Number/Structural Variation in DNA ChIA-PET: Transcription Factor Looping Interactions???

Functional Genomics Data Analysis Map reads to the genome Available Tools MAQ SOAP MOSAIK BWA BOWTIE Determine the target genome sequence (i.e., repeat classes) Mapping options Number of allowed mis-matches (as function of position) Number of mapped loci (e.g., 1 = unique read sequence) Generate Consensus Sequence and identify SNPs Generate Read Enrichment Profile (e.g., Wald Lab tool) Develop Null Model and Calculate Significantly Enriched Sites High level analysis: compare to annotations, other data sets, etc

Limitations of short read technology Need a genome De-novo assembly difficult Can t sequence through repeats 80% of the human genome is sequenceable Need high coverage 15-20X to detect polymorphisms Missed SNPs are likely due to low coverage 300X for 1 in 20 event (1 heterozygous in 10 samples) Error rate increases past the first 30~50 bases

Paired End Reads are Important! Known Distance Read 1 Read 2 Repetitive DNA Unique DNA Paired read maps uniquely Single read maps to multiple positions

Paired Ends are Important Part 2 Deletion Insertion Inversion Shendure et al 2005

Paired end mapping reveal structural variations a Basic insertion b Basic deletion c Basic inversion Donor Ref d Linking e Linked insertion f Everted duplication Donor A B A B C Ref A B A C B g Anchored split mapping (deletion) h Anchored split mapping (insertion) i Hanging insertion Donor Ref 0

We need more genomes! Complete genomics ($5000) ABI ($10,000) Illumina ($10,000) Intelligent Biosystems (<$1000)

Ion torrent 3 rd generation sequencing Pac Bio Nanopore

Ion Torrent Sensor, well and chip architecture. Wafer, die and chip packaging. JM Rothberg et al. Nature 475, 348-352 (2011) doi:10.1038/nature10242

Pros and Cons Fast (4 hour sequencing) Cheap per run, but not per base* Homopolymers? * Yet

Single-molecule, real-time (SMRT) sequencing PacBio

Nanopore sequencing