DNA:CHROMATIN INTERACTIONS

Size: px
Start display at page:

Download "DNA:CHROMATIN INTERACTIONS"

Transcription

1 DNA:CHROMATIN INTERACTIONS Exploring transcription factor binding and the epigenomic landscape Chris Seward

2 Introductions Cell and Developmental Biology PhD Candidate in Dr. Lisa Stubbs Laboratory Currently looking for Post-Doc! Molecular Roots of the Social Brain Project Dr. Lisa Stubbs - Mouse Dr. Gene Robinson Honeybee Dr. Alison Bell Stickleback Dr. Saurabh Sinha Computer Science Dr. Dave Zhao Statistics Project seeks to understand the genomic and molecular response to social stimulus across social species

3 Genomics in Social Research Social Stimulus Molecular Response Genetic response Develop -ment Animal Behavior Transcription Factor Activation Differential Gene Expression Developmental regulators Research Methods Behavioral Testing and Scoring Epigenetics and motif analysis RNASeq and qpcr Epigenetics and motif Analysis

4 Why Epigenomics? RNAseq can show you changes in gene expression across the genome, but how do you know what caused those changes? If RNAseq shows a Transcription Factor changes expression, what is it doing? Motif analysis of promoter regions works for identifying probable known regulators, but what if the regulator is unknown? What if the regulators bind far away from the promoter? What about more advanced regulatory mechanisms? Solution: Epigenomics

5 Epigenomics background ChIP experimental design Outline ChIP bioinformatics Other chromatin assays Higher order chromatin assays

6 Eukaryotic genomes are complex structures comprised of modified and unmodified DNA, RNA and many types of interacting proteins Most DNA is wrapped around a histone core, to form nucleosomes The classical histone protein complexes bind very tightly to DNA and prevent association with other proteins Modifications of the classical histones, or their replacement with unusual histone types under certain conditions, can loosen the interaction with DNA, allowing access to transcription factors, RNA polymerase, and other proteins

7 Histone Modifications All four histones in the tetramer have tails that can be modified in various ways, but the most consequential modifications, with respect to transcriptional activity, appear to involve methylation or acetylation of Lysines (K) in histone H3

8 Histone H3 modifications, especially methylation and acetylation, mark open or closed DNA CLOSED: Histones bound more tightly to DNA H3K27Me3, H3K9Me3 OPEN: Histones can be displaced by TFs, RNA Polymerase, and other proteins H3K27Ac, H3K4me1, H3K4me3 Histone marks, together with other assays of open chromatin, are presently the only reliable indicators of the locations and activities of regulatory elements

9 Many types of regulatory elements Gene transcribed Promoter Binding Factors: TFs, TATA binding factors, and other site-specific binders Recruit additional proteins: co-factors, RNA polymerase and others Enhancers: Tissue-specific activators of transcription Binding sites for proteins that interact with the promoter to enhance transcription Silencers: Also prevalent, but more difficult to detect and assay Many transcription factors repress, rather than enhance, gene expression Enhancers and Silencers are not mutually exclusive! Most regulatory elements can serve either function, depending on the proteins bound at a particular time Insulators: boundary elements that shield genes from enhancers or heterochromatin proteins in neighboring gene territories Involved in establishing loop structures that isolate genes

10 How to find them? Chromatin ImmunoPrecipitation (ChIP) Antibody to a DNA binding protein is used to fish out DNA bound to the protein in a cell DNA and protein are crosslinked in the cell using brief treatment with low concentration of high quality formaldehyde Crosslinked chromatin is sheared, usually by sonication, to yield short fragments of DNA+protein complexes Antibody to a TF or other binding protein used to fish out fragments containing that DNA binding protein DNA is then released and can be analyzed by various methods: PCR, microarray, sequencing Creates a pool of sequences highly enriched in binding sites for a particular protein

11 ChIP can be used to map DNA:protein interactions of virtually any type Histone Modifications RNA polymerase and elongation factors, to find promoters and active sites of transcription Proteins involved in DNA recombination, repair, and replication DNA Methylation Proteins

12 ChIP can be used to map DNA:protein interactions of virtually any type Secondary interactions (no direct linkage to DNA) Histone modifying proteins, such as SWI/SNF, histone deacetylases, histone methylases Cofactors that bind to TFs at particular sites, and that stabilize chromatin loops Proteins that link chromatin to nuclear matrix or envelope All of these methods require highly specific and efficient antibodies (which are rare!) Loop Structures Nuclear Matrix Nuclear Envelope

13 Epigenomics background ChIP experimental design Outline ChIP bioinformatics Other chromatin assays Higher order chromatin assays

14 ChIP Antibodies ENCODE maintains lists of ChIP validated antibodies good starting point Otherwise, validate yourself! IHC showing nuclear protein Western blot showing a single clear band ChIP PCR for a known binding site sequence ChIP-seq followed by motif analysis Try to find higher concentration αbs

15 Best Practices for ChIP Experimental Design Requires 1-2 million cells per IP for common Histone Modifications Requires 5-10 million cells for most transcription factors Other methods like ATAC and ChIPmentation can reduce cell # requirements (more later) Biological replicates are great! (But technical replicate IPs alone are generally accepted) Frozen tissue may have significantly less yields than tissue freshly collected, fixed, and reduced to nuclei. Fixed, washed nuclei can be stored for years at -80 C

16 ChIP Sequencing Criteria Many ChIP-seq samples can be sequenced per Illumina lane Typically 50bp+ unpaired sequencing is enough Depth needed dependent on genome size, quality, sequencer Species Genome Size # reads / IP HiSeq 2500 Hiseq 4000 Human 3.2 Gb 20m+ ~10 ~20 Mouse 2.8 Gb 15m+ ~13 ~24 Stickleback Fish 500 Mb 10m+ ~18 ~30 Honeybee 250 Mb 7m+ ~24 ~40 Always sequence technical replicate IP samples Always sequence an input background sample without antibody for each tissue This means 3 sequenced items per sample

17 Epigenomics background ChIP experimental design Outline ChIP bioinformatics Other chromatin assays Higher order chromatin assays

18 ChIP Bioinformatics Pipeline 1. Align reads to genome 2. Look at your data! 3. Call peaks 4. Look at your data again! 5. Annotate peaks / Gene Ontology 6. Identify differential peaks? 7. Identify Co-binding? 8. Motif analysis?

19 ChIP Bioinformatics First step is to map reads: BOWTIE, Novalign, BWA or other ChIP seq reads surround but may not contain the DNA binding site Sequence is generated from the ends of randomly sheared fragments, which overlap at the protein binding site Gives rise to two adjacent sets of read peaks separated by ~ 2X fragment length (~500bp) Defines a shift distance between read peaks at which you will find the true ChIP peak summit (~200bp) Programs like MACS and HOMER automatically subtract your control (genomic input) from sample reads to define a final set of peaks Binding site Sonicated fragment Ends sequenced

20 ChIP Analytical challenges Genomic Background Shear efficiency is not really random Some genomic regions are fragile and sensitive Some regions are protected from shear or degradation Other artifacts Peak width Centromeres: repeat sequences that are not all represented in the genome sequence build Polymorphic regions, such as regions modified in cell line DNA Repeats: most programs cannot manage sequence reads that are not mapped uniquely Transcription factors are typically sharp ~200bp peaks; chromatin marks are more diffuse If planning to call differential peaks, peak width should be locked between samples

21 Traditional methods fail with broad, flat peaks Most tools designed for TF proteins: isolated, sharp peaks Certain chromatin proteins, and modified histones in certain regions, bind continuously to large regions of chromatin and do not yield peaks MACS in default mode will carve the mesa into many peaks, or not detect it at all New settings in MACS 2 can be set to overcome this problem HOMER has a wide variety of settings ideal for data of different types

22 Look At Your Data! Once you have aligned reads and putative peaks called, it is important to actually look at your data and see if it looks believable. Peak calling software will often still call peaks from failed experiments! IGV, UCSC Genome Browser, Galaxy Track Browser are great tools for experimental validation

23 Looking at your Data Are your peaks enriched over the background? Do your technical replicates look similar? Are your peaks associated with genes? Are your peaks similar to existing data sets? UCSC Genome Browser Amazing resource of epigenomics data and genome annotations Visualize your ChIP data on UCSC to compare to existing tracks Can set up a public/private track hub for publication of your data Annotation RNA H3k27ac H3k27me3 H3k4me3 H3k4me1 Peaks False Peak!??

24 Differential Peaks Identifying when peak sets have changed between conditions can be tricky For comparisons with lots of (+/-) peak changes, just subtract the peak-sets to find new or missing peaks in one sample To spot changes in peak magnitude (+ / ++), more advanced methods are required Most involve re-calling peaks using experimental sample as the IP and using another sample IP as the input. HOMER has more advanced differential peak finding mechanisms and can utilize biological replicates Linking of differential regulatory peaks to differential genes interrogates each step of a biological process

25 Co-binding factor finding Some experiments may want to identify positions where two factors both bind at one location Example: TCF4 is a repressor when bound alone, or an activator when bound together with Beta-catenin Example: H3k4me3+h3k27ac dual peak indicates active promoters Galaxy intersect tool, bedtools intersect, or HOMER mergepeaks can identify these co-bound sites

26 Peak Annotation Now that we have peaks, what do they mean? Many peaks intersect promoters directly, but some may by 100kB+ from the nearest gene Different interpretations may lead to different conclusions Typical promoter region (mouse) -5kb/+2kb Typical regulatory domain +/- 100kb GREAT genome tool is a good place to start for Human, Mouse, Zebrafish Identifies nearest genes and performs Gene Ontology Analysis HOMER has advanced peak annotation scripts: annotatepeaks.pl?

27 Motif-finding Differential peak sets can be submitted for motif analysis to find enriched motifs TF ChIP Motif scanning can reveal novel binding sites or validate ChIP results with known binding sites Histone ChIP motif scanning in differential peaks can identify the active regulatory proteins responsible HOMER and MEME-ChIP are great ChIP motif finders Covered in detail this afternoon

28 Epigenomics background ChIP experimental design Outline ChIP bioinformatics Other chromatin assays Higher order chromatin assays

29 DNAse sensitivity assays are antibody free The first approach: from Crawford, 2006 (Francis Collins laboratory) 1. Digest with DNAse I to erase all the hypersensitive regions 2. Polish and ligate the remaining double-strand ends 3. Ligate 5 -biotinylated linkers to the DS ends 4. Shear (sonicate) or restriction-digest DNA into smaller fragments 5. Purify end sequences on a streptavidin column 6. Release sequences, add new linkers, and sequence Does not allow footprinting, because TF binding sites inside the HS regions have been digested away

30 Latest (and better) approach: sequences DNAse sensitive regions per se and permits transcription factor Footprinting The easiest method uses low concentrations of Dnase I to generate short fragments at sensitive ( open) sites Released fragments can be blunt-ended, ligated to linkers and sequenced directly Permits DNase Footprinting: Very deep sequencing can see short protected regions that are absent from the released DNA, and appear as protected valleys inside the DNAse sensitive peaks protected from DNAse I because they are occupied by TF proteins

31 Related methods and twists on the theme (see Furey et al., 2012 for review) Exo-ChIP Follows sonication with an exonuclease step, to pare back all but the protein-protected region in ChIP Nano-ChIP Methods ChIP normally required ~10 7 cells as input; hard to achieve for many cell types Nano ChIP can be carried out in several ways: 1. With carrier DNA: not the best for sequence analysis but can be done 2. Amplification after ChIP: very tricky because it can cause serious biases and artifacts, but can be done with care; linear amplification is the best strategy 3. ATAC and Tagmentation: a new method that creates libraries directly by transposon insertion The problem is library preparation, which needs a minimal amount of input for success

32 ATAC-seq and Tagmentation Uses transposase that has been modified to insert Illumina sequencing primers On untreated DNA, prefers to insert in open chromatin and is known as ATAC-seq (Greenleaf, 2013) Needs to be done on freshly collected tissue

33 ChIP Tagmentation Tagmentation can also be used to insert sequencing tags into immunoprecipitated DNA after or during ChIP (Schmidl, 2015) This allows you to make ChIP libraries from very small numbers of cells 50,000 or fewer!

34 Tagmentation Bioinformatics Bioinformatics pipeline from tagmented samples is similar to ChIP, but may require read trimming to remove transposon/index contamination Due to increased PCR amplification / bias in tagmentation vs traditional ChIP, it may be difficult to compare peak magnitudes Biological replicates strongly recommended

35 DNA Methylation Assays Bisulphite Sequencing (Review Li, 2011) Treatment of DNA with bisulphite followed by PCR amplification allows detection of methylated regions Requires very deep sequencing ($$$) Methyl-DIP (Weber, 2005) Enriches for methylcytosine using antibodies like ChIP Analysis is identical to standard ChIP Methyl-Binding-Domain Capture (Review Nair, 2011) Uses beads coated with MBD protein that bind methylated DNA directly Analysis is also identical to standard ChIP

36 Epigenomics background ChIP experimental design Outline ChIP bioinformatics Other chromatin assays Higher order chromatin assays

37 Back to the nucleus: Distant regulatory elements interact with promoters (and each other) through long-range chromatin loops Shear chromatin (Sonication or restriction enzyme) TF TF Regulatory elements are essentially docking sites for specific types of DNA-binding proteins Transcription factors, TATA-binding factors, and others These proteins serve to attract co-factors, which then mediate protein: protein interactions across chromatin loops Very long range interactions are common in vertebrates, less so in invertebrate species with lower coding:nocoding ratios ChIP with an antibody that binds to E DNA will bring down P DNA as well Proteins are crosslinked very efficiently to each other, as well as to DNA, by formaldehyde treatment When crosslinking is reversed the complex falls apart, and both DNA fragments are released independently Only one sequence binds to the TF! Common issue in analysis of ChIP

38 Chromatin conformation capture methods can identify these loop-linked sequences From Wikipedia Ends of the co-captured DNA fragments are ligated while still captured on the antibodybead with protein complex DNA is released and can be Queried by PCR for enrichment of suspected candidate interactors Circularized and PCR amplified using a primer from a bait region (4C) Directly sequenced for all X all interactions (5C, Hi-C, Chia-PET) Issue include random co-ligation between fragments that are not truly connected in the cell Over-crosslinking, which may join sequences that are nearby, incorrectly Provides a view of 3-D chromatin architecture, especially important for mammalian cells

39 Genomics Methods Summary Genomics RNA RNA-Seq Gro-Seq DNA Variant Calling Chromatin State DNase + ATAC DNA Methylation Bisulphite MBP/MethylDIP Histone Modifications Histone ChIP Transcription Factors TF ChIP Chromatin Structure 3C + 4C + HiC

40 Applied Genomics Social Stimulus Molecular Response Genetic response Develop -ment Animal Behavior Research Methods Behavioral Testing and Scoring Transcription Factor Activation Epigenetics and motif analysis Differential Gene Expression RNASeq and qpcr Developmental regulators Epigenetics and motif Analysis Research Findings Animals are stressed! Nuclear Receptor TFs enter nucleus, bind DNA Saul*, Seward* et al, Genome Research, 2017 Neurotransmitters, hormonal signaling Dormant Developmental Pathways activated, social learning

41 QUESTIONS?

42