EpiGnome Methyl-Seq - DNA Methylation Analysis using Whole Genome Bisulfite Sequencing Victor Ruotti! Bioinformatics Scientist! Epicentre (An Illumina company) 2013 Illumina, Inc. All rights reserved. Illumina, IlluminaDx, BaseSpace, BeadArray, BeadXpress, cbot, CSPro, DASL, DesignStudio, Eco, GAIIx, Genetic Energy, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iselect, MiSeq, Nextera, NuPCR, SeqMonitor, Solexa, TruSeq, TruSight, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners. 1
Bisulfite Sequencing 2
3
Integrative Approach on Methylation Xie W. Epigenomic analysis of multilineage differentiation of human embryonic stem cells. Cell. 2013 May 23;153(5):1134-48 Epub 2013 May 9. 4
Library Preparation Purify DNA Fragment (Covaris)! End-Repair! Ligate Adaptors! BS Conversion PCR Prior Methods 1-5 µg input Covaris + loss due to DNA damage Multiple cleanups Bisulfite conversion before PCR Methylated adaptors Specialized proof reading polymerase High PCR cycles Lengthy protocol 50 ng input No Covaris Post BS conversion library prep No methylated adaptors No special polymerases Only 10 cycles of PCR 5 hours protocol Purify DNA BS Conversion Random Prime Tagging PCR EpiGnome Sequencing 2x75 bp Reads Analysis Bioinformatics Analysis Pipeline 5
Agenda - Analysis: Bioinformatics Workflow Bioinformatics Analysis STEP 2 Generation of bisulfite converted genome STEP 1 Quality Control STEP 3 Genome Alignment STEP 4 Methylation calls STEP 5 Generation of genome wide tracks Krueger F & Andrews SR (2011) Bioinformatics 27(11): 1571-72 6
STEP 1 Quality Control HiSeq 2x75bp EpiGnome (50 ng Input) Filtering reads based on quality Assessing Bisulfite conversion rate Control: No Bisulfite Conversion Bisulfite-Converted EpiGnome Library Bisulfate conversion Rate (Distribution of bases) 7
STEP 1 - Quality Control Trimming of Adapters TrimGalore: Helping removing adapters from reads TrimGalore (hvp://www.bioinformaxcs.babraham.ac.uk/projects/trim_galore/) TrimmomaXc (hvp://www.usadellab.org/cms/?page=trimmomaxc) Krueger F & Andrews SR (2011) Bioinformatics 27(11): 1571-72 8
STEP 2 - Generation of Bisulfite Converted Genome Top$strand$ mc% % >>CCGGCATGTTTAAACGCT>>% <<GGCCGTACAAATTTGCGA<<% % mc% % hmc% mc% % Bisulfite$conversion$ % mc% Bo,om$strand$ OT CTOT >>UCGGUATGTTTAAACGUT>>% <<GGUCGTACAAATTTGCGA<<% PCR$amplifica8on$ >>TCGGTATGTTTAAACGTT>>% <<AGCCATACAAATTTGCAA<<% >>CCAGCATGTTTAAACGCT>>% <<GGTCGTACAAATTTGCGA<<% CTOB OB OT original top strand CTOT complementary to original top strand OB original bovom strand CTOB complementary to original bovom strand! G - > A! C - > T Krueger F & Andrews SR (2011) Bioinformatics 27(11): 1571-72 Bismark - > Bowtie - > Create Indexes 9
STEP 3 - Genome Alignment sequence&of&interest& TTGGCATGTTTAAACGTT 5 TTGGTATGTTTAAATGTT 3 5 TTAACATATTTAAACATT 3 bisulfite&convert&read&(treat&sequence&as&both& forward&and&reverse&strand)& (1)$ (2)$ align&to&bisulfite&converted&genomes& (3)$ (4)$ TTGGTATGTTTAAATGTT AACCATACAAATTTACAA forward&strand&c&6>&t&converted&genome& CCAACATATTTAAACACT GGTTGTATAAATTTGTGA forward&strand&g&6>&a&converted&genome& (equals&reverse&strand&c&6>&t&conversion)& (1)$ (2)$ (3)$ (4)$ read&all&4&alignment&outputs&and&extract& the&unmodified&genomic&sequence&if&the& sequence&could&be&mapped&uniquely& 5 CCGGCATGTTTAAACGCT 3 Krueger F & Andrews SR (2011) Bioinformatics 27(11): 1571-72 read&sequence& genomic&sequence& methylacon&call& TTGGCATGTTTAAACGTT CCGGCATGTTTAAACGCT cc..c...z.c. methylacon&call& c&&&&&unmethylated&c& C&&&&&methylated&C& z&&&&&unmethylated&c&in&cpg&context& Z&&&&&methylated&C&in&CpG&context& Slide, Courtesy of Krueger 10
STEP 3 - Genome Alignment sequence&of&interest& TTGGCATGTTTAAACGTT 5 TTGGTATGTTTAAATGTT 3 5 TTAACATATTTAAACATT 3 bisulfite&convert&read&(treat&sequence&as&both& forward&and&reverse&strand)& (1)$ (2)$ align&to&bisulfite&converted&genomes& (3)$ (4)$ TTGGTATGTTTAAATGTT AACCATACAAATTTACAA forward&strand&c&6>&t&converted&genome& CCAACATATTTAAACACT GGTTGTATAAATTTGTGA forward&strand&g&6>&a&converted&genome& (equals&reverse&strand&c&6>&t&conversion)& (1)$ (2)$ (3)$ (4)$ read&all&4&alignment&outputs&and&extract& the&unmodified&genomic&sequence&if&the& sequence&could&be&mapped&uniquely& 5 CCGGCATGTTTAAACGCT 3 Krueger F & Andrews SR (2011) Bioinformatics 27(11): 1571-72 read&sequence& genomic&sequence& methylacon&call& TTGGCATGTTTAAACGTT CCGGCATGTTTAAACGCT cc..c...z.c. methylacon&call& c&&&&&unmethylated&c& C&&&&&methylated&C& z&&&&&unmethylated&c&in&cpg&context& Z&&&&&methylated&C&in&CpG&context& Slide, Courtesy of Krueger 11
STEP 3 - Genome Alignment (Bowtie2) Data from 4 lanes of HiSeq 2x75bp EpiGnome Coriel DNA GM18507 Library Alignment Statistics Reads PF 736,320,254 Paired-end alignments with unique best hit 537,880,874 Read 1/Read2 (% >= Q30; mean from 4 lanes) 92.9%/90.2% Total number of C s analyzed 14,936,878,537 C methylated in CpG context 531,996,413 ( 54%) C methylated in CHG context 71,539,202 (1.9%) C methylated in CHH context 90,071,498 (0.9%) % Aligned 75.5 % Unique 91.9 12 Krueger F & Andrews SR (2011) Bioinformatics 27(11): 1571-72
STEP 4 - Methylation Calls Strand-specific methylation output files The methylation extractor output looks like this (tab separated): (1) seq-id (2) methylation state (3) chromosome (4) start position (= end position) (5) methylation call Examples for cytosines in CpG context: HWUSI-EAS611_0006:3:1:1058:15806#0/1-1 91793279 z HWUSI-EAS611_0006:3:1:1058:17564#0/1 + 2 122855484 Z Examples for cytosines in CHG context: HWUSI-EAS611_0006:3:1:1054:1405#0/1-1 89920171 x HWUSI-EAS611_0006:3:1:1054:1405#0/1 + 2 89920172 X Examples for cytosines in CHH context: HWUSI-EAS611_0006:3:1:1054:1405#0/1-1 89920184 h Krueger F & Andrews SR (2011) Bioinformatics 27(11): 1571-72 13
STEP 4 - Methylation Calls Data from 4 lanes of HiSeq 2x75bp (EpiGnome) 14
STEP 5 - Generation of Genome Wide Tracks The bedgraph output (optional from methylation extraction) Tab-delimited 0-based start, 1-based end coords): CpG File: track type=bedgraph (header line) <chromosome> <start position> <end position> <methylation percentage> low methylation high methylation Depth of coverage CpG % Methylation CpG High % Methylation CpG low Integrated Genome Viewer (IGV) 15
16 http://www.epibio.com/epignome?documents
EpiGnome Methyl-Seq Kit Whole Genome Bisulfite Sequencing from 50ng gdna For more information, please contact: Fraz Syed Sr. Product Manager fsyed@illumina.com Website: EpiGnome Methyl-Seq www.epibio.com/epignome Sales enquiries: 800-284-8474 sales@epicentre.com THANK YOU! Illumina Survey at Booth And one more thing 17
Targeted bisulfite amplicon sequencing on Nextera XT! Leverages upstream bisulfite conversion, Nextera XT workflow! Ideal for omic follow-up studies Fine mapping Orthogonal validation 18 1