Sequencing technologies

Size: px
Start display at page:

Download "Sequencing technologies"

Transcription

1 Sequencing technologies part of High-Throughput Analyzes of Genome Sequenzes Computational EvoDevo University of Leipzig Leipzig, WS 2014/15

2 Sanger Sequencing (Chain Termination Method) Sequencing of one DNA sequence (max 1000nt) 4 runs with either ddatps, ddctps, ddgtps, ddttps (optional: with fluorescent dye) in addition to regular dntps chain termitation at specific NTPs but random positions gel-electrophoresis

3 (High Throughput) Sequencing Workflow conceptualize the experimental idea get material (in sufficient amount) DNA template preparation DNA fragmentation adapter ligation single-end, paired-end, mate-pairs library preparation (in vivo, in vitro) amplification sequencing and imaging base/color calling quality control data analysis

4 Sequencing Techniques High Throughput Sequencing (HTS) e.g. high-throughput shotgun Sanger sequencing Next/Scecond Generation Sequencing (NGS, SGS) sequencing by synthesis e.g. Roche 454, Illumina sequencing by oligonucleotide ligation and detection e.g. SOLiD Single molecule sequencing e.g. single molecule real time sequencing (SMRTS), Helicos BioSciences, PacBio nanopore sequencing

5 Template fixation and amplification (454, SOLiD) beads with single primer type single bead with single template in microreactor of Emulsion Emulsion PCR (amplification) bead covered with identical sequences result: beads with thousands of identical ssdna Beads are fixed on glass slide or in PicoTiterPlate (PTP)

6 Template fixation and amplification (Illumina/Solexa) slide surface with forward and reverse primers hybridization bridge amplification (local) result: clusters with thousands of identical ssdna

7 Fluorescent dyes for each nucleotide (Illumina/Solexa) four nucleotides each with different fluorescent dye and reversible terminators (block chain extention) imaging after addition of exactly one nucleotide

8 Pyrosequencing (Roche 454) provide one type of nucleotide at a time (TTP) if TMP is inserted PP i APS +PP i ATP +SO 2 4 (sulfurylase) ATP activates Luciferase light Apyrase degrades unincorporated nucleotides light intensity gives the number of nucleotides added

9 Pyrosequencing (Roche 454) (orange) beads with Sulphurylase and Luciferase (green) cubes CTP (APS) is 3 -Phosphoadenosine-5 - phosphosulfate

10 SOLiD sequencing by ligation colors for first two dinucleotides (of oligonucleotides) sequence with primer shifted +/-1 to assemble the sequence need to know one nucleotide (adapter) or align to reference sequence

11 Comparison of sequencing technologies

12 Imaging of arrays Solexa/Illumina dense, disordered array multicolor, synchronous SOLiD dense, disordered array multicolor Helicos disordered array monocolor, asynchronous

13 Library generation How to store DNA fragments? in vivo linker-flanked DNA in a (living) vector amplification by natural replication with proof-reading DNA fragment/insert size limited by vector caution: loss of fragments that kill the vector/host in vitro linker-flanked DNA in a tube amplification by PCR, higher error rate caution: fast amplification of point mutations

14 Library generation spanning long distances paired-end library (cloning based and cloning-free) linker with binding sites for restriction endonucleases endonucleases cut 20nt downstream of BS MmeI (18/20nt) or EcoP15I (25/27nt) circularize again (cloning based) or ligate adaptors (cloning-free) amplification

15 Library generation spanning long distances mate-pair library caution: Illumina reads only 36bp caution: Roche 454 one long ( 200bp) one short read caution: reads might include adaptor sequences Solexa/Illumina Rocher 454 SOLiD 2 40kb 2 40kb 2 40kb ~200bp ~200bp B ~200bp ~200bp B ~60bp ~60bp B B B B B A1 A2 A1 CA A2 A1 CA A2 36bp ~300bp 50bp 50bp A1,A2 terminal adapter; CA central adapter; blizzard random site of fragmentation; encycled B biotin group

16 Comparison of sequencing methods

17 Single molecule sequencing Helicos Helicos primer imobilized add poly-a to DNA fragments imobilize poly-t primer highly sensitive fluorescence detection required monocolor, asynchronous improvement: two passes

18 Single molecule real-time sequencing PacBio PacBio polymerase imobilized extremely small volume (zero-mode waveguide) nucleotides with different fluorescent labels on 5 -phosphate incorporation millisecond fluorescent pulse repeated sequencing (e.g. 15 times) > 99.9% accuracy read length 1000bp

19 Nanopore sequencing nanopore (gray) in a membrane polymerase Φ29 (orange) passes DNA single strand through pore changes in current through the nanopore are characteristic for nucleotides improvement: DNA hairpin sequencing of both strands

20 Advantages and disadvantages Emulsion PCR: large amout of template DNA required cumbersome to implement high data density on array possible Illumina: large amout of template DNA (for mate pairs) required dephasing decreases quality with read length short reads deletions and substitutions Single molecule technology: small amount of template DNA dephasing is not a problem better quality longer reads up to 1000bp problems with homopolymers

21 Literature Excellent movies: Illumina PacBio Nanopore [Metzker, 2011] Michael L Metzker. Sequencing technologies the next generation. Nat Rev Genet , p [Shendure, 2008] Jay Shendure and Hanlee Ji. Next-generation DNA sequencing. Nat Biotech :10, p [Berglund, 2011] Eva C. Berglund, Anna Kiialainen and Ann-Christine Syvänen. Next-generation sequencing technologies and applications for human genetic history and forensics Invest. Genet :23, p2-15. Optical Recognition of Converted DNA Nucleotides for Single-Molecule DNA Sequencing Using Nanopore Arrays Nano Lett , p [Korlach, 2008] J. Korlach, P.J. Marks, R.L. Cicero, J.J. Gray, D.L. Murphy, D.B. Roitman, T.T. Pham, G.A. otto, M. Foquet and S.W. Turner Selective aluminum passivation for targeted immobilization of single DNA polymerase molecules in zero-mode waveguide nanostructures. PNAS. 105, p Steve Hoffmann. Computational Analysis of High Throughput Sequencing Data in Bioinformatics for Omics Data edited by Bernd Mayer. Humana Press, p