Thema Gentechnologie Erwin R. Schmidt Institut für Molekulargenetik Vorlesung #10 01. 07. 2014
Pyrosequenzierung
The Pyrosequencing technology is a relatively new DNA sequencing method originally developed here at KTH at the Department of Biotechnology. The technology has been commercialized and is today marketed by Biotage AB. The technique utilizes the cooperativity between four different enzymes and the phenomenon of bioluminescence to monitor the incorporation of nucleotides into the DNA. A short description of the steps in the Pyrosequencing process is given below. Initial step The reaction mixture consists of the four enzymes (DNA polymerase, ATP sulfurylase, luciferase and apyrase), different substrates needed for the reactions and the single stranded DNA to be sequenced. Step 1 - Polymerase One of the four nucleotides dntp (datp, dctp, dgtp, dttp) is added to the reaction mixture. If the added nucleotide is complementary to the base in the DNA strand, it is incorporated and inorganic pyrophosphate (PP i ) is released. Step 2 - ATP sulfurylase The PP i is converted into ATP by the enzyme ATP sulfurylase. Step 3 - Luciferase The luciferase catalyzes a reaction where ATP is used to generate light. The amount of light is proportional to the amount of ATP, and hence also proportional to the amount of incorporated nucleotides via the PP i. The light is then detected by a CCD camera. Step 4 - Apyrase Remaining dntp and ATP are degraded by the apyrase before the next nucleotide in the iterative cycle is added to the reaction mixture. My research is devoted to developing a good mathematical model of the reaction system. This will help us to understand the mechanisms governing the system in detail. Once a satisfactory model has been developed, it can be used to optimize the method with respect to substrate and enzyme concentrations as well as the choice of enzymes (kinetic parameters). As the demand for even better DNA sequencing techniques is steadily increasing, as new applications arise, there is a lot to gain by optimization.
Next Generation Sequencing (NGS) Erwin R. Schmidt Institut für Molekulargenetik Johannes Gutenberg Universität Mainz
DNA-Sequencing A brief historical overview The different platforms of NGS Benchtop versus High Output Cost and Reliabilty Future technologies Summary
NGS: Short History of (Nucleotide) Sequencing How many generations do we have?
First Generation Sequencing Nearest neighbor technology Combined with sequence or base specific nuclease digestion
The first nucleotide sequence of a complete biomolecule was the Alanine trna of Yeast by Robert W Holley et al. in 1964 Nobel Prize in Physiology and Medicine 1968 (for 77 nt)
Generation 2: The real breakthrough! 1975-1977 the Sanger Sequencing sequencing by synthesis Nobel Prize Chemistry 1980 1977 the Maxam and Gilbert Sequencing - sequencing by chemical degradation Nobel Prize Chemistry 1980
Development of the Sanger Sequencing 1975: Sanger and Coulson published the +/- method Sanger F, Coulson AR. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J Mol Biol. 1975 May 25;94(3):441 448. 1977: Sanger, Nicklen and Coulson published the chain terminator method F. Sanger, S. Nicklen, and A. R. Coulson DNA sequencing with chain-terminating inhibitors Proc Natl Acad Sci U S Dec 1977 Av.74(12); 5463-5467
The Maxam and Gilbert Method: based on base specific chemical degradation of endlabelled DNA-restriction fragments In the same year (1977) but 10 months before Sanger published the chain terminator Method Maxam and Gilbert published their DNA sequencing method based on chemical degradation of end-labelled DNA restriction fragments! A M Maxam, W Gilbert A new method for sequencing DNA. Proc Natl Acad Sci U S A. 1977 February; 74(2): 560 564.
Which sequencing method was Maxam and Gilbert sequencing: superior? A very robust method Not sensitive to secondary structures Shows base modification But: Requires work with with strong carcinogens and milicuries of radioisotopes Is not automatable Very laborious and requires long exposition times? A M Maxam, W Gilbert A new method for sequencing DNA. Proc Natl Acad Sci U S A. 1977 February; 74(2): 560 564.
Which sequencing method was Sanger sequencing: superior? Reading the sequence easier, No carcinogenic chemicals involved Exposure times were only a few hours the sequencing reactions could be done by the technician but the natural DNA-Polymerases are sensitive to secondary structures and stretches of homopolymeric nucleotides. This changed only when the sequenases were invented
Sanger-sequencing has won the race: Maxam and Gilbert: Sanger, Nicklen and Coulson: Number of citations: - 7690 times Number of citations: 62757 times Source: Google Scholar
Generation 3: on line sequencing - number of different techniques, - all based on fluorescently labelled DNA framents, which could be detected and tranferred automatically to a computer - automated base calling
Classical on line sequencing is still in use: The demand is still increasing Results are robust, low error rate < 1/1000-1/10000 bp Up to 1500 nt readable in a row Cost per sample ~ 3-5 (0.14 Cent Bp,ds) Comprehensive service available commercially
Generation # 4: Next generation sequencing (NGS) 2007: NGS selected by Nature as the method of the year introduces a new dimension in sequence determination Several platforms exist providing different possibilities
The advent of NGS is reflected by the number of genome projects and data base entries http://www.genomesonline.org/cgi-bin/gold/index.cgi?page_requested=statistics
In particular bacterial genome projects boost since 2008 http://www.genomesonline.org/cgi-bin/gold/index.cgi?page_requested=statistics
Complete Genome Projects: 12725 Archaeal: 317 Bacterial: 12096 Eukaryal: 312 Finished: 2876 Permanent Draft: 9849 Last updated 2014-01-24 Source: http://genomesonline.org/cgi-bin/gold/index.cgi
Genome Projects http://www.genomesonline.org/cgi-bin/gold/index.cgi Incomplete Genome Projects: 27988 Archaeal: 457 Bacterial: 19494 Eukaryal: 6413 Last updated 2014-01-24 Source: GOLD = Genomes Online Database at the DOE Joint Genome Institute
NGS has revolutionized genome science: Reduction of costs Reduction of time Reduction of labour Increase in bioinformatical challenge
The different platforms: The genome scale 454/Roche GenomeSequencer FLX ABI SOLiD Sequencing System Illumina/Solexa Hi- Seq2000/2500 Ion Torrent Proton Pacific Bioscience (Helicos) The bench top scale 454 GS Junior/Roche Illumina MiSeq Illumina NextSeq500 Ion Torrent PGM/Life Technologies
454/Roche GS FLX: The basis is Emulsion PCR and Pyrosequencing sst-dna: single-stranded template DNA
The number of sequences is depending on the number of wells in plate!
ÂPS = Adenosinephosphosulfate 454/Roche GS FLX: Pyrosequencing
Pyrosequencing is not suitable for sequencing oligopolymers n>6-7
GS FLX+ System Sequencing Kit New! GS FLX Titanium XL+ GS FLX Titanium XLR70 Read Length Up to 1,000 bp Up to 600 bp Mode Read Length 700 bp 450 bp Throughput Profile - 85% of total bases from reads >500 bp - 45% of total bases from reads >700 bp - 85% of total bases from reads > 300 bp - 20% of total bases from reads > 500 bp Typical Throughput 700 Mb 450 Mb Reads per Run ~1,000,000 shotgun ~1,000,000 shotgun, ~700,000 amplicon Consensus Accuracy* 99.997% 99.995% Run Time 23 hours 10 hours Sample Input gdna or cdna gdna, cdna, or amplicons (PCR products) Multiplexing Multiplex Identifiers (MIDs): 132 Gaskets: 2, 4, 8, 16 regions Data from Roche: http://454.com/products/gs-flx-system/
454/Roche GS FLX Titanium Advantages Long read length >400 nt up to 1000 Low error rate, but sensitive to homooligomers! Disadvantages Data output < 0,7 Gb Cost per Gigabase is highest among all systems
Applied Biosystems SOLiD TM -Sequencing SOLiD = Sequencing by Oligonucleotide Ligation and Detection Template preparation: Emulsion PCR Sequencing: Hybridization and ligation
By successive rounds labelled oligonucleotide ligation to the template each base in the template is determined twice
Process of SOLiD Sequencing Figure from Clinical Chemistry April 2009 vol. 55 no. 4 641-658
Each base is sequenced twice!
Applied Biosystems SOLiD TM Sequencing Advantage Very good data quality, since every base sequenced twice (99.99% correct) High data output ~ Solid4 TM hq 300 Gb/run/ 14d; High number of possible multiplexing (up to 1.536 sample per run) Cost effective: 2000 /human genome Disadvantage Maximum read length is 75 bases 14 days run time for 2x75 bases Data from: http://www3.appliedbiosystems.com/cms/groups/mcb_marketing/documents/generaldocuments/cms_061241.pdf
Illumina/Solexa TM -Sequencing Sequencing by Synthesis Modified chain terminating method Bridge amplification Paired end and mate pair libraries possible
Illumina/Solexa TM -Sequencing Clustering and sequencing
Illumina/Solexa TM -Sequencing Advantages (Hi-Seq TM 2000/2500) Very high data output > 400 Mio reads PE/lane; ~ 600 Gigabase/run; Read length PE 2x150 bases (increasing) Cost per Gb ~ <50 or 1500 /human genome Disadvantages Hardware investment is high (~600.000 plus periphery) Medium high error rate (~0.5%, increasing with read length) High maintenance costs (service contract >80.000 /year)
Life Technologies/Ion Torrent- Sequencing by ph Monitoring Based on sequencing by Synthesis Available since 2010 Emulsion PCR for library construction Beads with amplified molecules are primed with an adapter Beads are put in an Ion Chip, that is sensitive for H + -Ions Incorporation of a nucleotide produces an H + - Ion, which is measured by the chip
Ion Torrent: NGS by ph-change Measurement on a Semiconductor Chip G A T C Figure modified by E. R. Schmidt Annual Reviews
Life Technologies/Ion Torrent- Sequencing by ph Monitoring Advantages Very cost efficient (human genome < 1000 ) Read length 200 bases (increasing) Very short running times (~ 2-4 hrs) Hardware investment is low ( ~ 80.000 US $) Disadvantages High error rate (>1.0 %, increasing with read length) Especially sensitive to oligopolymer stretches, leading to a high rate of deletions Data output medium (depending on chip, e.g. Proton PII = 32 Gb)
Pacific Biosciences/Single molecule real time (SMRT)-sequencing Based on sequencing by synthesis on single molecules Available since 2010 Special library construction leading to circular molecules (enables multiple sequencing of the same molecule) Binding of engineered DNA-Polymerase in zeromode waveguide manufactured on a silicon wafer (SMRT TM -cell) fluorescence labelled dntp are measured in real time during incorporation
Zero-mode waveguide
Pacific Biosciences/Single molecule polymerase active site monitoring Advantages Read length up to 10.000 bases (average > 1000 b) Very short running times (~ 2hrs) Low running cost; acc. to the company a genome human equivalent a few hundred dollars Disadvantages High error rate (>10-15 %, for single pass sequencing, repeated sequencing lowers error rate to 2-3%) Significant investment in hardware (>600 k )
Helicos TM -Sequencing (16. November 2012 bancruptcy protection chapter 11) Sequencing by Synthesis with single molecules as templates Modified chain terminating method Bridge amplification
Benchtop NGS Sequencing Illumina Mi-Seq TM Roche 454 Junior TM Ion Torrent PGM TM
Costs and Performance of benchtop NGS* Table 1: Price comparison of benchtop instruments and sequencing runs Platform List price ApproximatMinimum e cost per throughput run (read length) Run time Cost/Mb Mb/h 454 GS Junior $108,000 $1,100 35 Mb (400 bases) 8 h $31 4.4 Ion Torrent PGM (314 chip) $80,490 a,b $225 c 10 Mb (100 bases) 3 h $22.5 3.3 (316 chip) $425 100 Mb d (100 bases) (318 chip) $625 1,000 Mb (100 bases) MiSeq $125,000 $750 1,500 Mb (2 150 bases) 3 h $4.25 33.3 3 h $0.63 333.3 27 h $0.5 55.5 * From: Performance comparison of benchtop high-throughput sequencing platform Nicholas J Loman, Raju V Misra Timothy J Dallman, Chrystala Constantinidou, Saheer Gharbia, John Wain & Mark J Pallen Nature Biotechnology30,434 439 (2012) doi:10.1038/nbt.2198
Updating benchtop sequencing performance comparison Sebastian Jünemann, Fritz Joachim Sedlazeck, Karola Prior, Andreas Albersmeier, Uwe John, Jörn Kalinowski, Alexander Mellmann, Alexander Goesmann, Arndt von Haeseler, Jens Stoye & Dag Harmsen Nature Biotechnology 31, 294 296 (2013) doi:10.1038/nbt.2522 Published online 05 April 2013
http://www.genome.gov/images/content/cost_per_megabase.jpg
Generation 5: Future Technology: Nanopore Sequencing
From: https://www.nanoporetech.com/home In Oxford Nanopore's 'strand sequencing' method, a single-stranded DNA polymer is passed through a protein nanopore, and individual DNA bases on the strand are identified in sequence as the DNA molecule passes through. When a DNA polymer passes through a nanopore, a number of individual DNA bases occupy the aperture of the nanopore at any time. A successful method of DNA sequencing must identify the sequence of individual bases within this strand. Oxford Nanopore has engineered bespoke nanopores, and data analysis algorithms are used to translate the characteristic electronic signals into DNA sequence data. A method of controlled translocation of the strand through the nanopore is needed. Oxford Nanopore uses proprietary, highly processive enzymes to ratchet DNA through the nanopore. Watch our movie for more information. Oxford Nanopore has not disclosed the proprietary nanopore and enzyme machinery used in its GridION and MinION system. Oxford Nanopore has not signed a commercialisation agreement for strand sequencing and intends to commercialise strand sequencing products independently.
Applications of NGS Genome Resequencing and SNP Detection Genome De novo sequencing Transcriptome sequencing ChIP-sequencing; Histonmethylation Bisulfate-sequencing for methylation analysis Exome enrichment sequencing Small RNA sequencing Genotyping by Sequencing (GBS; RAD); reduced complexity sequencing Ribosome profiling
No summary table! Equipment and technologies are too diverse, so a good advice would be: Discuss your project with people having experience with one or the other platform. NGS is a fantastic novel technology which provides completely new possibilities! Projects that have been even unthinkable a few years ago are now easy going! Thank you very much for your attention!
Steffen Rapp NGS Unit manager Benjamin Rieger bioinformatician Nicole Naumann technical assistant Rudolf Baader technical assistant