Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop

Size: px
Start display at page:

Download "Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop"

Transcription

1 Output (bp) Aaron Liston, Oregon State University Growth in Next-Gen Sequencing Capacity 3.5E E E E E E+11 Adapted from Mardis, 2011, Nature 5.0E E+00 ABI 454 Solexa ABI 3730xl GS20 1G GAII SOLiD GAIIx HiSeq Platforms Slide courtesy of Rich Cronn NGS Library Types 454 (2005) Fragment Library Template Type Sequencing Method Imaging Method Clonally amplified by emulsion PCR Sequencing by synthesis using single nucleotide addition Bioluminescence with charge coupled device (CCD) camera Paired-end Library Separation of bp Mate-paired Library Original separation of 2-5 kb Ansorge, W New Biotechnology 25:

2 GS Jr. Titanium 10 hrs chip 2.5 hrs FLX Titanium 10 hrs FLX+ 20 hrs (Solexa) 2007 Clonally Amplified Templates Template Type Sequencing Method Imaging Method Clonally amplified by solid phase amplification Sequencing by synthesis with cyclic reversible termination Four color imaging of single events using fluorescence Solid-phase Amplification Metzker, M Nature Reviews Genetics 11: Cyclic reversible termination (CRT) MiSeq 26 hrs GAIIx 14 days ,000 HiSeq 1000 HiSeq days , days ,000 Metzker, M Nature Reviews Genetics 11:

3 SOLiD (2008) SOLiD Template Type Sequencing Method Imaging Method Clonally amplified by emulsion PCR Sequencing by ligation Four color imaging of single events by CCD camera SOLiD 5500xl 8 days >1, ,100 Ion Torrent (2010) Ion Torrent (2010) Semiconductor Sequencing Semiconductor Sequencing Ion Torrent (2010) Ion Torrent Semiconductor Sequencing 314 chip 316 chip 318 chip 2.5 hrs hrs hrs

4 Ion Torrent Proton (2012) Pacific Biosciences (2010) Real Time Sequencing by Synthesis circular consensus, Reads and for Current NGS s Oxford Nanopore (2012) 3730xl (capillary) 2 hrs PacBio RS 2 hrs , GS Jr. Titanium 10 hrs chip 2.5 hrs FLX Titanium 10 hrs FLX+ 20 hrs chip 3 hrs MiSeq 26 hrs chip 4.5 hrs GAIIx 14 days ,000 SOLiD 5500xl 8 days >1,410 d ,100 HiSeq days ,000 HiSeq days ,000 Strand Sequencing 64 triplet signals Exonuclease Sequencing 4

5 Oxford Nanopore (2012) GridION MinION Platform Primary Errors Single-pass Final Error Rate Error Rate (%) (%) 3730xl (capillary) Substitution Indel 1 1 NGS Error Rates Substitution ~0.1 (85% of reads) ~0.1 (85% of reads) SOLiD A-T bias ~5 0.1 Ion Torrent Indel ~1 ~1 PacBio RS CG deletions ~15 15 Oxford Nanopore Deletions 4 4, Reads & for Current and Announced NGS s ABI 3730xl (capillary) 2 hrs PacBio RS 2 hrs , GS Jr. Titanium 10 hrs Oxford Nanopore MinION (2012) 6 hrs or less [0.1] [9,000] [1000] 314 chip 2.5 hrs FLX Titanium 10 hrs FLX+ 20 hrs chip 3 hrs MiSeq 26 hrs chip 4.5 hrs Oxford Nanopore GridION 2000 (2012) [6 hrs or less] [4] [10,000] [40,000] Oxford Nanopore GridION 8000 (2013) [6 hrs or less] [10] [10,000] [100,000] MiSeq upgrade (2012) [36 hrs] Proton I (2012) 4 hrs [50] [200] [40,000] Proton II (2013) 4 hrs [250] [400] [100,000] GAIIx 14 days ,000 HiSeq 2500 mini-cell (2012) 42 hrs ,000 SOLiD 5500xl 8 days >1, ,100 HiSeq days ,000 HiSeq days ,000 Grey = based on company sources. Brackets = speculation. How much will it cost? Reagent Minimum Unit Cost Cost/run a Reagent Cost/MB (% run) ABI 3730xl (capillary) $144 $2308 $6 (1%) PacBio RS $ c $7-38 $500 (100%) 454 GS Jr. Titanium $1100 $22 $1500 (100%) 454 FLX Titanium $6,200 $12 $2000 (12%) 454 FLX+ d $6,200 $7 $2000 (12%) 314 chip $350 $7 ~$750 (100%) 316 chip $550 $2 ~$1000 (100%) Oxford Nanopore minion (2012) $900 $1 ~$1100 (10%) MiSeq $1160 $1 ~$1400 (100%) 318 chip $750 $1 ~$1200 (100%) GAIIx $17,575 $0.19 $3000 (14%) iscansq $12,750 $0.09 $3000 (14%) Proton I (2012) $1000 $0.09? (100%) SOLiD 5500xl $10,503 <$0.07 $2000 (12%) HiSeq 1000 $10,220 $0.04 $3000 (12%) HiSeq 2000 $23,470 d $0.04 $3000 (6%) HiSeq 2500 or MiSeq upgrades (2012)??? Oxford Nanopore GridION 2000 (2012) varies $ ? ( 1%) Oxford Nanopore GridION 8000 (2013) varies $ ? ( 1%) Proton II (2013) [$1000] [$0.01]? (100%) Includes all stages of sample prep. for a single sample (i.e., library prep through sequencing. capillary = sequencing only) NGS Technology Summary Platform Year Sequencing Method Amplification Detection Features Pyrosequencing Emulsion PCR Light First NGS 2007 Synthesis Bridge PCR Light 90% of Market SOLiD 2008 Ligation Emulsion PCR Light Lowest Error Rate Ion Torrent 2010 Synthesis Emulsion PCR Hydrogen Ion Pacific 2010 Synthesis Biosciences Oxford 2012 Nanopore Nanopore None = Single Molecule None = Single Molecule Light Electrical Conductivity Semiconductor Chip Anchored Polymerases Run Until Sequencing Modified from Travis C. Glenn Field guide to next-generation DNA sequencers. Molecular Ecology Resources 11:

6 purchase, additional instrument and service agreement costs. Purchase Cost Additional s Service Contract ABI 3730xl (capillary) $376,000 - $19, GS Jr. Titanium $108,000 $16,000 $12, FLX to FLX+ upgrade $29, FLX+ $450,000 $30,000 $50,000 PacBio RS $695,000 - $85,000 (314/316/318 chips) $49,000 $16,000* $9,900* Proton (2012) $149,000 $16,000* $32,000* SOLiD 5500xl $251,000 $54,000 $44,400 MiSeq $125,000 - $12,500 MiSeq upgrade (2012) $0 - - HiScanSQ $405,000 $55,000 $41,500 GAIIx $250,000 $100,000 $44,500 HiSeq 1000 $560,000 $55,000 $62,000 HiSeq 1000 to 2000 upgrade $175, HiSeq 2000 $690,000 $55,000 $75,900 HiSeq 2000 to 2500 upgrade (2012) $50, HiSeq 2500 (2012) $690,000 $55,000 $75,900 Oxford Nanopore minion (2012) $0 $0 $0 Oxford Nanopore GridION 2000 (2012) [$30,000]???? Oxford Nanopore GridION 8000 (2013) [$30,000]???? *Includes optional OneTouch template preparation instrument. Required Computational Resources Computational Resources Data File Sizes (GB) 3730xl (capillary) $2,000 desktop GS Jr. Titanium $5,000 desktop <3 images, <1 sff 454 FLX Titanium $5,000 desktop 20 images, 4 sff 454 FLX+ $5,000 desktop 40 images, 8 sff PacBio RS $65,000 cluster 20 pulsed, 2 Fastq 314 chip $16,500 desktop server 0.1 Fastq 316 chip $16,500 desktop server 0.6 Fastq 318 chip $16,500 desktop server [small] Proton I (2012) $75,000 cluster [big] Proton II (2013) $75,000 cluster [big] SOLiD 5500xl $35,000 cluster 148 MiSeq $5,000 desktop or BaseSpace cloud 1 HiScanSQ $222,000 cluster (or DYI for less) 50 GAIIx $222,000 cluster (or DYI for less) 600 HiSeq 1000 $222,000 cluster (or DYI for less) 300 HiSeq 2000 $222,000 cluster (or DYI for less) 600 HiSeq 2500 (2012) $222,000 cluster (or DYI for less) [big] Oxford Nanopore minion (2012) laptop [small] Oxford Nanopore GridION 2000 (2012)? [small to big] Oxford Nanopore GridION 8000 (2013)? [small to big] Desktops assume higher-end models with multiple processors, 8 GB RAM and 1 TB HD. Primary Advantages Primary Disadvantages 3730xl (capillary) Low cost for very small studies Very high cost for large amounts of data. 454 GS Jr. Long read length. Low capital cost. Titanium Low cost per experiment High cost per Mb. 454 FLX+ Double the maximum read length of High capital cost. High cost per Mb. Titanium Reagent issues. Upgrade issues. PacBio High error rates. Low total number of reads per Single molecule real-time sequencing. run. High cost per Mb. High capital cost. Longest available read length. Short Many methods still in development. instrument run time. Low cost per sample. Weak company performance. 314/316/318 SOLiD 5500xl MiSeq HiSeq Primary Advantages and Disadvantages Current Platforms Low cost per sample for small studies. Fast runs. Semiconductor Chips. with few moving parts. Higher error rate than. Higher cost per Mb. Long sample prep. Each lane of Flow-Chip can be run Not likely to be sold very long after the Ion independently. High accuracy. Torrent Proton comes to market. Relatively short Ability to rescue failed sequencing cycles. reads. more gaps in assemblies than 96 validated barcodes per lane. data. less even data distribution than. High throughput. High capital cost. Low cost instrument and runs. Low cost/mb for a small platform. Fastest run times and longest read lengths. One or two independent flow cells. Most reads, Gb per day and Gb per run. Lowest cost per Mb. Relatively few reads and Higher cost/mb.compared to other platforms. High capital cost. High computation needs. Primary Advantages Primary Disadvantages Proton MiSeq Upgrade HiSeq 2500 Oxford Nanopore minion Oxford Nanopore GridION Primary Advantages and Disadvantages New Platforms Moderately low-cost instrument for high throughput applications. Cost / Mb approaching HiSeq. Same as MiSeq, but 3X more reads and 250X250 paired ends. Free upgrade. Same as HiSeq 2000, but can also run two 2 lane miniflowcells to achieve much faster run times and longer read lengths. No instrument. USB powered. No sample processing required. Could be used in the field. Extremely long reads are feasible. Low-cost Error-rate likely higher than. Higher cost/mb than HiSeq. Reagent costs not announced yet, but likely to be higher than current MiSeq. Mini-FlowCell will have a higher cost per read than standard flow cell. Can t run mini and standard flow cells together. No data publicly available. High cost per Mb relative to other Nanopore sequencers. No data publicly available. instrument (node). Error-rate doesn t increase Announced 4% error-rates. Single use along the length of the read. Real time cartridges may require serial analysis allows run until sequencing. sequencing for efficiency. Platform Application: de novo assemblies BACs, plastids, & microbial transcriptome Plant & animal genome 454 GS Jr. B good but expensive C need multiple runs, expensive D cost prohibitive B/C good as part of a mixed A good, need to multiplex to A/B good but expensive, not best 454 FLX+ platform strategy, expensive be economical for short RNAs to use alone B/A may need multiple runs, B good, assembly more assembly more challenging than C expensive, use to validate MiSeq challenging than , longer reads may make it the libraries for HiSeq best HiSeq Utility (according to Travis Glenn Univ. Georgia) of currently available DNA sequencing platforms for de novo assembly B/C more data than needed unless highly indexed. assembly more challenging than 454 C reads are shorter than & as expensive as 454 A/B good, assembly more challenging than 454 but much more data available for analyses C reads are shorter than & as expensive as 454 A primary data type in many current projects. requires mate-pair libraries D cost prohibitive, reads shorter than alternatives B good, data more C high cost, data more B/C good, data more challenging challenging to assemble than challenging to assemble than to assemble than 454 or 454 or 454 or Utility grades combine data characteristics (amount, quality, length), cost of data, and ease of assembling the data into the final desired product. Platform 454 GS Jr. 454 FLX+ MiSeq HiSeq Utility (according to Travis Glenn Univ. Georgia) of currently available DNA sequencing platforms for resequencing Application: Resequencing Targeted loci Transcript counting Genome resequencing B good but expensive, need to D cost prohibitive for large D cost prohibitive limit loci B good but expensive, should C/D cost prohibitive for D cost prohibitive limit loci large A/B good, fewer and higher B more expensive than C expensive for large cost reads than HiSeq HiSeq or SOLiD A primary data type in many A primary data type in A primary data type in current projects. best for many many current projects many current projects loci C OK but expensive, need to limit loci D cost prohibitive D cost prohibitive B good, slightly less data per run than MiSeq B/C more expensive than HiSeq or SOLiD. new informatics pipelines needed. new error profile C expensive for large Utility grades combine data characteristics (amount, quality, length), cost of data, and ease of assembling the data into the final desired product. 6