NGS: the basics
Human genome sequence June 26th 2000: official announcement of the completion of the draft of the human genome sequence (truly finished in 2004) Francis Collins Craig Venter HGP: 3 billion $ 15 years Costs: Celera: 200 million $ 2 years
2004: 2 Requests for Application NIH Current technologies are able to produce the sequence of a mammalian-sized genome of the desired data quality for $10 to $50 million; the goal of this initiative is to reduce costs by at least two orders of magnitude. It is anticipated that emerging technologies are sufficiently advanced that, with additional investment, it may be possible to achieve proof of principle or even early stage commercialization for genome-scale sequencing within five years. A parallel RFA solicits grant applications to develop technologies to meet the longer-term goal of achieving four-orders of magnitude cost reduction in about ten years, so that a mammalian-sized genome could be sequenced for approximately $1000.
Increased efficiency: decreased costs Exponential cost decrease
Efficient integration of each individual step to slash down the costs
Massively parallel sequencing Next generation sequencing Key: direct sequencing of DNA without the bacterial cloning step From colonies to polonies
454 Roche GS Flex
454: Library preparation
Clonal amplification of single molecules Emulsion PCR
454: Sequencing by pyrosequencing
GS Flex throughput (2011-2013) Up to a million sequences 700 bp long (up to 1 kb) in 23 hours
454: Game over! Jonathan Rothberg: In the sequencing business, one needs to innovate or die. At 454 we were always first; first non-bacterial cloning, first commercialization, first next-gen individual human genome, Neanderthal, mammoth, deep sequencing, cancer sequencing, drug response studies, HIV, metagenomics, first drug target by whole genome sequencing, and many more firsts. Always innovating, always first."
454: Game over! In 2007, Roche acquired 454 for $155 million in cash and stock. Rothberg said that when Roche bought 454, the company was "two years ahead of everyone else," but after the purchase, "they lost that lead, no more firsts, no more innovation."
Rothberg strikes back! Rothberg: "When I woke up and found Roche had bought 454 without me, I had to restart. It cost three years. We had to invent a new scalable way to sequence ion semiconductor sequencing and establish a clear path towards both truly low-cost and mobile sequencing." He went on to found Ion Torrent, which was bought by Life Technologies in 2010 for $375 million in cash and stock, and another $350 million based on milestones.
Ion Torrent
Simple Natural Chemistry
Fast Direct Detection dntp H + ph Q Sensing Layer Sensor Plate V Bulk Drain Source Silicon Substrate To column receiver Rothberg J.M. et al Nature doi:10.1038/nature10242 Nucleotides flow sequentially over Ion semiconductor chip Direct detection of natural DNA extension A few seconds per incorporation
Scalable Semiconductor Technology Wafer Semiconductor Manufacturing Chip Semiconductor Packaging Chip Cross Section Semiconductor Design
The Chip is the Machine Scalability Simplicity Speed
Two machines, 5 chips PGM 314 316 318 Proton P1 P2?
Ion Torrent Specs 314 Chip: 0.4 to 0.5 million reads, 30 to 100 Mb data 316 Chip: 2 to 3 million reads, 300 to 1000 Mb data 318 Chip: 4 to 5.5 million reads, 0.6 to 2 Gb data 200bp or 400bp reads, 2 to 7 hours Proton P1: 60-80 million reads, up to 10 Gb data 200bp reads, 2-4 hours Proton P2: L arlésienne!
Barcode read just before insert with Ion Torrent Barcoded adapter Insert Biotin adapter Sequencing primer Barcode
Ion Torrent paired-end sequencing
Illumina genome analyzer, HiSeq, Miseq (formerly Solexa)
Solexa amplification step Amplification and sequencing on a solid support
Sequencing by synthesis CRT: cyclic reversible termination
Sequencing by synthesis Amplification and sequencing on a solid support
120 tiles per lane 480 images per lane and cycle: 36nt run = 138,240 images = 945 Gb 2x50nt run = 384,000 images = 1.3 Tb 2x100nt run = 768,000 images = 5.3 Tb Illumina: Primary data analysis
Image analysis (Illumina) Image registration: Get image coordinates congruent Register images between cycles A C Cluster identification Template of cluster positions created from first five cycles G T
Cluster identification If neighboring clusters have identical sequences during first 5 cycles: one cluster If neighboring clusters have different sequences during first 5 cycles: two clusters As a consequence: Barcodes should not be included in the first bases otherwise the probability of fusing two different clusters would be too high
Illumina paired-end sequencing
Barcoding with a single index (Illumina)
Barcoding with dual indexing (Illumina)
Illumina-Solexa throughput (End 2013) Up to 3 billion sequences, up to 2*100 bp long in 11days (Hiseq2000) Or 0.6 billion, 2*150 bp, in 40 hours (Hiseq2500) Or 12-55 million, 2*250, in 39 hours (Miseq V2) Or 22-25 million, 2*300, in 65 hours (Miseq V3)
Solid sequencing Applied Biosystems
Solid Applied Library
Solid Applied Library Emulsion PCR
Solid Applied Library
Solid Applied Sequencing
Solid Applied Sequencing
Solid throughput (Early 2009) Up to 0.2 billion sequences up to 2*60 bp long in 7 days
Step1: fragment tagging Complete Genomics A human genome for 5,000$
Complete Genomics Step2: Clonal DNA amplification A human genome for 5,000$
Complete Genomics Step3:Distribution over patterned substrate 1 billion spots per slide A human genome for 5,000$
Complete Genomics Step 4: Sequencing by ligation A human genome for 5,000$
Step 5: Assembly Complete Genomics A human genome for 5,000$
Complete Genomics Costs slashing: small volumes, «simple» equipment A human genome for 5,000$
Third Generation sequencing Single molecule sequencing No PCR amplification
Helicos Bioscience Single molecule fluorescent sequencing on a flow cell
Helicos Cyclic reversible termination: single DNA molecule extended one base at a time, blocking fluorescent label removed and washed, and reiteration
Helicos Improved cyclic reversible termination and single DNA molecule detection
Helicos throughput Up to 1 Billion sequences On average 32 bp long in 7 days
Pacific Biosciences Long single molecule sequencing
Pacific Biosciences The label is on the phosphate, and the label is captured transiently using a DNA polymerase tethered on a nanopore
Pacific Biosciences Thousands of nanoguides concentrate light The ZMW nanostructure provides excitation confinement in the zeptoliter (10 21 liter) regime
Label on the phosphate, not on the base Pacific Biosciences
Pacific Biosciences Real time detection of incorporation of each base on thousands of molecules
Pacific Biosciences throughput Each pore: 10 bases/sec Claim: in 2013, a high quality human genome in 15 minutes
Third or Fourth generation sequencing Single molecules, no fluorophore Oxford Nanopore Technology
Oxford Nanopore Exonuclease Pore across lipid bilayer Nanopore Array chip Bases passing through the pore generate a change in the electrical conductance of the membrane allowing electrical measurements. A, T, G, C and MeC can be distinguished.
Oxford Nanopore
There are several more possibilities in the pipelines BioNanomatrix VisiGen Dover Systems Intelligent Bio-Systems ZS Genetics Reveo LightSpeed Genomics