THE SEQUENCING TECNOLOGY (R)EVOLUTION TIM STAKENORG IMEC MB&C meeting May 16, 2013 IMEC 2013
HISTORY OF SEQUENCING 384-322 BC - Aristotle told his students that all inheritance comes from the father 1977 (2 indepent methods published in PNAS) - Maxam & Gilbert: chemical degradation method - Sanger: ddntp-mediated chain termination!! 1995 (Fleishmann et al., Science 269: 485) - Mycoplasma genitalium (first fully sequenced bacterial genome) 2001 (Science/Nature) - First human genome (13 years, 300 million USD) May 2005 (454 technology) - 6 month, >30 million USD IMEC 2013 2
HISTORY OF SEQUENCING 1,E+09 Itanium 2 G80 RV770 AMD K10 transistor count (Moore's law) vs. sequenced kilo base pairs/day 1,E+08 1,E+07 1,E+06 1,E+05 1,E+04 1,E+03 1,E+02 1,E+01 Intel 4004 Intel 8088 Moore s law Intel 8080 Intel 80286 Intel 80386 Pentium II Intel 80486 ABI373 Pentium Pentium 4 AMD K5 ABI377 AMD K7 Pentium III ABI37000 Cell AMD K8 Barton Roche 454 Life Sciences ABI 3730XL Illumina HiSeq Atom Pacific Biosciences SMRT* 454 Titanium, ABI Solid3 First Solid manual slab gel 1,E+00 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 date of introduction 3 rd generation 2 nd generation (sequence by synthesis) 1 st generation (capillary electrophoresis) (slab gels) Note: human genome = ~3.10 9 bases IMEC 2013 3
HISTORY OF SEQUENCING Still many challenges in post-processing of data Data handling Computational algorithms IMEC 2013 4
THE FIRST GENERATION IMEC 2013 5
FIRST GENERATION (SANGER) Cyclic sequencing (amplification) reaction - PCR products of different length - Last base is fluorescent (different color per base) - Separation by size Pros and Cons - Extensive sample prep (-) - High cost (-) - Low throughput (-) - Long read lengths (+) IMEC 2013 6
DRAFT GENOME 1990 Human genome project started First draft in 2001, over 10 years and $3 billion later In 2003 (published 2004) finished human genome sequence February 2001 April 2011 IMEC 2013 7
IMEC 2013
THE NUMBER OF GENES Human genome : ~3 Gbase (300,000 kbases) Average gene size: ~3kbases, but sizes vary greatly (largest is dystrophin: 2.4 Mbases) GENE SWEEP (Cold Spring Harbor Lab 2000-2003) Rules: $1 in 2000, $5 in 2001 and $20 in 2002 165 bets Mean 61710 Lowest 25947 (Lee Rowen) Highest 153478 IMEC 2013 9
THE HUMAN GENOME ~3 Gbase, 24 chromosomes: 1-22, X, Y 21,500-24,000 genes only 2% of the genome encodes genes about 46% of the genome is repetitive sequence => THERE IS A LOT OF GENOMIC DARK MATTER (or non coding RNA) IMEC 2013 10
THE HUMAN GENOME IMEC 2013 11
THE HUMAN GENOME Almost all (99.9%) nucleotide bases are exactly the same in all people (0.1%, difference which is 1 difference per 1,000 base pairs) - Humans (0.08-0.1%) - Chimpanzees (0.12-0.17%) - Drosophila simulans (2%) - E. coli (5%) - HIV-I (30%) SNPs (a single base change in more than 1% humans) - Harmless (e.g. change in phenotype) - Harmful (e.g. diabetes, cancer, heart disease, Huntington s) - Latent (e.g. susceptibility to lung cancer) IMEC 2013 Photos from UN photo gallery www.un.org/av/photo 12
THE SECOND GENERATION (NEXT GEN) IMEC 2013 13
SECOND GENERATION Sequence by synthesis - Step-wise base addition & read-out - Washing steps between each step Pros and Cons - Extensive sample-prep (-) - Relative costly reagents/run (-) - Massively parallel sequencing (+) - Relatively short fragments (-) Examples: Roche 454 GS-FLX, Illuminia HiSeq, Applied Solid, IonTorrent, etc IMEC 2013 14
2 ND GENERATION: SAMPLE PREP Extensive sample prep - Library generation (generation of fragments with adapters) - Clonal amplification - e.g. empcr (e.g. Roche GS-FLX, ABI Solid, etc) - e.g. bridge PCR (e.g. Solexa from Illumina) IMEC 2013 15
FLUORESCENT READ-OUT e.g. Illumina, ABI Solid, (or Helicos on single molecule level) IMEC 2013 16
BASE CALLING: NOISE FACTORS Phasing noise - Leading / Lagging Fading noise - Exponential decay in fluorescent signal Cycle-dependent change in fluorophore cross-talk IMEC 2013 Erlich et al. Nature Methods 5: 679-682 (2008); http://www.cs.utoronto.ca/~brudno/csc2431w10/altacyclic_pres.pdf 17
PYROSEQUENCING (OPTICAL) e.g. Roche GS FLX 454 IMEC 2013 Figure from OMICS Journals (doi:10.4172/jcsb.1000019) and Nature Biotechnology (doi:10.1038/nbt1485) 18
PYROSEQUENCING (ELECTRICAL) e.g. IonTorrent (Life Technologies) IMEC 2013 19
PYROSEQUENCING (ELECTRICAL) Making small sequencing tests available (e.g. DNA electronics/roche) IMEC 2013 20
THE THIRD GENERATION (NEXT NEXT-GEN) IMEC 2013 21
THIRD GENERATION Sequencing (by synthesis) - Single molecule sensitivity - Read-out during copying Pros and Cons - Potentially long fragments (+) - Large cost reduction per run (+) - Easier sample prep (+) - Enzyme necessary: speed limited (1-3 bases/second/pore) Examples: Pacific Biosciences, Oxford Nanopore, Visigen (now Life Tech), etc. IMEC 2013 22
REAL-TIME SEQUENCING Zero mode waveguides (Pacific Biosciences) Single Molecule Real-Time (SMRT) sequencing 70nm - Polymerase is immobilized in 20 zl sized zeromode waveguides (ZMW) - Polymerase cleaves off the fluorescent tags - Fluorescent read-out - Diffusion time: microseconds - Incorporation time: milliseconds IMEC 2013 23
MINATURIZING DNA SEQUENCING IMEC 2013 - Molecular Biology and Cytometry Course - SCK CEN 24
IMEC 2013 25
COMPARISON OF COMMERCIAL PRODUCTS Illumina: HiSeq 2000, began shipping in the third quarter of 2012. The instrument produces 2x150-base paired-end reads, which will increase to 2x250. That will give you around 300 gigabases in approximately 60 hours, Roche: GS FLX+ system, coupled with its newest software produces reads of up to 1,000 bp and beyond Life Technologies (Ion Torrent): Ion Proton can sequence a human exome in a few hours, Proton II is basically a 50x improvement of their first chip (120 Gb), but with a somewhat higher error rate than Illumina Pacific Biosciences: PacBio RS, the company s new XL chemistry produces reads averaging 5,000 bases a piece, though about 5% of those exceed 10,000 bases. IMEC 2013 26
FUTURE FOURTH GENERATION IMEC 2013 27
NANOPORE BASED SEQUENCING e.g. Oxford Nanopore IMEC 2013 28
NANOPORE BASED SEQUENCING Hybridization assisted sequencing e.g. Nabsys - Short fragments are hybridized to DNA - Their distance is measured - In parallel for many fragments e.g. Noblegen - Replace bases by barcode - Hybridize molecular beacons - Unzip DNA fragments in pore - Read fluorescent signals IMEC 2013 29
FOURTH GENERATION Direct read-out of DNA - Nanopore based sequencing - Electron microscopy Pros and Cons - In principle, simple sample prep - Limited or no reagent costs - Long read lengths - No enzymatic reaction needed - Ability to read RNA, DNA modifications, etc Examples: imec, IBM, Halycon, IMEC 2013 30
NANOPORE BASED SEQUENCING IMEC 2013 Figures from Hao Liu, (Biodesign Institute) and http://www.mcb.harvard.edu/branton/index.htm 31
NANOPORE/NANOSLIT COMBINATION Controlled translocation through a solid-state nanopore Electrically induced translocation Mechanical confinement of a single DNA strand V SERS in a plasmonic nanoslit Vibrational fingerprinting Molecular information in the pore IMEC 2013 - RESTRICTED
MOLECULAR SPECTROSCOPY BY SERS The normal Raman effect Inelastic scattering from light by molecules through the excitation of molecular vibrations Spectroscopy Weak process! Surface Enhanced Raman Scattering Hot spots near metal nanostructures (excitation of plasmons) Enhancement with E 4 Single molecule resolution IMEC 2013 - RESTRICTED
SERS NANOSLIT λ=785 nm Au H 2 O Hot spot Generating a hot spot using top-down designed plasmonic nanocavities Large and highly localized field enhancement Raman enhancement: 10 5-10 10 x (to single molecule levels) IMEC 2013 - RESTRICTED
NEXT-GENERATION SEQUENCING 1 st generation 2 nd generation 3 nd generation 4 th generation Basic principle Sanger sequencing with size separation of amplified fragments Site-selective amplification followed by iterative base-incorporation, Enzymatic reaction to continuously integrate and read-out bases. True single molecule Direct read-out of bases (without copying). True single molecule analysis read and wash steps analyses Sample preparation Extensive Extensive Moderate Almost none Speed/base/site Very low (<<1/sec) Low (<1/sec) Moderate (3/sec) Very fast (~1 ms) Throughput Low Very high Very high Very high Accuracy High Low Low (~80%) NA Read length Long (~1000) Short (~15-400) Moderate (~450) Very long (>1000) De novo sequencing Possible Not possible Difficult Easy Repeat regions Limited Highly limited Limited No intrinsic limit DNA/protein Indirectly Indirectly Indirectly Yes derivatives Reagent cost Very high Very high High None IMEC 2013 35
Technology Generation On market Single molecule Nanopore (NP) / Enzymatic (E) Based Principle website Illumina HiSeq * 2 Yes No E Fluorescence, sequence by synthesis www.illumina.com Roche (FLX Titanium) 2 Yes No E Light, sequence by synthesis www.454.com Polonator 2 Yes? E Fluorescence/Polony www.polonator.org Complete Genomics 2 Yes No E Fluorescence, sequence by synthesis www.completegenomics.com Helicos (TSMS) 2 Yes Yes E Fluorescent, sequence by synthesis www.helicosbio.com Life Tech (ABI Solid4 ) 2 Yes No E Fluorescence, sequence by synthesis www.appliedbiosystems.com Life Tech (IonTorrent) 2 (3) Yes No E Electrical, sequence by synthesis www.iontorrent.com Intelligent Bio 2 No? E Fluorescence, sequence by synthesis www.intelligentbiosystems.com GE Global 2 No Yes E Fluorescence, sequence by synthesis http://ge.geglobalresearch.com/blog/sequencing-a-human-sized-genomein-less-than-a-day/ GnuBio 2 No No E Microdroplets, sequence by ligation www.gnubio.com Genizon Bioscience 2? No No E http://www.genizon.com/images/pdfs/pihlak_linnarsson_nbt2008.pdf www.geniozon.com Light Speed 2? No?? Light interference, patent: US 2009/0061526 www.lsgen.com Mobious Biosystems (Nexus) 2? No No E?? www.mobious.com Pacific Biosciences (tsmrt) 3 Yes Yes E Fluorescence (SMRT), sequence by synthesis www.pacificbiosciences.com Oxford Nanopore 3 No Yes NP/E Electrical, enzymatic cutting of DNA www.nanoporetech.com Visigen 3 No? E FRET measurement using TIRF www.visigenbio.com Cracker 3 No Yes E SMRT, read-out on chip www.crackerbio.com IBM/Roche nanopore 4 No Yes NP/- Electrical, tunneling using NPs http://www-03.ibm.com/press/us/en/pressrelease/32037.wss Nabsys 4 No Yes NP/- Electrical, hybridization assisted NP sequencing www.nabsys.com NobleGen Biosciences 4 No Yes NP/- Electrical, fluorescent after hybiridization (Meller) www.noblegenbio.com Electronic Bio 3 (4?) No Yes NP/- Electrical using biological NPs www.electronicbio.com Reveo 4 No Yes -/- Electrical, tunneling using nano-knifes www.reveo.com Base4 Innovation 4 No Yes?? Nanopore + optical? www.base4innovation.co.uk ZS Genetics 4? No Yes -/- Electronmicroscopy www.zsgenetics.com Halcyon Molecular 4? No Yes -/- Electronmicroscopy www.halcyonmolecular.com IMEC 2013 36
MANY APPLICATIONS & PUBLICATIONS IMEC 2013 37
APPLICATIONS OF NEXT-GEN SEQUENCING Whole-genome sequencing Comparative genomics Genome re-sequencing Structural variation analysis Polymorphism discovery Meta-genomics Environmental sequencing Gene expression profiling Genotyping Population genetics Migration studies Ancestry inference Relationship inference Genetic screening Drug targeting Forensics IMEC 2013 38
HAS NGS A PROGNOSTIC VALUE IMEC 2013
IMEC 2013... and for personal health?
IMEC 2013... and for personal health?
IMEC 2013
IMEC 2013
... and for personal health? 2 virus infections during the test period (common cold and sinus infection) Diabetes developed during / after the 2 nd infection (Genetic risk had already been identified from whole genome sequencing) IMEC 2013
IMEC 2013
HAS NGS A PROGNOSTIC VALUE Sequencing has gone through a revolution and has become affordable for some applications (e.g. exome sequencing) Personal genome sequencing is already possible, but the medical interpretation is still difficult Genome sequencing can predict disease risks Genome sequencing should be combined with other omics to monitor disease risk Integrated analysis are possible, but still need further improvement and understanding Regulatory information needs to be considered Every person is unique and longitudinally follow-up will provide further insight Longitudinal follow-up: case studies have proven value, but no good biomarkers yet IMEC 2013
IMEC 2013 THANK YOU FOR YOUR ATTENTION