Opportunities offered by new sequencing technologies Pierre Taberlet Laboratoire d'ecologie Alpine CNRS UMR 5553 Université Joseph Fourier, Grenoble, France Nature Biotechnology, October 2008: special issue on next generation sequencing 1
Molecular Ecology Resources (2008), 1, 3-17. Nature Methods (2008), 5, 16-18. 2
Opportunities offered by new sequencing technologies New perspectives in DNA sequencing Solexa (Illumina) SOLiD (Applied Biosystems) 454 (Roche) Other available systems The future (VisiGen, Pacific Biosciences) New opportunities for assessing farm animal biodiversity Sequencing versus SNP panels Importance of a good reference genome Whole genome sequencing Partial genome sequencing Conclusions New perspectives in DNA sequencing 3
DNA sequencing Capillary electrophoresis 500-1000 bp per sequencing reaction 12 x 96 reactions per day ( 1 Mb per day) Next generation sequencer Roche 454: 0.4 Gb per day Solexa, SOLiD: 2 Gb per day Genetic Analyzer TM /Solexa TM Company: Illumina Website: www.solexa.com Fragment length: 35-105 bases Number of reads per run: 60 10 6 Total output per run: 6 Gb Time per run: 3.5 days 4
The Illumina/Solexa technology (1) The Illumina/Solexa technology (2) 5
The Illumina/Solexa technology (3) The Illumina/Solexa technology (4) 6
The Illumina/Solexa technology (5) SOLiD DNA Sequencer Company: Applied Biosystem Website: solid.appliedbiosystems.com Fragment length: 50 bases (possibility of "paired-end" sequencing: from 0.6 to 10 kb) Number of reads per run: 400 10 6 Total output per run: 20 Gb Time per run: 6 days 7
454 GS FLX TM Company: Roche Diagnostic Website: https://roche-appliedscience.com/sis/sequencing/flx/index/jsp Fragment length: 400 bases Number of reads per run: 1 10 6 Total output per run: 0.4 Gb per run Time per run: 8 hours Emulsion Based Clonal Amplification A + PCR Reagents B Adapter carrying library DNA Mix DNA Library & capture beads (limited dilution) + Emulsion Oil Create Water-in-oil emulsion Micro-reactors Break micro-reactors Isolate DNA containing beads Perform emulsion PCR Generation of millions of clonally amplified sequencing templates on each bead No cloning and colony picking 8
Depositing DNA Beads into the PicoTiter Plate Load beads into PicoTiter Plate Load Enzyme Beads Centrifuge Step 44 µm 454 Sequencing: BaseCalling Count the photons generated for each flow Base call using signal thresholds Delivery of one nucleotide per flow ensures accurate base calling 4-mer 3-mer Flow Order T A C G KEY (TCAG) Measures the presence or absence of each nucleotide at any given position 2-mer 1-mer 9
454 Evolution GS Evolution Genome Sequencer 20 Genome Sequencer FLX Genome Sequencer FLX Titanium Read length 100 bases > 200 bases > 400 bp # of clonal reads System throughput /8hr shift >200,000 > 400,000 > 1,000,000 20-30mb 100mb 0.4 gb Cost $6,000-9,000.00 $3,000-9,000 $10,000 Accuracy 99.99% 99.99% 99.99% Heliscope TM Company: Helicos Website: helicosbio.com Fragment length: 25-50 bases Number of reads per run: 100 10 6 Total output per run: 2 Gb per day Time per run: - Problem of accuracy (only 95%?) 10
Polonator Website: www.polonator.org Open source sequencer, based on ligation sequencing Output speed:? Read length: 26 bp reads Cost: very low price Complete Genomics Website: www.completegenomicsinc.com Technology: oriented towards human genome; DNA nano-balls Output speed: 6Gb per day Read length: variable Cost: 5000$ per genome 11
NABsys Website: www.nabsys.com Based on nanopore technology Output speed:? Read length:? Cost:? NABsys 12
NABsys Harvard Nanopore Website: golgi.harvard.edu/branton/index.htm Technology: nanopore Output speed: 10,000 base/sec/nanopore Read length: very long Cost:? 13
Harvard Nanopore 2 nanograms of genomic DNA (no amplification) Array of 100 nanopores High-quality draft sequence of one mammalian genome in ~20 hours Intelligent Bio-System Website: www.intelligentbiosystems.com Technology: sequencing by synthesis (no amplification) Output speed:? Read length:? (for genome re-sequencing) Cost: low? 14
VisiGen Biotechnologies, Inc. Website: visigenbio.com Based on new polymerase and nucleotides Output speed: 50 10 6 bases per second Read length: 1 kb? Cost:? VisiGen Biotechnologies, Inc. HOUSTON, TX February 15, 2008 - VisiGen Biotechnologies, Inc., was awarded US Patent No. 7,329,492, "Methods for Real-time Single Molecule Sequence Determination," European and Australian counterparts have also recently issued. VisiGen's President, Dr. Susan Hardin, Ph.D. said, "We have the real path to the $1,000 human genome." VisiGen's sequencing methodology can be used to sequence the genome of a human or any other life form. VisiGen's DNA sequencing machines will enable low cost comprehensive genome analysis such as a one day, $1,000 human genome. VisiGen's patented technology is scalable. VisiGen's nanosequencing machines are designed to monitor massively parallel arrays to produce a DNA sequencing platform that will be capable of collecting DNA sequence data at the rate of 50 million bases per second or greater. VisiGen plans to offer a DNA sequencing service in late 2009 and to sell DNA sequencing machines and reagents 18 to 24 months later. 15
VisiGen Biotechnologies, Inc. Pacific Biosciences Website: www.pacificbiosciences.com De-novo synthesis of single molecules Output speed: One genome in minutes Read length: 10 kb Cost: Less than 100 $ 16
Oxford Nanopore Technology Website: www.nanoporetech.com De novo sequencing of single molecules without fluorescence Output speed:? Read length:? Cost:? The different next generation sequencing systems Already available 454 Roche Illumina/Solexa SOLiD (ABI) Already available but without future Heliscope (Helicos) Available soon? Pacific Biosciences Visigen Oxford Nanapore Tech. NABsys Harward nanopore Complete Genomics Polonator Intelligent Bio-System 17
Cost and speed of sequencing technologies Minimum sample preparation and reagents 10 8 10 7 10 6 10 5 10 4 10 3 (?) Parallel assays or long reads New opportunities for assessing farm animal biodiversity 18
Sequencing versus SNP panels (1) Current SNP panels might be strongly biased (ascertainment bias) Extremely useful for association studies in industrial breeds Might not be suitable for biodiversity assessment, depending on the breeds analyzed Sequencing versus SNP panels (2) Observed heterozygosity calculated in 19 breeds using three different subsets of SNPs (unpublished results from Riccardo Negrini). ANG=Angus, BMA=Beefmaster, BRM=Brahman, BSW=Brown Swiss, CHL=Charolais, GIR=Gir, GNS=Guernsey, HFD=Hereford, HOL=Holstein, JER=Jersey, LMS=Limousin, NDA=N'Dama, NEL=Nelore, NRC=Norwegian Red, PMT=Piedmontese, RGU=Red Angus, RMG=Romagnola, SGT=Santa Gertrudis, SHK=Sheko. 19
Importance of a good reference genome Only a few genomes are of sufficiently high quality for allowing efficient re-sequencing High quality genomes: Caenorhabditis elegans (100 Mb) Drosophila melanogaster (123 Mb) Arabidopsis thaliana (115 Mb) The human genome is not of very high quality The current cattle genome cannot be efficiently used for re-sequencing Whole genome sequencing (1) 1000 Genome Project Re-sequencing of 1,200 human genomes Sequencing to be completed at the end of 2009, using Solexa and SOLiD sequencers Cost per genome: ~ 2,000 Genome 10K project De novo sequencing of 10,000 vertebrate genomes 20
Whole genome sequencing (2) Possibility to detect recent selection pressure By carrying out classical tests on coding sequences (analysis of silent versus nonsilent mutations) By estimating the coalescent time on a moving window over the whole genome Open the door towards the use of adaptive genetic variation in conservation genetics Nature, vol. 449 (18 October 2007), pages 913-918, 21
Partial genome sequencing Exon capture - Albert TJ, Molla MN, Muzny DM, Nazareth L, Wheeler D, Song XZ, Richmond TA, Middle CM, Rodesch MJ, Packard CJ, Weinstock GM, Gibbs RA (2007) Direct selection of human genomic loci by microarray hybridization. Nature Methods, 4, 903-905. - Okou DT, Steinberg KM, Middle C, Cutler DJ, Albert TJ, Zwick ME (2007) Microarray-based genomic selection for high-throughput resequencing. Nature Methods, 4, 907-909. - Hodges E, Xuan Z, Balija V, Kramer M, Molla MN, Smith SW, Middle CM, Rodesch MJ, Albert TJ, Hannon GJ, McCombie WR (2007) Genome-wide in situ exon capture for selective resequencing. Nature Genetics, 39, 1522-1527. Conclusions 22
Conclusions Inexorably towards whole genome sequencing Conclusions Inexorably towards whole genome sequencing Good reference genome required 23
Conclusions Inexorably towards whole genome sequencing Good reference genome required Challenge in bioinformatics Conclusions Inexorably towards whole genome sequencing Good reference genome required Challenge in bioinformatics Possibility to distinguish neutral versus adaptive variation, and to use both for designing conservation strategies 24
Conclusions Inexorably towards whole genome sequencing Good reference genome required Challenge in bioinformatics Possibility to distinguish neutral versus adaptive variation, and to use both for designing conservation strategies Which strategy to use now for studying farm animal biodiversity? Nature Biotechnology, October 2008: special issue on next-generation sequencing 25
Thank you for your attention 26