Efficiency in Next-Generation Sequencing for Public Health Patrick Van Roey June 27, 2016 June 27, 2016 2 Overview Introduction Applied Genomic Technologies Core Implementing NGS for Public Health NGS workflow Multiplexing with different samples Interaction with DOH Epidemiology 1
June 27, 2016 3 Applied Genomic Technologies Core Implemented Next-Generation Sequencing in 2011 Ion Torrent PGM Research Public Health Salmonella Currently: 3 MiSeq 3 MiSeq DX 1 NextSeq 500 6 NGS staff, 3 bioinformaticians June 27, 2016 4 Applied Genomic Technologies Core Fiscal Year 2015-16: 2853 NGS results reported Public Health TB 568 Drug resistance/surveillance (APHL) Foodborne 1156 Genome Trackr/CDC/prospective Legionella 263 NYC outbreak/cdc-amd Virus (WGS) 145 Flu, adenovirus Virus (targeted) 198 HCV hypervariable region Cystic Fibrosis Newborn screening Research WGS, amplicon, 16S microbiome 2
June 27, 2016 5 Implementing NGS for Public Health Public Health needs vs. NGS methods Faster More accurate Less expensive Instrument/protocol time Wealth of data/right data? Defined costs (kits) In general, next-generation sequencing methods are developed for on human genomics, either whole genome or targeted How to best adapt for small genomes or small amplicon pools? MiSeq: best workflow and capacity June 27, 2016 6 Implementing NGS for Public Health Target: 5 work-day turn-around-time from receipt of DNA Coverage at least good enough for accurate SNP detection > 40x mean coverage, 20x minimum coverage at SNP Cost per specimen below $200 for bacteria (w/o personnel) How to do this with samples from multiple laboratories Variable (small) number of samples submitted at any time 3
June 27, 2016 7 Bacteria and Virus WGS MiSeq sequencer 500 cycle sequencing kit (2x250 bp paired-end), v2 $1135 Output: ~15 million reads or 7.5 Gb One Salmonella Genome (4.3 Mb) =1750x mean coverage 20 Salmonella Genomes = 85x mean coverage What to do with 4 Salmonella samples or with 30 Salmonella samples? Multiplexing with other samples. June 27, 2016 8 Bacteria and Virus WGS But: every organism behaves differently. Salmonella, Legionella easier than E. coli TB particularly difficult Reasons: GC content Genome stability (DNA received is not intact) DNA extraction methods (from cell lysis) Virus: RNA virus - DS DNA amplicons 4
June 27, 2016 9 NGS Workflow: Nextera XT DNA Isolation Standard volume and concentration request 30 μl at 2 ng/μl will try down to 0.2 ng/μl DNA QC Qubit Tapestation for DNA size Library Preparation: Nextera XT little variation Library QC Library Pooling Will load 600 µl denatured DNA at 13-15 pm Sequencing June 27, 2016 10 NGS Workflow: Nextera XT Library QC: Qubit Bioanalyzer, Tapestation or QIAxcel Library Pooling: Normalization beads OK for: Identical libraries High concentrations Estimated run percentage All other multiplexing 5
June 27, 2016 11 NGS Workflow: Nextera XT Different Protocols: FDA/Genome Trackr CDC (several) CDC Pulsenet (no normalization beads) Most variations are minor but some are significant Advise: start with one, identify differences, test Example: vortex step TB: increase index PCR step from 12 to 15 cycles (literature) June 27, 2016 12 NGS Workflow: Library QC Library QC: Concentration: Qubit much more accurate than nanodrop Library fragment size: Smaller DNA fragments will overpopulate flowcell Primer-dimers will reduce cluster density (eliminate with Ampure bead size selection) 6
June 27, 2016 13 NGS Workflow: Library QC normal Minor tailing Primer dimer Major tailing June 27, 2016 14 NGS Workflow: Library QC Reasons for Tailing: Low Initial DNA concentration DNA extraction quality Technique Don t know Effects of Tailing: low coverage uneven coverage insert sizes 7
June 27, 2016 15 NGS Workflow: Library Pooling How much DNA do I need to put in the pool for each sample? In principle: - Two samples with the same amount of DNA yield the same number of reads - Required number of reads = (Genome size/500) x required mean coverage June 27, 2016 16 NGS Workflow: Library Pooling In practice: If target is 40x, aim for 60x mean coverage. Adjust for experience with sample type include in calculations by faking genome size Adjust for fragment size: if large, add more DNA Can you tolerate repeats? Do not put something new on a clinical run 8
June 27, 2016 17 NGS Workflow: Sequencing results Example: TB sequencing for drug resistance Clinical application need 5-day turn-around-time Wed 5/11/16 18 TB+ NC + PC submitted and entered in LIMS Thu 5/12/16 Library prep Fri 5/13/16 Sequencing Mon 5/16/16 Data released to TB/pipeline/report 16 is limit of TB on a run: need to split over 2 runs June 27, 2016 18 NGS Workflow: Sequencing results Run 1 14 TB + NC Target run % per sample 7.2 DNA concentration 0.5 1.0 ng/μl Library concentration 1.4-5.88 ng/μl Average fragment size 731 1120 Reads Passing filter 18.17 M % reads 4.93 8.74 Number of reads 0.896 1.587M Coverage 81-143x 9
June 27, 2016 19 NGS Workflow: Sequencing results Run 2 4 TB 4 Salmonella 2 E. coli 5 Legionella Reads passing filter 17.23 M June 27, 2016 20 NGS Workflow: Sequencing results Lane spec Genome size DNA conc ng/µl Libr. conc Libr. size Vol. added to pool Target read % Read % Read # Cover. 1 NC 5.0 0.0001 17 0 2 TB 5.0 M 0.74 0.58 688 13.3 6.8 7.11 1,226,124 110 3 TB 1.0 2.62 832 3.7 6.8 10.03 1,728,288 156 4 TB 1.0 1.6 890 6.5 6.8 6.88 1,186,383 107 5 TB 1.0 5.88 980 1.9 6.8 7.49 1,290,646 116 6 Sal 4.6 M 2.0 5.2 1387 4.8 6.5 7.89 1,360,648 133 7 Sal 2.0 5.07 1452 5.2 6.5 3.89 669,587 66 8 Sal 2.0 5.7 1367 4.4 6.5 6.79 1,169,512 114 9 Sal 2.0 7.67 1106 1.5 6.5 4.81 829,014 81 10 E. c 5.4 M 2.0 4.1 1146 4.2 9.0 7.58 1,306,449 109 11 E. c 2.0 3.9 1140 4.3 9.0 4.53 781,105 65 12 Leg 3.5 M 2.0 4.42 1075 2.4 6.0 13.57 2,338,903 301 13 Leg 2.0 3.59 984 2.7 6.0 4.93 849,194 109 14 Leg 2.0 4.13 1018 2.4 6.0 4.98 857,932 110 15 Leg 2.0 13.6 1080 0.8 6.0 2.45 422,736 54 16 Leg 2.0 3.8 1027 2.4 6.0 2.75 473,764 61 10
June 27, 2016 21 NGS workflow: multiplexing Summary: Can multiplex different samples requires - very good library QC - careful evaluation of relative amounts of DNA added - knowledge of how well the type of sample loads - aim for coverage ~50% higher than required Do not load new material with clinical samples If cost is a priority, evaluate tolerance for repeats June 27, 2016 22 Interaction with NYSDOH Epidemiology Fall 2013 Fall 2015 sequenced all Salmonella Enterica serovar Enteritidis submitted to NYSDOH (~10 samples per week on average) All clusters reported to NYSDOH Epidemiology Detected on average one new cluster per week This approach was premature: too much new data, small NGS clusters difficult to tie to traditional epidemiology data 11
June 27, 2016 23 June 27, 2016 24 Acknowledgements AGTC Nathalie Boucher Melissa Leisner Helen Ling Matthew Shudt Joshua Williams Zhen Zhang Bioinformatics Pascal Lapierre Mike Palumbo Funding NYSDOH Health Research Inc. CDC, APHL, FDA NIH Collaborators Bacteriology Kim Musser Bill Wolfgang Mycobacteriology Vincent Escuyer Virology Kirsten St. George Newborn screening Denise Kay 12