Advanced Molecular Detection National Center for Immunization and Respiratory Diseases Jonas Winchell, PhD Respiratory Diseases Branch
How to think about AMD Pre-AMD Post-AMD
Technology Has Advanced 1993 5,000 basepairs/day 2003 50,000 basepairs/day 2013 50,000,000,000 basepairs/day *Human genome: 3,000,000,000bp
What Is AMD? Traditional Epidemiology and Lab Science + Advanced Genomic Sequencing + HPC Bioinformatics = Advanced Molecular Detection
What is AMD? 5 yr initiative appropriated from Congress ~$30mil/year, starting in 2014 Only a fraction gets to labs/related activities Used to build critical genomics and bioinf. capacity at CDC and PHLs To improve detection, control and prevention of infectious diseases
Why AMD? Results of a 2011 Expert Blue Ribbon panel assembled to review current state of CDC s bioinformatics capabilities Finding were less than flattering CDC seems ill-prepared to accomplish the rapid, data-intensive computations need for disease outbreak situations CDC has trailed the rest of the field CDC lacks organization, expertise, and computational infrastructure needed for NGS, metabolomics, proteomics CDC runs the risk of going from outdated to obsolete and then to irrelevant
Blue Ribbon Recommendations Develop core bioinformatics activity to collaborate with and support program science across ID centers (hub and spoke) Improve program-level access to bioinformatics tools, training and resources Foster a community of practice/excellence around bioinformatics Improve collaboration with national labs, non-profit research organizations, academic institutions, and other governmental agencies Establish a research network that enables CDC scientific computing
Blue Ribbon Recommendations IT and laboratory infrastructure that support AMD approaches Public health workforce that uses AMD approaches Programs and projects that apply AMD technologies
Develop core bioinformatics capacity IT and Lab Infrastructure High throughput sequencing and laboratory equipment High performance computing Networking and connectivity Scientific data storage, management and security Access to open source scientific software
Strategic investments: Scientific infrastructure Critical laboratory and bioinformatics infrastructure at CDC, state/local PHL, and key overseas laboratories. o o o o Sequencers, mass-spec, other instrumentation, reagents. High performance computing, workstations. Data storage, networking; data integration, knowledge management. Service contracts, software licensing, etc.
2014 Accomplishments 3600 TB GenBank submissions FY14 67,720 FY13 16,542 Sequence submissions to public repositories # of CPU-Hours FY14 341,593 FY13 67,252 Scientific Computing Usage Mobile-enabled website 400 TB Before AMD AMD Year 1 Increased Server Storage
Strategic Investments: Building a skilled workforce Workforce development: Training for CDC and PHL staff (bioinformatics, genomics, -omics) New or re-tooled fellowship programs (bioinformatics, genomics) Recruitment of new staff and skillsets (bioinformaticians, data scientists, lab specialists, )
Applying AMD Technologies Program and Project Alignment Development and standardization of methods and workflows: Lab methods, analysis, data exchange Secondary analyses and visualization Sequencing of reference collections High-impact CDC projects and AMD activities o Within CDC programs o With state and local PHLs
Establishing collaborations Consortia, partnerships and alignment of efforts Academic institutions State, Federal (NIH, FDA, DHS, DoD, DoE/National Laboratories) Non-Profit/NGO International community Commercial/For-Profit Pilot projects with state/local and other partners. Outbreak detection, investigation and response Leverage existing laboratory-based surveillance systems
Overarching Goals Goal 1 Improve pathogen identification and detection Goal 2 Develop new diagnostics to meet evolving public health needs Goal 3 Support states to establish and increase expertise and expand capacity Goal 4 Implement enhanced sustainable, integrated laboratory systems Goal 5 Develop tools for the prediction, modeling and early recognition of emerging infectious diseases
FY2016 Priorities Partner engagement State/local strategic planning Innovation Focus on evaluation Strategic project development and implementation
AMD projects http://www.cdc.gov/amd/index.html
The Next Big Hills to Climb for AMD Bolstering the Bioinformatics Sharing the data.securely Making sense of a sea of data Synergizing efforts and energy Metagenomics CIDT
Some AMD Projects Create Better Vaccines Pneumococcal Pertussis Influenza Identify Emerging Threats EIP with State labs Food net, Influenza Hospital Surv., Active Bacterial Core, etc. Identifying unknown infections Tracking Dengue and Chickungunya Tracking Diseases and Outbreaks Expanding MicrobeNet Tracking STD transmission Sequencing foodborne pathogens Improving TB detection and surveillance
Targeted resequencing and metagenomic analysis to identify and characterize pathogens in respiratory specimens from Unexplained Respiratory Disease Outbreak (URDO) responses
Current method: parallel real-time PCR for multi-pathogen detection TaqMan Array Card and ViiA7 1. Extract total nucleic acid (TNA) with MagNA Pure Compact 2. Add qrt-pcr mastermix and load onto TaqMan Array Card (TAC) 3. Run TAC on ViiA 7 4. Interpret results for presence/absence of 20 respiratory pathogens in 6 samples (2 x 1 µl reactions each) Advanced molecular detection method: targeted resequencing with WaferGen SmartChip and Illumina MiSeq SAMPLE PREP LIBRARY PREP LIBRARY QC & SEQUENCING 1. Extract total nucleic acid (TNA) with MagNA Pure Compact 2. Add qrt-pcr mastermix and load onto WaferGen SmartChip with custom panel 3. Ligation-free SeqReady Illumina library prep and sample barcoding take place on-chip On-chip cycling enables massively parallel singleplex amplification of x targets in y replicates for z samples x y z = 5,184 individual 100 nl wells 4. Centrifuge prepped amplicons off chip into global targeted resequencing pool 5. Purify library and size-select with AMPure beads 6. Quantitate and normalize 7. Sequence on Illumina MiSeq to obtain targeted amplicon reads
Current pathogen detection and characterization respiratory panel MLVA TYPING LOCI SEQUENCE-BASED TYPING LOCI bronchiseptica Assay type: holmesii flaa Identification Strain typing Drug resistance respiratory pathogens Bordetella pertussis parapertussis pile asd adenovirus 40/41 H1N1 oseltamivir resistance H3N2 oseltamivir resistance 1 2 3 MERS A B human metapneumovirus parainfluenzavirus RSV adenovirus coronavirida e influenza paramyxovirida e rhinovirus viruses bacteria Chlamydophil a Haemophilus Legionella Mycoplasma Streptococcus pneumoniae pneumophila pneumoniae agalactiae pyogenes pneumoniae mip momps proa neua serogroup 1 macrolide resistance Mpn13 Mpn14 Mpn15 Mpn16 macrolide resistance penicillin resistance serogroup 19A serogroup 15B/C serogroup 7F/7A
Optimization Master Mix selection: DNA amplification M. pneumoniae L. pneumophila S. pneumoniae Master Mix selection: WaferGen Chip Loading detection sequence-based typing, pile locus serotyping, 19A Inner primers only All primers (library prep) Inner primers only All primers (library prep) Inner primers only All primers (library prep) Super Script III 1 step RT-PCR mix was functionally confirmed for DNA targets in wet chemistry 1X Superscript III master mix, enzyme, and UV dye solution were dispensed in a checkerboard pattern onto the WaferGen SmartChip to confirm the mix was compatible with the dispense system and was delivered precisely without cross-contamination between wells. Primer design Optimization influenza A detection RT Temperature Result 45 C Optimization of reverse transcription step 45.4 46.3 47.5 49.2 51.4 52.9 56 57.7 58.8 59.7 60 C + - + - + - + - + - + - + - + - + - + - + - + - 10 minutes at 55 C was determined to be the optimal condition for reverse transcription stock 10-2 1. A target size of ~200 base pairs was determined to be the optimal amplicon size for MiSeq sequencing 2. Primer T M were modified such that the target specific region had a lower T M than the WaferGen adapter sequence to improve generation of library product ladder 1 2 3 4
Bioinformatics Preliminary bioinformatic pipeline Pipeline data output visualization Demultiplexed targeted amplicon sequencing reads (Illumina MiSeq) I. Adapter trimming, Low quality sequence read removal Identify common lab/reagent contaminants II. Read mapping: alignment to global database (KRAKEN) III. Data analysis Build internal target database from welldefined sequences Targeted resequencing report (incl. strain types and drug resistance)
Conclusions for URDOSeq to date We have developed and optimized 9 targets encompassing DNA and RNA pathogens (Mp, MacR of Mp, Spn, 16S, FluA & B, Hi, Lp, RSV) An ideal set of parameters for primer design has been established and has streamlined target design PCR and MiSeq loading and run conditions have been optimized The Minnesota and Massachusetts state public health labs have successfully completed initial testing and are now beginning to assist in target development and optimization An efficient bioinformatics pipeline has been created to rapidly process sequence data and a report with an easy to interpret format has been developed
FY16 plans for URDOSeq Perform reproducibility studies at CDC, MA, & MN on current targets, including an unknown PT panel Each site will develop and optimize at least 5 new targets Have Wafergen create a prototype chip to demonstrate PoP of approach Optimize chip design based upon above findings and generate first iteration of chips Test chips with prev. characterized specimens Use chips side-by-side with current techniques during outbreaks
2 nd AMD project Using WGS to more rapidly solve Legionella outbreaks
Using AMD to more rapidly identify environmental sources of legionellosis ~250 unique Legionella genomes sequenced by PRS to date Bioinformatic pipelines for comparative analysis
State Lab Partnership - NYS Comparative WGS for environmental / clinical isolates of Legionella Build accessible database for states to use during outbreaks Compare to historical PFGE patterns from NYS lab Establish standardized workflows and data analysis algorithms that states can use to more rapidly investigate Legionella outbreaks Develop and deploy a wgmlst system to provide labs the ability effectively compare data Work towards CIDT to more rapidly solve outbreaks
Development of a whole genome pipeline Workflow can process ~34,000 genes in a few hours
Identifying Potential Environmental Sources Using Core Genome SNP Analysis ~2100 SNPs Unassociated environmental isolates WGS effectively excluded some potential environmental sources of Legionella pneumophila associated with hospital-acquired legionellosis where SBT was unable to do so.
Plans for FY16 and beyond Refine and deploy wgmlst database for external PHLs Generate high-quality, complete L. pneumophila genome sequences using PacBio for improved analysis Expand genome sequence database to identify informative regions that can be used in more targeted molecular assays Collaborate with other programs to characterize microbial communities in risky water samples and more rapidly detect and type L. pneumophila using metagenomics approaches
ACKNOWLEDGMENTS MA Public Health Laboratory Dr. Sandra Smole and team MN Dept. of Health Dr. David Boxrud, Ruth Lynfield and team NYS Wadsworth Center Dr. Kim Musser and team CDC Pneumonia Response and Surveillance Lab Labs within DVD, ID, and DBD (and others)