ESCMID Online Lecture Library. by author

Size: px
Start display at page:

Download "ESCMID Online Lecture Library. by author"

Transcription

1 The E. coli O104:H4 Outbreak Importance of Rapid Genome Sequencing Dag Harmsen University of Münster, Germany

2

3 Fourth Dimension Needed Place, Time, Person Type!

4 Fourth Dimension Reloaded Next Generation Sequencing - Bench-top Machines Ion Torrent Personal Genome Machine (PGM) Affordable Speed Simple workflow Roche/454 GS Junior illumina MiSeq Personal Sequencing System

5 Proof of Principle World Largest Hemolytic-Uremic Syndrome (HUS) Epidemic German EHEC Outbreak 2011 Germany (RKI, July, 26th, 2011) EHEC HUS 3,481 cases 18 deaths 852 cases 32 deaths Europe / North America (WHO, July 21st, 2011) EHEC HUS 89 cases no deaths 52 cases 2 deaths EHEC 2 HUS Deaths Cases Cases Deaths RKI, July 26th, 2011 [WWW:

6 Achievements Prospective Genomic Epidemiology German EHEC Outbreak 2011 week University Münster / Life Technologies Tweetom year BGI / UKE / Crowd-sourcing & HPA / Univ. Göttingen / Illumina / PacBio

7 Phylogenetic Analysis of EHEC O104:H4 Phylogenetic analyses (by quick and dirty hybrid RefMap & de novo assemblies & BIGSdb MLST + ; n = core genome genes, and minimum-spanning tree) strain LB (outbreak 2011) and strain (2001 German historic HUS causing isolate) belong to the HUSEC041 complex both strains are only distantly related to commonly isolated EHEC serotypes Mellmann et al. (2011). PLoS One. 6: e22751 [PubMed]

8 Sprout-break Sprouts Source of E. coli O104:H4 Outbreak 2011 Sprouts as source of German EHEC outbreak 2011 suspected by epidemiologic food supply chain analysis (June 5th, 2011) Sprouts as source of German EHEC outbreak 2011 confirmed by RKI(June 10th, 2011)

9 Previous Comparable Outbreaks Japan, 1996 EHEC O157 (radish sprouts) USA, 2006 EHEC O157 (spinach) USA, 2008 Germany, 2011 S. Saintpaul EHEC O104 (jalapeno peppers) (sprouts) Number of cases ~ ~210 ~1.500 ~4.000 Death Onset of first cases to outbreak detection >7 weeks ~3 weeks ~4 weeks ~2 weeks Outbreak detection to identification of suspect vehicle >4 weeks ~5 days ~7 weeks ~3 weeks Length of outbreak ~12 weeks ~6 weeks ~16 weeks ~9 weeks Compiled by: Robert Koch Institute, Germany

10 Timeline Microbial Surveillance and Disease Red boxes indicate disease outbreaks, blue boxes indicate technological advances, and green boxes indicate events relating to surveillance. E. coli, Escherichia coli; GPHIN, Global Public Health Intelligence Network; InSTEDD, Innovative Support to Emergencies, Diseases and Disasters; ProMED-mail, Program for Monitoring Emerging Infectious Diseases; SARS, severe acute respiratory syndrome. Lipkin (2013). Nature Rev. Microbiol. 11: 133 [PubMed].

11 Genomics for Diagnostics Dutch Klebsiella OXA-48 Outbreak 2011 since June, 2011 at the Dutch Maasstad hospital in Rotterdam (NL) 98 patients infected 28 patients died more than 2000 people were at risk of being infected (as of August, 17th) by: release of cured patients transfer of patients to other locations visitors hospital employees [RNW] Dutch National Institute for Public Health and the Environment (RIVM) was asked to assist the hospital

12 Screening Test Dutch K. pneumoniae OXA-48 clinical isolate cultivation DNA extraction; PGM sequencing draft assembly; 36 candidate regions comparison with >200 K. genomes; reduction to 2 candidate regions Maasstad & RIVM Periodontology, UKM Life Technologies Sanger Institute multiplex PCR* based diagnostic test assay * oxa-48, ctx-m-15, housekeeping gene, homo-polymer rich region Test assay is used by Dutch hospitals to screen the patients and isolate those with a positive result. RIVM NATURE News Blog. Genomics Identifies Source of Klebsiella Outbreak. August 16, 2011 [Nature] GenomeWeb. Münster Team Sequences Klebsiella Outbreak on Ion Torrent PGM. August 16, 2011 [GenomeWeb] Science. Outbreak detectives embrace the genome era (2011). 333: 1818 [Science]

13 Microbial Genomics and Tool Development Relman (2011). N Engl J Med. 365: 347 [PubMed]

14 Insertion/deletion Errors On Read Level 454 GSJ: Nebulization/Titanium/454 Sequencing Software v2.6 Illumina MiSeq: Covaris/TruSeq v2/miseq Reporter Software v1.x Ion Torrent PGM: Bioruptor/100b/Torrent Server Software v1.5 Loman et al. (2012). Nature Biotechnology 30(4): 562 [PubMed].

15 Insertion/deletion And Substitution Errors On Read Level Error rates were calculated by counting insertions, deletions (indels) and substitutions (Subs) in the mapping against the EHEC Sakai reference sequence for each uniquely mapped read. & base pairs. + Paired-end sequencing. # Not officially available during time of study. GSJ, 454 GS Junior; MiSeq, Illumina MiSeq; PGM, Ion Torrent Personal Genome Machine. Jünemann et al. (2013). Nature Biotechnology 31(4): 294 [PubMed].

16 Genome-wide Gene by Gene Contiguity Of the 5,230 with the gene feature annotated chromosomal coding ORFs of the E. coli Sakai NCBI entry, all 4,671 non pseudo- or paralogous- genes (= targets, 78.8% bases of the complete genome) were taken as reference and mapped with Ridom SeqSphere + v0.99beta against the de-novo assemblies (MIRA 3.4.0) of newly generated NGS data = extended MLST / MLST + Evolution of genome contiguity for PGM, MiSeq and GSJ. The contiguity of the de novo assembly consensus sequence s generated by MIRA was analysed for 4,671 non pseudo- or paralogous chromosomal coding E. coli Sakai NCBI reference sequence genes. This genome-wide gene-by-gene analysis was performed with the Ridom SeqSphere + (Münster, Germany) software. In green are perfect genes shown that had a match of 100% identity and 100% overlap with its reference gene, in yellow are genes visualized that had a match of 97% identity and 97% overlap with its reference gene (but less than 100% identity and 100% overlap mostly due to single InDels) and in red are genes highlighted that had no match or < 97% identity and/or < 97% overlap with its reference gene counterpart. Jünemann et al. (2013). Nature Biotechnology 31(4): 294 [PubMed].

17 It s the Consensus Genome-wide Gene by Gene Analysis Consensus Accuracy Venn diagram of consensus accuracy for PGM, MiSeq and GSJ. Reported consensus errors were analysed for 4,632 coding NCBI Sakai reference genome genes that could be retrieved from the MIRA de novo assemblies using SeqSphere + for all three platforms. Numbers of variants confirmed by bidirectional Sanger sequencing are indicated in parenthesis (the validation of the eight substitution and fifteen indel variants identified using all three NGS platforms, suggested that either the Sakai strain sequenced here experienced microevolutionary changes or the genome sequence deposited in 2001 contains sequencing errors). PGM, Ion Torrent Personal Genome Machine 300bp; MiSeq, Illumina MiSeq 2x 250bp PE; GSJ, 454 GS Junior with GSJ Titanium chemistry; bp, base pairs. Jünemann et al. (2013). Nature Biotechnology 31(4): 294 [PubMed].

18 De novo Assembly Metrics of Benchtop NGS Platforms Datasets were assembled using MIRA with default settings with the exception of only reporting contigs with a minimum of 100 reads for each platform. *kilo bases. $ Two combined runs. & base pairs. + Paired-end sequencing. s Datasets were randomly sub-sampled to achieve approximately 75 fold coverage for MiSeq and 40 fold coverage for PGM. # Not officially available during time of study. GSJ, 454 GS Junior; MiSeq, Illumina MiSeq; PGM, Ion Torrent Personal Genome Machine. Jünemann et al. (2013). Nature Biotechnology 31(4): 294 [PubMed].

19 Laboratory Developments Towards Finished Genomes Fast & easy mate-pair protocols (insert > 5kb) Hybrid 2 nd - & 3 rd generation assemblies Koren et al. (2012). Nature Biotechnology. doi: /nbt.2280 [PubMed] HGAP Hierarchical Genome Assembly Process (10kb seeds)

20 NGS Challanges Sample processing NGS platform(s) tools, IT-infrastructure, & (bio)informaticians

21 Bioinformatics Challenge - Molecular Typing Esperanto by Standardized Genome Comparison Average Nucleotide Identity (ANI) / k-mer approach used in taxonomy for species delineation, no assembly needed (suitable for clone level?) whole genome finally reduced to a single number of (dis)similarity Pair-wise (visual) genome comparison (Mauve-like) in general very problematic due to genomic rearrangement and recombination events; (nearly) impossible with draft genomes and more than 40 genomes Genome-wide SNP approach works especially good for ad hoc analysis of monomorphic organisms; various very subjective SNP selection approaches that are difficult to reproduce Genome-wide gene by gene approach works on gene level, i.e. the element of evolution sort of super/extended MLST (MLST + ), hierarchical typing approaches possible enables potentially for plain-language reports

22 Multi Locus Sequence Typing One disruptive technology fits it all gdh pdhc aroe abcz pgm, adk MLST of 6 genes versus genome-wide MLST +. Shown are the six N. meningitidis MLST targets and symbolic red boxes across the genome for MLST + targets. + Multi Locus Sequence Typing (MLST): 5-7 housekeeping genes distributed over the whole genome. One single expanding nomenclature. Used mainly for population genetics and evolutionary studies. MLST + : genome-wide gene by gene typing; hundreds/thousands of core genome genes for typing, resulting in higher discrimination and more accurate strain typing. The heightened discrimination power of MLST +, coupled with rapid and affordable NGS, makes this complete solution ideal for everyday microbial monitoring to outbreak investigation. Combining the best - Comparability of MLST and discriminatory power of PFGE

23 MLST + - Software Tools / Ridom SeqSphere + BIGSdb SeqSphere + Jolley & Maiden (2010). BIGSdb. BMC Bioinformatics 11: 595 [PubMed], Jünemann et al. (2013). Nature Biotechnology 31(4): 294 [PubMed].

24 Hierarchical Microbial Typing Approach SNPs / Alleles explorative MLST + rmlst* SNPs confirmatory/canonical MLST Hierarchical microbial typing approach. From bottom to top with increasing discriminatory power. MLS T, multi locus sequence typing; rmlst, ribosomal MLST; SNP, single nucleotide polymorphism. Lineage specific e.g., Köser et al (2012). NEJM 366: 2267 [PubMed] Standardized Species specific Mellmann et al. (2011). PLoS One. 6: e22751 [PubMed] Vogel et al. (2012). JCM 50: 1889 [PubMed] Jolley et al. (2012). JCM. 50: 3046 [PubMed] Pan-bacterial specific (also suited for identification) Jolley et al. (2012). Microbiology 158: 1005 [PubMed] Species specific e.g., Van Ert et al. (2007). JCM 45: 47 [PubMed] Maiden et al. (1998). PNAS 95: 3140 [PubMed] also needed for backward compatibility * user must register (free for academicians) at

25 Next Tasks and Topics whole genome microbial typing early warning systems & GIS resistome and pathogenome/toxome analysis plain language report unique signature detection for development of rapid molecular screening tests

26 EpiScanGis creation of a webbased application user-friendly interface allowing queries as to the distribution of fine types, incidence rates etc. since 2006 weekly prospective scans with SaTScan, display and storage of results used by 50% of all local German public health authorities for informed decisions (vaccination campaigns) Meningococcal infectious disease in Germany Elias et al. (2006). Emerg. Infect. Dis. 12: 1689 [PubMed], Reinhardt et al. (2008). Int. J. Health Geogr. 7: 33 [PubMed]. Grundmann et al. (2010). PLoS Med 7: [PubMed].

27 A Sense of Plain Language Report Phenotypic antibiograms, resistome, and toxome of MRSA hospital outbreak Köser et al (2012). NEJM 366: 2267 [PubMed]

28 The E. coli O104:H4 Outbreak Importance of Rapid Genome Sequencing Dag Harmsen University of Münster, Germany

29 Sessions: International Meeting on Microbial Epidemiological Markers IMMEM-10 OCTOBER 2 5, 2013 Paris Abstract submission deadline: June 15 th, 2013 Bioinformatics tools for genome-based microbial surveillance Outbreak genomics and epidemiology Population genetics, phylogenomics, emergence Molecular typing and epidemiology Surveillance networks in practice Phylodynamics of viral pathogens Virulence: diagnostic and epidemiology Resistance: diagnostic and epidemiology Diagnostic by high-throughput sequencing Social networks and transmission modelling Strain tracking from global health to One Health Round tables, workshops on various topics & networks