Introduction to PulseNet WGS Tools in BioNumerics v7.6

Size: px
Start display at page:

Download "Introduction to PulseNet WGS Tools in BioNumerics v7.6"

Transcription

1 National Center for Emerging and Zoonotic Infectious Diseases Introduction to PulseNet WGS Tools in BioNumerics v7.6 Steven Stroika PulseNet CDC PulseNet/OutbreakNet Regional Meeting February 2019

2 Overview PulseNet Workflow Reference Identification Genotyping Tools Nomenclature and surveillance Timelines

3 Data Analysis Workflow with National Database PHL Raw Sequence Data Private Raw Sequence Storage Reference ID Database Organism -specific Database PulseNet National Databases 1. Sequence isolates using PulseNet Key number File naming format: Key-LabID-M###- YYMMDD 2. Save generated sequence files locally, on BaseSpace, or external hard drive 3a. Link sequence data to Reference ID database by PulseNet Key name 3b. Submit data to calculation engine (CE) for denovo assembly, species identification (Genus, Species by ANI) 3c. Verify quality 3d. Export entries 4a. Import entries from Reference ID 4b. Add demographic information for entries 4c. Submit to the CE for allele calls and genotyping results (serotype, AST, virulence) 4d. Verify quality and upload to national database (WGS id automatically assigns) 4e. Upload raw data sequence reads to NCBI 4f. Perform surveillance in BioNumerics Public Raw Sequence Data Storage Updated 10/23/2018

4 Raw reads, QC, read quality, predicted coverage, contaminatio n Reference Identification Database (RefID) de novo Assembly, QC, N50, genome size and coverage Average Nucleotide Identity (ANI) Species specific databases

5 What ANI Identifies: *will not set to map for organism specific database export Genera Species ANI value (%) Genome size (MB) Campylobacter coli fetus jejuni lari upsaliensis hyointestinalis* Escherichia albertii* coli and Shigella fergusonii* Listeria innocua* ivanovii* marthii* monocytogenes seeligeri* welshimeri* Salmonella bongori enterica Vibrio cholerae Parahaemolyticus vulnificus alginolyticus* cidicii* cincinnatiensis* fluvialis* furnissii* garveyi* metoecus* metschnikovii* mimicus* navarrensis*

6 How Does Reference ID Look in BioNumerics?

7 Examination of Sequence Quality-Reference ID Database Warning

8 Genotyping Tools in Available in BioNumerics Organism Databases Listeria Salmonella Escherichia (O157/Non -O157/Shigella) Campylobacter Resistance Resistance Resistance Resistance Plasmid Plasmid Plasmid Plasmid Lineage Virulence (genes for pathovars) Antigenic formula and serotype Virulence(stx/eae/etc. used to determine Pathotype) Serotype

9 Genotyping Tools in Organism Databases Center for Genomic Epidemiology Tools Online tools hosted by CGE ( have been built into BioNumerics, which use BLAST to compare sequences to reference genomes Serotype: SerotypeFinder (Escherichia) and SeqSero (Salmonella) Resistance genes: ResFinder in all organism databases Virulence genes: VirulenceFinder (Escherichia) Plasmids: PlasmidFinder in all organism databases

10 Genotyping Tools: View Results in the Main Window Results output into database fields can be viewed in the main window: Serotype_wgs (Salmonella, Escherichia, Vibrio parahaemolyticus) AntigenForm_wgs (Salmonella) Pathotype (Escherichia) Toxin_wgs (Escherichia, Vibrio cholerae)

11 Genotyping Tools: View Results in an Experiment Card Results associated with experiments can also be viewed in an experiment card: Resistance Virulence Plasmids Stx (Escherichia) Click the green dot in the main window for a genotyping experiment An experiment card will appear with output data

12 MLST Sequence Types Publicly-available molecular typing schemas hosted by PubMLST ( and other sites have been integrated with BioNumerics: Escherichia: Achtman Listeria: PubMLST Salmonella: Achtman Campylobacter: Species-specific schemes jejuni/coli, fetus, lari, and upsaliensis

13 Allele databases and Analysis Developed allele databases: Escherichia: wgmlst, cgmlst, Chromosome-Associated, Plasmid-Associated, MLST (Achtman) Campylobacter: wgmlst, cgmlst (jejuni/coli), MLST (species specific) Listeria: wgmlst, cgmlst, MLST Salmonella: wgmlst, cgmlst (Enterobase), MLST (Achtman) For outbreak detection: cgmlst (All); wgmlst (Listeria and Campylobacter for further characterization) Allele database in development for Vibrio wgsnp analysis available in BioNumerics v7.6 (pipeline most similar to CFSAN) Nomenclature in development for non-listeria cgmlst schemes with availability in March

14 Listeria WGS Nomenclature LMO Partial names Sequence A Sequence B Organismversion When sequences have partial names, it means they are singletons in clusters below their last digit. The sequences A and B above are within approximately 7 alleles of each other by cgmlst. 71 Alleles 51 Alleles 36 Alleles 19 Alleles 7 Alleles 0 Alleles LMO LMO Allele Code These allele definitions are approximate and specific to organism!

15 Allele Codes in BioNumerics These strains failed QC and should be resequenced (core genome less than 95%) The two top strains are indistinguishable with 6 digits matching exactly. The bottom one is missing the 6 th digit so it relates to the other two within 7 alleles. These three strains are indistinguishable; 0 alleles different based on the core genome. Singleton: No close matches, name not assigned.

16 Cluster Detection Using Allele Codes Finds clusters based on allele code and userdefined thresholds Active in Listeria, may differ in other organism databases as allele codes are developed

17 WGS Tools Availability to PulseNet Participants Laboratories are in process of converting their existing PFGE databases to BioNumerics v7.6: Labs migrating in a phased approach Expectation is that all labs will be migrated by early March 35 labs have converted as of 2/1/19 WGS Tools available by March, contingent on: WGS Analysis certification Available now for labs who have converted to BN 7.6 Unforeseen technical hurdles which could delay accessibility for external users Distribution of state labs sequence data fully annotated

18 Thank You! For more information, contact CDC CDC-INFO ( ) TTY: #PulseNet The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention. Telephone: Web: