Food Safety (Bio-)Informatics

Similar documents
Ribotyping Easily Fills in for Whole Genome Sequencing to Characterize Food-borne Pathogens David Sistanich

Challenges and opportunities for whole genome sequencing based surveillance of antibiotic resistance

Bioinformatics- Data Analysis

GMI: Global Microbial Identifier

Whole Genome Sequencing for food safety FSA Chief Scientific Advisor Report and 2013 Listeria pilot study

Whole Genome Sequencing for TB diagnostics. Adam Witney. Institute for Infection and Immunity St George s, University of London

Whole Genome Sequence Data Quality Control and Validation

Developing Tools for Rapid and Accurate Post-Sequencing Analysis of Foodborne Pathogens. Mitchell Holland, Noblis

Next Generation Sequencing Applications in Food Safety and Quality

The Genomic Transformation of Health

Types of Databases - By Scope

Using New ThiNGS on Small Things. Shane Byrne

Bioinformatics Tools and Pipelines for Real-Time Pathogen Surveillance

AUDREY FARBOS JEREMIE POSCHMANN PAUL O NEILL KONRAD PASZKIEWICZ KAREN MOORE

Food Safety & High-Throughput Sequencing (HTS) What Does the Future Hold? Perspectives from the Industry, Governmental Agencies and Academia An IFSH

Setting the Course: Virginia's experience navigating information technology and bioinformatics needs for whole genome sequencing

Canada's IRIDA platform for genomic epidemiology. Gary Van Domselaar Chief, Bioinformatics National Microbiology Lab Public Health Agency of Canada

Global outbreak of severe Mycobacterium chimaera disease after cardiac surgery: a molecular epidemiological study

National Institute of Biology - NIB

Development and Implementation of a Quality System for Next-Generation Sequencing

Next generation sequencing in diagnostic laboratories: opportunities and challenges

Introduction to Whole Genome Sequencing and its Applications in Microbial Diagnostics

FUTURE PROSPECTS IN MOLECULAR INFECTIOUS DISEASES DIAGNOSIS

ELE4120 Bioinformatics. Tutorial 5

Genomics and its Impact on Diagnostic Microbiology

Implementation of genomics technologies in regulatory food microbiology testing

Food Safety & High-Throughput Sequencing (HTS) What Does the Future Hold? Perspectives from the Industry, Governmental Agencies and Academia An IFSH

Bioinformatics and computational tools

Functional profiling of metagenomic short reads: How complex are complex microbial communities?

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

The EMBL-Bioinformatics and Data-Intensive Informatics

Experimental Design Microbial Sequencing

CBC Data Therapy. Metagenomics Discussion

Genome Sequence Assembly

ngs metagenomics target variation amplicon bioinformatics diagnostics dna trio indel high-throughput gene structural variation ChIP-seq mendelian

Bioinformatics Sequence And Genome Analysis David W Mount

Introduction to Bioinformatics

Looking Ahead: Improving Workflows for SMRT Sequencing

What can whole genome sequencing do for you? Michael Deason

Unlocking Genomic Diversity! without Assembly or Alignment!

The Basics of Understanding Whole Genome Next Generation Sequence Data

Pathogenic organisms no thanks: Use of next generation sequencing techniques in risk assessment and HACCP

BIOINFORMATICS TO ANALYZE AND COMPARE GENOMES

Introduction to BIOINFORMATICS

Introduction to Bioinformatics

Machine Learning and Data Fusion Methods for Phenotype-based Threat Assessment of Unknown Bacteria

Introduction to 'Omics and Bioinformatics

Programme Manager GROW Colombia JOB VACANCY

Beef Industry Safety Summit Renaissance Austin Hotel 9721 Arboretum Blvd. Austin, TX March 1-3

Hybrid Error Correction and De Novo Assembly with Oxford Nanopore

Two Mark question and Answers

CENTER FOR BIOTECHNOLOGY

(Big) Data Analytics for strain and product development

Visit our Career Flowchart to get more information on some of these career paths.

Advanced Technology in Phytoplasma Research

Microbial sequencing solutions

EURL WORKING GROUP ON WHOLE GENOME SEQUENCING AND PULSENET INTERNATIONAL

The Basics of Understanding Whole Genome Next Generation Sequence Data

Studying the Human Genome. Lesson Overview. Lesson Overview Studying the Human Genome

From classical molecular typing to WGS in a food safety context: WGS at EFSA

Biomedical Informatics in BIG DATA Era

2014 APHL Next Generation Sequencing (NGS) Survey

New York State s experience with analyzing, interpreting, and sharing whole genome sequence data for surveillance of enteric organisms.

Introduction to NGS. Simon Rasmussen Associate Professor DTU Bioinformatics Technical University of Denmark 2018

State strategies for bioinformatics in BRICs and the UK. Professor Brian Salter

Carl Woese. Used 16S rrna to develop a method to Identify any bacterium, and discovered a novel domain of life

Lesson Overview. Studying the Human Genome. Lesson Overview Studying the Human Genome

Lecture 21. Microbiology & Metagenomics

Introducing QIAseq. Accelerate your NGS performance through Sample to Insight solutions. Sample to Insight

Machine Learning. HMM applications in computational biology

Contact us for more information and a quotation

BIOINFORMATICS TO ANALYZE AND COMPARE GENOMES

Antimicrobial Resistance Prediction Using Deep Convolutional Neural Networks on Whole Genome Sequencing Data

ProNNaC Whole Genome Sequencing. Staphylococcus aureus

Overview of CIDT Challenges and Opportunities

Our website:

Analytics Behind Genomic Testing

Array-Ready Oligo Set for the Rat Genome Version 3.0

Rue Juliette Wytsmanstraat Brussels Belgium T F

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Whole-Genome Sequencing (WGS) for Food Safety

ELIXIR: data for molecular biology and points of entry for marine scientists

Download the Lectin sequence output from

Supplementary Figure 1. Design of the control microarray. a, Genomic DNA from the

10. BIOTECHNOLOGY (Code No. 045)

Questionnaire on the use of High Throughput Sequencing, Bioinformatics and Computational Genomics (HTS-BCG) in the OIE Reference Centre network

Meet the iseq 100 System.

ILSI Europe The European branch of the International Life Sciences Institute

Faramarz Valafar.

New Lab Tech. Bruce L. Akey BS MS DVM Director, Texas A&M Veterinary Medical Diagnostic Laboratory

Lecture 8: Predicting metagenomic composition from 16S survey data

MB 668 Microbial Bioinformatics and Genome Evolution. 4 credits Spring, 2017

An Industrial Lab s Experience of NGS. Dr Greg Jones

Carl Woese. Used 16S rrna to developed a method to Identify any bacterium, and discovered a novel domain of life

Functional annotation of metagenomes

FRAUNHOFER INSTITUTE FOR INTERFACIAL ENGINEERING AND BIOTECHNOLOGY IGB NEXT-GENERATION SEQUENCING. From wet lab to dry lab complete sample analysis

Practical Bioinformatics for Biologists (BIOS 441/641)

Transcriptome analysis

Human Genomics. Higher Human Biology

Assembly and Validation of Large Genomes from Short Reads Michael Schatz. March 16, 2011 Genome Assembly Workshop / Genome 10k

Transcription:

Food Safety (Bio-)Informatics Henk C. den Bakker Assistant Professor in Bioinformatics and Epidemiology Center for Food Safety University of Georgia hcd82599@uga.edu

Overview Short introduction of Food Safety Informatics The digital immune system

Food Safety Informatics? The use of information and computer science to advance food safety A combination of different individual disciplines: Statistics Computer science Epidemiology Bioinformatics Using Big Data approaches, The Internet of Things

The rise of a digital immune system (DIS) Coined by David Lipman Further worked out by Michael Schatz and Adam Phillippy in 2012* Would work in much the same way as an adaptive, biological immune system: Observe the microbial landscape Detect potential threats Neutralize threads before they can cause widespread harm Distributed sensor sequencing and bioinformatics where a network of mobile sequencing devices serves a real-time stream of microbial genomes to a global compute cloud for analysis. *Schatz, M.C, & A. Phillippy. 2012.GigaScience 1 (1): 4. doi:10.1186/2047-217x-1-4.

What is necessary for a digital immune system? A catalogue of microbial diversity, so we can tell the normal from the abnormal (a potential thread) Centralized (genome) databases, such as NCBI, EMBL and DDBJ Rapid bioinformatics tools to deal with the growing amount of (realtime) data sequencing devices (preferably inexpensive and portable) that can act as the sensors in a distributed, real-time sequencing network

The digital immune system http://hint.fm/wind/

Applying the digital immune system to food safety: The GenomeTrakr project Project spear-headed by the FDA* GenomeTrakr is the first distributed network of labs to utilize whole genome sequencing for pathogen identification Consists of 15 federal labs, 25 state health and university labs, 1 U.S. hospital lab, 2 other labs located in the U.S., 20 labs located outside of the U.S., and collaborations with independent academic researchers. Data curation and bioinformatic analyses and support are provided by the National Center for Biotechnology Information (NCBI) at the National Institutes of Health The GenomeTrakr network has sequenced more than 167,000 isolates, and closed more than 175 genomes. The network is regularly sequencing over 5,000 isolates each month. *https://www.fda.gov/food/foodscienceresearch/wholegenomesequencingprogramwgs/default.htm

The sensors and the network Illumina short read sequencers, in particular the MiSeq Generate genome sequences as short reads, typically >> 200,000 per bacterial genome https://www.illumina.com

The sensors and the network

Using whole genome sequencing (WGS) data in outbreak investigations WGS data give unprecedented resolution Ability to use genomic changes that can help us to infer relatedness with strains in past and present (Single Nucleotide Polymorphisms). After ~ 2 years of using WGS for outbreak investigations*: aid in finding the food vehicle for cold cases and sporadic cases, as WGS can phylogenetically link isolates from human cases and food. Sequencing of both food product and patient derived isolates, outbreaks can be confirmed following product testing, allowing for an early association of an outbreak with a contaminated food. WGS can help in a rapid and precise outbreak case definition, and thus productively redirect epidemiological resources * Jackson et al. 2016. Clin Infect Dis.;63(3):380-6

NCBI Pathogen detection

The database is growing

How close is the GenomeTrakr network to a digital immune system? Close, but far from real-time: Still dependent on classical microbiology to isolate pathogens, which adds days to weeks to the protocol Sequenchers are state of the art, but the sequencing procedure takes 2 to 3 days The increasing size of the database becomes prohibitively large for real-time searches

New sequencing technologies and (quasi- )metagenome sequencing Novel sequencing protocols that need either no or very limited steps for enrichment of target organisms Novel sequencing technologies e.g., Oxford Nanopore https://nanoporetech.com

The databases are getting larger and larger

Fortunately we can surf the Big Data wave Source: http://www.tech-dynamics.com/wp-content/uploads/2014/02/bigdatachart.png

A rediscovery of old data structures/algorithms Big Data is years ahead of the big increase in genomic data In an effort to speed up analyses and searches of genomic data, old data structures and algorithms are rediscovered and/or reimplemented: De Bruijn Graph (De Bruijn, 1946) genome assembly Bloom filter (Bloom, 1970) MinHash (Broder, 1997); efficient comparison of datasets

MinHash; comparing large datasets with smaller sketches Originally developed to compare large electronic documents (Broder, 1998) Summarizes documents as subsets (sketch) of a fixed size of their information, using a specific criterion to select the members of the subset Example: a sketch of a thousand words is approximately large enough to infer the similarity of a document with millions of words Translated to bacterial genomes, we can use the same strategy to divide genomes up in words (kmers) and use a MinHash approach to estimate the relatedness of these genomes Ondov, Brian D. et al. 2016. Genome Biology 17 (1): 132.

BIGSI: Searching microbial big data BLAST has been the traditional search algorithm for genetic and genomic database centers such as NCBI (US), EBI (Europe). However the majority of genomic data (by now hundreds of thousands) are stored as un-assembled genomes, consisting of hundreds of thousands to millions of small reads BLAST is generally not fast enough to search these databases realtime

From Bloom filters to BIGSI By David Eppstein - self-made, originally for a talk at WADS 2007, Public Domain,https://commons.wikimedia.org/w/index.php?curid=2609777 Advantages: small storage for large sets of elements Fast search Disadvantage: False positives

BIGSI: extension of the bloom filter bitsliced genomic signature index (BIGSI) Allows for superfast search of big sequence data databases 3 antibiotic resistance genes (MCR-1, MCR-2, MCR3) could be searched in 1.73 seconds in a data-base of 447,833 viral and bacterial genomes. P. Bradley, H.C. den Bakker, E. Rocha, G. McVean, Z. Iqbal. 2017. biorxiv 234955; doi: https://doi.org/10.1101/234955

Summary In food safety, the Genome Trackr network is the closest thing we have to a digital immune system In order to use this network to detect early threads we need further improvements: Improvement of sample preparation methods/culture free methods Sequenching technology (faster, easier, smaller) Bioinformatics These improvements are coming fast