Exploring a fatal outbreak of Escherichia coli using PATRIC

Similar documents
LESSON FIVE A: BACTERIAL RESEARCH

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1)

LESSON FOUR: COMPARATIVE GENOMICS

Viral Genomes. Genomes may consist of: 1. Double Stranded DNA 2. Double Stranded RNA 3. Single-stranded RNA 4. Single-stranded DNA

The contribution of typing methods to risk assessment - the case of Vero cytotoxin- (Shiga toxin-) producing E. coli (VTEC/STEC)

Browser Exercises - I. Alignments and Comparative genomics

COMPUTER SIMULATIONS AND PROBLEMS

Hands-On Four Investigating Inherited Diseases

Investigating Inherited Diseases

SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen

Introduction to DNA-Sequencing

Hobart Equipment Service Portal

Tutorial for Stop codon reassignment in the wild

VTK Finance Tab. Troop Leader Training

APNA Mentor Match: Mentor Guide

From DNA Sequences to Genome-Scale Metabolic Models (to Network Projects in CellNetAnalyzer)

A. Locating the Job Requisition:

Introduction to Cognos Analytics and Report Navigation Training. IBM Cognos Analytics 11

Our world as food scientists has changed already. The most recent

Costpoint 7 Employee Instructions Vighter Medical Group, LLC

Functional analysis using EBI Metagenomics

OFFICE OF BUSINESS AND FINANCIAL SERVICES UNIVERSITY ACCOUNTING & FINANCIAL REPORTING GUIDE TO USING THE BIENNIAL INVENTORY APP ANDROID

Object Groups. SRI International Bioinformatics

DOING JUSTICE EXERCISE CREATING A MAP IN SIMPLYMAP

Verocytotoxin producing Escherichia coli (VTEC) diagnostics

user s guide Question 1

TimeSaver Training - Managers & Supervisors

The Benchmarking module

MODULE 5: TRANSLATION

BH Client Guide V3 2018

TRUST. Technology Reporting Using Structured Templates for the FCH JU. User Manual for data providers. Version 1.00

PathIQ ImmunoQuery 2.8 User s Guide

CCM 1.1 Field Staff User Guide

Angus AnyWhere. Reports User Guide AUGUST 2012

LAB 19 Population Genetics and Evolution II

Molecular Scissors: Lambda Digest Student Materials

NOVAtime 5000 Supervisor Web Services

University of North Dakota PeopleSoft Finance Tip Sheets. Vendor Payment Inquiry

GENETICS - CLUTCH CH.5 GENETICS OF BACTERIA AND VIRUSES.

Viruses. Chapter 19. Biology Eighth Edition Neil Campbell and Jane Reece. PowerPoint Lecture Presentations for

Biol Lecture Notes

Virus- infectious particle consisting of nucleic acid packaged in a protein coat.

Bacteriophages as Regulators of the Mammalian Microbiome JOSEPH FRANCIS 2018 ICC

User Quickstart Guide

Residences Supervisor User Guide

Creating Requisitions

IMI Apprenticeship Standards

Cre Portal ( Tutorial

Real-Time Air Quality Activity. Student Sheets

VENDOR USER GUIDE Fall 2018

Go to Bottom Left click WashU Epigenome Browser. Click

Talent Management System User Guide. Employee Profile, Goal Management & Performance Management

COMPUTER RESOURCES II:

Introduction to IBM Cognos for Consumers. IBM Cognos

The Basics and Sorting in Excel

VTEC strains typing: from traditional methods to NGS

Manager Dashboard User Manual

Viruses 11/30/2015. Chapter 19. Key Concepts in Chapter 19

Completing an Internal Audit User Guide For the Reliance Assessment Database

Registration and Access Navigation and Application Tools The Landing Page Invoice Access Filter Options...

TUTORIAL. Revised in Apr 2015

REPORTING ON HISTORICAL CHANGES IN YOUR DATA

An Update on Non-0157 Shiga Toxin-Producing E. coli in the Beef Chain

User Manual NSD ERP SYSTEM Customers Relationship Management (CRM)

CU Careers: Step-by-Step Guide

How to Set-Up a Basic Twitter Page

Finding Data in the IEDB 3.0. Kerrie Vaughan, PhD - Sr. Biocurator/Meta-Analysis Nima Salimi, MS - Sr. Biocurator/Curation Mgr

Viruses. Chapter 19. Biology Eighth Edition Neil Campbell and Jane Reece. PowerPoint Lecture Presentations for

Viruses. Chapter 19. Biology Eighth Edition Neil Campbell and Jane Reece. PowerPoint Lecture Presentations for

Requisitioning Method of Inventory Control

Table of Contents. 2 P a g e

TEMPLE UNIVERSITY CEMS Chemical Environmental Management System

Biotechnology Unit: Viruses

Note that anyone from the lab with access to Chematix can upload the barcodes for reconciliation.

Verocytotoxin producing Escherichia coli (VTEC) diagnostics

Figure 1. FasterDB SEARCH PAGE corresponding to human WNK1 gene. In the search page, gene searching, in the mouse or human genome, can be done: 1- By

Discover the Microbes Within: The Wolbachia Project. Bioinformatics Lab

Next, switch from your browser to the inflow Cloud for Windows app and log in.

Sequence Analysis Lab Protocol

Why study sequence similarity?

Viruses and Bacteria Notes

Lesson 1 Introduction to Restriction Analysis

2. The dropdown box has a number of databases that are searchable. Select the gene option and search for dihydrofolate reductase.

Finding Genes, Building Search Strategies and Visiting a Gene Page

Contractor Data Systems (CDS) Monthly Amounts Paid and EPI Submissions May 2017 v2

MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE?

LAB. WALRUSES AND WHALES AND SEALS, OH MY!

Rev.2.0. p f W. 119th Street Chicago, IL

Lesson Overview Identifying the Substance of Genes

AP Biology Reading Guide BI #3 Chapter 19: Viruses

EMPLOYEE JOB AID GETTING STARTED IN WORKDAY KEY ICONS ICON FUNCTION DESCRIPTION

user s guide Question 3

2. Use the advanced query to identify the subset of peptides that elicit an IFN- g response and do not require in vitro restimulation

Personal Financial Management

Chapter 10 Microbial Genetics: New Genes for Old Germs

TEMPLE UNIVERSITY CEMS Chemical Environmental Management System

PIMS User Guide USER GUIDE. Polaris Interview Management System. Page 1

Normalization, Dilution and Transfer Methods for the Maxprep TM Liquid Handler Technical Manual

Transcription:

Exploring a fatal outbreak of Escherichia coli using PATRIC On May 19, 2011, the Robert Koch Institute, Germany's national-level public health authority, was informed about a cluster of three cases of the hemolytic uremic syndrome in children admitted on the same day to the Hamburg university hospital. As numbers of effected children began to rise, they realized that they had a problem on their hands. They also began to see adults that had been sickened, and that number also began to increase. What was now considered an epidemic began to spread throughout Europe. The hemolytic uremic syndrome associated with the epidemic has been characterized by the triad of acute renal failure (an abrupt loss of kidney function that develops within 7 days), hemolytic anemia (a condition in which red blood cells are destroyed and removed from the bloodstream) and thrombocytopenia (low platelet count). Diarrheaassociated hemolytic uremic syndrome occurs primarily in children, and a precipitating infection with Shiga-toxin producing Escherichia coli, mainly of serotype O157:H7, is usually the primary cause. In adults, the hemolytic uremic syndrome with prodromal diarrhea, indicating an infectious cause, is a rare event. The serotype of the E. coli outbreak strain was determined to be O104:H4. A comparative genomic examination showed that the pathogen possessed genes typical of enteroaggregative E. coli, such as atta, aggr, aap, agga, and aggc, located on a virulence plasmid. In addition, the strain carried the gene for a Shiga-toxin 2 variant (stx2a). Other typical Shiga-toxin producing E. coli genes such as stx1, eae, and ehx were missing.[1] Using the genomes isolated from this outbreak, we will use PATRIC tools to examine the presence or absence of specific genes, and also compare the outbreak genomes to others similar genomes to see if we can see the same patterns of the genes that they have or lack. Creating genome groups 1. Login to the PATRIC website so that you can use your workspace in the downstream analysis.

2. On the PATRIC homepage (patricbrc.org), open the Organisms tab at the top of the page. 3. When the tab opens to reveal the box listing the names of pathogens, click on Escherichia. 4. This will take you to the landing page for Escherichia, which summarizes all the information that PATRIC has about the genus, including the number of genomes, experiments associated with it, publications on it, and tools that can analyze the available data sorted at that taxonomic level. 5. Find the tab across the top that is labeled Genome List and click on it.

6. This will take you to the Genome List for the genus Escherichia. On the left you will see a dynamic filter, and on the right a table that lists the genomes. 7. At the top of the filter on the left hand side you can see a text box. Enter the word Germany in that box and then hit return. 8. This will filter the table on the right hand side to show all the genomes that were either isolated in Germany, or had that word mentioned in the information that was submitted when the genome became public. Other information about these genomes can be seen in the columns, including information like the host that the bacterium was isolated from. 9. One of the columns to the right of this table is titled Collection Date. Click on those words and it will sort the table in the order of the years that the bacteria were collected.

10. One click will sort the table from the earliest collection date. 11. A second click shows the most recently collected genomes. 12. Check each of the boxes next to the genome name from the organisms that were collected in 2011. 13. Click on the Add Genomes next to the folder icon in the Workspace header. 14. This will open up a pop-up window that allows you to save the group.

15. Select the Create New Group option. 16. Name the group and click Save to Workspace. Now that data is saved and you can use a number of tools to explore it. Assignment Create genome groups for the three categories below. Use the dynamic filter on the Genome List page, and remember that you can use the text box at the top to filter on specific terms (hint: like O104). You can also use the filters underneath the text box to further refine your search (hint: Isolation Country and Collection Date). When you complete your assignment, you will have four different groups that include the one we just created. Create a group that contains all the O104 genomes collected in Europe, but not including Germany, in 2011 Create a group that contains all the E. coli genomes collected in 2011 in the United States. Collect all the O104 genomes are available in the PATRIC database, but exclude those collected in 2011. Comparing genome groups in PATRIC using the Protein Family Sorter tool 1. To look for presence or absence of the protein families within a genome group that you have created, click on the Tools tab and under Comparative Genomics, select the Protein Family Sorter tool

2. This will take you to the landing page for that tool. 3. Scroll down in the Select Organism box until you see the genome groups you created. Select the boxes for the Germany 2011 group, the O104 group from 2011 that don t include Germany, and the O104 group that contains genomes isolated in years other than 2011. 4. Hit the select button under the keyword search box.

5. This takes you to the Protein Family Sorter landing page. On the right you will see a dynamic filter, and on the left a table that lists all the protein families. 6. One way you can examine differences in your genome groups is to visualize the data. To do this, click on the Heatmap at the top of the table (next to the Table tab). 1. 7. This will take you the heatmap view, where absence (black cells) and presence (yellow, mustard and orange cells) can be seen across all genomes. The genomes are on the y-axis, and the protein families on the x-axis.

8. You can order the protein families by the way the genes occur in a given genome. This is a good way to check for something called genomic islands, which are parts of a genome that were not directly inherited, but are obtained from different bacteria in what is described as horizontal transfer. To do this using the Protein Family Sorter, click on the down arrow in the text box next to the words Advanced Clustering. 9. This will open up a list of genomes that are included in the groups. Scroll down until you find one of the German genomes (Escherichia coli O104:H4 str. Ty-2482). Click on that name. 10. This will order all the protein families along the order that the genes occur in the Ty-2482 strain. You ll notice that several of the genomes appear to have long black boxes associated with them. This means that these genomes could be missing a long section of the genome that is present in the reference strain. This is an indication of a genomic island.

11. To explore a particular section, you should use your mouse to draw a box around the area of the genome that is next to a black box. 12. This generates a pop-up window that gives the user choices on what they want to do with the selected data. Click the Show Proteins button at the bottom of the pop-up window. 13. This will open a new window that shows the genes found in that section of the heatmap view that you selected. 14. To see the order the genes occur in, first resize the table by changing the number at the bottom of it to include all the genes and hit return. 15. Then at the top of the table, click once on the column head that reads Alternative Locus Tag to reorder the genes from first to last.

16. You can see that the majority of these genes are sequential (each of the locus tags increases numerically by one). Moreover, many of the names of these genes include the word phage. This word is derived from bacteriophage, which are viruses of bacteria. They are often associated with horizontal transfer of DNA, the transfer of genes between organisms in a manner other than traditional reproduction. Assignment Use the protein family sorter and the groups you created to answer the following questions. Compare the groups from all the genomes collected in Germany in 2011 with the O104 genomes that were not isolated in 2011. Go to the heatmap view and choose the Escherichia_coli_O104-H4_str_01-09591 (isolated in 2001) as the reference. If you scroll down the heatmap (use the slider at the bottom of the view),

you will see a large black box in strain E112/10. Use you mouse to select the proteins found in another genome that occur where the E112/10 genome is missing them. Many of these are metabolic proteins. From the other classes you have had, can you determine which pathways would be impacted in the E112/10 strain by not having these genes? Comparing genomes in PATRIC using the Protein Family Sorter tool to look for specific genes. 1. To look for presence or absence of the protein families within a genome group that you have created, click on the Tools tab and under Comparative Genomics, select the Protein Family Sorter tool 2. This will take you to the landing page for that tool. 3. Scroll down in the Select Organism box until you see the genome group you created that contains the genomes from the Germany outbreak. Check the box in front of that group.

4. We are going to see if these genomes have the Shiga toxin genes described. Enter the work Shiga in the keyword search box and click on the Search button below the box. 5. This returns a table that has a filter on the right, and the results on the left. You can see that a single protein family has been found in these genomes. 6. If you look carefully at the name under the product description it says Shiga-like toxin II subunit B precursor. The name is a hyperlink. Click on it. 7. This will take you to the summary information for all the genes in your genome group that were in that particular protein family. This information includes the

names of the genomes, the various locus tags that identify the genes, and the length of the proteins. 8. To find out more information about any of the genes, click on any locus tag in the Column called PATRIC ID. 9. This will take you to the landing page for that gene where all the information available for it in PATRIC is summarized, including its different gene identifiers, tools and resources that can be used to examine this gene, and any publications that might have been written about it. 10. If you remember the story from above, the gene that was associated with the outbreak was Shiga-toxin 2 variant (stx2a). The a generally implies the A subunit, and we re looking at the B subunit here. What happened to A? A good thing is that these genes generally travel in pairs, so let s look at the genes around this one to see if we can find A. To do this, in the tabs along the top of the page, click on the one named Genome Browser.

11. This will open up a tool that shows you the gene you are looking at, and the genes surrounding it. The Shiga toxin subunit B gene that we were looking at is the fig 1048256.3.peg.1439 locus tag. 12. Mousing over the gene immediately upstream reveals the A subunit. 13. If you click on that gene in the genome browser, a pop-up box shows you specific information about it. Double click on the first line under Feature Details

14. This takes you to the landing page for this gene 15. So there is a Shiga toxin subunit A in this genome. Why didn t we see it in the tool. Now you re exposed to some of the problems research biologist have. The gene is present, but we don t see it because it has not yet been assigned to a protein family. Look down at under Functional Properties. You ll see that next to FIGFam

Assignments, there is nothing assigned. This means that it is not assigned to a protein family. Below I ve provided a comparison of both the A and B subunit. You can see that Shiga toxin B subunit has a FIGFam assignment, but A does not. That s why only the B subunit is seen in the Protein Family Sorter. Shiga toxin A subunit Shiga toxin B subunit Searching for specific genes in PATRIC Scientists studying the 2011 outbreak found that genomes isolated from the E. coli bacteria associated with the epidemic certain genes that had previously been associated with virulence (atta, aggr, aap, agga, and aggc). In addition, these strains also carried the gene for a Shiga-toxin 2 variant (stx2a). In contrast, these same genomes were found to be missing other typical Shiga-toxin producing E. coli genes (stx1, eae, and ehx). In this part of the exercise, we are about to embark on one of the most frustrating aspects of searching for information that research biologists encounter. In an age where there is an abundance of information about organisms, their genomes and genes, and how those genes are expressed, scientists are often unable to find the information that could help their research. Sometimes the data is located in different repositories, and each of these places call the genes by different names or by different IDs. Scientists often rely on older publications that identify their gene of interest by a certain name, and that name may no longer exist in any resource. And sometimes, a specific annotation pipeline that is used to call the genes on a genome and name them may not recognize that a specific gene is there. Part of this exercise will be to try and map whatever data we can from the outbreak genomes in PATRIC and find the discrepancies in the available information. 1. In the search box at the top of the page, enter stx2a and coli. This will narrow the search to look at the E. coli genomes. Hit return.

2. This will take you to the Search Results page. This page will always be structured with the same format, with the results of genes with the best hit to your search term on top, followed by genomes. The search results also include taxon (if you re looking for a species, genus, family or higher) and experiments that might result from your search term. Genes Genomes Taxonomy Experiments 3. Look at the Features the top of the results. These are the genes that match your search. 85 genes match the search terms Genes name Genome name This symbol means that the gene is a RefSeq annota on, and may or may not have a PATRIC annota on. RefSeq locus tag 4. As there are 85 features that match this return, lets be more specific and try to refine the search. In the search box enter stx2a and O104 and hit return. 5. The results table shows fewer genes.

6. Click on the name of the first gene in the list. This will take you to the landing page for that gene. Assignment: Use the landing page to fill out the table below, and then search for the other genes in PATRIC. You will not be able to find all of them, and to locate some of them, you may have to broaden your scope (Hint: Start with the O104 genomes, and then change to coli if necessary). Gene Name PATRIC locus tag E. coli strain FigFam number Product Description in PATRIC atta aggr aap agga

aggc stx2a fig 1090928.3.peg.1113 O104:H4 str. E112/10 None Shiga-like toxin II subunit A precursor (EC 3.2.2.22) stx1 eae ehx In a previous exercise, you learned how to use the FIGFam IDs in the Protein Family Sorter tool to to see the presence or absence of certain genes across various genome groups. Use this technique to examine the genomes from the 2011 German outbreak. References Which genes do the genomes share, and which are they lacking? Expand to the other outbreak genomes outside of Germany. Do they have a similar pattern? Look carefully at the O104 genomes that were not part of the 2011 epidemic. Do any of those genomes have the same pattern as you see in the German genomes? What are the differences? 1. Frank, C., et al., Epidemic profile of Shiga-toxin-producing Escherichia coli O104:H4 outbreak in Germany. N Engl J Med, 2011. 365(19): p. 1771-80.