The Immune Epitope Database and Analysis Resource (IEDB)

Size: px
Start display at page:

Download "The Immune Epitope Database and Analysis Resource (IEDB)"

Transcription

1 The Immune Epitope Database and Analysis Resource (IEDB) Jason Greenbaum La Jolla Institute for Allergy and Immunology EMBRACE Bioinformatics of Immunology Workshop January 24, 2007 Lyngby, Denmark

2 Overview The IEDB Concept & Goals Walkthrough Analysis Tools Information Exchange Ontology Development Summary & Future Plans

3 Overview The IEDB Concept & Goals Walkthrough Analysis Tools Information Exchange Ontology Development Summary & Future Plans

4 What is an epitope? An epitope is defined as the chemical structure recognized by specific receptors of the immune system (antibodies, MHC molecules, and/or T cell receptors)

5 IEDB Organization Discovery Groups NIAID External Tool Developers Database Analysis Resource

6 Goals for the IEDB Catalog and organize an ever growing body of immunological information B and T cell epitopes from infectious pathogens, experimental and self antigens Priority on Category A-C pathogens and emerging diseases Humans, non human primates, rodents, and other species for which detailed epitope information is available Develop new methods to predict and model immune responses Assist in the development of vaccines and diagnostics Incorporate input from scientific community (Feedback and Forums)

7 Other sources of epitope information MHCPep SYFPEITHI FIMM HLA Ligand/Motif Database HIV Sequence Database

8 What sets the IEDB apart? Incorporation of positive AND negative data Finely detailed curation 10 full-time Ph.D.-level curators

9 What information is stored? Epitope information Sequence, structure, species, source protein, etc. Reference information Authors, journal, PMID, etc. Assay data Assay type, cell line(s), species, measurements, etc.

10 Curation Priorities Category A-C pathogens & toxins Emerging and Re-emerging infectious diseases Other Infectious diseases Allergens Self antigens involved in autoimmunity Transplant rejection antigens and other alloantigens Cancer epitopes

11 IEDB philosophy Inclusive curation We believe that is not our job to decide what is good or bad data Context information Use of assay standards Our job is to catalog the information and allow the users (scientific community) to easily access it

12 Data sources Literature Targeted PubMed query Direct submission Large-scale epitope discovery contracts

13 The curation process PubMed query Finalized Curation Peer review Epitope council review Abstract scan Curators

14 Selection and Curation of Influenza A Literature References More than 16 millions references are available in Pubmed 2063 references are influenza related (~0.01%) 743 selected after abstract scan (~36%) 429 curated into IEDB (~58%)

15 It all starts with PubMed

16 Automated Text Classification ~21,000 abstracts classified by expert into relevant: Yes / No Can this process be automated? Naive Bayes Classifier Analyze word frequencies in classified abstracts Cross validated result: 50% of the irrelevant abstracts can be identified with few (<5%) false negative classifications Curatable words Yes No Abstracts tag 0 30 epitope-tagged 0 28 superantigens 3 97 adverse 2 21 seroconversion 1 20 phage-displayed 8 1 overlapping epitope-based 8 0 ~X-mers~ 12 0 ~MHC allele~ 72 10

17 Current database statistics January 17, ,853 references 54,626 records 23,979 distinct epitopes

18 Overview The IEDB Concept & Goals Walkthrough Analysis Tools Information Exchange Ontology Development Summary & Future Plans

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39 Overview The IEDB Concept & Goals Walkthrough Analysis Tools Information Exchange Ontology Development Summary & Future Plans

40 Analysis tools available Viewing tools Epitope viewer Analytical tools Population coverage Conservancy analysis Predictive tools T cell epitope predictions MHC binding Antigen processing B cell epitope predictions

41

42 Analysis tools screenshot

43 B-cell epitope prediction tools Five different methods to predict linear Ab epitopes were selected 1. Chou and Fasman beta turn prediction Chou PY, Fasman GD. Adv Enzymol Relat Areas Mol Biol. 1978;47: Emini surface accessibility scale Emini EA, Hughes JV, Perlow DS, Boger J. J Virol Sep;55(3): Karplus and Schulz flexibility scale Karplus PA, Schulz GE. Naturwissenschafren 1985; 72: Kolaskar and Tongaonkar antigenicity scale Kolaskar AS, Tongaonkar PC. FEBS Lett Dec 10;276(1-2): Parker Hydrophilicity Prediction Parker JM, Guo D, Hodges RS. Biochemistry Sep 23; 25(19): These methods were implemented in the IEDB ( Workshop on B cell epitope prediction tools Greenbaum et al. (2007) J. Mol Recognit.

44 1. Specify protein sequence 2. Select a method 3. Click Submit

45

46 Planned/desired future tools Self-similarity Visualization of epitopes in genome New B cell epitope prediction tools Discotope Bepipred Ellipro SDSC

47 Overview The IEDB Concept & Goals Walkthrough Analysis Tools Information Exchange Ontology Development Summary & Future Plans

48 Data exchange XML Submissions IEDB XML Database export Tools HTTP Querying Linking

49 XML format (figure)

50 Direct submission issues Submitters unfamiliar with XML Completely automated validation impossible with XSD

51 Linking to the IEDB Two methods Linking to query result NCBI linking based on IEDB-supplied XML

52

53

54 Linking to other databases Links provided to source protein records in GenBank/SwissProt Pubmed ID at NCBI

55 Overview The IEDB Concept & Goals Walkthrough Analysis Tools Information Exchange Ontology Development Summary & Future Plans

56 Ontology Development "define:ontology" 28 results on google "An ontology is a controlled vocabulary that describes objects and the relations between them in a formal way, and has a grammar for using the vocabulary terms to express something meaningful within a specified domain of interest." Ontology of Immune Epitopes

57 Ontology goals Enumerate and unambiguously define all terms in database Determine relationships among entities Apply ontology to current database and to new records as they are entered

58 Ontology projects Ontology for Biomedical Investigations (OBI) Gene Ontology (GO) MGED Ontology (MO)

59 Why is it important? Minimizing redundant work Error checking Enforcing data constraints Information exchange

60 Assay Components Are Shared T T B APC APC MHC Binding MHC Ligand Elution T Cell Response B Cell Response

61 IEDB Data Structure Sathiamurthy et al, Immunome Research, 2005

62 Towards a Formal Ontology Goal: Upper Connecting level ontology information fromresources Basic Formal Ontology (BFO) Shared concepts from other biomedical ontologies (OBI, GO, NCBI, FMA,...) We will host next OBI workshop in San Diego

63 Overview The IEDB Concept & Goals Walkthrough Analysis Tools Information Exchange Ontology Development Summary & Future Plans

64 Summary IEDB seeks to organize and consolidate all existing and future epitope information Expert curation Rich set of integrated analysis tools available/under development Interacting with IEDB possible through several channels Formal ontology development under way to ensure data consistency

65 Future Plans Continue with curations Add capability to the Analysis Resource Work with Discovery Groups in submitting data Develop interface for external tools Complete ontology development Expand the IEDB s utility and exposure

66 Acknowledgments Scott Stewart Scott Way Tom Carolan Louis Bulger Hussein Emami John Quaresma Rob Thurmond Ryan Shyffer Jane Herron Cindy Oliver Phil Bourne Julia Ponomarenko Ole Lund Søren Buus

67 Further reading & surfing Peters et al PLoS Biology Sette et al Immunity IEDB: