Functional Annotation: Preliminary Results
|
|
- Solomon Watts
- 5 years ago
- Views:
Transcription
1 Functional Annotation: Preliminary Results Vani Rajan Gena Tang Neha Varghese Kevin Lee Gabriel Mitchell Tripp Jones Robert Petit Shaupu Qin
2 Outline Motivation Naming scheme Preliminary Program Results Database created Blast InterPro CDD Ab Initio Pathway Tools
3 The GaTech/CDC collaboration What sort of questions are we trying to answer? What annotations do we need to answer them? annotations questions discussion
4 genetic feature A genetic feature A Classification Schemes so the first question I would want to ask is how Hhae strains are related to NTHi phylogenetically? Can you identify any features that would tell NTHi from Hhae without any uncertainty? ~Xin Wang NT Hi NT Hi genetic feature B Hhae functional behavior B Hhae In order to develop classification schemes that extend beyond standard MLST and 16S rrna we will need annotations of genes. Developing new classification protocols based on metabolic activity and the like will require annotation of functional networks.
5 Explaining Phenotypes The second question is what is the genetic basis for Hhae hemolysis and pathogenecity. ~Xin Wang We need to have annotations of genes (and gene functions) and putative genetic networks that determine behaviors.
6 An aside about Pathogenicity Many kinds of potential genetic signatures associated with pathogenicity (there are databases; our collaborators have a list of candidates as well). Genetic requirements may not be enough. The environmental context matters too. Mobile elements of particular interest Phages (as prophages; there are tools to detect) Plasmids (they can integrate) Transposons Insertion sequences
7 Virulence Factor Database CDC has preliminary list of target genes Virulence factors in many bacterial species Searching our genomes: Download all factors from VFDB Create a blastdb of them BLAST all genomes against them Record the hits with e<0.001
8
9
10 Virulence Factor Database A track in the browser Integral to comparative analysis
11 Mobile Elements Insertion sequences Allow for recombination, genomic rearrangement, and plasmid formation Structure: inverted terminal repeats flanking a transposase (and an activity modulator) Search using ISfinder (web interface)
12 Mobile Elements Insertion sequences
13 Mobile Elements Insertion sequences
14 Mobile Elements Insertion sequences
15 Naming Scheme Previously used scheme (Kislyuk et al, 2010) Uniprot result with > 91% aa identity & e<10-9 If hypothetical in Uniprot, use InterPro domain Genes with unknown function in other genomes Conserved hypothetical Else putative uncharacterized protein
16 Preliminary Results BLAST AND DATABASE CREATED
17 Blastp Goals find hits with: Identity > 91% E < 10-9 Assign name Database Build database from organisms related to h. haemolyticus From UniProt Sanity check Pasteurellaceae family
18
19
20
21 Summary 1030 significant hits in h. influenza 76% 10 significant hits in other organisms 1 in 4 genes with no hits (313) ident > 91% E < 10-9 Pasteurellaceae database No extra hits To Come: Blast against other databases
22 Preliminary Results INTERPRO SCAN
23 Degree Distribution of Database Hits
24 Sample InterProScan Output
25 InterPro Entry Link
26 Preliminary Results CONSERVED DOMAIN DATABASE
27 Conserved Domain DB Search by RPS-BLAST - uses the query sequence to search a database of pre-calculated PSSMs Helps give more information about function for genes which come up as: putative, conserved, unknown Allows scripted data downloads
28 Sample output GFF 57/1353 genes have no hit in CDD
29 Web Server Output
30 Preliminary Results AB INITIO
31 SignalP
32 SignalP prediction 1352 Protein seqs in total 238: HMM predicted to contain a Signal Peptide 175: Neural network predicted to contain a Signal Peptide
33 LipoP Lipo Protein: 44/1352 predicted
34 TMHMM 275/1352: At least one transmembrane helix predicted
35 Preliminary Results The Working Concept of Pathway Tools
36 Components Pathway/Genome Navigator Pathway/Genome Editors PathoLogic
37 Input and PGDB Schema Annotated genome of the organism Genome sequence, locations of identified genes, identified functions of gene products. Sequence -> fasta format Annotation-> Genbank, Pathologic PGDB representation of input file
38 Metabolic Pathway Prediction 2 steps: 1. Creates the reactome using MetaCyc 2. Imports corresponding pathways from MetaCyc and then carries out a pruning process.
39 Operon Predictor Identifies operon boundaries by examining pairs of adjacent genes A and B Intergenic distance, functional relationship between A and B, membership in the same pathway, membership in same multimeric protein complex and so on.
40 Pathway Hole Filler Pathway holes -> seen -> under-annotated genes PHFiller uses Uniprot to identify potential hole fillers.
41 Pathway Tools Cellular Overview
42 Pathway Tools Cellular Overview
43 Next steps On receiving complete gene prediction results, perform all the annotation steps on one strain and then essentially automate the process and run it for the other five.