CBRS Chlamydiae community re-annotation

Size: px
Start display at page:

Download "CBRS Chlamydiae community re-annotation"

Transcription

1 CBRS Chlamydiae community re-annotation

2 Session schedule Intro and automatic annotation (T. Weinmaier) Nomenclature Manual refinement Submission / Publication

3 Session schedule Intro and automatic annotation (T. Weinmaier) Nomenclature Manual refinement Submission / Publication

4 Concept for Chlamydiae community re-annotation Thomas Weinmaier Chlamydial Basic Research Society meeting, San Antonio, Texas

5 Prototypic workflow for genome projects Sequencing Assembly Primary annotation Functional annotation

6 Prototypic workflow for genome projects Sequencing Assembly Primary annotation Functional annotation

7 Prototypic workflow for genome projects Sequencing Assembly Primary annotation Functional annotation

8 Prototypic workflow for genome projects Sequencing Assembly Primary annotation Functional annotation MxiH

9 Prototypic workflow for genome projects Sequencing Assembly Primary annotation Functional annotation MxiH Annotation drawbacks: - No standard procedure - Later not updated

10 Comparative genomics C. pneumoniae clinical isolates: AR39, CWL029, J138 and TW183 Genome sizes ~1,23 Mb 6000 nucleotides different (99.5% identical) Annotated genes in Genbank Genes without ortholog Isolate # Genes Isolate AR39 AR CWL J TW AR39 - CWL J TW183 72

11 BLAST search at NCBI Query: chlamydial protease like activity factor (CPAF) [Waddlia chondrophila WSU ] Search: BLASTP against NCBI RefSeq database

12 BLAST search at NCBI Query: chlamydial protease like activity factor (CPAF) [Waddlia chondrophila WSU ] Search: BLASTP against NCBI RefSeq database Description chlamydial protease-like activity factor (CPAF) [Waddlia chondrophila WSU ] putative chlamydial protease-like activity factor [Parachlamydia acanthamoebae str. Hall s coccus] >ref Y protease-like activity factor [Protochlamydia amoebophila UWE25] hypothetical protein CAB712 [Chlamydophila abortus S26/3] hypothetical protein CAB1_0732[Chlamydophila abortus LLG]

13 BLAST search at NCBI Query: chlamydial protease like activity factor (CPAF) [Waddlia chondrophila WSU ] Search: BLASTP against NCBI RefSeq database Description chlamydial protease-like activity factor (CPAF) [Waddlia chondrophila WSU ] putative chlamydial protease-like activity factor [Parachlamydia acanthamoebae str. Hall s coccus] >ref Y protease-like activity factor [Protochlamydia amoebophila UWE25] hypothetical protein CAB712 [Chlamydophila abortus S26/3] hypothetical protein CAB1_0732[Chlamydophila abortus LLG]

14 83 chlamydial genomes Problem: - Different annotation strategies - No update after submission

15 83 chlamydial genomes Problem: - Different annotation strategies - No update after submission Re-annotation goal: - Consistency - Currentness

16 83 chlamydial genomes Problem: - Different annotation strategies - No update after submission Re-annotation goal: - Consistency - Currentness Solution: - Consistent automatic re-annotation - Manual refinement

17 Proposed re-annotation strategy DNA sequences Resubmission Automatic annotation Reannotation of all publicly available chlamydial genomes CBRS reannotation project consortium Publication Manually refined annotation Chlamydiae researchers

18 Proposed re-annotation strategy DNA sequences Resubmission Automatic annotation Reannotation of all publicly available chlamydial genomes CBRS reannotation project consortium Publication Manually refined annotation Chlamydiae researchers DNA sequences from Genbank Automatic re-annotation

19 Proposed re-annotation strategy DNA sequences Resubmission Automatic annotation Reannotation of all publicly available chlamydial genomes CBRS reannotation project consortium Publication Manually refined annotation Chlamydiae researchers DNA sequences from Genbank Automatic re-annotation Manual refinement

20 Proposed re-annotation strategy DNA sequences Resubmission Automatic annotation Reannotation of all publicly available chlamydial genomes CBRS reannotation project consortium Publication Manually refined annotation Chlamydiae researchers DNA sequences from Genbank Automatic re-annotation Manual refinement Resubmission to Genbank Publication

21 Proposed re-annotation strategy DNA sequences Resubmission Automatic annotation Reannotation of all publicly available chlamydial genomes CBRS reannotation project consortium Publication Manually refined annotation Chlamydiae researchers DNA sequences from Genbank Automatic re-annotation Manual refinement Resubmission to Genbank Publication

22 Automatic annotation software Intrinsic information (Gene prediction tools) Extrinsic information (BLAST search) Resolving overlaps (trnas, rrnas, ncrnas) ConsPred Consensus prediction Functional annotation Best prediction by integrating multiple evidences spurious predictions are discarded ConsPred annotation

23 Comparative genomics after re-annotation C. pneumoniae clinical isolates: AR39, CWL029, J138 and TW183 Genome sizes ~1,23 Mb 6000 nucleotides different (99.5% identical) Annotated genes from ConsPred Genes without ortholog Isolate # Genes Isolate AR39 AR CWL J TW AR39 - CWL J TW

24 ConsPred Genbank Gene counts genes trnas rrnas ncrnas Cmu Cps Cps Cfe Cpn Cpn Cps Cps Cps Env Cmu/ ps Cca/ pe Cp n Cp s En v Cmu Cps Cpn Cps Cps Env Cmu Cps Cpn Cps Cps Env Cmu Cps Cps Cfe Cpn Cpn Cps Cps Cps Env Cmu/ ps Cca/ pe Cp n Cp s En v Cmu/ ps Cca/ pe Cp n Cps Env Cmu/ ps Cca/ pe Cp n Cps Env

25 Session schedule Intro and automatic annotation (T. Weinmaier) Nomenclature Manual refinement Submission / Publication

26 Session schedule Intro and automatic annotation (T. Weinmaier) Nomenclature Manual refinement Submission / Publication

27 Proposed re-annotation strategy DNA sequences Resubmission Automatic annotation Reannotation of all publicly available Chlamydiae CBRS reannotation project consortium Publication Manually refined annotation Chlamydiae researchers DNA sequences from Genbank Automatic re-annotation Discussion with community (rules, nomenclature, evidences,..) Resubmission to Genbank Publication

28 Nomenclature Existing references are retained Same locus_tag: Genes with unchanged coordinates Genes with changed start coordinate (new protein ID in DB) In between locus_tag: Newly annotated genes (e.g. CT_444.1) Removed genes lose locus_tag Evidence (PMID) of functional annotation is added as note Gene names?

29 References - now LOCUS AE bp DNA circular BCT 05-MAR-2010 DEFINITION Chlamydophila pneumoniae CWL029, complete genome. ACCESSION AE AE AE VERSION AE GI: DBLINK BioProject: PRJNA248 SOURCE Chlamydophila pneumoniae CWL029 ORGANISM Chlamydophila pneumoniae CWL029 Bacteria; Chlamydiae; Chlamydiales; Chlamydiaceae; Chlamydia/Chlamydophila group; Chlamydia. REFERENCE 1 (bases 1 to ) AUTHORS Kalman,S., Mitchell,W., Marathe,R., Lammel,C., Fan,J., Hyman,R.W., Olinger,L., Grimwood,J., Davis,R.W. and Stephens,R.S. TITLE Comparative genomes of Chlamydia pneumoniae and C. trachomatis JOURNAL Nat. Genet. 21 (4), (1999) PUBMED REFERENCE 2 (bases 1 to ) AUTHORS Kalman,S., Mitchell,W., Marathe,R., Lammel,C., Fan,J., Olinger,L., Grimwood,J., Davis,R.W. and Stephens,R.S. TITLE Direct Submission JOURNAL Submitted (01-DEC-1998) Program in Infectious Diseases, University of California, 235 Earl Warren Hall, Berkeley, CA 94720, USA Genome paper Direct submission

30 References after re-annotation LOCUS AE bp DNA circular BCT 05-MAR-2010 DEFINITION Chlamydophila pneumoniae CWL029, complete genome. ACCESSION AE AE AE VERSION AE GI: DBLINK BioProject: PRJNA248 SOURCE Chlamydophila pneumoniae CWL029 ORGANISM Chlamydophila pneumoniae CWL029 Bacteria; Chlamydiae; Chlamydiales; Chlamydiaceae; Chlamydia/Chlamydophila group; Chlamydia. REFERENCE 1 (bases 1 to ) AUTHORS Kalman,S., Mitchell,W., Marathe,R., Lammel,C., Fan,J., Hyman,R.W., Olinger,L., Grimwood,J., Davis,R.W. and Stephens,R.S. TITLE Comparative genomes of Chlamydia pneumoniae and C. trachomatis JOURNAL Nat. Genet. 21 (4), (1999) PUBMED REFERENCE 2 (bases 1 to ) AUTHORS The Chlamydia re-annotation project consortium TITLE Reannotation of all publicly available chlamydial genomes JOURNAL XXX (2013) PUBMED XXX REFERENCE 3 (bases 1 to ) AUTHORS Kalman,S., Mitchell,W., Marathe,R., Lammel,C., Fan,J., Olinger,L., Grimwood,J., Davis,R.W. and Stephens,R.S. TITLE Direct Submission JOURNAL Submitted (01-DEC-1998) Program in Infectious Diseases, University of California, 235 Earl Warren Hall, Berkeley, CA 94720, USA Genome paper Re-annotation paper Direct submission

31 Session schedule Intro and automatic annotation (T. Weinmaier) Nomenclature Manual refinement Submission / Publication

32 Session schedule Intro and automatic annotation (T. Weinmaier) Nomenclature Manual refinement Submission / Publication

33 Proposed re-annotation strategy DNA sequences Resubmission Automatic annotation Reannotation of all publicly available Chlamydiae CBRS reannotation project consortium Publication Manually refined annotation Chlamydiae researchers DNA sequences from Genbank Automatic re-annotation Discussion with community (rules, nomenclature, evidences,..) Resubmission to Genbank Publication

34 Manual refinement: your contributions Locus_tag Product Gene name EC number Pubmed ID Taxonomic transfer level CPn_0008 HB2 protein hb Chlamydophila pneumoniae CPn_0056 CPn_ CPn1016 Deleted: CPn1119 Phosphoglucomutase/ phosphomannomutase Tryptophanyl trna Synthetase Chlamydial protease like activity factor Chlamydophila pneumoniae trps_ Chlamydiaecae cpaf Chlamydiales Hypothetical protein Chlamydophila pneumoniae CWL029

35 Manual refinement: your contributions Locus_tag Product Gene name EC number Pubmed ID Taxonomic transfer level CPn_0008 HB2 protein hb Chlamydophila pneumoniae CPn_0056 CPn_ CPn1016 Deleted: CPn1119 Phosphoglucomutase/ phosphomannomutase Tryptophanyl trna Synthetase Chlamydial protease like activity factor Chlamydophila pneumoniae trps_ Chlamydiaecae cpaf Chlamydiales Hypothetical protein Chlamydophila pneumoniae CWL029

36 Genbank entry - now gene /locus_tag="cpn_0008" CDS /locus_tag="cpn_0008" /codon_start=1 /transl_table=11 /product="hypothetical protein" /protein_id="aad " /db_xref="gi: " /translation="misgllfllvrrevptvrseeiprgvsvtpseepalekaqkepe TKKILDRLPKELDQLDTYIQEVFACLERLKDPKYEDRGLLTEAKEKLRVFDVVEKDMM SEFLDIQRVLNEEAYYVEHCQDPLENIAYEIFSSQELRDYYCAGVCGYLPSGDARADR LKRSVKEVMDRFMRVTWKSWEASVMLDHSYGVARELFKKAVGVLEESVYKILFKSYRD AFYECEKAKIQRDGRFKWL" Automatic annotation

37 Genbank entry after re-annotation gene /locus_tag="cpn_0008 /gene="hb2" CDS /locus_tag="cpn_0008 /gene="hb2" /codon_start=1 /transl_table=11 /product="hb2 protein" /protein_id="aad " /db_xref="gi: /note ="product, gene name derived from literature (PMID )" /translation="misgllfllvrrevptvrseeiprgvsvtpseepalekaqkepe TKKILDRLPKELDQLDTYIQEVFACLERLKDPKYEDRGLLTEAKEKLRVFDVVEKDMM SEFLDIQRVLNEEAYYVEHCQDPLENIAYEIFSSQELRDYYCAGVCGYLPSGDARADR LKRSVKEVMDRFMRVTWKSWEASVMLDHSYGVARELFKKAVGVLEESVYKILFKSYRD AFYECEKAKIQRDGRFKWL" Manual annotation Manual annotation Manual annotation Indication of source

38 Genbank entry - now gene /gene="mrsa" /locus_tag="cpn_0056" CDS /gene="mrsa" /locus_tag="cpn_0056" /codon_start=1 /transl_table=11 /product="phosphomannomutase" /protein_id="aad " /db_xref="gi: " /translation="mkeveqrirslydavtaenicrwlsndctqqdaktilgwldtdp AQLEDLFGATLTFGTGGLRSLMGIGTNRINLFTIRRTTQGLVQVLRAHLPHPGDPMRV Automatic annotation Automatic annotation Automatic annotation.

39 Genbank entry after re-annotation gene /gene="mrsa" /locus_tag="cpn_0056" CDS /gene="mrsa" /locus_tag="cpn_0056" /codon_start=1 /transl_table=11 /product="phosphoglucomutase/phosphomannomutase /EC_number =" " /protein_id="aad " /db_xref="gi: /note ="product, EC_number derived from literature (PMID )" /translation="mkeveqrirslydavtaenicrwlsndctqqdaktilgwldtdp AQLEDLFGATLTFGTGGLRSLMGIGTNRINLFTIRRTTQGLVQVLRAHLPHPGDPMRV Automatic annotation Automatic annotation Manual annotation Manual annotation Indication of source.

40 Session schedule Intro and automatic annotation (T. Weinmaier) Nomenclature Manual refinement Submission / Publication

41 Session schedule Intro and automatic annotation (T. Weinmaier) Nomenclature Manual refinement Submission / Publication

42 Proposed re-annotation strategy DNA sequences Resubmission Automatic annotation Reannotation of all publicly available Chlamydiae CBRS reannotation project consortium Publication Manually refined annotation Chlamydiae researchers DNA sequences from Genbank Automatic re-annotation Discussion with community (rules, nomenclature, evidences,..) Resubmission to Genbank Publication

43 Submission Why? Provides up-to-date knowledge for BLAST searches Facilitates future (multi-)genome projects How? Chlamydia re-annotation project generates new Genbank files Owners of Genbank entries re-submit

44 Publication Authors: all contributors to manual re-annotation Research article Re-annotation of all publicly available chlamydial genomes CBRS reannotation project consortium Manuscript outline Goal: improving existing genome annotations Method: community-based re-annotation Results Comparison between old and new annotation Analysis of the chlamydial pan-genome Further ideas / contributions?

45 Session schedule Intro and automatic annotation (T. Weinmaier) Nomenclature Manual refinement Submission / Publication

46 Thank you for your attention!

47