Identifying Bacteriophages in Metagenomic Data Sets

Size: px
Start display at page:

Download "Identifying Bacteriophages in Metagenomic Data Sets"

Transcription

1 Identifying Bacteriophages in Metagenomic Data Sets RCAM 2017 Vanessa Jurtz Technical University of Denmark October 17, /42

2 Contents Phages and why they matter MetaPhinder - Identifying phages Further characterization of sequences Phage cocktail data sets 2/42

3 Phages and why they matter Bacteria rule the world and phages rule bacteria -by Mya Breitbart most abundant organisms in biosphere outnumber bacteria 10:1 kill up to 50% of bacteria produced every day impact biogeochemical cycling of key elements such as carbon, nitrogen and phosphorus 3/42

4 Phages and why they matter Figure adapted from 4/42

5 Phages and why they matter 13 families based on morphology and nucleic acid composition (ICTV) phage genomes: single or double stranded DNA or RNA most sequenced phages today are of the order Caudovirales tailed dsdna phages 5/42

6 Phages and why they matter Figure adapted from Ceyssenset al Inrtoduction to Bacteriophage biology and diversity; ASM Press 6/42

7 Phages and why they matter Phages have greatly impacted our understanding of biology! central dogma of molecular biology: DNA RNA proteins first organism to be sequenced phage MS2 (ssrna) in 1976 and phage φx174 (ssdna) in 1977 phage typing, phage display, CRISPR-Cas Figure adapted from 7/42

8 Phages and why they matter Félix d Hérelle Frederik Twort Phages were discovered by F. Twort (1915) and F. d Hérelle (1917) F. d Hérelle was the first to apply phages for therapeutic purposes G. Eliava founded the Eliava institute in Tsibilisi, Georgia in 1923 George Eliava 8/42

9 Phages and why they matter - Phage therapy antibiotics are easy to produce and store antibiotic resistances cause problems post antibiotic era (WHO 2014) phages are specific to certain bacteria safety concerns (integrases, virulence factors) difficult licensing in western countries complex and dose independent pharmacokinetics Eliava Institute 9/42

10 Challenges in phage identification+characterization small phage genome size contribute around 2-5% of total DNA in metagenomic sample few fully sequenced phage genomes in public databases (< 6000) little annotation in general (protein function, host etc.) * 10/42

11 MetaPhinder identifying phage sequences in metagenomic samples by database comparison 11/42

12 MetaPhinder- similar methods MetaPhinder s aim is only identifying phage contigs, therefore the method itself remains very simple. 12/42

13 MetaPhinder 13/42

14 MetaPhinder genomic rearrangement: mosaic genomes: 14/42

15 MetaPhinder ANI = average nucleotide identity N = number of hits id = blastn identity al = alignment length m cov = merged coverage 15/42

16 MetaPhinder Can we find a %ANI threshold to classify a contig as phage? Which method should be used for database comparison? 16/42

17 MetaPhinder Which method should be used for database comparison? AUC blastn KmerFinder tblastx length [kbp] 17/42

18 MetaPhinder Can we find a %ANI threshold to classify a contig as phage? true positive rate A AUC: false positive rate threshold = 1.7 %ANI rate B false positive rate true positive rate threshold [%ANI] 18/42

19 MetaPhinder 19/42

20 MetaPhinder Predicting prophage data sets: 20/42

21 MetaPhinder Practical experience on a data set of sewage samples from all over the world: %ANI threshold is too low developement of MetaPhinder version 2 21/42

22 MetaPhinder No threshold specification! no need to redefine threshold if database is updated contig selection left at discretion of user 22/42

23 MetaPhinder 23/42

24 MetaPhinder min. 10%ANI and %ANI > bacterial coverage 24/42

25 MetaPhinder MetaPhinder limitations: small size of phage database no discovery of completely new phages possible removal of prophage kmers from bacterial DB incomplete (due to incomplete annotation) What about prophages? MetaPhinder is not designed for prophage annotation use specialized software: PHASTER, VirSorter, PhiSpy etc. 25/42

26 Further characterization of sequences 26/42

27 VirulenceFinder Searches for virulence genes of Listeria, S. aureus, E. coli, Enterococcus using blastn. Webservice: 27/42

28 ResFinder ResFinder identifies acquired antimicrobial resistance genes. Webservice: 28/42

29 VirulenceFinder and ResFinder results 29/42

30 HostPhinder HostPhinder identifies the bacterial host of a query phage genome based on its genomic similarity to a database of phage genomes with known host. Webservice: Julia Villaroel (PhD student DTU) 30/42

31 HostPhinder kmer based comparison to database calculate coverage use scoring criterion where normalized coverages of database hits with the same host are summed correct predictions: genus 81% species 74% Webservice: 31/42

32 HostPhinder HostPhinder can only predict hosts that are part of the database! Webservice: 32/42

33 Phage Cocktail data sets phage solution for medical application consisting of several different phage species 33/42

34 Phage Cocktail data sets INTESTI cocktail: Henrike Zschach (PhD student DTU) active against E. coli, Enterococcus, Proteus, P. aeruginosa, Shigella, Salmonella, Staphylococcus in use since 1937 (regularly updated every 6 months) against intestinal infections analysis in 2015/2016 PYO cocktail: Julia Villaroel (PhD student DTU) active against Staphylococcus, Streptococcus, Proteus, E. coli, P. aeruginosa against skin or wound infections analysis in /42

35 Phage Cocktail data sets INTESTI PYO 35/42

36 Phage Cocktail data sets INTESTI PYO predicted hosts correspond well with advertised specificity no harmful genes discovered 36/42

37 Phage Cocktail data sets PYO cocktail: which DB phage is most similar to a given bin? reverse engineer MetaPhinder! 37/42

38 Conclusion MetaPhinder compares contigs to a phage database new version also compares sequences to a bacterial database flexibility - users can create their own database small amount of sequenced phages in public databases phage therapy provides an alternative to antibiotics, therefore a better understanding of phages is important 38/42

39 Acknowledgments Morten Nielsen (Professor DTU) Julia Villaroel (PhD student DTU) Henrike Zschach (PhD student DTU) Mette Voldby Larsen (CEO GoSeqIt) Ole Lund (Professor DTU) Frank Møller Aarestrup (Professor DTU) 39/42

40 e-value AUC blastn %ANI + e value 0.05 blastn %ANI + e value 1 blastn %ANI + e value 1e length [kbp] 40/42

41 41/42 KmerFinder vs. blastn KmerFinder q_cov blastn %ANI %ANI = 100(q_cov ( 1 16 ) ) %ANI = 100(q_cov)

42 42/42 Top hit ANI vs. ANI all density % ANI top hit %ANI all hits density