Scoary Pan-genome-wide associationstudies in bacteria

Size: px
Start display at page:

Download "Scoary Pan-genome-wide associationstudies in bacteria"

Transcription

1 Scoary Pan-genome-wide associationstudies in bacteria Ola Brynildsrud Researcher, bioinformatician Dep. of infectious disease epidemiology and modelling.

2 Manhattan plot SNP variants vs retinal vasoconstriction Association Genome position Ikramet al., 2010 GWAS = Genome-wide association studies Strongly linked to human genomics. Very limited studies in bacteria

3 Whatis GWAS? GWAS Pan-GWAS

4 Microbial GWAS applications AMR Virulence mechanisms Insight into many kinds of biological process

5 BacteriavsEukaryote Feature Bacteria Eukaryote Ploidy Haploid Diploid Genetic re-assortment Infrequent short gene conversion and horizontal gene transfer events Homologous recombination and chromosome segregation linked to reproduction Accessory (non-core) genes Variable numbers in different species Rare Linkage disequilibrium Variable across the genome and between species Variable across the genome Population structure Asexual, generally highly structured, except for relatively rare homologous recombination events Sexual, variable allele frequencies in subpopulations owing to non-random mating, ancestral divergence, drift Confounders in genome-wide association studies How to move from association to causality Population structure Genetic reconstruction of mutations in laboratory strains, transposon mutant screens Population structure Forward genetics in animal models or cultured tissue systems; linkage to known genetic diseases; large monogenic association studies Current burden of proof for causality Molecular Koch s Postulates Combined genetic and experimental evidence

6 Possibleassociationvariants SNPs Gene presence/absence (plasmids etc) Indels Kmers

7 Antibioticresistance Susceptible Resistant

8 The optimal situation Susceptible Y Resistant Susceptible and resistant populations identical at genomic level except for variant Y -> Variant Y causes resistance Rarely occurs in practice

9 Null hypothesis For eachvariant Variant Trait Alternative hypothesis Variant Trait

10 Populationstructure Resistance emerges Variant Y introduced Many other variants introduced Susceptible Resistant

11 NB: Populationstructure Utopia Decent Uninformative «p = 0.38» PROBLEMS: Y is Multifactorial(result of multiple variants) Y is not dichotomous(e.g. MIC values) Measurement/classification error

12 NB: Populationstructure Goal: Incorporate population structure into analysis Measure variant Y impact on resistancedevoidofpop structure bias.

13 Methods ofpop structcorrection Many options, no solutions. Sometests comewitha lot ofassumptions(e.g. knownbranchlengths, evolutionaryrates, accurate reconstruction of states at nodes) Tests varyingly assign significance to «Darwin s scenario» «Correlation does not equal causality» Wearguefor a test thatsacrificespowerin order to make as few assumptions as possible

14

15

16 Variant file (Pan-genome from Roary, SNP file etc.) Traits file (Have to make this yourself ) INPUT Computationally cheap Computationally expensive Collapse correlated variants Populationnaïve filtration Populationaware filtration Permutation- Empirical p Genes OUTPUT Results file Significant gene associated statistics

17 Implementation Python (command line script) GUI wrapper Sorry- The reviewers forced us

18 Doesit work? Testedondatasetofjust 21 strainsof Staphylococcus intermedius Identifiedthecfrgene knownto be associated with LZD resistance

19 Doesit work? 3,085 Streptococcus pneumoniae Resistance data towards beta-lactams Multiple gene systems identified as important (gpsb, yoqj, pbp, cbpc etc.) Subset analysis: 100 isolates enough to identify common variants with high power Rare variants (e.g. genes in veryfew/nearlyall strains) required higher sample sizes

20 Availability: Free!

21 Installation Easily installed by pip (Python package manager) pip install scoary

22 Thanks Vegard Eldholm Jon Bohlin Lonneke Scheffer Marco Galardini Anders Goncalves de Silva Inês Mendes Eric Deveaud Jukka Corander