MSc in Genetics. Population Genomics of model species. Antonio Barbadilla. Course

Size: px
Start display at page:

Download "MSc in Genetics. Population Genomics of model species. Antonio Barbadilla. Course"

Transcription

1 Group Genomics, Bioinformatics & Evolution Institut Biotecnologia I Biomedicina Departament de Genètica i Microbiologia UAB 1 Course

2 Outline Cataloguing nucleotide variation at the genome scale Population genomics in Drosophila melanogaster Visualization of nucleotide variation Readings & exercices 2

3 3

4 Mapping the footprint of natural selection throughout the genome of Drosophila melanogaster 4

5 To date, most population genetics studies have been based on fragmentary and non-random samples of the genome, providing a partial view, often biased, of the population genetic processes

6 Data desideratum in this golden age of the study of genetic variation A big sample of complete genome sequences from a single natural population of a model organism and complete sequences from out-group species Description and explanation of patterns of nucleotide variation at a large scale

7 Internacional Project (US National Human Genome Research Institute): Sequencing of a genetic reference panel of 192 wild lines of D. melanogaster Drosophila Genetic Reference Panel (DGRP): A Community Resource for the Study of Genotypic and Phenotypic Variation in model organism Trudy F. C. Mackay

8 The Drosophila Genetic Reference Panel DGRP Sequencing of a D. melanogaster genetic reference panel of 192 wild-type inbred lines sampled from a single natural population (Raleigh, North Carolina, USA)

9

10 The Drosophila Genetic Reference Panel DGRP 192 lines which extensive information on complex trait phenotypes has been collected

11 Genotypic and Phenotypic Space

12 DGRP Sequencing strategies Long reads (~500 bp) Primary error: homopolymer length determination Short reads (~45 bp), Paired-end. Primary error: substit. at the ends of the reads De-novo assembly Mapping reads to the Drosophila reference genome (Mosaik) Identify large polymorphic features (>20 bp) Identify small polymorphic features (<10 20 bp)

13 DGRP Main Goals 1. Create a community resource for the association mapping of quantitative trait loci (QTLs) 2. Create a community resource containing the common sequence polymorphisms of Drosophila melanogaster (SNPs and structural variation) 3. Create a test bench for statistical methods used in QTL association mapping studies for traits affecting human diseases

14 DGRP Our Task 1. Create and maintain a Genome Browser containing the high-resolution sequence polymorphism map of D. melanogaster 2. Genome-wide molecular population genetic analyses

15 Questions that can be answered with a genome-wide perspective Which pattern or gradient follows genetic variation along the chromosomes? X

16 Raw data: 158 MSc genomes in Genetics D. melanogaster DGRP project Module (Freeze 1). Genomics & Proteomics

17 Raw data: Outgruoup species

18

19 SNPs #SNPs (all) L 2R 3L 3R X #SNPs (nosingletons) #SNPs (all) #SNPs (nosingletons) % no-singletons % singletons 2L ,37% 26,63% 2R ,39% 27,61% 3L ,54% 29,46% 3R ,60% 29,40% X ,93% 35,07% TOTAL ,71% 29,29%

20 Data analysis approaches Sliding windows along chromosome arms (ranging from 50pb to 100kb) Specific and general patterns that follows genetic diversity (polymorphism and divergence) along the chromosome arms Correlating them to other variables such as recombination rate, gene density, linkage disequilibrium, or structural regions Coding genes centred approach: Site (functional) classes of coding gene: coding (synonymous, non-synonymous) and noncoding (5 and 3 UTR, intron, 5 and 3 intergenic). Nucleotide variation Evidence of natural selection acting on each site class.

21 Data visualization: The Population Drosophila Browser (PopDrowser) Ràmia M, Librado P, Casillas S, Rozas, J & Barbadilla A PopDrowser: The Population Drosophila Browser. Bioinformatics

22

23 Polymorphism and divergence

24 Patterns polymorphism and divergence along chromosome arms PopDrowser -> 2L 2R 3L 3R X Tel Cen Cent Tel Cen Cen Cen Tel Tel Cen

25 Patterns polymorphism and divergence along chromosome arms centromeric vs. non-centromeric regions within autosome arms autosomes vs. X chromosome PopDrowser ->

26 Polymorphism / divergence and recombination centromeric vs. non-centromeric regions within autosome arms autosomes vs. X chromosome PopDrowser -> and the rate of recombination Site class Spearman s ρ Prob rec < 2 cm/mb p < 2.2 e-16 rec 2 cm/mb p = Recombination data: Fiston-Lavier, et al. Drosophila melanogaster recombination rate calculator Gene. doi: /j.gene

27 Anova of the regression model to test the effect of different genomic variables on nucleotide variation. Dependent variables: recombination rate, genomic region (centromere or noncentromere), 4-fold divergence to D. yakuba and gene density Independent variable is π 4fold Autosomes Recombination rate Sum of squares df F-value p-value < 2.2e-16 *** Region < 2.2e-16 *** Divergence < 2.2e-16 *** Gene density ** Residuals X Recombination rate e-10*** Divergence * Region Gene density Residuals

28

29 Dynamics of mutations 1 Allele frequency 0 Time

30 McDonald-Kreitman test (MKT) Ratio divergence = ω = Ratio polymorphism = k i k neut i neut Neutral expectance = 4Nμ k = μ i neut = k i k neut Only neutral fixation

31 Extended McDonald-Kreitman test (ext MKT) i neut = i neut < i neut > k i k neut k i k neut k i k neut Only neutral fixation + Adaptive fixation + Weakly negative selection i > k i = neut < k neut + Adaptive fixation +Weakly negative selection

32 Estimated selection regimes 11% 12% 4% 73% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% d (strongly deleterious) b (weakly deleterious) f-γ (old neutral) γ (new neutral)

33 % d (strongly deleterious) b (weakly deleterious) f-γ (old neutral) γ (new neutral) Selection genome-wide

34 Adaptive selection α calculated aggregating for windows along chromosome arms on nonsynonymous sites (0fold) α = 1 neutral region α > 1 adaptive selection α < 1 negative selection α = 1 3L 2L 2R 3L 3R X Tel Cen Cent Tel Tel Cen Cen Cen Tel Tel Cen α = 1- [(π 0-fold /π 4-fold ) / (k 4-fold /k 0-fold )]

35 Nucleotide diversity Effect of local recombination on genetic variation Begun, D. J. Y C. F. Aquadro Levels of naturally ocurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature 356: genes of Drosophila melanogaster Exhange coefficient

36 Adaptive propensity and recombination Direction of Selection DoS. Stoletzki, N. & Eyre-Walker, Mol. Biol. Evol. 28, (2011)

37 Double cost of sex (Maynard-Smith 1978) Recombination and Hill- Robertson effect (1966) Advantage of sex (Crow & Kimura 1965)

38 Conclusions The 158 genomes have allowed the unprecedented opportunity to perform the most comprehensive nucleotide variation study done so far (1) The genome patterns of polymorphism and divergence differs manifestly (i) centromeric vs. non-centromeric regions within autosome arms (ii) autosomes vs. X chromosome (2) Natural selection is pervasive along the D. melanogaster genome, and the relative importance of different selection regimes depends on both the site classes and the genome regions considered (3) There exists a threshold value of recombination rate which defines the efficiency of selection for any genome region (4) All evidence together supports the fast X-hypothesis

39 Bioinformatics of Genetic Diversity Genomics, Bioinformatics and Evolution Research group

40 Alfredo Ruiz Collaborators Genomics, Bioinformatics and Evolution group Universitat Autònoma de Barcelona Julio Rozas Pablo Librado Universitat de Barcelona Trudy Mackay, Department of Genetics, N.C. State University Casey M. Bergman University of Manchester Esther Betrán University of Texas

41 Exercices Exercise 3: Consider that adaptive and weakly deleterious selection are acting in a DNA sequence. If you want to perform a MKT, search for a statistical approach to take into account the weakly negative selection to detect adaptive selection. i > k i = neut < k neut + Adaptive fixation +Weakly negative selection 41