Introduction to Population Genetics. Spezielle Statistik in der Biomedizin WS 2014/15

Size: px
Start display at page:

Download "Introduction to Population Genetics. Spezielle Statistik in der Biomedizin WS 2014/15"

Transcription

1 Introduction to Population Genetics Spezielle Statistik in der Biomedizin WS 2014/15

2 What is population genetics? Describes the genetic structure and variation of populations. Causes Maintenance Changes Theorizes on the evolutionary forces acting on populations. Drift Mutation Selection Migration

3 Phenotypic and Genotypic Variation Phenotypic Variation Mendel 1866: Paper of heredity; discrete phenotypic variation. Galton : Study of human hereditary differences; continuous variation (e.g., height). Genotypic Variation Classical Hypothesis Variation due to mutation ( ) vs. negative selection ( ). Balance Hypothesis Selection maintains variation because it favors heterozygots or rare genotypes. Neutral Theory Maybe much variation has only very little effect?

4 Definitions Locus Place on a chromosome where an allele resides. Allele The bit of DNA at that place. The same allele can have different DNA sequences. Site A specific, unique position of a single nucleotide in a genome. Segregating (Polymorphic) Site A site with different nucleotides in independently sampled alleles (e.g., Single Nucleotide Polymorphism; SNP). Silent (Synonymous) Polymorphism Alternative codons code for the same amino acid. Replacement (Non-Synonymous) Polymorphism A nucleotide polymorphism that causes an amino acid polymorphism.

5 Example: DNA Variation in Drosophila The alcohol dehydrogenase locus (ADH) locus in D. melanogaster has the typical exon-intron structure of eukaryotic genes. Kreitman & Gillespie (1983) analyzed 11 alleles from Florida, Washington, Africa, Japan and France. Exon Coding region. Intron Non-coding region.

6 DNA for Coding Region at ADH locus

7 ADH Alleles for Different Dmel Populations

8 The Great Obsession of Population Genetics What evolutionary forces could have led to such divergence between individuals within the same species? Why do silent polymorphisms preponderate over replacement polymorphisms? Table: Summary of polymorphic sites within melanogaster and fixed differences between melanogaster and erecta.

9 McDonald-Kreitman Test Tests, if both silent and replacement variation are neutral; follows standard procedure for 2 2 contingency tables.

10 Segregating Sites and Nucleotide Mismatches There are different ways to quantify the amount of nucleotide mismatches in sequence data. Segregating sites S = Number of nucleotide sites that differ among the aligned sequences Average number of nucleotide mismatches = Total number of nucleotide mismatches Total number of pairwise comparisons

11 The Parameter θ The parameter θ = 4Nu (N is the population size, u is the mutation coefficient) determines the level of variation under the neutral model. The estimates of both S and can be related to θ Testing for equality between these two estimates is one of many ways to detect departures from neutrality.

12 Genotype Frequency and Allele Frequency To introduce the notion of genotype and allele frequencies, we will not refer to a particular sample, but rather to one locus that has two alleles A and a, segregating in the population. Genotype frequency Genotype AA Aa aa Relative frequency x 11 x 12 x 22 The frequency of the A allele in the population is p = x 11 + ½ x 12

13 Hardy-Weinberg Law Relates the allele frequencies to the genotype frequencies at an autosomal locus in an equilibrium randomly mating population. More assumptions: nonoverlapping generations sexual reproduction diploid diallelic P(male) = P(female) infinite population size no selection no mutation no migration

14 Hardy-Weinberg Law The Hardy-Weinberg genotype frequencies will remain unchanged in all generations after the first. This is a statement we can test against (null model)!

15 Problem There are two islands of the same size, one inhabited with people with 5 fingers, and one with people with 6 fingers. The islands, through some geological cataclysm, crash into one another. The first generation hybrids between the inhabitants of the two islands have 6 fingers. How will the frequency of the six-finger allele change over time? (Thanks to Andrea Betancourt for the problem set)

16 Problem AA AA AA AA AA f(a) = 1 A A A A a a a a A a A a f(a) = p f(a) = 1-p aa aa aa aa aa f(a) = 1

17 R exercise >source("~desktop\exercises.r") > run_simulation() Population size? 10 Initial frequency of red? 0.5 (Hit q to quit)

18 Genetic Drift In finite populations, random changes in allele frequencies result from variation in the number of offspring between individuals. Genetic drift causes random changes in allele frequencies. Hence, alleles can be lost from the population (genetic variation is removed). The direction of the random changes is neutral. Wright-Fisher model; binomial sampling

19 Genetic Drift > run_many_simulations() Population size? 100 Initial frequency of red? 0.5 How many runs to simulate? 100 What affect does changing the population size have? Changing the initial frequency?

20 Genetic Drift What is the expected change in frequency under this model? Variance?

21 Genetic Drift

22 Genetic drift (Variance effective size of Wright 1931)

23 Genetic Drift Usually, N e < N - Separate sexes - Variance in offspring number - Bottlenecks in population size Wright 1931; Kimura 1983

24 Selection Individuals with different genotypes may leave different number of offspring (on average). Given the fitness schema below, what is the expected change in frequency for the A allele? AA Aa aa frequency p 2 2p(1-p) (1-p) 2 relative number of offspring w 11 w 12 w 22 frequency in the offspring p 2 w 11 w 2p(1 p)w 12 w (1 p) 2 w 22 w

25 Selection

26 Selection Individuals with different genotypes may leave different number of offspring (on average). AA Aa aa frequency p 2 2p(1-p) (1-p) 2 relative number of offspring w 11 w 12 w 22 1+s 1+hs 1

27 Selection

28 Selection Selection as a Wright-Fisher process: AA Aa aa frequency p 2 2p(1-p) (1-p) 2 relative number of offspring 1+2s 1+s 1

29 Selection > run_many_simulations(s=0.1) Population size? 100 Initial frequency of red? 0.2 How many runs to simulate? 100 What affect does changing N have? try 10, 100, 1000 with s=0.01 and p=0.1 Changing the initial frequency? try 0.01,0.1, 0.5 with N =100, s=0.01

30 Selection N = 10 N = 100

31 Selection p 0 = 0.01 p 0 = 0.1 p 0 = 0.5

32 Divergence between populations

33 Divergence between populations Run simulations with genetic drift alone What affect does changing N have? try, e.g, 10, 100, Note that the scale will change on the x-axis. To plot in a new window, type quartz() Changing the initial frequency? try, e.g, 0.01, 0.1, 0.5.

34 Divergence between populations N e =10 N e =100 N e =1000

35 Divergence between populations p 0 = 0.01 p 0 = 0.1 p 0 = 0.5

36 Divergence between populations Run simulations with selection Two scenarios: 1) selection in opposite directions run_many_simulations(s=c(-0.1, 0.1)) 2) selected vs. control populaiton run_many_simulations(s=c(0.1, 0))

37 Divergence between populations selected vs. control selected in opposite directions

38 Divergence between populations selected in opposite directions s = +/-0.01, N=100, p0=0.1

39 Selection vs. Drift In a population of size 200, an allele changes frequency from 0.2 to 0.22 in a single generation. - What is the probability of this occurring if the allele is neutral? - What if the allele is favorable, with a heterozygous s of 0.1?

40 Selection vs. Drift Neutral: Binomial probability with 2N = 400, and k = (0.22*400) = 88, p = 0.2. > dbinom(x=88, size=400, p = 0.2) [1] Selected for, s= 0.1 Binomial probability with 2N = 400, k = 88, and p =p+(p*(1-p)*s)/mean(w) = > dbinom(x=88, size=400, p = 0.215) [1]

41 Selection vs. Drift Suppose in the same population, the allele changes from 0.22 to 0.25 (k= 100). What is the probability of the whole sequence for both models? - neutral? - s = 0.1?

42 Selection vs. Drift Suppose in the same population, the allele changes from 0.22 to 0.25 (k= 100). What is the probability of the whole sequence for both models? - neutral? >dbinom(x=88, size=400, p = 0.2)* dbinom(x=100, size=400, p = 0.22) [1] s = 0.1? (p = 0.236) >dbinom(x=88, size=400, p = 0.215)* dbinom(x=100, size=400, p = 0.236) [1]

43 Summary Population genetics is concerned with the genetic basis of evolution. In a population geneticists world, evolution is the change in the frequencies of genotypes through time. Probabilistic models of evolution are constructed and checked whether they are compatible with real data.

44 Outlook Some further topics not (yet) discussed here Demography inference Migration Recombination Epistasis...