A brief introduction to population genetics

Size: px
Start display at page:

Download "A brief introduction to population genetics"

Transcription

1 A brief introduction to population genetics

2 Population genetics Definition studies distributions & changes of allele frequencies in populations over time effects considered: natural selection, genetic drift, mutation and gene flow recombination, population subdivision and population structure allows inferring past events as well as predicting future History fundamental work by Haldane, Wright and Fisher on first half of 20th century recent development: coalescent theory by Kingman in 1980 s suitable for SNPs data computationally highly efficient

3 Population genetics basics: Allele Allele one of alternative forms of a gene or same genetic locus used to be visible gene product (e.g. blond vs. red hair) now typically a SNP (e.g. rs (c) vs. rs (t)) rs = C rs = T

4 Population genetics basics: Allele Allele one of alternative forms of a gene or same genetic locus used to be visible gene product (e.g. blond vs. red hair) now typically a SNP (e.g. rs (c) vs. rs (t))

5 Population genetics basics: Allele Allele one of alternative forms of a gene or same genetic locus used to be visible gene product (e.g. blond vs. red hair) now typically a SNP (e.g. rs (c) vs. rs (t))

6 Population genetics basics: Allele Allele one of alternative forms of a gene or same genetic locus used to be visible gene product (e.g. blond vs. red hair) now typically a SNP (e.g. rs (c) vs. rs (t)) Alleles in a genetic locus do not need to be functional in many studies we are interested in neutral variation rs associated e.g. with Skin sensitivity to sun, Hair color, Nonmelanoma skin cancer, Freckles Genome provides millions of variable loci, majority of those neutral Inferring presence of function for a locus/allele is of special interest

7 Population genetics basics: Allele Population model Theoretical models assume a simplified population model Most commonly used model is WrightFisher model. It assumes: haploid population no sex constant population size WrightFisher model (WFM) can be generalised: diploid population panmictic, random mating variable population size WFM gives a good approximation for more complex populations

8 Population genetics basics: WrightFisher model WrightFisher model Evolution of an idealised population: generation 1 1

9 Population genetics basics: WrightFisher model WrightFisher model Evolution of an idealised population: generation 2 1 2

10 Population genetics basics: WrightFisher model WrightFisher model Evolution of an idealised population: generation 3 2 3

11 Population genetics basics: WrightFisher model WrightFisher model Evolution of an idealised population: generation

12 Population genetics basics: WrightFisher model WrightFisher model Evolution of an idealised population: generation 10 10

13 Population genetics basics: Ne Population size One central parameter in population genetics is population size Abbreviated as N Population size defines how quickly variation is lost (forwards) how much frequencies change per generation (now) how quickly sample coalesces to MRCA (backwards) Population size is measured in units of WFM population known as effective population size, Ne can be very different from census population size some violations of WFM can be corrected for

14 Population genetics basics: Ne and Drift loss of variation change of allele frequencies Known as genetic drift

15 Population genetics basics: Ne and Drift loss of variation change of allele frequencies Known as genetic drift

16 Population genetics basics: Ne and Drift Genetic drift At every locus, variation is eventually lost and one allele becomes fixed in nonneutral loci, selection affects chances of fixation variation is lost much more rapidly in small populations in small populations genetic drift prevails selection and even harmful alleles may get fixed Variation once lost is lost forever population bottleneck reduces variation and population recovery cannot bring it back new variation is created by mutations

17 Population genetics basics: Coalescent N = 20 N = 100 MRCA Coalescence time small populations coalesce faster, more recent MRCA conversely: Ne can be defined by coalescence time

18 Coalescence of two lineages Probability of coalescence on generation r before present follows geometric distribution: [1 1/(2N)]r 1 [1/(2N)] its mean is 1/p or 1/[1/2N)] or 2N for 2N=20, expected time for two random lineages to coalesce is 40 generations For large N, coalescence process follows exponential distribution

19 Coalescence of many lineages For n lineages, coalescence rate is [n(n 1)/2][1/(2N)] for 2N = 20, rate and expected time to next event are: lineages coalescent rate generations total 32 when n is large, coalescent events happen quickly last event (n = 2) is expected to take at least half of total time

20 Coalescence of many lineages Expected coalescence times have large variance, tree shapes differ expected genetic diversity is affected by tree structure

21 Coalescence and Site frequency spectrum Vertical branches are evolutionary time, mutations random tree shapes have expected distribution of allele frequencies for many loci this is called site frequency spectrum with outgroup, ancestral and derived allele can be inferred Bamshad and Wooding, NRG, 2003

22 Site frequency spectrum, DAF and MAF SFS: also known as allele frequency spectrum if ancestral sate known, derived allele frequency (DAF) if not, minor allele frequency (MAF) or folded SFS One of the most widely used summary statistics at neutral sites, reflects population history at nonneutral sites, reflects selection pressure

23 Coalescence with nonconstant population size Increasing population size few coalescent events in large phase Decreasing population size many coalescent events in small phase

24 Coalescence with nonconstant population size Population increase and decrease affect the expected tree shape mutations are random so tree shape affects expected SFS Nielsen and Slatkin, 2013

25 Coalescence and Site frequency spectrum Bamshad and Wooding, NRG, 2003

26 Site frequency spectrum and Demography Evidence for bottleneck in human EUR and ASN populations quick coalescent & deep branches deficit of low frequencies Keinan et al, NG, 2007

27 Site frequency spectrum and Selection negative positive negative selection syn sites_ 3 UTR_ cons 3 UTR_ nonsyn sites_ cons mirna_ Selection affects fitness and thus AF Increase in low frequencies alleles due to negative selection Chen and Rajewsky, NG, 2006

28 Derived allele frequencies and annotation liftover

29 Derived allele frequencies We will look at DAF of 1. different populations 2. different genomic regions The latter requires annotation that we liftover from threespined