Natural Selection Advanced Topics in Computa8onal Genomics

Size: px
Start display at page:

Download "Natural Selection Advanced Topics in Computa8onal Genomics"

Transcription

1 Natural Selection Advanced Topics in Computa8onal Genomics

2 Natural Selection Compara8ve studies across species O=en focus on protein- coding regions Genes under selec8ve pressure Immune- related func8ons Spermatogenesis Olfac8on and sensory percep8on Studies within species Based on polymorphic loci in genomes

3 Divergence and Selection Gene8c dri=: the stochas8c change in popula8on frequency of a muta8on due to the sampling process that is inherent in reproduc8on. Selec8ve sweep: The process by which new favourable muta8ons become fixed so quickly that physically linked alleles also become either fixed or lost depending on the phase of the linkage Fixa8on: the state where a muta8on has achieved a frequency of 100% in a natural popula8on Hitchhiking of neighboring polymorphisms

4 Complete vs Incomplete Sweep Complete sweep: the favored allele reaches a fixa8on Local varia8on is removed except the ones that arise by muta8on and recombina8on during the selec8ve sweep Incomplete sweep: posi8vely selected alleles are currently on the rise, but have not reached a frequency of 100%

5 Selective Sweep

6 Different Types of Natural Selection Direc8onal selec8on Posi8ve selec8on Reduces gene8c diversity The small region around the selected varia8on is over- represented Hitch- hiking Nega8ve selec8on (purifying selec8on) Can lead to background selec8on, where linked varia8ons are also removed Reduces gene8c diversity, but no skewing in allele frequencies

7 Different Types of Natural Selection Balancing selec8on: a selec8on for the maintenance of two or more alleles at a single locus in a popula8on Heterozygote advantage over homozygotes G6PD muta8on G6PD enzyme deficiency and sickle- cell anemia in homozygote state Par8al protec8on against malaria in the heterozygous state CFTR muta8on Causes cys8c fibrosis in the homozygous state Protects against asthma in the heterozygous state Disrup8ve (diversifying) selec8on: extreme values of the traits are favored than intermediate values

8 Time Scales for the Signatures of Selection

9 1. Proportion of Functional Changes Non- synonymous muta8ons are likely to induce func8onal changes O=en deleterious and Less likely to become common or reach fixa8on Posi8ve selec8on for occasional beneficial muta8on

10 McDonald-Kreitman Test Given sequences for a candidate gene for two species Count synonymous/nonsynonymous subs8tu8ons Count divergent/polymorphic loci Divergence: monomorphic sites that differ in the species Polymorphism: loci that are polymorphic in one or both species Divergence Plymorphism Synonymous D s P s Nonsynonymous (Replacement) D n P n If synonymous sites are at most weakly selected, the excess of divergent nonsynonymous subs8tu8ons must be the result of posi8ve selec8on Under the null hypothesis of no selec8on, D n /P n =D s /P s Perform Χ 2 test

11 McDonald-Kreitman Test

12 McDonald-Kreitman Test Comparison of sequences between species. Focuses on posi8ve selec8on of beneficial muta8ons. Limita8on: mul8ple muta8ons are required to be detected as posi8vely selected.

13 PRM1 Exon 2

14 2. Reduction in Genetic Diversity Hitch- hiking of neutral muta8ons near beneficial muta8ons Selec8ve sweep lowers the gene8c diversity Subsequent muta8ons restores gene8c diversity Rare alleles at low frequency ~ 1 million years for a rare allele to dri= to a high frequency allele

15 Low Diversity and Many Rare Alleles

16 Low Diversity and Many Rare Alleles Speed of posi8ve selec8on Rapid selec8on larger selected region easier detec8on but harder to dis8nguish between hitchhiking and beneficial varia8ons Recombina8on rate Lower recombina8on rate leads to a larger selected region

17 Low Diversity and Many Rare Alleles

18 Tajima s D Statistic Segrega8ng sites (S): the nucleo8de sites that are polymorphic in a sample of individuals Nucleo8de mismatches (π): the nucleo8de sites in the sample that differ between individual pairs of sequences

19 Tajima s D Statistic Under neutral evolu8on E(S) = θ Test if S/a=π where n 1 i=1 1 i a = n 1 i=1 1 i E(π) = θ If the frequency of the polymorphic variants are unequal (skewed allele frequencies with rare alleles), π - S/a will be nega8ve

20 3. High-frequency Derived Alleles Derived vs ancestral alleles Use alleles in closely related species for dis8nc8on Derived alleles typically have lower frequency, but under selec8ve pressure, the frequency rises Many high- frequency derived alleles as a signature for posi8ve selec8on

21 High-frequency Derived Alleles Duffy red cell an8gen in Africans: selec8on for resistance to P. vivax malaria (red: derived, grey: ancestral)

22 Rare vs Derived Alleles Popula8on expansion is a confounder for rare- allele test, but not for derived alleles High- frequency derived alleles rapidly dri= to near fixa8on, thus persists for a shorter period 8me.

23 4. Population Differences Geographically separate popula8ons and different selec8ve pressure on different popula8ons. Rela8vely large differences in allele frequencies between popula8ons as a signature for posi8ve selec8on. Useful only for posi8ve selec8on a=er popula8on differen8a8on (e.g., human migra8on out of Africa 50,000-75,000 years ago) F ST can be used as test sta8s8c

24 Extreme Population Differences FY*O allele for resistance to P. vivax malaria

25 5. Long Haplotypes Alleles under recent posi8ve selec8on, thus liale break- downs by recombina8on long haplotypes with high- frequency alleles This signature persists for a short period of 8me before recombina8on breaks down the chromosome.

26 Long Haplotypes LCT allele for lactase persistence (high frequency ~77% in European popula8ons but long haplotypes)

27 Determining Function of the Candidate Loci What is the associated trait that is being selected, given the candidate locus for selec8on? Examine the annota8ons of the genes Look for associa8ons to phenotypes in a popula8on Use compara8ve genomics e.g., SLC24A5 gene posi8vely selected in human popula8on. Zebrafish homolog of this gene determines pigmenta8on phenotype. It is harder to iden8fy func8on for varia8ons that reached fixa8on in human popula8on e.g., speech- related genes

28 Determining Function of Positively Selected CD36 Haplotype

29 Natural Selection and Diseases Posi8ve selec8on Response to pathogens, environmental condi8ons, diet. Beneficial muta8ons for infec8ous diseases can be selected rapidly Nega8ve selec8on Deleterious muta8on with only small or moderate effects can reach a low- level frequency with rela8ve low selec8ve pressure

30 Difficulties in Detecting Natural Selection Confounding effects of demography Popula8on boaleneck and expansion can leave signatures that look like a posi8ve selec8on Ascertainment bias for SNPs Regions where many sequences were used for ascertainment may appear to have more segrega8ng alleles at low frequencies with more haplotypes. Recombina8on rate Strong signature for selec8on for regions with low recombina8on rates

31 Neandertals and Modern Humans

32 Selective Sweeps in Modern Human Genomes Compared to Neandertals