Neutrality Test. Neutrality tests allow us to: Challenges in neutrality tests. differences. data. - Identify causes of species-specific phenotype

Size: px
Start display at page:

Download "Neutrality Test. Neutrality tests allow us to: Challenges in neutrality tests. differences. data. - Identify causes of species-specific phenotype"

Transcription

1 Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection sweep tests Positive selection is when a new advantageous trait is segregating in a population Selection sweep is when reduction of neutral allele diversity linked to a selected loci is fixed

2 Neutrality Test Neutrality tests allow us to: - Identify causes of species-specific phenotype differences - Identify regions currently under selection - Form hypotheses on function from genome data Challenges in neutrality tests - Extracting the data - Identifying the loci under selection

3 Neutrality Test Two main classes of neutrality test - Allelic distribution and/or level of variability - Comparisons of divergence/variability between different mutation classes within a locus The former relies on major assumptions on population demographics

4 Single-Locus Test Ewens Sampling Formula - Sampling probability under infinite allele model - Ewens-Watterson Test Compare the expected homozygosity with the observed homozygosity If larger than a threshold value, reject the null hypothesis

5 Single-Locus Test Tajima's D-Test - Nucleotide data - D = θ π -θ ω /S θπ -θ ω - D is the scaled difference in the estimate of θ=4ν e µ - Θ π is an estimator of θ based on average number of pairwise differences - Θ ω is an estimator of θ based on number of segregating sites - S θπ is an estimate of the standard error of the difference of the two estimates

6 Single-Locus Test D-Test - Difficulty in interpreting significant results - Useful for detecting bottlenecks and subdivision as well as selection sweeps

7 Multiple-Loci Test Lewontin-Krakauer test - Data from diallelic loci from multiple populations - F = σ p2 /[p(1-p)] - P and σ p 2 are the mean and variance of allele frequencies across populations - If F is too large, the neutral hypothesis is rejected

8 Multiple-Loci Test HKA Test - Variability between and within species is compared for two or more loci - Assumes that under neutrality Expected number of segregating sites within species and expected number of fixed differences between species are proportional to mutation rate Ratio of two expectations is constant among loci - Therefore if divergence:polymorphism ratio is too high, selection is at work

9 Multiple-Loci Test HKA Test - Challenge: Variance in segregating sites highly depends on demographics - Example: Immigration from unknown population M = Immigration rate CV = Standard deviation divided by mean in segregating sites number

10 Multiple-Loci Test Assumptions and Challenges - Selection will contrast target alleles/loci - Selection can be seen if significant difference in adherence to the neutral model between loci Challenges - Our expected value and variance of D depends heavily on the demographic model

11 Multiple-Loci Test Example

12 Comparing Variability in Different Classes of Mutations McDonald-Kreitman (MK) Type Tests Traditionally used to detect and measure the amount of adaptive evolution within a species by determining whether adaptive evolution has occurred, and the proportion of substitutions that resulted from positive selection. In general, the MK test compares the amount of species polymorphism and the divergence (substitutions) between species at neutral and nonneutral sites (advantageous or deleterious).

13 McDonald-Kreitman cont. Setting up a MK test Set up a two way contingency table show to the right Term clarification: Synonymous a point mutation causing a silent mutation (phenotypically normal) often used as a control Nonsynonymous mutation that causes a change in phenotype Synonymous Ds Nonsynonym ous Fixed Dn Polymorphic Ps Pn D s : the number of synonymous substitutions per gene D n : the number of nonsynonymous substitutions per gene P s : the number of synonymous polymorphisms per gene P n : the number of nonsynonymous polymorphisms per gene

14 McDonald-Kreitman cont. First used with drosophila in 1991 and the ADH gene. The test proposed a method to estimate the proportion of substitutions that are fixed by positive selection rather than by genetic drift. The ratio of ns. to s. variation within a species is going to equal the ratio of ns. to s. variation between species: D n /D s = P n /P s

15 McDonald-Kreitman cont. When positive or negative selection influences ns. variation, the ratios will no longer be equal. The ratio of ns. to s. between species is lower than the ratio of ns. to s. within species when negative selection is high and deleterious alleles strongly affect polymorphism: D n /D s < P n /P s The ratio of ns. to s. within species is lower than the ratio of ns. to s. between species when positive selection is high. D n /D s > P n /P s These do not necessarily contribute to polymorphism but have an effect on divergence.

16 McDonald-Kreitman cont. Possible shortcoming of the MK type tests: It s not always clear what type of selection is acting upon a gene Ex changes in pop size combined with weak selection against slightly deleterious mutation may either increase or decrease the number of ns polymorphisms An increase in pop size will lead to excessive ns polymorphisms Significant results from MK cannot be interpreted directly as evidence for positive selection

17 The Genomic Rate of Adaptive Evolution Smith and Eyre-Walker Additional work with MK tests by Smith and Eyre- Walker: α = 1 (D s P n )/(D n P s ) In the above equation, α = proportion of substitutions driven by positive selection. See research handouts.

18 Test Based on Allelic Distribution in ns and s Sites Some tests are done by examining different types of sites (nonprotein coding sites) Differences I the allelic distributions (frequency spectra) between s and ns polymorphisms. Used for genomic sets in which large number of polymorphisms can be obtained. Microsat data? Nielsen and Weinreich performed frequency spectra analysis in the human genome. (1999) Differences in the average age of ns and s mutations provided evidence for selection.

19 Tests Based on the dn/d s Ratio or ω The most direct method for showing the presence of positive selection is to demonstrate that the number of ns substitutions per ns sites (d N ) is much larger than the number of s substitutions per s sites (d S )

20 Definitions The dn d N (alternatively designated K a ) is a measure of the degree to which two homologous coding sequences differ with respect to amino-acid content. Specifically, it indicates the degree to which two sequences differ at ns sites (substitution that changes the aa). d N is the average number of nucleotide differences between the sequences per ns site.

21 More Definitions The d S ds (alternatively designated K s ) is a measure of the degree to which two homologous coding sequences differ with respect to silent nucleotide substitutions (substitutions that do not cause an amino-acid substitution). It indicates the degree to which two sequences differ at s sites (substitution that does not change the aa). d S is the average number of nucleotide differences between sequences per synonymous site.

22 Tests Based on the dn/d s Ratio or ω A value of d N > d S implies that ns mutations are fixed with a higher P than neutral ones due to positive selection. If testing dn < d S (ω 1)for an entire gene is a very conservative test of neutrality. Purifying selection must occur frequently in functional genes to preserve function. Therefore, the average dn is expected to be much less than the average d S, even if positive selection is occurring in some sites.

23 Differences in MK and H 0 :ω 1 ω 1 is to date the only direct method available to provide data for detecting positive selection. ω>1 is to date the only direct method available for detecting positive selection from DNA sequence data. Limitations: they assume no recombination and the effect of strong codon bias on these methods have not been systematically explored (2001). **Have the above limitations been investigated yet?