Linkage Disequilibrium Adele Crane & Angela Taravella
Overview Introduction to linkage disequilibrium (LD) Measuring LD Genetic & demographic factors shaping LD Model predictions and expected LD decay Patterns of LD in human populations GWAS and fine-scale mapping
What is Linkage Disequilibrium (LD)? Linkage disequilibrium: non-independence of alleles at different sites (Pritchard & Pzeworski, 2001) LD exists due to shared ancestry In absence of recombination, diversity arises through mutation Ardlie et al. (2002)
Example: Allele Frequencies Biallelic Loci Locus 1 (alleles A and a) and Locus 2 (alleles B and b) are studied for LD Allele A a B b Allele Frequency p A p a p B p b Gamete Frequency p AB p Ab p ab p ab
Example: Allele Frequencies Biallelic Loci Expected frequencies when loci are in linkage equilibrium (loci are independent): p AB = p A p B p Ab = p A p b p ab = p a p B p ab = p a p b How do we quantify the difference between expected and observed frequencies?
LD Measurements: D & D Linkage Disequilibrium Coefficient D: D = p AB - p A p B D = 0 in linkage equilibrium D 0 in linkage disequilibrium +/- sign for D depends on how alleles are labeled
LD Measurements: D & D Normalized coefficient D better measurement: D depends on allelic frequencies D is [D] over maximum possible values given allele frequencies D = 1 if alleles have not been separated by recombination during history of sample analyzed (complete linkage disequilibrium) D < 1 if LD is disrupted Weakness: D values can be inflated by small samples or low frequencies of minor alleles
LD Measurements: r 2 ( 2 ) r 2 = 1 if alleles have not been separated by recombination and have same allele frequency (perfect linkage disequilibrium) r 2 less inflated by small sample sizes
Comparison of D and r 2 + is D and is r 2 Simulated decay of D and r 2 as a function of genetic distance (cm) under a constant population size and random mating D and r 2 behave differently and high values of D may not be consistent with low values of r 2. More random variation in D values Pritchard & Pzeworski, 2001
LD parameter ρ r 2 can have inverse relationship with ρ = 4N e c N e : effective population size c : recombination rate (varies over time and across regions) ρ is a scaled recombination rate Expected r 2 : E(r 2 ) 1 / (1 + ρ) Large N e : E(r 2 ) 1 / ρ LD increases as ρ decreases Advantages: Can compare LD observed in studies using different marker spacing or types of data (SNP vs. microsatellite data) Provides estimate of recombination rate per generation
Genetic factors shaping LD Recombination Mutation Inversions Hotspots high recombination breaks down haplotype blocks low LD Comparisons of LD in different parts of the genome may not be informative unless local recombination rates are known Myers hotspot motif Introduces diversity into haplotype blocks (especially in non-recombining regions) Suppresses recombination Strong LD can develop Gene conversion Gene conversion can affect short scale LD LD may be broken up by gene conversion
Selection Shaping LD Hitchhiking effect Haplotype near a favored variant swept into high frequency or fixation Background selection Loss of diversity at neutral locus due to negative selection against linked deleterious alleles Epistatic selection Epistasis: interaction between genes (ex: suppression of phenotypic expression) Needs to be strong to maintain allelic association over long distance
Demographic Factors Shaping LD Inbreeding Inbreeding: mating between related individuals Decreased diversity levels can increase LD Minor effect in humans Bottlenecks Temporary reduction in population size can increase LD Long term bottlenecks can lead to sharp reduction in Ne and thus higher LD Populations outside of Africa have higher LD Admixture LD between unlinked sites seen at time of admixture LD increases over long range with recent admixture of populations with different allele frequencies rapid decay Breaks down with random mating
Demographic Models and Expected LD Decay r hat r hat Genetic distance (cm) Genetic distance (cm) Standard model Panmictic population of constant size (N e =10 4 ). Considerable variability is expected Kruglyak model Exponential population growth, from 10 4 to 5x10 9 Low LD between loci expected under this model because of large N e 1 island sample All individuals are drawn from the same sub population 2 island sample All individuals are drawn from both sub populations equally. Population structure tends to increase levels of LD Pritchard & Pzeworski, 2001
Demographic Models and Expected LD Decay Different growth models Neutral model (solid line) population growth leads to reduction in LD but the effect is not as great as with the Kruglyak model Kruglyak model (long dashed line) Expanding population get dramatic reduction in LD Can use LD decay to make inferences on human demographic history Pritchard & Pzeworski, 2001
Pattern of LD in Human Populations LD in global populations LD increases outside of Africa Bottlenecks LD in African Populations Southern African origin for modern humans LD decay averaged across populations within each of six geographic regions The highest correlation coefficient in blue indicates the best fit with a potential geographic origin
GWAS and LD Genome wide association study (GWAS) Testing cases and controls to determine potential variants associated with a disease trait Low r 2 will have little power to detect association at the marker locus Want marker locus linked to disease susceptibility mutation Need a marker density with high probability of strong LD between at least one marker locus and the disease susceptibility mutation Issues with finding causative SNPs (gene localization) Long range LD is problematic Human populations vary in LD and recombination
References Ardlie, K., Kruglyak, L., & Seielstad, M. Patterns of linkage disequilibrium in human genome. Nature Reviews Genetics 3 (2002): 299-209. doi:10.1038/nrg777 Henn, B. M., et al. "Hunter-gatherer genomic diversity suggests a southern African origin for modern humans." Proceedings of the National Academy of Sciences 108.13 (2011): 5154-5162. International HapMap Consortium. A haplotype map of the human genome. Nature 437 (2005): 1299-1320. doi:10.1038/nature04226 Jallow, M., et al. "Genome-wide and fine-resolution association analysis of malaria in West Africa." Nature genetics 41.6 (2009): 657-665. Jobling, M., Hurles, M., & Tyler-Smith, C. Human evolutionary genetics: origins, peoples & disease. Garland Science, 2013. Pritchard, Jonathan K., & Przeworski, M. "Linkage disequilibrium in humans: models and data." The American Journal of Human Genetics 69.1 (2001): 1-14.