Office Hours. We will try to find a time

Size: px
Start display at page:

Download "Office Hours. We will try to find a time"

Transcription

1 Office Hours We will try to find a time If you haven t done so yet, please mark times when you are available at: Thanks!

2 Hardy Weinberg Equilibrium Biostatistics 666

3 A model genomewide association study

4 A first-pass genomewide association study

5 Before analyzing any dataset Explore and investigate the data and study design For quantitative traits, we often Compare distributions with prior studies Plot density and histograms Check for outliers For genetic data Compare allele frequencies with prior studies Check missing data rates Check duplicate concordance Check Mendelian inheritance Check Hardy-Weinberg Equilibrium

6 Allele Frequency Notation Alleles are alternative forms of a particular sequence Allele frequency is the proportion of chromosomes of that type in a population For two alleles Frequencies are usually labeled p and q = 1 p For more than 2 alleles Frequencies are usually labeled p A, p B, p C... subscripts A, B and C indicate allele name

7 Genotype The pair of alleles carried by an individual If there are n alternative alleles there will be n(n+1)/2 possible genotypes Homozygous Genotype Genotype where the two alleles are in the same state Heterozygous Genotypes Genotype where the two alleles are in different states

8 Genotype Frequencies Since alleles occur in pairs, these are a useful descriptor of genetic data However, in any non-trivial study we might have a lot of frequencies to estimate... p AA, p AB, p AC, p BB, p BC, p CC

9 The simple part Genotype frequencies lead to allele frequencies For example, for two alleles: p A = p AA + ½ p AB p B = p BB + ½ p AB Fortunately, the reverse is also possible!

10 Hardy-Weinberg Equilibrium Random union of gametes Relationship described in 1908 Hardy, British mathematician Weinberg, German physician Shows n allele frequencies determine n(n+1)/2 genotype frequencies Large populations, random mating, no natural selection Hardy (1908) Mendelian Proportions in a Mixed Population, Science Weinberg (1908) Über den Nachweis der Vererbung beim Menschen, Jahr. Ver.f. Vaterland Nat. Würz

11 Random Mating: Mating Type Frequencies Mating Frequency AA x AA AA x AB AA x BB AB x AB p 2 AB AB x BB BB x BB Total 1

12 Random Mating: Offspring Genotype Frequencies Offspring Mating Frequency AA AB BB AA x AA AA x AB AA x BB AB x AB p 2 AB ¼ p 2 AB ½ P 2 AB ¼ P 2 AB AB x BB BB x BB Total 1

13 And now pp AAAA = pp 2 2 AAAA + pp AAAA pp AAAA + ¼pp AAAA = pp AAAA + ½pp AAAA 2 = pp AA 2 pp BBBB = pp 2 2 BBBB + pp BBBB pp AAAA + ¼pp AAAA = pp BBBB + ½pp AAAA 2 2 pp AAAA = 2pp AAAA pp BBBB + pp AAAA pp AAAA + pp AAAA pp BBBB + ½pp AAAA = 2 pp AAAA + ½pp AAAA pp BBBB + ½pp AAAA = 2pp AA pp BB = pp BB 2

14 Conclusion Genotype frequencies are function of allele frequencies Equilibrium reached in one generation Independent of initial genotype frequencies Random mating, etc. required Conform to binomial expansion (p A + p B ) 2 = p A2 + 2p A p B + p B 2 Holds in almost all human populations Although a very small excess in the proportion of heterozygotes is expected. Why?

15 Simple HWE Exercise Enables useful inferences for some single gene traits Particularly, when mode of inheritance is simple and known (e.g. dominant, recessive) If the defective alleles of the cystic fibrosis (CFTR) gene have a cumulative frequency of 1/50 what is: The proportion of carriers in the population? (These are individuals with one defective allele) The proportion of affected children at birth? (These are individuals with two defective alleles)

16 Checking Hardy-Weinberg Equilibrium A common first step in any genetic study is to verify that the data conforms to Hardy-Weinberg equilibrium Deviations can occur due to: Systematic errors in genotyping, Unexpected population structure, Presence of homologous regions in the genome, Association with trait in case-control studies. Which of these causes would you expect to increase the proportion of heterozygotes?

17 Testing Hardy Weinberg Equilibrium Consider a sample of 2N alleles n A alleles of type A n B alleles of type B Frequency p A = n A / (2N) Frequency p B = n B / (2N) n AA genotypes of type AA n AB genotypes of type AB n BB genotypes of type BB Expected frequency is Np 2 A Expected frequency is 2Np A p B Expected frequency is Np 2 B

18 Simple Approach Calculate allele frequencies and expected counts Construct chi-squared test statistic (1-df for 2 alleles) χχ 2 = OO EE 2 EE Convenient, but can be inaccurate, especially when one allele is rare

19 HapMap Project ( )

20 International Effort Genotyping being carried out in: Canada (McGill) China (Beijing, Hong Kong, Shanghai) Japan (Riken) United Kingdom (Sanger) United States (Baylor, Illumina, UCSF/WashU, Broad) International Working Groups: Analysis, QA/QC, Dataflow, Ethical Issues,

21 The Hapmap Samples European Ancestry 90 CEPH family members (CEU, 30 trios) African Ancestry 90 Yoruba from Ibadan, Nigeria (YRI, 30 trios) East Asian 45 Japanese from Tokyo, Japan (JAT) 45 Han Chinese from Beijing, China (HCB)

22 QC Exercise Repeatedly attempted 1,295 markers Each genotyped at 8.8 centers on average 116,536 genotypes Compared each center s repeated attempts to each other and to consensus 99.99% completion rate (14 missing genotypes in consensus) error rate (5 Mendelian inconsistencies in 38,845 consensus nuclear families) Genotype error rate estimates Average data producer had error rate of vs its own repeat experiments Average data producer had error rate of based on Mendelian inconsistencies Average data producer had error rate of vs consensus of all others

23 QC filters Good quality SNPs had to: Have 1 discrepancy in 5 duplicates Have 1 Mendelian inconsistency in 60 trios Have <20% missing genotypes Have Hardy-Weinberg p-value >.001 Phase I release, in 2005, included about 1 million SNPs Interesting observation: Rare variants failed HWE test whenever one homozygote was observed

24 A Better Approach: Exact Test of Genotypic Proportions Figure out exact probability of each genotype configuration, conditional on allele counts Sum probability of configurations with equal or lesser probability than that observed Approach analogous to Fisher s exact test for contingency tables P HWE = [ * P( N = n N, n ) P( N = n N, n )] I AB AB a AB AB a P( N AB = * n AB n * AB N, n a )

25 Probability of N AB heterozygotes, conditional on allele counts n A and n B Number of possible arrangements for 2N alleles is (2N)! / (n A! n B!) Number of possible arrangements with n AB heterozygotes is 2 nab N! / (n AA! n AB! n BB!) The ratio of these two quantities gives P(N AB = n AB N, n A )

26 Probability of n AB Heterozygotes To reconstruct formula, first calculate: Possible rearrangements for 2N alleles Possible rearrangements with n AB heterozygotes P( N AB = n AB N, n A ) = n AA n 2! n AB AB N!! n BB! n A! nb! ( 2N )! Calculation can be carried out efficiently in recursive fashion

27 Recursion for P( N AB = n AB N, n AB ) PP NN AAAA = nn AAAA + 2 NN, nn AA = 2 nnaaaa+2 NN! nn AAAA 1! nn AAAA + 2! nn BBBB 1! nn AA! nn BB! 2NN = 2 2 nn AAAA nn BBBB (nn AAAA + 2)(nn AAAA + 1) 2 nnaaaann! nn AAAA! nn AAAA! nn BBBB! nn AA! nn BB! 2NN = 4 nn AAAA nn BBBB (nn AAAA + 2)(nn AAAA + 1) PP NN AAAA = nn AAAA NN, nn AA, nn BB

28 Exact Test Observed

29 Type I Error Rate Is Periodic! Exact test in red, χ² test in blue

30 Comparison of Test Statistics

31 Comparison of Type I Error Rates

32 Poor Genotype Calling is the most common real life cause of deviations from HWE Most modern studies exclude markers that fail HWE tests In genomewide association studies, thresholds of p < 10-3 to 10-6 are common

33 Example of Good & Bad Genotype Calling

34 More Genotype Calling Examples

35 Summary Hardy Weinberg Equilibrium holds in most human population samples Deviations from HWE can indicate population structure or natural selection, but most often are genotyping artifacts Exact tests for Hardy Weinberg equilibrium provide accurate results and are typically recommended

36 Recommended Reading I Wigginton et al. (2005) A note on exact tests of Hardy-Weinberg equilibrium. Am J Hum Genet 76:887-93