Conditioning on Known Associations. Studies

Size: px
Start display at page:

Download "Conditioning on Known Associations. Studies"

Transcription

1 Conditioning on Known Associations Improves the Power of Association Studies Noah Zaitlen Harvard School of Public Health

2 Does Conditioning on Known Associations Improve the Power of Association Studies? It Depends

3 Outline Conditioningon SNPs inlinkage Disequilibrium Conditioning on SNPs in Linkage Equilibrium Nï Naïve Conditioning i Improves Continuous Phenotypes Naïve Conditioning Harms Case Control Phenotypes Informed Conditioning Always Improves Power

4 Conditioning unmasks SNPs hidden due to LD r = χ2 = 2 χ2 = 3 Conditioned on 2 Conditioned on 1 χ2 = 45 χ2 = 42 SNPs in LD with opposite effect directions or in negative LD reduce effect size: WTCCC RA rs and rs are in negative LD with χ and 2.5 Conditioning changes this to and 43.3 resulting in a new hit for RA

5 Conditioning is not a test for multiple causal variants χ2 = 40 χ2 = 37 Conditioned on 3 Conditioned on 1 χ2 = 38 χ2 = 35 Conditioned on 2 Conditioned on 2 χ2 = 1 χ2 = 1

6 Outline Conditioning on SNPs in Linkage Disequilibrium Conditioning on SNPs in Linkage Equilibrium Nï Naïve Conditioning i i Improves Continuous Phenotypes Naïve Conditioning Harms Case Control Phenotypes Informed Conditioning Always Improves Power

7 Conditioning improves power in continuous data y = 0.5*x *x2 + N(0,0.044) r 2 = 0.90

8 Conditioning improves power in continuous data r 2 = r 2 = Unconditioned Conditioned

9 Conditioning Reveals Novel eqtls 108 YRI Individuals Genes 1 MB 1 MB New eqtl After Conditionig On Largest tx start tx end Largest eqtl Counting SNPs 500kb away from conditioned SNPs χ 2 > 11 Increase from 272 to 381 eqtls (FDR 1%) χ 2 > 13 Increase from 39 to 46 eqtls (Barbara Stranger, Manolis Dermitzakis, et. al. in submission)

10 Outline Conditioning on SNPs in Linkage Disequilibrium Conditioning on SNPs in Linkage Equilibrium Nï Naïve Conditioning i i Improves Continuous Phenotypes Naïve Conditioning Harms Case Control Phenotypes Informed Conditioning Always Improves Power

11 Naïve conditioning decreases power WTCCC T1D data for low prevalence Conditioned on 5 SNPs from chrom 6 explaining ~8% of thephenotypic variance Compared χ 2 statistic of naïve and informed to none for SNPs with ihχ 2 > 20 for both Naïve/None N = 24 mean = 0.78 std =

12 Naïve decreases power for low prevalence T1D Q Q Plot

13 Naïve Conditioning decreases power in case control Studies y = 0.1*x *x2 + N(0,0.05)

14 Associated variants are correlated in case control studies y = 0.1*x *x2 + N(0,0.05) Controls Cor(x1,x2) x2) 2 = Cor(x2,y) 2 = 0.74 Cor(x2,y x1) 2 = 0.24 Cases

15 Outline Conditioning on SNPs in Linkage Disequilibrium Conditioning on SNPs in Linkage Equilibrium Nï Naïve Conditioning i i Improves Continuous Phenotypes Naïve Conditioning Harms Case Control Phenotypes Informed Conditioning Always Improves Power

16 Naïve Conditioning Null Alternate N Logistic Regression Likelihood OR OR 1 4 Logistic Regression Likelihood OR OR 1 4+N Logit Model Ignores Prevalence Ignores Ascertainment

17 Informed Conditioning Null Alternate N Prevalence Multiplicative Relative Risk Likelihood RR RR 1 4 F Multiplicative Relative Risk Likelihood RR RR 1 4+N B k l i b i d i Breaks correlation between associated variants Accounts for Prevalence Accounts for Ascertainment Induces Random Ascertainment Right Thing to Do if using Relative Risk Model

18 Informed Conditioning increases power in case control studies Naive vs Informed Informed Naïve χ 2 Ra atio N = 6000 RR1 = 10 RR2 = 1.2 RR3 = Prevalence

19 Naïve conditioning decreases power WTCCC T1D data for low prevalence Conditioned on 5 SNPs from chrom 6 explaining ~8% of thephenotypic variance Compared χ 2 statistic of naïve and informed to none for SNPs with ihχ 2 > 20 for both Naïve/None N = 24 mean = 0.78 std = Informed/None N = 57 mu = 1.01 sd =

20 Naïve decreases power for low prevalence T1D Q Q Plot

21 Mid Prevalence WTCCC T2D QQ Plot

22 Naïve Conditioning CAN help mid prevalence with enough controls Taken From Voight et al Nature Genetics 2010

23 Mid Prevalence T2D + Controls QQ Plot

24 When To Condition Case Control Low Mid High Prevalence Prevalence Prevalence None YES NO NO Naïve NO +CONTROLS YES Informed YES YES YES OR Randomly Ascertained Samples (Cross Sectional Studies)

25 Summary Conditioning on SNPs in LD is not necessarily a test for multiple variants but can improve power NaïveConditioningimproves improves power of continuous phenotypes in cross sectional studies NaïveConditioning in case control studies can help or hurt depending on prevalence Informed Conditioning is a new statistic that outperforms both unconditioned and naïve tests

26 Acknowledgements Alkes Price Bogdan Pasaniuc Hyun Min Kang Harvard School of Public Health