Genome Wide Association Studies

Size: px
Start display at page:

Download "Genome Wide Association Studies"

Transcription

1 Genome Wide Association Studies Liz Speliotes M.D., Ph.D., M.P.H. Instructor of Medicine and Gastroenterology Massachusetts General Hospital Harvard Medical School Fellow Broad Institute

2 Outline Introduction to Human Genetics HapMAP SNP genotyping Phenotype Association Viewing & reporting results Imputation Meta analysis Summary

3 Uses of human genetics Find genetic variants in HUMANS that causally influence HUMAN traits Potential to help estimate risk of developing disease Generate hypotheses from these regarding genes that affect processes Understanding the biology underlying traits Potential targets for therapeutics

4 Mendelian conditions GENE -> CONDITION Free (dominant) or attached (recessive) earlobes Wet (dominant) or dry (recessive) earwax

5 Most traits are genetic, but complex Genes Gene 1 Gene 2 Gene 3... Gene N Trait Environment Nutrition Environment in utero Etc.

6 Advances in human genetics Categorizing human genetic variation 99.5% genome identical differences- mostly single nucleotide polymorphisms (SNPs)

7 HapMAP International HAP MAP project- 290 individuals of European/African/ Asian Ancestries Mostly common variants categorized About 2.8 million variants categorized

8 Linkage disequilibrium European Ancestry African Ancestry R2=1 black R2=0 white

9 Migration of common ancestors has lead to the formation of related but distinguishable populations

10 Linkage disequilibrium European Ancestry African Ancestry R2=1 black R2=0 white

11 Genotyping methods Affymetrix Illumina

12 Content, quality, cost SNP Genotyping Robustness Good DNA Not-so-good DNA Inter-lab Consistency Affymetrix Good More sensitive to problems OK Illumina Very Good Not as sensitive to problems Good SNP Content Roughly equal: not as critical with improving methods for imputation Cost ~$500 ~$ (varies by product)

13 Calling genotype

14 Quality control Eliminate Poorly genotyping individuals Poorly genotyping SNPs- <95 or 99% SNPs not in HW equilibrium p 2 + 2pq + q 2 =1 p=freq of allele a q=freq allele A or (1-p)

15 Phenotype Characterize phenotype mean, SD, etc transform Think about confounders Age, gender It may be nice but not necessary to have some previous idea that the trait is heritable (i.e. has a genetic component)

16 Association Lean individuals Obese individuals A C A A C A C A A A A A C A A A A A C A A A A A A A Continuous- linear regression Dichotomous- logistic regression 70% A 30% C P < % A 15% C

17 Linear regression

18 Things to watch out for Multiple hypothesis testing P val <5 x 10-8

19 Things to watch out for Multiple hypothesis testing P val <5 x 10-8 Stratification Eigenstrat/PLINK use to get axes of variation across ancestries and correct in regression

20 Stratification

21 Things to watch out for Multiple hypothesis testing P val <5 x 10-8 Stratification Eigenstrat/PLINK use to get axes of variation across ancestries and correct in regression QC problems Association to plate, etc

22 Manhatan plots Willer, Speliotes et al Nat Gen 2009

23 QQ plots Lambda= median/0.455 Willer, Speliotes et al Nat Gen 2009

24 Regional plots Willer, Speliotes et al Nat Gen 2009

25 Reporting results Willer, Speliotes et al Nat Gen 2009

26 Implicate genes/pathways in humans SH2B1 MC4R 65% of people 38% of people Willer, Speliotes et al Nat Gen 2009

27 Celebrate!!!

28 No genome wide association How to best proceed

29 Is power the problem?

30 Statistical power Real effect Samples Need a large enough samples to pick up effects of certain size Possible Solution: combine with others and META ANALYZE

31 Combine across platformsimputation Affymetrix Illumina Imputed IMPUTE MACH2QTL

32 Fixed effects meta analysis Frayling et al Science 2008

33 Interpreting results Validity Things that could make data false QC, stratification, not genome wide significant Significance and replication Generalizability Population heterogeneity Phenotype definition

34 What do I need to do this? Phenotype data Genotype data Bioinformatician Genetic statistician Computer Programs are mostly freeware

35 Uses of human genetics Find genetic variants in HUMANS that causally influence HUMAN traits Potential to help estimate risk of developing diseases/traits Generate hypotheses from these regarding genes that affect processes Understanding the biology underlying traits Potential targets for therapeutics

36 Genetics of human traits Insights into the very essence of who we are and why we do things Suffering from endogenous susceptibilities Understand & treat susceptibilities Improve health, efficiency of health care delivery, and decrease costs