Genome Wide Association Studies Liz Speliotes M.D., Ph.D., M.P.H. Instructor of Medicine and Gastroenterology Massachusetts General Hospital Harvard Medical School Fellow Broad Institute
Outline Introduction to Human Genetics HapMAP SNP genotyping Phenotype Association Viewing & reporting results Imputation Meta analysis Summary
Uses of human genetics Find genetic variants in HUMANS that causally influence HUMAN traits Potential to help estimate risk of developing disease Generate hypotheses from these regarding genes that affect processes Understanding the biology underlying traits Potential targets for therapeutics
Mendelian conditions GENE -> CONDITION Free (dominant) or attached (recessive) earlobes Wet (dominant) or dry (recessive) earwax
Most traits are genetic, but complex Genes Gene 1 Gene 2 Gene 3... Gene N Trait Environment Nutrition Environment in utero Etc.
Advances in human genetics Categorizing human genetic variation 99.5% genome identical differences- mostly single nucleotide polymorphisms (SNPs)
HapMAP International HAP MAP project- 290 individuals of European/African/ Asian Ancestries Mostly common variants categorized About 2.8 million variants categorized http://www.hapmap.org/
Linkage disequilibrium European Ancestry African Ancestry R2=1 black R2=0 white
Migration of common ancestors has lead to the formation of related but distinguishable populations
Linkage disequilibrium European Ancestry African Ancestry R2=1 black R2=0 white
Genotyping methods Affymetrix Illumina
Content, quality, cost SNP Genotyping Robustness Good DNA Not-so-good DNA Inter-lab Consistency Affymetrix Good More sensitive to problems OK Illumina Very Good Not as sensitive to problems Good SNP Content Roughly equal: not as critical with improving methods for imputation Cost ~$500 ~$375-500 (varies by product)
Calling genotype
Quality control Eliminate Poorly genotyping individuals Poorly genotyping SNPs- <95 or 99% SNPs not in HW equilibrium p 2 + 2pq + q 2 =1 p=freq of allele a q=freq allele A or (1-p)
Phenotype Characterize phenotype mean, SD, etc transform Think about confounders Age, gender It may be nice but not necessary to have some previous idea that the trait is heritable (i.e. has a genetic component)
Association Lean individuals Obese individuals A C A A C A C A A A A A C A A A A A C A A A A A A A Continuous- linear regression Dichotomous- logistic regression 70% A 30% C P < 10-20 85% A 15% C
Linear regression
Things to watch out for Multiple hypothesis testing P val <5 x 10-8
Things to watch out for Multiple hypothesis testing P val <5 x 10-8 Stratification Eigenstrat/PLINK use to get axes of variation across ancestries and correct in regression
Stratification
Things to watch out for Multiple hypothesis testing P val <5 x 10-8 Stratification Eigenstrat/PLINK use to get axes of variation across ancestries and correct in regression QC problems Association to plate, etc
Manhatan plots Willer, Speliotes et al Nat Gen 2009
QQ plots Lambda= median/0.455 Willer, Speliotes et al Nat Gen 2009
Regional plots Willer, Speliotes et al Nat Gen 2009
Reporting results Willer, Speliotes et al Nat Gen 2009
Implicate genes/pathways in humans SH2B1 MC4R 65% of people 38% of people Willer, Speliotes et al Nat Gen 2009
Celebrate!!!
No genome wide association How to best proceed
Is power the problem?
Statistical power Real effect Samples Need a large enough samples to pick up effects of certain size Possible Solution: combine with others and META ANALYZE
Combine across platformsimputation Affymetrix Illumina Imputed IMPUTE MACH2QTL
Fixed effects meta analysis Frayling et al Science 2008
Interpreting results Validity Things that could make data false QC, stratification, not genome wide significant Significance and replication Generalizability Population heterogeneity Phenotype definition
What do I need to do this? Phenotype data Genotype data Bioinformatician Genetic statistician Computer Programs are mostly freeware
Uses of human genetics Find genetic variants in HUMANS that causally influence HUMAN traits Potential to help estimate risk of developing diseases/traits Generate hypotheses from these regarding genes that affect processes Understanding the biology underlying traits Potential targets for therapeutics
Genetics of human traits Insights into the very essence of who we are and why we do things Suffering from endogenous susceptibilities Understand & treat susceptibilities Improve health, efficiency of health care delivery, and decrease costs