Genome Wide Association Studies

Similar documents
Transcription:

Genome Wide Association Studies Liz Speliotes M.D., Ph.D., M.P.H. Instructor of Medicine and Gastroenterology Massachusetts General Hospital Harvard Medical School Fellow Broad Institute

Outline Introduction to Human Genetics HapMAP SNP genotyping Phenotype Association Viewing & reporting results Imputation Meta analysis Summary

Uses of human genetics Find genetic variants in HUMANS that causally influence HUMAN traits Potential to help estimate risk of developing disease Generate hypotheses from these regarding genes that affect processes Understanding the biology underlying traits Potential targets for therapeutics

Mendelian conditions GENE -> CONDITION Free (dominant) or attached (recessive) earlobes Wet (dominant) or dry (recessive) earwax

Most traits are genetic, but complex Genes Gene 1 Gene 2 Gene 3... Gene N Trait Environment Nutrition Environment in utero Etc.

Advances in human genetics Categorizing human genetic variation 99.5% genome identical differences- mostly single nucleotide polymorphisms (SNPs)

HapMAP International HAP MAP project- 290 individuals of European/African/ Asian Ancestries Mostly common variants categorized About 2.8 million variants categorized http://www.hapmap.org/

Linkage disequilibrium European Ancestry African Ancestry R2=1 black R2=0 white

Migration of common ancestors has lead to the formation of related but distinguishable populations

Linkage disequilibrium European Ancestry African Ancestry R2=1 black R2=0 white

Genotyping methods Affymetrix Illumina

Content, quality, cost SNP Genotyping Robustness Good DNA Not-so-good DNA Inter-lab Consistency Affymetrix Good More sensitive to problems OK Illumina Very Good Not as sensitive to problems Good SNP Content Roughly equal: not as critical with improving methods for imputation Cost ~$500 ~$375-500 (varies by product)

Calling genotype

Quality control Eliminate Poorly genotyping individuals Poorly genotyping SNPs- <95 or 99% SNPs not in HW equilibrium p 2 + 2pq + q 2 =1 p=freq of allele a q=freq allele A or (1-p)

Phenotype Characterize phenotype mean, SD, etc transform Think about confounders Age, gender It may be nice but not necessary to have some previous idea that the trait is heritable (i.e. has a genetic component)

Association Lean individuals Obese individuals A C A A C A C A A A A A C A A A A A C A A A A A A A Continuous- linear regression Dichotomous- logistic regression 70% A 30% C P < 10-20 85% A 15% C

Linear regression

Things to watch out for Multiple hypothesis testing P val <5 x 10-8

Things to watch out for Multiple hypothesis testing P val <5 x 10-8 Stratification Eigenstrat/PLINK use to get axes of variation across ancestries and correct in regression

Stratification

Things to watch out for Multiple hypothesis testing P val <5 x 10-8 Stratification Eigenstrat/PLINK use to get axes of variation across ancestries and correct in regression QC problems Association to plate, etc

Manhatan plots Willer, Speliotes et al Nat Gen 2009

QQ plots Lambda= median/0.455 Willer, Speliotes et al Nat Gen 2009

Regional plots Willer, Speliotes et al Nat Gen 2009

Reporting results Willer, Speliotes et al Nat Gen 2009

Implicate genes/pathways in humans SH2B1 MC4R 65% of people 38% of people Willer, Speliotes et al Nat Gen 2009

Celebrate!!!

No genome wide association How to best proceed

Is power the problem?

Statistical power Real effect Samples Need a large enough samples to pick up effects of certain size Possible Solution: combine with others and META ANALYZE

Combine across platformsimputation Affymetrix Illumina Imputed IMPUTE MACH2QTL

Fixed effects meta analysis Frayling et al Science 2008

Interpreting results Validity Things that could make data false QC, stratification, not genome wide significant Significance and replication Generalizability Population heterogeneity Phenotype definition

What do I need to do this? Phenotype data Genotype data Bioinformatician Genetic statistician Computer Programs are mostly freeware

Uses of human genetics Find genetic variants in HUMANS that causally influence HUMAN traits Potential to help estimate risk of developing diseases/traits Generate hypotheses from these regarding genes that affect processes Understanding the biology underlying traits Potential targets for therapeutics

Genetics of human traits Insights into the very essence of who we are and why we do things Suffering from endogenous susceptibilities Understand & treat susceptibilities Improve health, efficiency of health care delivery, and decrease costs