Population structure, heritability, and polygenic risk

Population structure, heritability, and polygenic risk Alicia Martin Daly Lab October 18, 2016 armartin@broadinstitute.org @genetisaur

Project goals Call local ancestry in large case/control PTSD cohort of African Americans Estimate heritability using local ancestry tracts. Compare/ contrast this estimate with SNP-based heritability in this and European cohort (in progress) Perform admixture mapping Considerations: transferability of polygenic risk scores, cross-population heritability (Work with Karestan Koenen, Mark Daly, Laramie Duncan, Caroline Nievergelt)

Data overview Study PI Analyst NTotal NAA Data label 1 GTP (Grady Trauma Project) Kerry Ressler Lynn Almli 4752 3492 gt2y 2 Detriot (DNHS) Monica Uddin Guia Guffanti 812 650 dnhy 3 Genetics of Substance Dependence Goel Gelernter Pingxing Xie 5451 3100 gsdy 4 Marine Resilience Study Caroline Nievergelt / Dewleen Baker Adam Maihofer 4036 226 mrsy 5 Family Study of Cocaine Dependence Laura Bierut Louis Fox 1271 653 fscy 6 COGEND Laura Bierut Louis Fox 2768 711 cogy 7 Nurses Health Study Karestan Koenen Andrew Ratanatharathorn 1378 8 Stein South Africa Dan Stein / Kerry Ressler Lynn Almli 434 9 Ohio National Guard Israel Liberzon Tony King 239 Summary Statistics from imputed data 10 Duke 11 National Center for PTSD (Boston) J. Beckham / M. Hauser / A. Ashley-Koch Mark Miller / Mark Logue Melanie Garrett 1963 Mark Logue 652 Total 23,756 8,832

Local ancestry calling strategy 1. Merge intersecting genotyped SNPs (N=421,607 with MAF > 0.05) 2. Phase aggregated dataset with HAPI-UR 3x and take best combined phase 3. Split jointly phased haplotypes into reference + 50 sets of admixed samples for computational feasibility 4. Aggregate local ancestry calls across all runs 5. Collapse local ancestry output gt2y + dnhy + gsdy + mrsy + fscy + cogy + YRI + CEU Local ancestry run 1 1 AA + reference genos 2 AA + reference jointly phased haplotypes 3 Local Local + ancestry + + ancestry + run 2 run 49 4... Combined local ancestry calls 5 Local ancestry run 50 Collapsed bed files, ancestry karyograms, and plink files

Heritability estimates h 2 estimate Kinship matrix ĥ 2 SE N h 2 g REAP 0.018 0.046 7548 h 2 g GCTA GRM 0.02 0.048 7248 h 2 γ local ancestry GRM?? h 2 =phenotypic variation described by variation in local ancestry 2 =phenotypic variation explained by variation in local ancestry 2 e =residual phenotypic variance h 2 = 2 2 + 2 e F STC =weighted allele frequency di erences between ancestral populations at causal loci =genome-wide ancestry proportions h 2 =2F STC (1 )h 2 Zaitlen, N., et al. (2014). Nat. Genet. 46, 1356 1362.

1000 Genomes phase 3 populations Auton, A., et al. (2015). Nature 526, 68 74.

Substantial global genetic diversity in 1000 Genomes Europeans East Asians Africans South Asians Admixed Americas K=5 K=6 TS C I D KHX V C H S C H B JP T G W D M SL YR I ES N LW K ST U G IH PJ L IT U BE B AC B AS W PU R C LM M XL PE L FI N C EU G BR IB S K=7

Varying admixture proportions across populations in the Americas Reference panel 1.0 0.8 0.6 0.4 0.2 0.0 NAT CEU YRI NAT = Mao et al, (2007). AJHG. 80, 1171 1178. African American 1.0 0.8 0.6 0.4 0.2 0.0 ACB ASW Hispanic/ Latino 1.0 0.8 0.6 0.4 0.2 0.0 PUR CLM MXL PEL African Americans ACB = African Caribbean in Barbados ASW = African Ancestry in SW US Hispanic/Latinos CLM = Colombians MXL= Mexicans PUR = Puerto Ricans PEL = Peruvians

Admixed samples in the Americas

Admixture tracts inform subcontinental-level ancestral populations HG01893 (Peruvian) RFMix: Maples, B.K., et al (2013). AJHG. 93, 278 288.

Ancestry-specific PCA provides insight into subcontinental admixture origins 1 0 PC2 1 2 3 4 Reference AFR EUR NAT Admixed ACB ASW CLM MXL PEL PUR 5 1.0 0.5 0.0 0.5 1.0 PC1 ASPCA: Moreno-Estrada, A., et al. (2013). PLoS Genetics. 9, e1003925.

African Americans have northern European tracts, Hispanics have southern European tracts 1 PC2 0 1 2 Reference FIN CEU GBR IBS TSI Admixed ACB ASW CLM MXL PEL PUR 3 2 1 0 1 2 PC1 ASPCA: Moreno-Estrada, A., et al. (2013). PLoS Genetics. 9, e1003925.

African Americans have African tracts closest to Nigerian reference panel 1 PC2 0 1 GWD MSL YRI ESN LWK Reference ESN GWD LWK MSL YRI Admixed ACB ASW 2 1 0 1 2 PC1 ASPCA: Moreno-Estrada, A., et al. (2013). PLoS Genetics. 9, e1003925.

Africans have more genetic variation than out-of-africa populations AFR AMR EAS EUR SAS 1000 Genomes Project Consortium. (2015). A global reference for human genetic variation. Nature 526, 68 74.

Biased genetic discoveries Global population PGC GWAS (SCZ, BIP, MDD, ADHD) East Asian Latino African East Asian Middle Eastern European European Oceanic South Asian

Europeans (and Hispanic/Latinos) are overrepresented in disease databases 1000 Genomes Project Consortium. (2015). A global reference for human genetic variation. Nature 526, 68 74.

Computing polygenic risk scores from summary statistics X = mx i=1 g i i LD clumping for all variants with MAF 0.01: Apply p-value threshold (p=0.01) Thin for LD within window (R 2 =0.5, window=250kb)! (P+T in LDpred paper)

Polygenic risk score for height reflects adaptive event in Europeans and bias European height score 6000 Density 4000 Region N.Europe S.Europe 2000 0 0.0e+00 2.5e 04 5.0e 04 7.5e 04 1.0e 03 Polygenic Risk Score Wood, A.R., et al. (2014). Nature Genetics 46, 1173 1186.

Polygenic risk score for height reflects adaptive event in Europeans and bias European height score 10000 Global height score 6000 7500 Density 4000 Region N.Europe S.Europe Density 5000 Super population AFR AMR EAS EUR SAS 2000 2500 0 0.0e+00 2.5e 04 5.0e 04 7.5e 04 1.0e 03 Polygenic Risk Score 0 0.0e+00 2.5e 04 5.0e 04 7.5e 04 1.0e 03 Polygenic Score Wood, A.R., et al. (2014). Nature Genetics 46, 1173 1186.

Polygenic risk score for Type II diabetes highlights role of demography 25 Global T2D (EUR) score Global T2D (Multi ethnic) score 100 20 75 Density 15 10 Super population AFR AMR EAS EUR SAS Density 50 Super population AFR AMR EAS EUR SAS 5 25 0 0 0.50 0.55 0.60 0.65 Polygenic Score 0.54 0.56 0.58 0.60 Polygenic Score European: Gaulton, K.J., et al. (2015). Nat. Genet. 47, 1415 1425. Multi-ethnic: Mahajan, A., et al. (2014). Nat. Genet. 46, 234 244.

Coalescent model for simulation framework Demographic model: Gravel, S., et al. (2011). Proc. Natl. Acad. Sci. U. S. A. 108, 11983 11988. msprime: Kelleher, J., Etheridge, A.M., and Mcvean, G. (2015). PLoS Comput Biol 1 22.

Simulation steps Simulate for chr20 (μ=2e-8 mutations/(bp*generation)) genotypes with HapMap recombination map for 200k each: Africans, East Asians, Europeans Assign true causal effect sizes to m evenly spaced variants as: As before, define X as: Normalize: N(0, h2 m ) X = Compute true PRS as (such that total variance is h 2 ): mx i=1 Z X = X g i X i µ X G = p h 2 Z X

Simulation steps Compute the total liability for each individual (epsilon is standard normal noise), such that: T = p h 2 Z X + p 1 h 2 Z h 2 = Assuming a 5% prevalence, assign 10,000 European individuals at the most extreme end of the liability threshold case status. Randomly assign different 10,000 European individuals control status. Run a simulated GWAS, computing Fisher s exact test for all sites with MAF 0.01. Clump SNPs into LD blocks for all sites with p 1e-2, R 2 0.5 in Europeans, and window size of 250kb. Compute inferred PRS from summary stats and with true PRS Evaluate over 50 simulations for m = 200,500,1000 and h 2 =0.33,0.50,0.67 2 g 2 g + 2

True vs inferred PRS with same causal variants, different effect sizes are inconsistent h 2 =0.67, m=1000 G H I

Best performance in European study population h 2 =0.67, m=1000, 50 replicates Pearson's correlation 1.00 0.75 0.50 0.25 0.00 1000 Super population AFR EAS EUR ALL AFR EAS EUR ALL

http://biorxiv.org/content/early/2016/08/23/070797