Supplementary Figure 1. Quantile quantile plot for the combined analysis of cohorts 1 and 2.

Size: px
Start display at page:

Download "Supplementary Figure 1. Quantile quantile plot for the combined analysis of cohorts 1 and 2."

Transcription

1 Supplementary Figure 1 Quantile quantile plot for the combined analysis of cohorts 1 and 2. Quantile quantile plot of the observed log 10 (P values) versus the expectation under the null hypothesis. Data are presented for the meta-analysis of cohorts 1 and 2 after imputation and quality control. The overall genomic control inflation factor ( GC ) is 1.023, indicating that inflation due to population structure is negligible. SNPs at which the P value is smaller than are represented by triangles at the top of the plot. The gray region represents the 95% concentration band. 1

2 Supplementary Figure 2 Fine-mapping at the FOXO3 locus. Prognosis GWAS results (combined cohorts) at the FOXO3 locus; adapted from the LocusTrack plot 1. Top, SNPs in the region with their log 10 (P value) plotted against genomic position and colored according to LD with the lead SNP (rs ). Genes in the region are indicated. The expanded plot includes SNPs, genes, and ChIP seq data from the ENCODE Project 2. H3K4me1 and H3K27ac data from CD14 + monocytes and p300 binding data from myeloid K562 cells are shown (no monocyte data were available). The transcription factor binding track displays regions of transcription factor binding identified in a large collection of ChIP seq experiments performed by the ENCODE Project (further details available at 2

3 Supplementary Figure 3 Transcription of XACT in a range of human tissues. RNA sequencing data from the XACT locus in a range of human tissues. Raw data were downloaded and aligned against the hg19 genome using Star 3. (a d) The data sets comprised GEO series GSE45326 (ref. 4; n = 1 per tissue) (a,b) and the Illumina Human Bodythe Map 2.0 project (ArrayExpress E-MTAB-513, n = 1 per tissue) (c,d). The bar plots in a and c depict FPKM for the human tissues studied. The tables in b and d contain the raw and normalized data for each tissue. 3

4 Supplementary Figure 4 Relationship between association at classical HLA alleles and the frequency with which these alleles occur in non-ancestral MHC 8.1 haplotypes. Linear regression demonstrating the relationship between the classical HLA alleles that were associated with prognosis and the frequency with which they occur in haplotypes other than the ancestral MHC 8.1 haplotype in Caucasians. Allele frequency and haplotype data were obtained from the National Bone Marrow Donor Program (Six-Locus High Resolution HLA A C B DRB3/4/5 DRB1 DQB1 Haplotype Frequencies). Data were not available for HLA-DQA1. In our data, the frequency with which the lead SNP (rs ) was associated with non-ah8.1 haplotypes was , suggesting that rs is a better tag for AH8.1 than any of its constituent HLA alleles (and explaining the difference in P values between the HLA alleles and the SNP association). 4

5 Supplementary Figure 5 The genetic association signals for Crohn s disease prognosis and susceptibility at the MHC region are distinct. (a) Manhattan plots for 22,125 MHC SNPs that were common to this analysis of Crohn s disease prognosis (top; blue) and a large recent meta-analysis of Crohn s disease susceptibility (5,956 cases, 14,927 controls 5 ; bottom; red). (b) Scatterplot directly comparing the association P values between Crohn s disease susceptibility and prognosis at these 22,125 common SNPs. Dotted lines indicate the significance threshold for suggestive association (P < ). No SNPs that showed suggestive association in one analysis (of susceptibility or prognosis) were also suggestively associated in the other. 5

6 Supplementary Figure 6 Protein protein interaction analysis of genes implicated at prognosis-associated loci. DAPPLE analysis of prognosis-associated SNPs (meta P < ) demonstrating known interactions between proteins at implicated loci. Colored dots represent genes at prognosis-associated loci. Gray dots represent proteins at other nonassociated loci. Gray lines represent known interactions. 6

7 Supplementary Figure 7 Relationship between the observed P value and power for each of the 170 Crohn s disease susceptibility variants. Scatterplot of the statistical power to detect a weak general effect (OR = 1.25) plotted against the observed P value in the prognosis analysis for each of the 170 Crohn s disease susceptibility variants. The line of best fit (dotted line) was calculated by linear regression. Lack of correlation between power and P value is consistent with the null hypothesis that none of the disease susceptibility variants are individually associated with prognosis. 7

8 Supplementary Figure 8 Genetic risk scores using the extended Crohn s disease SNP list (P < ). (a c) Box-and-whisker plots of weighted genetic risk scores between good- and poor-prognosis Crohn s disease subgroups. (a) L1 (ileal disease, n = 742). (b) L2 (colonic disease, n = 724). (c) L3 (ileocolonic disease, n = 947). Boxes represent the mean and interquartile range. Whiskers represent maximum and minimum values. Genetic risk scores were calculated using an extended list of Crohn s disease associated SNPs (P < ) and their published values 6. (d) Distribution of unweighted risk allele counts in the extended list of Crohn s disease SNPs between the good-prognosis and poor-prognosis Crohn s disease subgroups. Purple histogram bars represent the poor-prognosis Crohn s disease subgroup, and yellow histogram bars represent the good-prognosis Crohn s disease subgroup. Statistical significance was assessed using unpaired two-tailed Student's t tests and were stratified for disease location; n = 2,413. 8

9 Genome-wide association study identifies distinct genetic contributions to prognosis and susceptibility in Crohn's disease Supplementary Material Inventory This file contains the following: Supplementary Tables 1 6 and 9 Supplementary Note Supplementary References

10 Supplementary Table 1. Confirmatory genotyping of 99,126 imputed SNPs Chr. Number of SNPs Concordance Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr. 21 a Chr rs rs Total Genotyping performed using Illumina Immunochip array in 880 patients unless indicated. Genotyping of rs and rs (which showed association with prognosis) performed using TaqMan SNP genotyping assays. a Chromosome 21 was imputed against the 1000 Genomes dataset due to gaps in UK10K data Chr., Chromosome; SNPs, Single Nucleotide Polymorphisms

11 Supplementary Table 2. Allelic associations at the XACT locus (rs ) by gender Gender Subgroup n Genotypes RAF OR 95% CI P value (allelic) P value (heterogeneity) Male Female Ind. CD / - / Agg. CD / - / Ind. CD / 30 / Agg. CD / 21 / x x Genotypes are presented as major allele homozygotes / heterozygotes / minor allele homozygotes. The odds ratio is presented with respect to risk of poor prognosis disease. P value (heterogeneity) assessed using Breslow-Day test. The allele frequencies between males and females within the indolent and aggressive CD groups were not significant different (P = 0.96 and 0.08 respectively; chi-squared test) RAF, Risk Allele Frequency; OR, odds ratio; 95% CI, 95% confidence interval for OR; Ind., Indolent (good prognosis) CD; Agg., Aggressive (poor prognosis) CD;P value (allelic), P value based on allele counts.

12 Supplementary Table 3. Allelic associations at class I and II HLA genes HLA allele Multigene haplotype OR 95% CI Z value P value P value after conditioning on rs HLA B*08:01 Ancestral MHC x HLA DRB1*03:01 Ancestral MHC x HLA DQA1*05:01 Ancestral MHC x HLA DQB1*02:01 Ancestral MHC x HLA C*07:01 Ancestral MHC x HLA DRB1*01:03 None HLA alleles imputed from SNP data using HLA*IMP:02 Z statistics and P values generated by logistic regression The odds ratio is presented with respect to risk of poor prognosis disease Bonferroni multiple testing correction for number of alleles examined (199). Significance threshold: P = 2.5 x 10-4 HLA DRB1*01:03 (the strongest IBD susceptibility allele) included for comparison OR, odds ratio; 95% CI, 95% confidence interval for OR

13 Supplementary Table 4. Allele frequencies of four prognosis-associated SNPs in Crohn's disease patients by disease location SNP Candidate gene(s) MAF (ind.) Ileal (L1, n = 742) MAF (agg.) OR MAF (ind.) Colonic (L2, n = 724) MAF (agg.) OR MAF (ind.) Ileocolonic (L3, n = 947) MAF (agg.) P value (heterogeneity) rs MHC rs FOXO rs XACT rs IGFBP1/IGFBP OR Disease location data available for 2413 (88.2%) of samples. P value (heterogeneity) assessed using Breslow-Day test. The odds ratio is presented with respect to the minor allele and the risk of poor prognosis (aggressive) disease MAF, Minor Allele Frequency; ind., Indolent (good prognosis) CD; agg., Aggressive (poor prognosis) CD; OR, odds Ratio.

14 Supplementary Table 5. Zero-inflated Poisson regression model incorporating disease duration into analysis of treatment escalation rate Parameter Estimate Standard Error z value P value rs (MHC) x 10-8 rs (FOXO3) x 10-6 rs (XACT) x 10-8 rs (IGFBP1 / IGFBP3) x 10-4 Disease duration Zero-inflated Poisson regression was used to analyse whether allelic variation at the four outcome-associated SNPs was associated with the number of treatment escalations (immunomodulators and abdominal surgeries) independent of the disease duration (by incorporating disease duration as a separate term within the regression model). Vuong's closeness test was used to confirm that a zero-inflated Poisson model was the most appropriate model for the data. Chr., chromosome; SNP, Single Nucleotide Polymorphism; MAF, Minor Allele Frequency; OR, odds ratio.

15 Supplementary Table 6. Association results at prognosis-associated loci using a different definition of poor prognosis CD Chr. SNP Candidate gene(s) MAF (ind. CD) OR 95% CI P value 6 rs MHC x X rs XACT x rs FOXO x rs IGFBP1 / IGFBP x 10-8 Poor prognosis disease defined as requirement for abdominal surgery within first 2 years following diagnosis (n = 763) The odds ratio is presented with respect to the minor allele and the risk of poor prognosis CD Chr., chromosome; ind., Indolent (good prognosis) CD; SNP, Single Nucleotide Polymorphism; MAF, Minor Allele Frequency; OR, odds ratio; 95% CI, 95% confidence interval.

16 Supplementary Table 7. SNPsea results for enrichment of prognosis-associated genes in known biological pathways (Gene Ontology) Data provided in separate.xlsx file

17 Supplementary Table 8. SNPsea results for enrichment of prognosis-associated genes in primary human cell-types Data provided in separate.xlsx file

18 Supplementary Table 9. Association statistics of prognosis-associated SNPs in CD casecontrol analysis Chr. SNP Candidate genes MAF cases MAF controls P 6 rs MHC X rs a XACT rs FOXO rs IGFBP1 / IGFBP Data from International IBD Genetics Consortium meta-analysis 5 : 5,956 cases, 14,927 controls). a Association statistics for rs were obtained from Wellcome Trust Case Control Consortium CD GWAS 7 as X chromosome data was not included in the International IBD Genetics Consortium meta-analysis. Chr., chromosome; SNP, Single Nucleotide Polymorphism; MAF, Minor Allele Frequency

19 Supplementary Table 10. Association statistics for 170 CD susceptibility SNPs in GWAS of prognosis Data provided in separate.xlsx file

20 Supplemental Note The UK IBD Genetics Consortium consists of the following people (excluding those already listed as authors): Craig Mowat 1, Cathryn Edwards 2, Hazel Drummond 3, Nick Kennedy 3, Charlie W. Lees 3, Kirstin Taylor 4, Christopher G. Mathew 4, Alison Simmons 5, William G. Newman 6, Christopher Hawkey 7, Ailsa Hart 8, Paul Henderson 9, Richard K. Russell 10, Jeffrey C. Barrett 11. Affiliations 1 Gastrointestinal Unit, Ninewells Hospital & Medical School, Dundee, UK. 2 Department of Gastroenterology, Torbay Hospital, Torquay, UK. 3 Gastrointestinal Unit, Division of Medical Sciences, School of Molecular and Clinical Medicine, University of Edinburgh, Western General Hospital, Edinburgh, UK. 4 Department of Medical and Molecular Genetics, King s College London School of Medicine, 8th Floor Guy s Tower, Guy s Hospital, London, UK. 5 Translational Gastroenterology Unit, Experimental Medicine Division, John Radcliffe Hospital, Headington, Oxford, UK. 6 Department of Medical Genetics, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK. 7 Nottingham Digestive Diseases Centre, Queens Medical Centre, Nottingham, UK. 8 St Mark s Hospital, Harrow, Middlesex, UK. 9 Paediatric Gastroenterology and Nutrition, Child Life and Health, College of Medicine and Veterinary Medicine, University of Edinburgh, Royal Hospital for Sick Children, Edinburgh, UK. 10 Department of Paediatric Gastroenterology, Hepatology and Nutrition, Royal Hospital for Children, Glasgow, UK. 11 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

21 Supplemental References 1 Cuellar-Partida, G., Renteria, M. E. & MacGregor, S. LocusTrack: Integrated visualization of GWAS results and genomic annotation. Source Code Biol Med 10, 1 (2015). 2 The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, (2012). 3 Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, (2013). 4 Nielsen, M. M. et al. Identification of expressed and conserved human noncoding RNAs. RNA 20, (2014). 5 Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat Genet (2015). 6 Wei, Z. et al. Large sample size, wide variant spectrum, and advanced machinelearning technique boost risk prediction for inflammatory bowel disease. Am J Hum Genet 92, (2013). 7 Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, (2007).