Nature Genetics: doi: /ng Supplementary Figure 1. Eigenvector plots for the three GWAS including subpopulations from the NCI scan.

Size: px
Start display at page:

Download "Nature Genetics: doi: /ng Supplementary Figure 1. Eigenvector plots for the three GWAS including subpopulations from the NCI scan."

Transcription

1 Supplementary Figure 1 Eigenvector plots for the three GWAS including subpopulations from the NCI scan. The NCI subpopulations are as follows: NITC, Nutrition Intervention Trial Cohort; SHNX, Shanxi Cancer Genetics Study; SING, Singapore Chinese Cohort; SWHS, Shanghai Men s Health Study; SWHS, Shanghai Women s Health Study. Beijing refers to the study by Wu et al., and Henan refers to the study by Wang et al. The plot on the left uses the first and second eigenvectors, whereas the plot on the right uses the second and third eigenvectors. The plots show that the first eigenvector partially separates the subpopulations. 1

2 Supplementary Figure 2 QQ plots of observed versus expected P values for the joint genome-wide stage 1 analysis from a logistic regression model adjusted for age, sex, study and two eigenvectors from a joint model to adjust for population stratification. The red dots represent a plot using all SNPs, whereas the green dots represent a plot excluding SNPs (and SNPs within 500 kb of SNPs) previously reported to be associated with ESCC risk. The inflation factor λ for the plot using all SNPs was

3 Supplementary Figure 3 QQ plots of observed versus expected P values for the Beijing study based on a logistic regression model adjusted for age, sex and three study-specific eigenvectors (first, fifth and seventh) to adjust for population stratification. The red dots represent a plot using all SNPs, whereas the green dots represent a plot excluding SNPs (and SNPs within 500 kb of SNPs) previously reported to be associated with ESCC risk. The inflation factor λ for the plot using all SNPs was

4 Supplementary Figure 4 QQ plots of observed versus expected P values for the Henan study from a logistic regression model adjusted for age and sex. This study did not require eigenvector adjustment for population stratification because the ESCC risk base model did not reveal statistically significant eigenvectors. The red dots represent a plot using all SNPs, whereas the green dots represent a plot excluding SNPs (and SNPs within 500 kb of SNPs) previously reported to be associated with ESCC risk. The inflation factor λ for the plot using all SNPs was

5 Supplementary Figure 5 QQ plots of observed versus expected P values for the NCI study based on a logistic regression model adjusted for age, sex and the first study-specific eigenvector to adjust for population stratification. The red dots represent a plot using all SNPs, whereas the green dots represent a plot excluding SNPs (and SNPs within 500 kb of SNPs) previously reported to be associated with ESCC risk. The inflation factor λ for the plot using all SNPs was

6 Supplementary Figure 6 Association results, recombination and linkage disequilibrium plot for the region at 6p21.32 for the NCI and Henan scans combined (meta), the Henan replication set and the combined estimate from both stages. The region at 6p21.32: 32,508,399 32,689,013 for the HLA class II locus was plotted. Association results from a trend test in log 10 P values (y axis, left; gray diamonds, stage 1 results including the NCI and Henan scans; purple diamonds, Henan replication results; red diamonds, combined results) of the SNPs are shown according to their chromosomal positions (x axis). Linkage disequilibrium structure based on 1000 Genomes Project CHB data (n = 91) was visualized with snp.plotter software. Owing to the hypervariability of the locus, the number of SNPs was pruned so that no two sites were within the same 500-bp window. The line graph shows the likelihood ratio statistics (y axis, right) for recombination hotspot as determined by SequenceLDhot software on the basis of the background recombination rates inferred by PHASE v2.1 using 100 randomly sampled controls from the NCI scan (red line) and Henan scan (blue line) data from stage 1. Physical locations are based on NCBI human genome Build 37. Gene annotation was based on the NCBI 6

7 RefSeq genes from the UCSC Genome Browser. 7

8 Supplementary Figure 7 QQ plots of observed versus expected P values for the joint analysis from a logistic regression model adjusted for age and sex. The green dots represent results from a model without adjustment for population stratification, whereas the red dots represent results from a model including two eigenvectors to adjust for population stratification. The inflation factor λ was 1.17 before adjustment and 1.01 after adjustment. 8

9 Supplementary Figure 8 QQ plots of observed versus expected P values for the Beijing study from a logistic regression model adjusted for age and sex. The green dots represent results from a model without adjustment for population stratification, whereas the red dots represent results from a model including three study-specific eigenvectors (first, fifth and seventh) to adjust for population stratification. The inflation factor λ was 1.38 before adjustment and 1.00 after adjustment. This adjustment accounts for differences in estimates for SNPs rs , rs and rs , which had P values of , and before adjustment and , and after adjustment. 9

10 Supplementary Figure 9 QQ plots of observed versus expected P values for the NCI study from a logistic regression model adjusted for age and sex. The green dots represent results from a model without adjustment for population stratification, whereas the red dots represent results from a model including one study-specific eigenvector to adjust for population stratification. The inflation factor λ was before adjustment and 1.01 after adjustment. 10

11 Supplementary Table 1. Characteristics of ESCC cases and controls used in the study* Study/phase Controls Cases N Age, mean (s.d.) Sex, male (%) N Age, mean (s.d.) Sex, male (%) NCI scan (9.56) (8.74) Henan scan (14.91) (9.60) Beijing scan (8.51) (9.74) Henan replication (12.67) (9.00) Beijing replication (14.17) (9.07) * Nine subjects from the NCI and Henan scans were the same individuals and they were used only once in the joint analysis

12 Supplementary Table 2. Summary of SNPs prior to imputation GWAS scan NCI Beijing Henan Combined PreQC (1) 556, , ,865 QC excludes 95,481 7, ,425 PostQC/PreImputation 461, , ,440 PostImputation (2) 40,497,507 40,499,458 40,497,507 Merging (3) 40,373,742 QC excludes (4) 32,817,527 Final number of SNPs 7,556,215 Notes: (1) Overlap in the number of SNPs among all three GWAS scans after QC is 130,787; (2) Overlap in the number of SNPs after imputation is 40,497,507; (3) Incompatible SNPs among sets were automatically excluded; (4) Excluded SNPs with INFO < 0.3 or MAF < 0.01 in study control population

13 Supplementary Table 3. Replication data testing for 14 SNPs that passed Stage 1 SNP CHR LOC GROUP CATEGORY* INFO NUM_CONTROL NUM_CASE REFERENCE_ALLELE EFFECT_ALLELE EFFECT_ALLELE_FREQ_CONTROL EFFECT_ALLELE_FREQ_CASE OR CI P P heterog. rs chr scan analysis (Stage 1) i,i,i C G ( ) 4.07E-06 rs chr Henan Replication g C G ( ) 2.70E-11 rs chr Beijing Replication g C G ( ) 2.92E-06 rs chr Combined ( ) 7.72E E-01 rs chr scan analysis (Stage 1) i,i,i C T ( ) 3.91E-08 rs chr Henan Replication g C T ( ) 2.67E-02 rs chr Beijing Replication g C T ( ) 1.55E-06 rs chr Combined ( ) 3.10E E-02 rs chr scan analysis (Stage 1) i,i,i G A ( ) 6.17E-07 rs chr Henan Replication g G A ( ) 1.00E-03 rs chr Beijing Replication g G A ( ) 3.59E-01 rs chr Combined ( ) 1.18E E-02 rs chr scan analysis (Stage 1) i,i,i T A ( ) 5.18E-08 rs chr Henan Replication g T A ( ) 2.36E-02 rs chr Beijing Replication g T A ( ) 4.50E-01 rs chr Combined ( ) 7.30E E-03 rs chr scan analysis (Stage 1) i,i,i C A ( ) 1.14E-05 rs chr Henan Replication g C A ( ) 3.82E-01 rs chr Beijing Replication g C A ( ) 2.42E-02 rs chr Combined ( ) 8.48E E-02 rs chr scan analysis (Stage 1) g,i,i A G ( ) 1.00E-04 rs chr Henan Replication g A G ( ) 1.44E-02 rs chr Beijing Replication g A G ( ) 2.86E-01 rs chr Combined ( ) 1.69E E-01 rs chr scan analysis (Stage 1) g,g,g C T ( ) 3.59E-06 rs chr Henan Replication g C T ( ) 5.85E-02 rs chr Beijing Replication g C T ( ) 7.04E-01 rs chr Combined ( ) 5.65E E-02 rs chr scan analysis (Stage 1) i,i,i G A ( ) 4.30E-06 rs chr Beijing Replication g G A ( ) 1.89E-01 rs chr scan analysis (Stage 1)+Beijing Replication ( ) 2.88E E-02 rs chr scan analysis (Stage 1) g,g,g G A ( ) 2.52E-06 rs chr Henan Replication g G A ( ) 7.49E-01 rs chr Beijing Replication g G A ( ) 1.87E-01 rs chr Combined ( ) 1.65E E-03 rs27209 chr scan analysis (Stage 1) i,i,i G A ( ) 4.33E-06 rs27209 chr Beijing Replication g G A ( ) 5.63E-01 rs27209 chr scan analysis (Stage 1)+Beijing Replication ( ) 3.70E E-04 rs chr scan analysis (Stage 1) g,i,i T C ( ) 6.12E-07 rs chr Henan Replication g T C ( ) 5.32E-01 rs chr Beijing Replication g T C ( ) 6.50E-01 rs chr Combined ( ) 1.64E E-05 rs chr scan analysis (Stage 1) i,i,i A G ( ) 3.79E-06 rs chr Henan Replication g A G ( ) 9.65E-01 rs chr Beijing Replication g A G ( ) 4.19E-01 rs chr Combined ( ) 2.48E E-04 rs chr scan analysis (Stage 1) i,i,i C T ( ) 2.92E-05 rs chr Henan Replication g C T ( ) 1.56E-01 rs chr Beijing Replication g C T ( ) 7.18E-01 rs chr Combined ( ) 5.12E E-04 rs chr scan analysis (Stage 1) g,i,i A T ( ) 5.42E-06 rs chr Henan Replication g A T ( ) 9.32E-01 rs chr Beijing Replication g A T ( ) 1.80E-01 rs chr Combined ( ) 6.57E E-05 *Categories in Stage 1 are listed in the order Beijing, Henan, and NCI, where 'g' indicates that the SNP was genotyped on the array and 'i' indicates that the SNP was imputed

14 Supplementary Table 4. Individual GWAS analysis results and meta-analysis SNP CHR LOC GROUP CATEGORY* INFO NUM_CONTROL NUM_CASE REFERENCE_ALLELE EFFECT_ALLELE EFFECT_ALLELE EFFECT_ALLELE OR CI P P heterog. FREQ_CONTROL FREQ_CASE rs chr nci i C G ( ) 1.60E-02 rs chr beijing i C G ( ) 7.06E-02 rs chr henan i C G ( ) 7.64E-05 rs chr beijing+henan+nci ( ) 6.23E E-01 rs chr nci i C T ( ) 1.21E-04 rs chr beijing i C T ( ) 3.12E-02 rs chr henan i C T ( ) 1.19E-02 rs chr beijing+henan+nci ( ) 4.74E E-01 rs chr nci i G A ( ) 4.27E-06 rs chr beijing i G A ( ) 8.83E-01 rs chr henan i G A ( ) 3.50E-04 rs chr beijing+henan+nci ( ) 5.62E E-02 rs chr nci i T A ( ) 7.72E-08 rs chr beijing i T A ( ) 1.82E-02 rs chr henan i T A ( ) 6.82E-01 rs chr beijing+henan+nci ( ) 1.99E E-02 rs chr nci i C A ( ) 1.11E-04 rs chr beijing i C A ( ) 9.60E-03 rs chr henan i C A ( ) 3.46E-01 rs chr beijing+henan+nci ( ) 5.41E E-01 rs chr nci i A G ( ) 6.49E-02 rs chr beijing g A G ( ) 3.74E-02 rs chr henan i A G ( ) 1.17E-03 rs chr beijing+henan+nci ( ) 8.38E E-01 rs chr nci g C T ( ) 2.74E-02 rs chr beijing g C T ( ) 6.64E-03 rs chr henan g C T ( ) 7.08E-04 rs chr beijing+henan+nci ( ) 4.33E E-01 rs chr nci i G A ( ) 9.48E-04 rs chr beijing i G A ( ) 1.35E-02 rs chr henan i G A ( ) 5.09E-01 rs chr beijing+henan+nci ( ) 7.54E E-01 rs chr nci g G A ( ) 1.36E-02 rs chr beijing i G A ( ) 1.50E-03 rs chr henan g G A ( ) 2.99E-02 rs chr beijing+henan+nci ( ) 8.29E E-01 rs27209 chr nci i G A ( ) 4.19E-02 rs27209 chr beijing i G A ( ) 2.64E-03 rs27209 chr henan i G A ( ) 5.51E-04 rs27209 chr beijing+henan+nci ( ) 2.94E E-01 rs chr nci i T C ( ) 1.44E-04 rs chr beijing g T C ( ) 3.68E-02 rs chr henan i T C ( ) 1.97E-01 rs chr beijing+henan+nci ( ) 1.17E E-01 rs chr nci i A G ( ) 1.62E-04 rs chr beijing i A G ( ) 3.85E-02 rs chr henan i A G ( ) 2.39E-01 rs chr beijing+henan+nci ( ) 1.62E E-01 rs chr nci i C T ( ) 8.18E-02 rs chr beijing i C T ( ) 8.64E-03 rs chr henan i C T ( ) 8.50E-03 rs chr beijing+henan+nci ( ) 1.06E E-01 rs chr nci i A T ( ) 7.94E-04 rs chr beijing g A T ( ) 2.77E-02 rs chr henan i A T ( ) 2.86E-02 rs chr beijing+henan+nci ( ) 5.47E E-01 * 'g' indicates that the SNP was genotyped on the array and 'i' indicates that the SNP was imputed

15 Supplementary Table 5. Association results for top 4 SNPs stratified by study group and alcohol status Group SNP CHR LOC GROUP CATEGORY* INFO NUM_CONTROL NUM_CASE REFERENCE_ALLELE EFFECT_ALLELE EFFECT_ALLELE EFFECT_ALLELE OR CI P P heterog. FREQ_CONTROL FREQ_CASE Beijing rs chr nondrinker i C G ( ) 4.92E-01 Beijing rs chr drinker i C G ( ) 7.99E-02 Beijing rs chr drinker+nondrinker ( ) 8.56E E-01 Henan rs chr nondrinker i C G ( ) 5.75E-02 Henan rs chr drinker i C G ( ) 5.04E-01 Henan rs chr drinker+nondrinker ( ) 4.61E E-01 NCI rs chr nondrinker i C G ( ) 5.33E-02 NCI rs chr drinker i C G ( ) 7.95E-02 NCI rs chr drinker+nondrinker ( ) 1.43E E-01 Beijing rs chr nondrinker i C T ( ) 2.02E-02 Beijing rs chr drinker i C T ( ) 4.41E-01 Beijing rs chr drinker+nondrinker ( ) 2.80E E-01 Henan rs chr nondrinker i C T ( ) 5.31E-02 Henan rs chr drinker i C T ( ) 5.10E-01 Henan rs chr drinker+nondrinker ( ) 4.38E E-01 NCI rs chr nondrinker i C T ( ) 5.36E-04 NCI rs chr drinker i C T ( ) 1.70E-01 NCI rs chr drinker+nondrinker ( ) 1.97E E-01 Beijing rs chr nondrinker i G A ( ) 6.05E-01 Beijing rs chr drinker i G A ( ) 7.72E-01 Beijing rs chr drinker+nondrinker ( ) 8.53E E-01 Henan rs chr nondrinker i G A ( ) 3.77E-02 Henan rs chr drinker i G A ( ) 4.75E-01 Henan rs chr drinker+nondrinker ( ) 1.28E E-01 NCI rs chr nondrinker i G A ( ) 1.04E-06 NCI rs chr drinker i G A ( ) 7.89E-01 NCI rs chr drinker+nondrinker ( ) 5.13E E-02 Beijing rs chr nondrinker i T A ( ) 2.13E-01 Beijing rs chr drinker i T A ( ) 4.23E-02 Beijing rs chr drinker+nondrinker ( ) 2.12E E-01 Henan rs chr nondrinker i T A ( ) 9.98E-01 Henan rs chr drinker i T A ( ) 3.52E-01 Henan rs chr drinker+nondrinker ( ) 6.59E E-01 NCI rs chr nondrinker i T A ( ) 4.06E-06 NCI rs chr drinker i T A ( ) 1.45E-02 NCI rs chr drinker+nondrinker ( ) 2.04E E-01 * 'g' indicates that the SNP was genotyped on the array and 'i' indicates that the SNP was imputed

16 Supplementary Table 6. Association results for top 4 SNPs stratified by study group and tobacco status Group SNP CHR LOC GROUP CATEGORY* INFO NUM_CONTROL NUM_CASE REFERENCE_ALLELE EFFECT_ALLELE EFFECT_ALLELE EFFECT_ALLELE OR CI P P heterog. FREQ_CONTROL FREQ_CASE Beijing rs chr nonsmoker i C G ( ) 7.79E-01 Beijing rs chr smoker i C G ( ) 4.09E-02 Beijing rs chr nonsmoker+smoker ( ) 8.22E E-01 Henan rs chr nonsmoker i C G ( ) 3.24E-02 Henan rs chr smoker i C G ( ) 7.22E-01 Henan rs chr nonsmoker+smoker ( ) 3.78E E-01 NCI rs chr nonsmoker i C G ( ) 3.30E-02 NCI rs chr smoker i C G ( ) 1.50E-01 NCI rs chr nonsmoker+smoker ( ) 1.26E E-01 Beijing rs chr nonsmoker i C T ( ) 1.77E-01 Beijing rs chr smoker i C T ( ) 1.15E-01 Beijing rs chr nonsmoker+smoker ( ) 3.80E E-01 Henan rs chr nonsmoker i C T ( ) 1.11E-01 Henan rs chr smoker i C T ( ) 1.22E-01 Henan rs chr nonsmoker+smoker ( ) 3.37E E-01 NCI rs chr nonsmoker i C T ( ) 1.13E-03 NCI rs chr smoker i C T ( ) 7.44E-02 NCI rs chr nonsmoker+smoker ( ) 4.29E E-01 Beijing rs chr nonsmoker i G A ( ) 8.27E-01 Beijing rs chr smoker i G A ( ) 6.78E-01 Beijing rs chr nonsmoker+smoker ( ) 8.84E E-01 Henan rs chr nonsmoker i G A ( ) 1.61E-01 Henan rs chr smoker i G A ( ) 8.11E-01 Henan rs chr nonsmoker+smoker ( ) 1.71E E-01 NCI rs chr nonsmoker i G A ( ) 5.84E-08 NCI rs chr smoker i G A ( ) 1.86E-01 NCI rs chr nonsmoker+smoker ( ) 3.64E E-03 Beijing rs chr nonsmoker i T A ( ) 3.90E-02 Beijing rs chr smoker i T A ( ) 1.12E-01 Beijing rs chr nonsmoker+smoker ( ) 1.07E E-01 Henan rs chr nonsmoker i T A ( ) 7.55E-01 Henan rs chr smoker i T A ( ) 1.82E-01 Henan rs chr nonsmoker+smoker ( ) 7.47E E-01 NCI rs chr nonsmoker i T A ( ) 2.04E-03 NCI rs chr smoker i T A ( ) 4.26E-05 NCI rs chr nonsmoker+smoker ( ) 3.28E E-01 * 'g' indicates that the SNP was genotyped on the array and 'i' indicates that the SNP was imputed

17 Supplementary Table 7. Summary of genomic annotation by HaploReg v2 and RegulomeDB chr pos(hg19) LD * variant Ref Alt Alternative allele frequency Conservation Histone marks AFR AMR ASN EUR GERP SiPhy Promoter Enhancer DNase Proteins bound eqtl tissues Motifs changed GENCODE genes RefSeq genes dbsnp func annot RegulomeDB score hit rs G A GM bound proteins 6.7kb 5' of HLA-DQA1 16kb 5' of HLA-DQA rs A G Pou2f2 7.8kb 5' of HLA-DQA1 17kb 5' of HLA-DQA rs G A,C kb 5' of HLA-DQA1 17kb 5' of HLA-DQA rs C T GM12878 MAZR,p kb 5' of HLA-DQA1 15kb 5' of HLA-DQA rs A T Arid3a,Foxp1 7.5kb 5' of HLA-DQA1 17kb 5' of HLA-DQA rs C T GM12878, HMEC 12 cell types 11 bound proteins 5.6kb 5' of HLA-DQA1 15kb 5' of HLA-DQA rs A T GM12878 ERalpha-a 7kb 5' of HLA-DQA1 16kb 5' of HLA-DQA rs C G NF-AT 7.7kb 5' of HLA-DQA1 17kb 5' of HLA-DQA rs G A AP-1,Irf 7.5kb 5' of HLA-DQA1 17kb 5' of HLA-DQA rs G C POL2,POL24H8 PU.1,Pbx3,SP2 HLA-DQA1 217bp 3' of HLA-DQA rs C G HES1 8.6kb 5' of HLA-DQA1 18kb 5' of HLA-DQA rs T C HES1 8.6kb 5' of HLA-DQA1 18kb 5' of HLA-DQA rs A C,G,T GM12878 FibroP 7 bound proteins HLA-DQA1 HLA-DQA1 missense rs G A GM12878 FibroP 9 bound proteins NRSF HLA-DQA1 HLA-DQA1 missense rs C T NHEK, GM12878, HMEC GM19238,Urothelia RFX5,TBP 8 altered motifs 4.8kb 5' of HLA-DQA1 14kb 5' of HLA-DQA1 3a rs G A STAT,TCF4 15kb 5' of HLA-DQA1 24kb 5' of HLA-DRB rs G A AP-1,PRDM1,PU.1 HLA-DQA1 4.7kb 5' of HLA-DQA rs A T SETDB1 5 altered motifs 10kb 5' of HLA-DQA1 20kb 5' of HLA-DQA rs A G altered motifs 7.7kb 5' of HLA-DQA1 17kb 5' of HLA-DQA rs G A GM12878 LXR,Znf143 16kb 5' of HLA-DRB1 17kb 5' of HLA-DRB1 intronic rs C G GM12878 TAL1 15kb 5' of HLA-DRB1 15kb 5' of HLA-DRB1 intronic rs C A altered motifs 11kb 5' of HLA-DQA1 20kb 5' of HLA-DQA rs G A GM12878 CD20+ POL2,POL24H8,TBP Foxm1,Pou2f2 6.5kb 5' of HLA-DQA1 16kb 5' of HLA-DQA rs T G GM12878 Sox,TAL1 6.9kb 5' of HLA-DQA1 16kb 5' of HLA-DQA rs A G GM12878 SIX5,Sox 6.9kb 5' of HLA-DQA1 16kb 5' of HLA-DQA rs AC A altered motifs 9.8kb 5' of HLA-DQA1 19kb 5' of HLA-DQA rs T C LBP-1,Nanog,TEF-1 7.9kb 5' of HLA-DQA1 17kb 5' of HLA-DQA rs G A Ets,Foxp3 HLA-DQA1 5.3kb 5' of HLA-DQA rs A T altered motifs 9kb 5' of HLA-DQA1 18kb 5' of HLA-DQA rs G A HLA-DQA1 7.8kb 5' of HLA-DQA rs A C SETDB1 10kb 5' of HLA-DQA1 19kb 5' of HLA-DQA rs T A GM bound proteins 7 altered motifs 6.6kb 5' of HLA-DQA1 16kb 5' of HLA-DQA rs G C NHEK, GM12878, HMEC GATA,HNF1,Nkx3 4.6kb 5' of HLA-DQA1 14kb 5' of HLA-DQA rs A G GM12878, HMEC, NHEK 14 cell types 19 bound proteins Pou5f1,TBX5 5.4kb 5' of HLA-DQA1 15kb 5' of HLA-DQA rs C T GM12878 Pax-5,RREB-1,Znf143 12kb 5' of HLA-DQA1 21kb 5' of HLA-DQA rs C T GM12878 POL2 ATF3,E2F,XBP-1 HLA-DQA1 HLA-DQA1 intronic rs C G altered motifs 506bp 3' of HLA-DQA1 3.9kb 3' of HLA-DQA rs T TC altered motifs 11kb 5' of HLA-DQA1 20kb 5' of HLA-DQA rs A G GM altered motifs 15kb 5' of HLA-DRB1 15kb 5' of HLA-DRB1 intronic rs C G GM12878 H9ES,Myometr,RPTEC JUND 7 altered motifs HLA-DQA1 593bp 5' of HLA-DQA1 3a rs G C Ets,Znf143,p kb 5' of HLA-DQA1 19kb 5' of HLA-DQA rs A T Maf,STAT 9.5kb 5' of HLA-DQA1 19kb 5' of HLA-DQA rs T G GM cell types 4 bound proteins Pou2f2 18kb 5' of HLA-DRB1 18kb 5' of HLA-DRB rs T C altered motifs 3.3kb 3' of HLA-DQA1 6.8kb 3' of HLA-DQA rs G T altered motifs 7.5kb 5' of HLA-DQA1 17kb 5' of HLA-DQA rs C T altered motifs 4.4kb 3' of HLA-DQA1 7.9kb 3' of HLA-DQA rs C G GM altered motifs HLA-DQA1 4kb 5' of HLA-DQA rs G T GM12878 GM altered motifs HLA-DQA1 8.9kb 5' of HLA-DQA rs G A POL2,POL24H8 Mef2,THAP1 98bp 3' of HLA-DQA1 3.5kb 3' of HLA-DQA rs C A altered motifs 8.1kb 5' of HLA-DQA1 17kb 5' of HLA-DQA hit rs C G HSMM 6 cell types Th1,CD34+_Mobilized,Th2 TCF12 TMEM173 TMEM173 synonymous rs G A Huvec, H1 LNCaP 4 altered motifs 2.7kb 3' of TMEM kb 3' of TMEM173 1f rs A G cell types BATF,Irf 223bp 5' of TMEM bp 5' of TMEM rs G A H1 CEBPB 4.2kb 3' of TMEM kb 3' of TMEM173 1f rs T A Huvec Hoxd10 489bp 5' of ECSCR 497bp 5' of ECSCR hit rs C T altered motifs ATP1B2 ATP1B2 intronic rs A G Th1,Hepatocytes,Osteobl Schadt_Liver AP-2,EBF,NRSF ATP1B2 ATP1B2 1f rs G A,C,T H1 Adult_CD4_Th0 ATP1B2 ATP1B2 3'-UTR rs G A H1 Fibrobl ATP1B2 ATP1B rs T G H1 8 cell types RAD21 12 altered motifs ATP1B2 ATP1B2 3'-UTR 1b rs C T cell types FibroP,Medullo 9 altered motifs ATP1B2 ATP1B2 5'-UTR rs C T K bp 5' of ATP1B2 4.6kb 5' of ATP1B2 1f rs G C cell types K562 8 cell types POL2 6 altered motifs ATP1B2 509bp 5' of ATP1B2 2b rs T C altered motifs 1.7kb 3' of TP53 2.3kb 3' of ATP1B rs C T GM12878 HNF4,LBP-1,Zbtb3 TNFSF12 TNFSF12 intronic rs C T H1 FibroP TAF1 12 altered motifs ATP1B2 ATP1B2 3'-UTR rs G A H1 BDP1,HDAC2 409bp 3' of ATP1B2 406bp 3' of ATP1B2 2b *LD against susceptibility loci (hit) was calculated using the 1000 Genomes CHB data and threshold was set to r 2 >=0.8 for rs and rs and r 2 >=0.5 for rs becuase no surrogates passed the 0.8 threshold RegulomeDB scores: 1a, eqtl + TF binding + matched TF motif + matched DNase Footprint + DNase peak;1b, eqtl + TF binding + any motif + DNase Footprint + DNase peak;1c, eqtl + TF binding + matched TF motif + DNase peak;binding + matched TF motif; 1d, eqtl + TF binding + any motif + DNase peak;1e, eqtl + TF;1f, eqtl + TF binding / DNase peak;2a, TF binding + matched TF motif + matched DNase Footprint + DNase peak;2b,tf binding + any motif + DNase Footprint + DNase peak;+ any motif + DNase peak; 2c, TF binding + matched TF motif + DNase peak;3a, TF binding; 3b, TF binding + matched TF motif;4, TF binding + DNase peak;5, TF binding or DNase peak;6, other; 7, unknown

18 Supplementary Table 8. Independent and joint analysis of 12 previously reported loci associated with risk of ESCC NCI Henan Beijing Combined Loci Gene rsid Ref_Allele Effect_Allele Effect_Allele Effect_Allele P_value OR 95% CI Effect_Allele Effect_Allele P_value OR 95%CI Effect_Allele Effect_Allele P_value OR 95% CI Effect_Allele Effect_Allele P_value OR 95% CI Freq_Control Freq_Case Freq_Control Freq_Case Freq_Control Freq_Case Freq_Control Freq_Case 10q23.33 PLCE1,KIAA1516 rs A G E ( ) E ( ) E ( ) E ( ) 2q33.1 CASP8,ALS2CR12 rs A G E ( ) E ( ) E ( ) E ( ) 21q22.12 RUNX1 rs A G E ( ) E ( ) E ( ) E ( ) 22q12.1 CHEK2,CCDC117,XBP1 rs T C E ( ) E ( ) E ( ) E ( ) 5q11.2 PDE4D,PDE4DN2 rs C A E ( ) E ( ) E ( ) E ( ) 16q12.1 HEATR3 rs C T E ( ) E ( ) E ( ) E ( ) 3q27.3 ST6GAL1 rs G A E ( ) E ( ) E ( ) E ( ) 6p21.1 rs T C E ( ) E ( ) E ( ) E ( ) 17q21.2 JUP,HAP1 rs A T E ( ) E ( ) E ( ) E ( ) 17p13.3 SMG6 rs C A E ( ) E ( ) E ( ) E ( ) 20p13 C20orf54 rs C T E ( ) E ( ) E ( ) E ( ) 18p11.21 PTPN2 rs A G E ( ) E ( ) E ( ) E ( )

19 Supplementary Table 9. Associations in individual GWAS for top 4 SNPs before and after PC adjustment SNP Before PC Adjustment After PC Adjustment OR (95% CI) P OR (95% CI) P rs ( ) 2.72E ( ) 5.60E-02 rs ( ) 2.50E ( ) 3.12E-02 rs ( ) 1.38E ( ) 8.83E-01 rs ( ) 2.57E ( ) 1.82E-02 Before PC Adjustment After PC Adjustment OR (95% CI) P OR (95% CI) P rs ( ) 1.78E ( ) 1.60E-02 rs ( ) 1.28E ( ) 1.21E-04 rs ( ) 3.76E ( ) 4.27E-06 rs ( ) 6.84E ( ) 7.72E-08 Before PC Adjustment After PC Adjustment* OR (95% CI) P OR (95% CI) P rs ( ) 7.64E-05 rs ( ) 1.19E-02 rs ( ) 3.50E-04 rs ( ) 6.82E-01 Beijing Henan *Baseline risk model indicated no need for PC adjustment (lambda value before PC adjustment 1.02) NCI