Supplementary Materials

Size: px
Start display at page:

Download "Supplementary Materials"

Transcription

1 Supplementary Materials Genome-wide association study identifies 1p36.22 as a new susceptibility locus for hepatocellular carcinoma in chronic hepatitis B virus carriers Hongxing Zhang 1, Yun Zhai 1, Zhibin Hu 2,12, Chen Wu 3,12, Ji Qian 4,12, Weihua Jia 5,6,12, Fuchao Ma 7, Wenfeng Huang 7, Lixia Yu 1, Wei Yue 1, Zhifu Wang 1, Peiyao Li 1, Yang Zhang 1, Renxiang Liang 8, Zhongliang Wei 8, Ying Cui 7, Weimin Xie 7, Mi Cai 1, Xinsen Yu 9, Yunfei Yuan 5, Xia Xia 1, Xiumei Zhang 1, Hao Yang 1, Wei Qiu 1, Jingmin Yang 4, Feng Gong 1, Minshan Chen 10, Hongbing Shen 2, Dongxin Lin 3, Yixin Zeng 5,6, Fuchu He 1,11 & Gangqiao Zhou 1 1 State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, Beijing, China; 2 Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China; 3 Department of Etiology and Carcinogenesis, Cancer Institute and Hospital, Chinese Academy of Medical Sciences, Beijing, China; 4 MOE Key Laboratory of Contemporary Anthropology and Center for Evolutionary Biology, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai, China; 5 State Key Laboratory of Oncology in Southern China, Guangzhou, Guangdong, China; 6 Department of Experimental Research, Sun Yat-Sen University Cancer Center, Guangzhou, Guangdong, China; 7 Cancer Institute of Guangxi, Nanning, Guangxi, China; 8 Liver Cancer Institute at Fusui County, Guangxi, China; 9 Disease Prevention and Control Center at Haimen County, Jiangsu, China; 10 Department of

2 Hepatobiliary Oncology, Sun Yat-Sen University Cancer Center, Guangzhou, Guangdong, China; 11 Institute of Biomedical Sciences, Fudan University, Shanghai, China. 12 These authors contributed equally to this work. Correspondence should be addressed to: Prof. Gangqiao Zhou or Prof. Fuchu He The State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing, , P. R. China. Phone & fax: or

3 Index Supplementary Tables: Supplementary Table 1: Summary of the quality controls in the GWAS stage. Supplementary Table 2: Summary description of the samples used in this study. Supplementary Table 3: The distribution of the P values obtained in the GWAS stage. Supplementary Table 4: Summary of the GWA study and replication studies of top 45 SNPs. Supplementary Table 5: TDT investigation for rs polymorphism in 159 family trios. Supplementary Table 6: Association results of rs and rs in the GWAS stage (Guangxi population). Supplementary Table 7: Haplotypes for three SNPs in 1p36.22 locus and their associations with HCC risk in the GWAS stage. Supplementary Table 8: Independence tests for rs and rs in five case-control populations. Supplementary Table 9: Stratification analysis for rs by sex and age at diagnosis in the five independent cases-control sample sets of Chinese ancestry. Supplementary Table 10: Association results for the rs and rs in the GWAS stage (Guangxi population). Supplementary Table 11: Correlation between protein expression levels of UBE4B, KIF1B, and PGD and rs genotypes in HCC tissues and paired adjacent non-tumor tissues by IHC. Supplementary Table 12: The allele and genotype frequencies of rs in different populations. Supplementary Table 13: Recent GWA studies in liver diseases or liver traits. Supplementary Table 14: Primers and probes used for genotyping assays. Supplementary Table 15: Primers used for the quantitative real-time PCR assays. Supplementary Figures: Supplementary Figure 1: An overview of the study workflow. Supplementary Figure 2: Plots of the first five components from the multidimensional

4 scaling analysis (MDS, PLINK) using 707 participants in the GWAS. Supplementary Figure 3: Summary of genome-wide association results. Supplementary Figure 4: Forest plot for rs across all six studies. Supplementary Figure 5: The association between rs genotypes and age at diagnosis. Supplementary Figure 6: Regional plots for associations in the region surrounding rs in the GWAS stage. Supplementary Figure 7: Protein expression of UBE4B, KIF1B, and PGD by immunohistochemical staining in representative HCC tissues and paired adjacent non-tumor tissues. Supplementary Figure 8: Quantitative real-time RT-PCR for KIF1Bβ. Supplementary Figure 9: Power to detect a genetic effect of various sizes (OR = 0.25, 0.50, 0.53, 0.60, or 0.75) versus study sample size. Supplementary Figure 10: Identification of outlier by STRUCTURE and PLINK. Supplementary Note

5 Supplementary Table 1: Summary of the quality controls in the GWAS stage. (a) Samples should be removed calculated by PLINK and STRUCTURE. Programs PLINK 0 (CR < 90%) Samples should be removed, n (criterion) Low CR Sex-mismatched Duplicates or close relatives Outlier 3 (F < 0.8 for males; F > 0.2 for females) 4 (PI_HAT > 0.25) 1 (> 4 SD from a nearest neighbor) STRUCTURE CR, genotyping call rate; SD, standard diviation. 1 (high coefficient of ancestry from other population) (b) An overview of the SNPs by chromosome. Chr. SNPs on chip, n Criteria of QC, n CR < 90% MAF < 5% P HWE < SNPs failing QC, n (%) SNPs passing QC, n (%) (34.4) (65.6) (34.5) (65.5) (33.1) (66.9) (34.8) (65.2) (32.2) (67.8) (30.5) (69.5) (32.0) (68.0) (32.8) (67.2) (32.3) (67.7) (32.6) (67.4) (32.7) (67.3) (32.9) (67.1) (33.9) (66.1) (32.7) 9322 (67.3)

6 (33.4) 8408 (66.6) (33.6) 8896 (66.4) (33.3) 6552 (66.7) (34.5) 8607 (65.5) (29.6) 3831 (70.4) (33.0) 7359 (67.0) (31.0) 4301 (69.0) (34.9) 3472 (65.1) (37.3) 6142 (62.7) Sum (33.2) (66.8) CR, genotyping call rate; MAF, minor allele frequency in controls; QC, quality control; P HWE, P value for Hardy-Weinberg equilibrium test in controls.

7 Supplementary Table 2: Summary description of the samples used in this study. Variables Cases (n = 348) Guangxi population Beijing population Jiangsu population Controls Cases Controls Cases Controls P P (n = 359) (n = 276) (n = 266) (n = 507) (n = 215) Age, years Mean (SD) 45.8 (10.6) 41.6 (12.1) (9.1) 55.5 (12.2) (10.8) 52.9 (11.2) , n (%) 227 (65.2) 267 (74.4) (26.4) 73 (27.4) (37.3) 84 (39.1) 0.65 > 49, n (%) 121 (34.8) 92 (25.6) 203 (73.6) 193 (72.6) 318 (62.7) 131 (60.9) Sex, n (%) Female 45 (12.9) 49 (13.6) (14.1) 41 (15.4) (20.9) 40 (18.6) 0.54 Male 303 (87.1) 310 (86.4) 237 (85.9) 225 (84.6) 401 (79.1) 175 (81.4) Smoking status, n (%) Non-smoker 224 (64.4) 208 (57.9) (61.6) 13 (4.9) (46.9) 116 (54.0) 0.10 Smoker 124 (35.6) 151 (42.1) 90 (32.6) 8 (3.0) 268 (52.9) 99 (46.0) Unknown (5.8) 245 (92.1) 1 (0.2) 0 Smoking level, pack-years Mean (SD) 20.5 (19.7) 20.3 (19.0) (13.9) 14.8 (13.8) (7.0) 15.5 (10.2) , n (%) 60 (48.4) 80 (53.0) (57.8) 5 (62.5) (7.1) 20 (20.2) 0.67 > 18, n (%) 64 (51.6) 71 (47.0) 38 (42.2) 3 (37.5) 23 (8.6) 20 (20.2) Unknown (84.3) 59 (59.6) Drinking status, n (%) Non-drinker 256 (73.6) 262 (73.0) (64.1) 12 (4.5) (45.6) 128 (59.5) Drinker 92 (26.4) 97 (27.0) 83 (30.1) 9 (3.4) 275 (54.2) 87 (40.5) Unknown (5.8) 245 (92.1) 1 (0.2) 0 First-degree family history of HCC, n (%) Negative 292 (83.9) 338 (94.2) (5.8) 231 (86.8) (81.5) 183 (85.1) 0.28 Positive 56 (16.1) 21 (5.8) 0 12 (4.6) 93 (18.3) 32 (14.9) Unknown (94.2) 23 (8.6) 1 (0.2) 0 P

8 Supplementary Table 2 continued. Cases (n = 751) Guangdong population Shanghai population Guangxi family trios Controls Cases Controls Probands Parents P P (n = 509) (n = 428) (n = 440) (n = 159) (n = 318) Age, years Mean (SD) 49.3 (11.6) 48.1 (11.4) (9.1) 51.2 (8.5) (7.2) 64.1 (8.2) 49, n (%) 392 (52.2) 278 (54.6) (45.1) 193 (43.9) (95.6) 10 (3.1) > 49, n (%) 359 (47.8) 231 (45.4) 235 (54.9) 247 (56.1) 7 (4.4) 308 (96.9) Sex, n (%) Female 99 (13.2) 81 (15.9) (11.9) 57 (13.0) (7.5) 159 (50.0) Male 652 (86.8) 428 (84.1) 377 (88.1) 383 (87.0) 147 (92.5) 159 (50.0) Smoking status, n (%) Non-smoker 110 (14.6) 113 (22.2) (53.3) 214 (48.6) (52.2) 219 (68.9) Smoker 95 (12.6) 95 (18.7) 200 (46.7) 226 (51.4) 76 (47.8) 99 (31.1) Unknown 546 (72.8) 301 (59.1) Smoking level, pack-years Mean (SD) 20.8 (15.6) 22.0 (12.9) (15.7) 14.5 (11.3) (20.3) 31.3 (25.7) 18, n (%) 52 (54.7) 36 (37.9) (31.5) 167 (73.8) (43.4) 37 (37.4) > 18, n (%) 43 (45.3) 59 (62.1) 85 (42.5) 57 (25.2) 43 (56.6) 62 (62.6) Unknown (26.0) 2 (0.9) 0 0 Drinking status, n (%) Non-drinker 112 (14.9) 115 (22.6) (58.4) 220 (50.0) (56.6) 237 (74.5) Drinker 93 (12.4) 93 (18.3) 174 (40.7) 220 (50.0) 69 (43.4) 81 (25.5) Unknown 546 (72.7) 301 (59.1) 4 (0.9) First-degree family history of HCC, n (%) Negative 174 (23.2) 185 (36.3) (10.3) 431 (98.0) (75.5) NA Positive 31 (4.1) 23 (4.5) 64 (15.0) 9 (2.0) 39 (24.5) NA Unknown 546 (72.7) 301 (59.2) 320 (74.7) 0 0 NA NA, not applicable; SD, standard deviation. P values are calculated by t test (2-sided) for means of age and smoking level, and χ 2 test (2-sided) for other variables.

9 Supplementary Table 3: The distribution of the P values obtained in the GWAS stage. Chr. P < P < P < P < P P Sum The association between SNP genotype and HCC risk was evaluated by additive model using logistic regression while controlling for confounding factors (including age, sex, status of smoking and drinking, pack-years of smoking, and family history), and the P value was calculated for each SNP.

10 Supplementary Table 4: Summary of the GWA study and replication studies of top 45 SNPs. GWAS GWAS (Guangxi population) Replication 1 (Beijing population) SNP Chr Location a Gene Allele b Rank 348 cases c 359 controls c OR (95% CI) P 276 cases c 266 controls c OR (95% CI) P 1 rs A/G 2/106/208 1/47/ ( ) 7.0E-8 1/27/240 0/35/ ( ) rs SLC19A3 A/C 2/91/231 1/40/ ( ) 5.1E-7 3/42/231 4/41/ ( ) rs KIF1B G/A 8/100/240 26/141/ ( ) 5.8E-6 5/86/185 24/109/ ( ) 3.5E-6 4 rs LOC T/C 6/95/234 1/57/ ( ) 1.8E-5 6/48/222 4/45/ ( ) rs G/T 0/77/239 0/39/ ( ) 1.8E-5 0/8/268 0/10/ ( ) rs T/C 6/83/242 29/106/ ( ) 2.2E-5 17/106/139 14/99/ ( ) rs KCNK5 T/C 2/100/218 0/59/ ( ) 2.3E-5 3/61/212 3/73/ ( ) rs LOC A/G 75/184/74 52/177/ ( ) 2.6E-5 48/148/80 45/115/ ( ) rs T/G 3/83/229 0/48/ ( ) 2.8E-5 1/30/237 2/32/ ( ) rs T/G 5/93/243 27/116/ ( ) 3.4E-5 13/110/149 13/96/ ( ) rs ZDHHC14 T/C 4/45/297 8/92/ ( ) 3.5E-5 15/79/168 8/60/ ( ) 8.3E-3 12 rs LOC C/T 48/176/109 38/135/ ( ) 3.6E-5 32/130/101 27/116/ ( ) rs ZDHHC14 G/T 4/47/293 8/94/ ( ) 3.6E-5 failed failed failed failed 14 rs CCDC148 A/T 0/74/239 0/36/ ( ) 3.6E-5 0/0/276 0/0/ rs ZDHHC14 T/C 4/47/297 7/95/ ( ) 3.6E-5 failed failed failed failed 16 rs KCNK9 A/C 35/84/217 14/57/ ( ) 3.9E-5 11/77/188 10/92/ ( ) rs A/G 34/170/142 61/198/ ( ) 4.0E-5 68/120/85 59/134/ ( ) rs CPNE8 G/C 10/97/241 22/143/ ( ) 4.0E-5 6/61/200 11/88/ ( ) 3.2E-3 19 rs COL18A1 T/C 3/71/272 8/115/ ( ) 4.1E-5 4/52/218 5/56/ ( ) rs A/C 45/38/246 5/64/ ( ) 4.6E-5 9/59/208 9/75/ ( ) rs T/G 4/69/261 1/33/ ( ) 4.8E-5 40/129/103 6/72/ ( ) 1.1E rs T/C 12/78/258 17/133/ ( ) 5.0E-5 25/106/140 27/107/ ( ) rs LPHN2 T/C 30/145/170 16/113/ ( ) 5.1E-5 10/107/158 9/98/ ( ) rs PRKCQ A/G 18/125/205 43/148/ ( ) 5.1E-5 29/112/126 32/110/ ( ) rs PRKCQ G/T 18/130/200 44/152/ ( ) 5.2E-5 25/118/122 31/111/ ( ) rs CACNA2D3 G/C 24/152/171 52/182/ ( ) 5.4E-5 27/126/121 33/106/ ( ) rs LIMCH1 G/A 3/90/222 1/52/ ( ) 5.6E-5 1/31/242 0/33/ ( ) rs PDE8A G/T 1/76/245 1/38/ ( ) 5.6E-5 1/28/247 2/21/ ( ) rs NKAIN2 C/T 1/93/231 2/47/ ( ) 5.7E-5 1/36/234 2/20/ ( ) rs G/A 0/73/249 0/34/ ( ) 5.8E-5 0/27/237 0/26/ ( ) 0.94

11 31 rs PRUNE2 A/C 16/121/181 4/88/ ( ) 6.5E-5 10/78/182 8/66/ ( ) rs A/G 8/36/297 11/81/ ( ) 6.7E-5 5/67/202 2/60/ ( ) rs MYO18B A/G 2/89/237 3/44/ ( ) 6.7E-5 0/44/222 0/38/ ( ) rs ATF6 A/G 24/133/191 51/152/ ( ) 6.8E-5 33/123/110 38/105/ ( ) rs A/G 42/176/105 29/136/ ( ) 6.8E-5 61/128/87 46/130/ ( ) rs PHACTR3 C/G 14/127/207 6/93/ ( ) 8.0E-5 6/77/191 6/58/ ( ) rs G/A 2/77/250 1/43/ ( ) 8.3E-5 6/91/177 3/36/ ( ) 3.7E-7 38 rs TPPP2 G/A 17/151/171 13/108/ ( ) 8.4E-5 0/12/261 0/7/ ( ) rs C/T 86/158/79 59/143/ ( ) 8.5E-5 72/127/76 44/127/ ( ) 4.9E-3 40 rs PRKCQ C/G 18/128/200 42/152/ ( ) 8.6E-5 25/118/130 33/106/ ( ) rs RPL13AP3 G/A 32/158/157 61/182/ ( ) 8.9E-5 26/107/141 30/109/ ( ) rs PFTK1 G/C 47/179/120 30/155/ ( ) 9.3E-5 30/116/129 21/110/ ( ) rs CACNA2D3 C/T 18/141/188 34/188/ ( ) 9.9E-5 11/101/153 20/99/ ( ) rs ATF6 A/C 24/124/190 52/140/ ( ) 9.9E-5 30/125/112 38/105/ ( ) rs ATF6 A/G 24/128/192 52/146/ ( ) 9.9E-5 32/129/106 37/106/ ( ) 0.55

12 Supplementary Table 4 continued. GWAS Replication 2 (Jiangsu population) Rank SNP Chr Location a Gene Allele b 507 cases c 215 controls c OR (95% CI) P 1 rs A/G 2 rs SLC19A3 A/C 3 rs KIF1B G/A 26/181/300 21/101/ ( ) 3.9E-5 4 rs LOC T/C 5 rs G/T 6 rs T/C 7 rs KCNK5 T/C 8 rs LOC A/G 9 rs T/G 10 rs T/G 11 rs ZDHHC14 T/C 12 rs LOC C/T 13 rs ZDHHC14 G/T 14 rs CCDC148 A/T 15 rs ZDHHC14 T/C 16 rs KCNK9 A/C 17 rs A/G 18 rs CPNE8 G/C 12/142/353 10/60/ ( ) rs COL18A1 T/C 20 rs A/C 21 rs T/G 5/62/440 2/25/ ( ) rs T/C 23 rs LPHN2 T/C 24 rs PRKCQ A/G 25 rs PRKCQ G/T 26 rs CACNA2D3 G/C 27 rs LIMCH1 G/A 28 rs PDE8A G/T 29 rs NKAIN2 C/T 30 rs G/A

13 31 rs PRUNE2 A/C 32 rs A/G 33 rs MYO18B A/G 34 rs ATF6 A/G 35 rs A/G 36 rs PHACTR3 C/G 37 rs G/A 3/70/434 0/31/ ( ) rs TPPP2 G/A 39 rs C/T 118/241/141 55/94/ ( ) rs PRKCQ C/G 41 rs RPL13AP3 G/A 42 rs PFTK1 G/C 43 rs CACNA2D3 C/T 44 rs ATF6 A/C 45 rs ATF6 A/G Chr, chromosome; OR, odds ratio; CI, confidence interval. a Genomic position (NCBI Build 36). b Minor allele/major allele. c Number of minor homozygotes/number of heterozygotes/number of major homozygotes. P values, ORs and 95% CIs were calculated under additive model by logistic regression while adjusting for age, sex, status of smoking and drinking, smoking level and family history of HCC in GWAS stage, adjusting for age and sex in replication stages. 45 top significantly associated SNPs (P 10-4 ) in GWAS stage were further genotyped in replication 1 (Beijing population). rs (ranks the eleventh in GWAS scan) showed significant association in replication 1, but with effect opposite to that of GWAS scan, therefore not further genotyped in subsequent replications. The genotyping of rs and rs failed in replication 1; these two SNPs were not further genotyped because they were in absolute LD with rs (r 2 = 1; CHB data). P value, OR and 95% CI were not available for rs in replication 1 (Beijing population) due to monomorphism. 5 SNPs were replicated in replication 1 (P < 0.05, with effects in the same direction as in GWAS stage), and then were genotyped in replication 2. Only rs , which ranks the third in the GWAS scan, was validated again and went forward to replication 3 and 4.

14 Supplementary Table 5: TDT investigation for rs polymorphism in 159 family trios. Families All (n = 159) Genotype and allele distribution in probands, n (%) TDT AA AG GG A G Allele Transmitted Nontransmitted χ 2 P value 91 (57.2) 60 (37.7) 8 (5.0) TDT, transmission/disequilibrium test. 242 (76.1) 76 rs [a] (23.9) rs [g]

15 Supplementary Table 6: Association results of rs and rs in the GWAS stage (Guangxi population). Polymorphisms Cases, n (%) (n = 348) Controls, n (%) (n = 359) OR (95% CI) a P value a OR (95% CI) b P value b rs GG 229 (65.8) 192 (53.5) 0.59 ( ) ( ) GA 111 (31.9) 142 (39.5) AA 8 (2.3) 25 (7.0) rs TT 226 (64.9) 192 (53.5) 0.61 ( ) (0- ) 1.0 TG 114 (32.8) 142 (39.5) GG 8 (2.3) 25 (7.0) OR, odds ratio; CI, confidence interval. P values, ORs and 95% CIs were calculated under additive model by logistic regression while adjusting for age, sex, status of smoking and drinking, pack-years of smoking, and family history of HCC. a Values before adjustment for rs b Values after adjustment for rs

16 Supplementary Table 7: Haplotypes for three SNPs in 1p36.22 locus and their associations with HCC risk in the GWAS stage. Haplotype Cases, 2n (%) Controls, 2n (%) (n = 348) (n = 359) OR (95% CI) P value b A-T-G 566 (81.3) 525 (73.1) 1 G-G-A 114 (16.4) 192 (26.7) 0.55 ( ) Others a 16 (2.3) 1 (0.1) ( ) The haplotype is in the order of rs , rs and rs a Others include haplotypes A-G-A and G-G-G. b No correction was made for testing multiple alleles.

17 Supplementary Table 8: Independence tests for rs and rs in five case-control populations. (a) Association results of rs before and after adjustment for rs Populations Genotypes Cases, n (%) Controls, n (%) OR (95% CI) a P value a OR (95% CI) b P value b Guangxi (GWAS stage) n = 348 n = 359 Beijing (Replication stage 1) Jiangsu (Replication stage 2) Guangdong (Replication stage 3a) Shanghai (Replication stage 3b) TT 287 (82.5) 323 (90.0) 1.89 ( ) ( ) TC 60 (17.2) 35 (9.7) CC 1 (0.3) 1 (0.3) n = 276 n = 266 TT 207 (75) 205 (77.1) 1.19 ( ) ( ) 0.64 TC 57 (20.7) 58 (21.8) CC 12 (4.3) 3 (1.1) n = 507 n = 215 TT 407 (80.3) 190 (88.4) 1.35 ( ) ( ) 0.19 TC 96 (18.9) 22 (10.2) CC 4 (0.8) 3 (1.4) n = 751 n = 509 TT 637 (84.8) 449 (88.2) 1.63 ( ) ( ) 0.14 TC 105 (14) 57 (11.2) CC 9 (1.2) 3 (0.6) n = 428 n = 440 TT 342 (79.9) 361 (82) 1.24 ( ) ( ) 0.65 TC 80 (18.7) 77 (17.5) CC 6 (1.4) 2 (0.5) Pooled n = 2310 n = 1789 TT 1880 (81.4) 1528 (85.4) 1.33 ( ) ( ) 0.025

18 TC 398 (17.2) 249 (13.9) CC 32 (1.4) 12 (0.7) OR, odds ratio; CI, confidence interval. P values, ORs and 95% CIs were calculated under additive model by logistic regression while adjusting for age, sex, status of smoking and drinking, pack-years of smoking, family history of HCC, and population, where it was appropriate. a Values before adjustment for rs b Values after adjustment for rs (b) Association results of rs before and after adjustment for rs Populations Genotypes Cases, n (%) Controls, n (%) OR (95% CI) a P value a OR (95% CI) b P value b Guangxi (GWAS stage) n = 348 n = 359 Beijing (Replication stage 1) Jiangsu (Replication stage 2) Guangdong (Replication stage 3a) AA 240 (69.0) 192 (53.5) 0.53 ( ) ( ) AG 100 (28.7) 141 (39.3) GG 8 (2.3) 26 (7.2) n = 276 n = 266 AA 185 (67.0) 133 (50.0) 0.49 ( ) ( ) AG 86 (31.2) 109 (41.0) GG 5 (1.8) 24 (9.0) n = 507 n = 215 AA 300 (59.2) 93 (43.3) 0.58 ( ) ( ) AG 181 (35.7) 101 (47.0) GG 26 (5.1) 21 (9.8) n = 751 n = 509 AA 497 (66.2) 279 (54.8) 0.65 ( ) ( ) AG 228 (30.4) 195 (38.3) GG 26 (3.5) 35 (6.9) Shanghai n = 428 n = 440

19 (Replication stage 3b) AA 275 (64.2) 239 (54.3) 0.66 ( ) ( ) AG 141 (32.9) 169 (38.4) GG 12 (2.8) 32 (7.3) Pooled n = 2310 n = 1789 AA 1497 (64.8) 936 (52.3) 0.62 ( ) ( ) AG 736 (31.9) 715 (40.0) GG 77 (3.3) 138 (7.7) OR, odds ratio; CI, confidence interval. P values, ORs and 95% CIs were calculated under additive model by logistic regression while adjusting for age, sex, status of smoking and drinking, pack-years of smoking, family history of HCC, and population, where it was appropriate. a Values before adjustment for rs b Values after adjustment for rs

20 Supplementary Table 9: Stratification analysis for rs by sex and age in the five independent cases-control sample sets of Chinese ancestry. GWAS stage Replication stage 1 Replication stage 2 Replication stage 3 Pooled replication Pooled GWAS and Guangxi, n (%) Beijing, n (%) Jiangsu, n (%) Guangdong, n (%) Shanghai, n (%) stage 1,2 and 3, n (%) replication stages, n (%) rs Case / Control Case / Control Case / Control Case / Control Case / Control Case / Control Case / Control Sex Female n = 45 / n = 49 n = 39 / n = 41 n = 106 / n = 40 n = 99 / n = 81 n = 51 / n = 57 n = 295 / n = 219 n = 340 / n = 268 AA 28 (62.2) / 28 (57.1) 27 (69.2) / 20 (48.8) 52 (49.1) / 15 (37.5) 59 (59.6) / 49 (60.5) 27 (52.9) / 29 (50.9) 165 (55.9) / 113 (51.6) 193 (56.8) / 141 (52.6) AG 15 (33.3) / 18 (36.7) 12 (30.8) / 18 (43.9) 49 (46.2) / 21 (52.5) 33 (33.3) / 24 (29.6) 20 (39.2) / 22 (38.6) 114 (38.6) / 85 (38.8) 129 (37.9) / 103 (38.4) GG 2 (4.4) / 3 (6.1) 0 (0.0) / 3 (7.3) 5 (4.7) / 4 (10.0) 7 (7.1) / 8 (9.9) 4 (7.8) / 6 (10.5) 16 (5.4) / 21 (9.6) 18 (5.3) / 24 (9.0) OR (95% CI) 0.86 ( ) 0.39 ( ) 0.63 ( ) 0.96 ( ) 0.99 ( ) 0.80 ( ) 0.81 ( ) P value Male n = 303 / n = 310 n = 237 / n = 225 n = 401 / n = 175 n = 652 / n = 428 n = 377 / n = 383 n = 1667 / n = 1211 n = 1970 / n = 1521 AA 212 (70.0) / 164 (52.9) 158 (66.7) / 113 (50.2) 248 (61.8) / 78 (44.6) 438 (67.2) / 230 (53.7) 248 (65.8) / 210 (54.8) 1092 (65.5) / 631 (52.1) 1304 (66.2) / 795 (52.3) AG 6 (2.0) / 23 (7.4) 5 (2.1) / 21 (9.3) 21 (5.2) / 17 (9.7) 19 (2.9) / 27 (6.3) 8 (2.1) / 26 (6.8) 53 (3.2) / 91 (7.5) 59 (3.0) / 114 (7.5) GG 85 (28.1) / 123 (39.7) 74 (31.2) / 91 (40.4) 132 (32.9) / 80 (45.7) 195 (29.9) / 171 (40.0) 121 (32.1) / 147 (38.4) 522 (31.3) / 489 (40.4) 607 (30.8) / 612 (40.2) OR (95% CI) 0.50 ( ) 0.51 ( ) 0.57 ( ) 0.60 ( ) 0.62 ( ) 0.60 ( ) 0.59 ( ) P value P heterogeneity Age 49 n = 227 / n = 267 n = 73 / n = 73 n = 189 / n = 84 n = 392 / n = 278 n = 293 / n = 193 n = 847 / n = 628 n = 1074 / n = 895 AA 161 (70.9) / 144 (53.9) 54 (74.0) / 32 (43.8) 105 (55.6) / 35 (41.7) 265 (67.6) / 152 (54.7) 130 (67.4) / 102 (52.8) 554 (65.4) / 321 (51.1) 715 (66.6) / 465 (52.0) AG 61 (26.9) / 104 (39.0) 18 (24.7) / 31 (42.5) 77 (40.7) / 38 (45.2) 109 (27.8) / 103 (37.1) 60 (31.1) / 75 (38.9) 264 (31.2) / 247 (39.3) 325 (30.3) / 351 (39.2) GG 5 (2.2) / 19 (7.1) 1 (1.4) / 10 (13.7) 7 (3.7) / 11 (13.1) 18 (4.6) / 23 (8.3) 3 (1.6) / 16 (8.3) 29 (3.4) / 60 (9.6) 34 (3.2) / 79 (8.8) OR (95% CI) 0.51 ( ) 0.20 ( ) 0.54 ( ) 0.65 ( ) 0.54 ( ) 0.57 ( ) 0.56 ( ) P value > 49 n = 121 / n = 92 n = 203 / n = 193 n = 318 / n = 131 n = 359 / n = 231 n = 235 / n = 247 n = 1115 / n = 802 n = 1236 / n = 894 AA 79 (65.3) / 48 (52.2) 131 (64.5) / 101 (52.3) 195 (61.3) / 58 (44.3) 232 (64.6) / 127 (55.0) 145 (61.7) / 137 (55.5) 703 (63.0) / 423 (52.7) 782 (63.3) / 471 (52.7) AG 39 (32.2) / 37 (40.2) 68 (33.5) / 78 (40.4) 104 (32.7) / 63 (48.1) 119 (33.1) / 92 (39.8) 81 (34.5) / 94 (38.1) 372 (33.4) / 327 (40.8) 411 (33.3) / 364 (40.7) GG 3 (2.5) / 7 (7.6) 4 (2.0) / 14 (7.3) 19 (6.0) / 10 (7.6) 8 (2.2) / 12 (5.2) 9 (3.8) / 16 (6.5) 40 (3.6) / 52 (6.5) 43 (3.5) / 59 (6.6) OR (95% CI) 0.58 ( ) 0.58 ( ) 0.62 ( ) 0.67 ( ) 0.80 ( ) 0.68 ( ) 0.67 ( )

21 P value P heterogeneity OR, odds ratio; CI, confidence interval. P values, ORs and 95% CIs were calculated under additive model by logistic regression while adjusting for age, sex, status of smoking and drinking, pack-years of smoking, family history of HCC, and population, where it was appropriate. P heterogeneity were calculated to compare the difference of ORs within each stratum of sex (female and male) or age ( 49 and > 49).

22 Supplementary Table 10: Association results for the rs and rs in the GWAS stage (Guangxi population). Polymorphisms Cases, n (%) Controls, n (%) and genotypes (n = 348) (n = 359) OR (95% CI) P value rs TT 189 (54.3) 159 (44.3) 0.81 ( ) TC 125 (35.9) 170 (47.4) CC 34 (9.8) 30 (8.4) rs GG 221 (63.5) 237 (66.0) 1.07 ( ) 0.63 GA 114 (32.8) 111 (30.9) AA 13 (3.7) 11 (3.1) OR, odds ratio; CI, confidence interval. P values, ORs and 95% CIs were calculated under additive model by logistic regression while adjusting for age, sex, status of smoking and drinking, pack-years of smoking, and family history of HCC. rs exists in Affymetrix SNP Array 5.0, therefore was directly genotyped in GWAS stage. The genotypes of rs were imputed in our GWAS samples with HapMap CHB data (Release 22) as reference panel. The average genotype probability is > 98%.

23 Supplementary Table 11: Correlation between protein expression levels of UBE4B, KIF1B, and PGD and rs genotypes in HCC tissues and paired adjacent non-tumor tissues by IHC. Proteins Genotypes Expression levels, n Negative Low High AG + GG vs. AA P value Tumor vs. Adjacent UBE4B Tumor AA AG GG Adjacent 0.83 AA AG GG KIF1Bα Tumor AA AG GG Adjacent 0.16 AA AG GG Total KIF1B isoforms Tumor AA AG GG Adjacent AA AG GG PGD Tumor AA AG GG Adjacent 0.16 AA AG GG IHC, immunohistochemical. Total KIF1B isoforms denote all KIF1B isoforms including KIF1Bα and KIF1Bβ. Expression levels were classified into three groups (negative, low, and high expression) with scores of the immunohistochemistry signals. The differences of the protein levels between the tumors and paired adjacent non-tumors were assessed by a Wilcox test. The differences of the protein levels between the rs AA and AG + GG genotype carriers were assessed by a trend χ 2 test.

24 Supplementary Table 12: The allele and genotype frequencies of rs in different populations. Populations Sample size Allele, n (%) Genotype, n (%) A G AA AG GG In the present study Chronic HBV carriers with HCC (80.7) 890 (19.3) 1497 (64.8) 736 (31.9) 77 (3.3) Chronic HBV carriers without HCC (72.3) 991 (27.7) 936 (52.3) 715 (40.0) 138 (7.7) Non-HBV carriers controls (68.6) 116 (31.4) 89 (48.1) 76 (41.1) 20 (10.8) Random controls (72.1) 541 (27.9) 504 (52.0) 389 (40.1) 76 (7.9) In HapMap ASW (95.3) 5 (4.7) 48 (90.6) 5 (9.4) 0 (0.0) YRI (95.6) 10 (4.4) 103 (91.2) 10 (8.8) 0 (0.0) LWK (90.0) 18 (10.0) 73 (81.1) 16 (17.8) 1 (1.1) MKK (85.3) 42 (14.7) 106 (74.1) 32 (22.4) 5 (3.5) CHB (72.0) 47 (28.0) 44 (52.4) 33 (39.3) 7 (8.3) CHD (67.6) 55 (32.4) 37 (43.5) 41 (48.2) 7 (8.2) GIH (71.6) 50 (28.4) 45 (51.1) 36 (40.9) 7 (8.0) JPT (71.8) 48 (28.2) 46 (54.1) 30 (35.3) 9 (10.6) CEU (68.1) 72 (31.9) 53 (46.9) 48 (42.5) 12 (10.6) TSI (67.0) 58 (33.0) 42 (47.7) 34 (38.6) 12 (13.6) MEX (64.0) 36 (36.0) 21 (42.0) 22 (44.0) 7 (14.0) ASW, African ancestry in southwest USA; YRI, Yoruba in Ibadan, Nigeria; LWK, Luhya in Webuye, Kenya; MKK, Maasai in Kinyawa, Kenya; CHB, Chinese Han in Beijing, China; CHD, Chinese in Metropolitan Denver, Colorado; GIH, Gujarati Indians in Houston, Texas; JPT, Japanese in Tokyo, Japan; CEU, Utah residents with northern and western European ancestry from the CEPH collection; TSI, Toscani in Italia; MEX, Mexican ancestry in Los Angeles, California. Non-HBV carriers controls are subjects negative for both hepatitis B surface antigen and antibody immunoglobulin G to hepatitis B core antigen. Random controls are subjects having no information of HBV infection.

25 Supplementary Table 13: Recent GWA studies in liver diseases or liver traits. Diseases/traits Genes SNPs OR (95% CI) or Beta coefficient (standard error) P Chronic hepatitis B HLA-DPA1 rs ( ) 1.3E-13 HLA-DPB1 rs ( ) 1.8E-12 Non-alcoholic fatty liver disease PNPLA3 rs e-10 Response to hepatitis C treatment IL28B rs ( ) 1.1E-25 IL28B rs ( ) 2.8E-27 IL28B rs ( ) 2.7E-32 Drug-induced liver injury due to HLA-B rs ( ) 8.7E-33 Flucloxacillin Liver-enzyme levels ALT CPN1 rs (0.008) 2.9E-08 ALT CHUK rs (0.007) 3.6E-07 ALT CHUK rs (0.007) 4.5E-07 ALT PNPLA3 rs (0.010) 8.2E-12 ALT SAMM50 rs (0.009) 9.4E-07 GGT HNF1A rs (0.001) 3.2E-08 GGT GGT1 rs (0.001) 3.9E-06 ALP NBPF3-ALPL rs (0.005) 1.4E-10 ALP GPLD1 rs (0.006) 4.4E-08 ALP GPLD1 rs (0.006) 2.2E-08 ALP ABO rs (0.005) 1.4E-08 ALP ABO rs (0.006) 6.8E-10 ALP ABO rs (0.006) 6.2E-10 ALP ABO rs (0.006) 1.4E-08 ALP ABO rs (0.005) 4.6E-29 ALP ABO rs (0.006) 3.4E-07 ALP ABO rs (0.006) 3.1E-08 ALP JMJD1C rs (0.005) 4.7E-07 ALP REEP3 rs (0.005) 3.9E-07 OR, odds ratio; CI, confidence interval; -, not available. Reference 1 1 2,3 4 5,6 5, In our GWAS data OR (95% CI) P 1.27 ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 0.91

26 Supplementary Table 14: Primers and probes used for genotyping assays. SNPs Primers Sequences SNPstream a rs Forward 5'-AATAAATATTGCTGCACAATTACCA-3' Reverse 5'-TATTTAACTTCTCTGAGCCTCAGTTT-3' Extension 5'-ATCTAACGCACCTACGACCTCATTGTCTTATTCCAACAAAAACCC-3' rs NA NA rs Forward 5'-CTTCAGCCTCATTTTTTTTTTATAC-3' Reverse 5'-TTCCCTGCTTTGAAAATTTG-3' Extension 5'-GTGATTCTGTACGTGTCGCCAAACACATAGTGCCTCTATGAGTCC-3' rs Forward 5'-AAATTGCTTATTTTGCTTCACATC-3' Reverse 5'-ATGGTTACATGTTACATCAGGTAGC-3' Extension 5'-GGCTATGATTCGCAATGCTTGAGTGAGTTGTTGAAAGCTGATTTC-3' rs Forward 5'-GAGTTAATTTACTGTGCCTGCAC-3' Reverse 5'-GTCCTGGGTCCGTGGGAT-3' Extension 5'-CTCACTATCTGACAAGCCACGTTGTCTGGGCCCGAACAAGTTGAA-3' rs Forward 5'-ATTTTAACCTATGTGTCCCTTGG-3' Reverse 5'-AGAGGTTATACTCATTCTCTATAATTTTGC-3' Extension 5'-CAGCACTATTACCATCACGTTATAATTTTGCAATACATTGAATTC-3' rs Forward 5'-CTCCTTCCTTCCTAAGCTCC-3' Reverse 5'-AGTGTGTATAGACCAGTCCAAAGG-3' Extension 5'-CAACAATACGAGCCAGCAAGTTTCTTGTGTTCTAAGGTTTTCCAT-3' rs Forward 5'-AACGAACACATCCAGAGCTC-3' Reverse 5'-AGGTGTTGGAAAACCTTAATAAAGA-3' Extension 5'-AACATACAGACGCACTCCTCATAAAGACTGCTAGGCAGAAATGCT-3' rs Forward 5'-TGAGGTAAGATGCAGATCCC-3' Reverse 5'-CCACCAATTACCTGCTGC-3' Extension 5'-CAAGCAACGACCTACTACAACCGGCCGAAAGCCCGTGTTCTTCTC-3' rs Forward 5'-TTGTGCTTTTATAAAGTAGTTCCAATT-3' Reverse 5'-TAGTTGAGGAATATTATATTTACCTTGTAGG-3' Extension 5'-CAACAAGTAATCCGCAGACTTTTACCTTGTAGGAGTTTCCGTAGA-3' rs Forward 5'-ATTCAGGCCATGCACTGG-3' Reverse 5'-ACACACCCCAGCTTTCCT-3' Extension 5'-CAAGACCGCAACTAGATACAAAGAAGAGGTGAAGTAGAGTCCAGG-3' rs Forward 5'-TCCTAAGCACACTGCATGC-3' Reverse 5'-TTCCTCACATTCATTTCCTTC-3' Extension 5'-CGATCACCTCACTAGAACAACATTCACTCATTCATTCAATCAACA-3' rs Forward 5'-ATGAGCCAGGACCCTCAC-3 Reverse 5'-TGAGAAGGAGTTCGGAGGA-3 Extension 5'-CGACTGTAGGTGCGTAACTCCACAGCTGCCCCCTTTGTCTTTCAA-3 rs Forward 5'-ATTGTATGATGATCAACACTAAGATCA-3 Reverse 5'-TGAGTTTTAATTCTTTGGAAGTCAA-3 Extension 5'-CGTGCCGCTCGTGATAGAATAACTAATTGAGTTCACAGGCTTAAA-3 rs Forward 5'-TAATGAGGATTCACTGAGGCTG-3 Reverse 5'-TGTCCAGGGCAGACCCAC-3 Extension 5'-AGAGCGAGTGACGCATACTACAAATGAAGTTTGCATTCCACAAGC-3 rs Forward 5'-TTAGGAACTTGCCTCTTGTCA-3 Reverse 5'-AATTAGGTCCCTGTTCTAAGCAG-3 Extension 5'-GGATGGCGTTCCGTCCTATTGGCTTTGCAGCTGTCTCCCTAGCAG-3 rs Forward 5'-CACAAATGACTCCAAGCTTAATAGT-3 Reverse 5'-ATTGAGTCTTTCCAATGAGGC-3 Extension 5'-AGGGTCTCTACGCTGACGATCAGGAATTCAGAAAGAGCAAATCCA-3 rs Forward 5'-AATTTTACTCATATTCCTTCAAAGAGAA-3 Reverse 5'-TAAATTTACTTCTTTGTCATAAACTTGTAGC-3 Extension 5'-GGATGGCGTTCCGTCCTATTTTCAAACAGAAGTAATGAGGCTTGC-3 rs Forward 5'-ATCTGTGATTCCAATTGTATTATGTTAC-3 Reverse 5'-AAATGGAAAACCAAAGAGATAAGAG-3

27 Extension 5'-AGCGATCTGCGAGACCGTATTACTTGAGATCGTATTGTAATCCCT-3 rs Forward 5'-TAATCTCCTTTATAATAAACCGGTAAA-3 Reverse 5'-GAGTTCCCACAACCCTCTC-3 Extension 5'-GGCTATGATTCGCAATGCTTGTGTTTCCCTGAGGTCTGTGAACCA-3 rs Forward 5'-CTGTGTATAAATACCATTTTATTAGGA-3 Reverse 5'-TAACTCTTCCCCTTAGTCTCTGC-3 Extension 5'-ACGCACGTCCACGGTGATTTAATGTTTTGGTTTAGGTCTCAGAAG-3 rs Forward 5'-AATGAACAATACGTGTACAACTTGG-3 Reverse 5'-TTCTACTCATGATTGCAACAAGG-3 Extension 5'-AGATAGAGTCGATGCCAGCTTTGTCTCTATTCTGGTCTACAGTAC-3 rs Forward 5'-TTCAAAAGGCACAAATGAGG-3 Reverse 5'-TGTTTTCAGCAAGAATTCACAC-3 Extension 5'-AGCGATCTGCGAGACCGTATCCAGTGTAACTAGATTCTTATCTTG-3 rs Forward 5'-TTTCAAGGACAGAACCTAGGG-3 Reverse 5'-TTTGGCCTTTCTCAAACCA-3 Extension 5'-GACCTGGGTGTCGATACCTAAGGTATGGAATTCCGAGGTCCTGGG-3 rs Forward 5'-TAAAATACTTAGTGTCCTTGGAAGAGTAT-3 Reverse 5'-CTAAATGTAACCCAAACCTTTCAT-3 Extension 5'-AGGGTCTCTACGCTGACGATTTTCTCTTATGTTTTCTCTGACTTC-3 rs Forward 5'-TTAACACTGCTCAGCAGATAGC-3 Reverse 5'-CTTTTTAATTTTATAAATGTCGGGAG-3 Extension 5'-CGTGCCGCTCGTGATAGAATGCTTCCTTATGTGTGTTAAAGCAAA-3 rs Forward 5'-ATACCTCTCTTACTCTGGTTGGC-3 Reverse 5'-TAGAGGAAGGTTGTACAGGTCTG-3 Extension 5'-GTGATTCTGTACGTGTCGCCGCCAGGCATGTGCTCATGCTTCCCA-3 rs Forward 5'-AAACTGAAAGTTTAAGTAATCTGCC-3 Reverse 5'-AATGTGTCTCTCTTTTCAGACTAAATC-3 Extension 5'-GTGATTCTGTACGTGTCGCCCTAGGGCTCTGAAGCACAGTTTTTA-3 rs Forward 5'-CCTATAAAGAGGCGGAAGAG-3 Reverse 5'-TCTTGATTGAGATTTGCATGAA-3 Extension 5'-ACGCACGTCCACGGTGATTTAGTGACACTCTAAGTGCTACGCTGA-3 rs Forward 5'-TTGGAGTTTGAAAAACATTGTATTC-3 Reverse 5'-TATTTGTTGGAGAAGTCAATTAATCA-3 Extension 5'-CGACTGTAGGTGCGTAACTCACCAGCTAAACAAATGTACCAGACT-3 rs Forward 5'-TCCATCATCCATAATCTTCTATCAT-3 Reverse 5'-ATCATGGTGTTTTTTGTTGAGTG-3 Extension 5'-GACCTGGGTGTCGATACCTAAGACTGTCAGTTGATGAATTATGTG-3 rs Forward 5'-CCGAGAGGAAGGTGCTGA-3 Reverse 5'-AGAAGAGCTTGAAGTTGTTCAGA-3 Extension 5'-CGACTGTAGGTGCGTAACTCCTCACCCCAGGGTGTCACTGAAGAT-3 rs Forward 5'-ATATATCTGTGTCTATATGTGTATGTGTATATCTG-3 Reverse 5'-CACATGCACTACATACATACCCTC-3 Extension 5'-GGCTATGATTCGCAATGCTTCCTGTGTGTGTGCATGTGTGTGTAT-3 rs Forward 5'-AAATTTTGTGCTCATTCCACTATT-3 Reverse 5'-CTGCTCTGTGGCTTGGAG-3 Extension 5'-CGTGCCGCTCGTGATAGAATATCTCCCTGCTCTTTTCATCTTAAC-3 rs Forward 5'-TATTATATCCCTTTAACCTGGTGG-3 Reverse 5'-TATTGAGATCCCAGAAGACAGAG-3 Extension 5'-GTGATTCTGTACGTGTCGCCGAAAGTCTAAGACTACTCTATAAGT-3 rs Forward 5'-TGGATTCTTGGTCGTTCTGT-3 Reverse 5'-AGTTGGGGCATCTTTTAAGG-3 Extension 5'-AGAGCGAGTGACGCATACTATGGTTCTGGAAAGTGTCTGACAGCA-3 rs Forward 5'-AGTTTCTTGTTAGAGCATCCCTC-3 Reverse 5'-AAAGTGATGGAGTTAACCAGGA-3 Extension 5'-GGCTATGATTCGCAATGCTTAGCAATGCTCTATAAGACACAGGGT-3 rs Forward 5'-AAAAGGGCTTTGTTCCTAGG-3 Reverse 5'-ATGCTCTCTCTGTGCCTGG-3 Extension 5'-AGATAGAGTCGATGCCAGCTAATGCCCCTCACTCCCACTCAAGTA-3

28 rs Forward 5'-GTACAAAATAGTGTGTATAAATATGATGTCAT-3 Reverse 5'-TCTTTTCAGCCTGTCCGG-3 Extension 5'-GGATGGCGTTCCGTCCTATTGACAATCAAGAGACTAGTAACAGTG-3 rs Forward 5'-CTTTAAAAATCTGCTAGATTTCATTTC-3 Reverse 5'-TAGGGATTTTAACCAAATTTTAGGA-3 Extension 5'-AGCGATCTGCGAGACCGTATCCACTGTGTTTGTGAGCCATGCAAA-3 rs Forward 5'-TAATATCATGAGTAGCTTTAACACATAATTG-3 Reverse 5'-ATCAGCCACAATCCAGTTAAAA-3 Extension 5'-AGGGTCTCTACGCTGACGATCCAGTTTCTGTTCACTCTCACAATG-3 rs Forward 5'-TTTTCAGACAAGTCAGCAGG-3 Reverse 5'-ACAAGTCTCTTTCCACCTCTTC-3 Extension 5'-ACGCACGTCCACGGTGATTTCTTGAATTCTGGTGCTCAAGAATAA-3 rs Forward 5'-TGTATACTGACATAGAAGGCTCTTCA-3 Reverse 5'-CAAAACCCCCTAGCTTAATATTTT-3 Extension 5'-GCGGTAGGTTCCCGACATATGACAGAATTCCAGTTTGTTAAAATG-3 rs Forward 5'-TTACACTATGGTTATAAAAAGGATGAGC-3 Reverse 5'-AATTTTTGAATTATCCTTAACTTCCC-3 Extension 5'-GCGGTAGGTTCCCGACATATATCTTAGGGTACCTCTCTGATGTAG-3 rs Forward 5'-AAAGATTAAAAAGAAGGAAGAAAAAGTC-3 Reverse 5'-ATATCTTAAACAGGCAATAGAGGGA-3 Extension 5'-GCGGTAGGTTCCCGACATATAAACAAATTTATCCATTGTGACATC-3 TaqMan b rs Forward 5 -TTGGGTGGGTGGAAAGAAATC-3 Reverse 5 -GTACCACTATTGTAGATAATTTCCTTCCAAA-3 Probe1 5'(FAM)-TACTGACCATACTGCC-3 Probe2 5'(HEX)-TGTACTGACAATACTGC-3 rs Forward 5 -TGCTTATTTTGCTTCACATCTTTTG-3 Reverse 5 -GAGGTCTATCCTGGGGCTTTG-3 Probe1 5'(FAM)-CTGATTTCCGAATTC-3 Probe2 5'(HEX)-TGATTTCTGAATTCCT-3 rs Forward 5 -TTTCCAGCACTTAATGAAAACACATAG-3 Reverse 5 -CAAAGTTAAATTTCCCTGCTTTGAA-3 Probe1 5'(FAM)-TATGAGTCCATATTGAGTC-3' Probe2 5'(HEX)-TATGAGTCCGTATTGAGT-3' rs Forward 5 -GAAAACGGAAATGCTTTCAAACA-3 Reverse 5 -ACAAATAGTAGCAAAGCCTAGTAAGCAA-3 Probe1 5'(FAM)-CTTGCCTAATCTTAAC-3 Probe2 5'(HEX)-CTTGCGTAATCTTAAC-3 rs Forward 5 -GCTGTCTGTAGATTTACTGGCTGTGT-3 Reverse 5 -TTTTAACTCTTCCCCTTAGTCTCTGC-3 Probe1 5'(FAM)-TCTCAGAAGGTCCCATGA-3 Probe2 5'(HEX)-TCTCAGAAGTTCCCATGAA-3 rs Forward 5 -TAACCAGGAAAGAAAAAGGGACAT-3 Reverse 5 -TCACTACAGAGAGGAAAAAAGCAATG-3 Probe1 5'(FAM)-CAGAGCTATACCCTGTG-3 Probe2 5'(HEX)-CAGAGCTACACCCTGT-3 rs Forward 5 -GGGCAGCTTACTTGAACTGACA-3 Reverse 5 -TTGGCAACCACCTGAAACTG-3 Probe1 5'(FAM)-ATATCTGCATACTCTG-3' Probe2 5'(HEX)-ATCTGCACACTCTG-3' Direct DNA sequencing c rs Forward 5 -TTCAAAATGTGCATGTTGGTAT-3 Reverse 5 -ATGTCCCCTTTTCTCAGTGC-3 a Primers and probes used for the 12-plex SNPstream genotyping assays. NA, not applicable. The rs failed to design appropriate primers and probe for the SNPstream genotyping assay. For rs , rs and rs602890, the primers and probes can be designed

29 successfully, but its genotyping was failed in the 12-plex SNPstream genotyping assays. b Primers and MGB probes used for TaqMan genotyping assays. PCR was performed with an initial 2 min at 50 C and 10 min at 95 C, followed by 40 cycles of 15 sec at 95 C and 1 min at 60 C. c Primers used for the direct DNA sequencing. The primers were used for amplifying and sequencing the target region (529 bp) containing rs PCR conditions were as follows: 38 cycles of denaturation at 95 C, annealing at 55 C, and primer extension at 72 C, each step for 30 seconds.

30 Supplementary Table 15: Primers used for the quantitative real-time PCR assays. Genes Primers Sequences KIF1Bβ Forward 5 -CGGTTCCACTGGTTCAAACT-3 Reverse 5 -CTCATCTTGCTGCTCGTCAG-3 GAPDH Forward 5 -CCAGAACATCATCCCTGC-3 Reverse 5 -GGAAGGCCATGCCAGTGAGC-3 The following conditions were applied: denaturation for 1.5 min at 95 C; 45 cycles of 10 sec at 95 C, 20 sec at 60 C, and 20 sec at 72 C.

31 Supplementary Figure 1: An overview of the study workflow. Numbers refer to cases and controls, family trios, and SNPs genotyped and analyzed. The top 45 significantly associated SNPs in the GWAS stage were genotyped in replication stage 1. Then, the confirmed SNP (P < 0.05, and with the same direction as in GWAS stage) was further genotyped in replication stage 2, 3 and 4 samples. At last, only rs were confirmed in all six sample sets. A meta-analysis combining all the case-control and TDT studies for the rs was performed.

32 Supplementary Figure 2: Plots of the first five components from the multidimensional scaling analysis (MDS, PLINK) using 707 participants in the GWAS. The red points are cases and the green points are controls. (a) MDS plot of the first and second components; (b) Plot of the first and third components; (c) Plot of the first and fourth components; and the (d) Plot of the first and fifth components.

33 Supplementary Figure 3: Summary of genome-wide association results. The genome-wide P values of the additive model test from 294,566 SNPs in 348 HBV-related HCC cases and 359 controls are presented by chromosome. The x-axis represents genomic position, and the y-axis shows -log 10 (P). Within each chromosome shown on the x-axis, the data are plotted from the p-ter end. The inset shows quantile-quantile plots of the observed P values (obtained in the analysis of GWAS, in red) versus the expected P values (under the null hypothesis). The solid dark line represents the null hypothesis of no true association. The dashed dark line with gradient λ (inflation coefficient) is fitted to the lower 90% of the distribution of observed test statistics. The plot is based on the entire set of 294,566 SNPs passed the quality controls.

34 Supplementary Figure 4: Forest plot for rs across all six studies. We plot the odds ratio (OR) (blue square) and the 95% CI (horizontal blue line) for each study. A vertical dashed dark line indicates the final OR across all six studies. The top six bars represent data from six studies and the blue diamond below them summarizes their meta-analyzed effect. The area of each square is proportional to the weight of each study in the meta-analysis. Overall, the meta-analysis gave a joint P value of (joint OR = 0.61, 95% CI = ). P heterogeneity = 0.60.

35 Supplementary Figure 5: The association between rs genotypes and age at diagnosis. The vertical bars indicate the standard deviation (SD). Risky genotype was associated with 1.1 years younger age at diagnosis (AA vs. AG + GG, P = 0.020, t test).

36 Supplementary Figure 6: Regional plots for associations in the region surrounding rs in the GWAS stage. All SNPs in the GWAS stage are plotted with their P values (shown as -log 10 values) for additive model test as a function of genomic position (NCBI Build 36). rs is indicated by a big blue diamond with its P value in the GWAS stage. Other data points are shaded with increasing red intensity to suggest LD to rs : red signifies r 2 0.8, orange 0.5 r 2 < 0.8, yellow 0.2 r 2 < 0.5, and white r 2 < 0.2. Recombination rates, which were estimated from phased haplotypes in HapMap Release 22 (NCBI Build 36), were downloaded from the HapMap website and plotted in light blue.

37 Genomic locations of genes on the NCBI Build 36 human assembly were adapted from the University of California at Santa Cruz Genome Browser ( rs is located in the intron 13 of PADI6. The LD structure for a ~ 250 kb region surrounding the rs in Guangxi case-control population (707 individuals) was shown. Pairwise LD (measured by r 2 ) between SNPs is indicated by the redness of individual spots in the triangular graphic, the most intense red spots have a r 2 = 1.

38 Supplementary Figure 7: Protein expression of UBE4B, KIF1B, and PGD by immunohistochemical staining in representative HCC tissues and paired adjacent non-tumor tissues. Panels a, b, e, f, i, j, m, and n: tumor tissues; Panels c, d, g, h, k, l, o, and p: adjacent non-tumor tissues. Total KIF1B isoforms denote all KIF1B isoforms including KIF1Bα and KIF1Bβ. The scale bar represents 200 μm in panel a, c, e, g, i, k, m, and o, and 50 μm in panels b, d, f, h, j, l, n, and p.

39 Supplementary Figure 8: Quantitative real-time RT-PCR for KIF1Bβ. Expression of KIF1Bβ isoform for the three different genotypes of rs was measured in RNA from EBV-transformed blood lymphocyte cell lines derived from 67 unrelated Chinese individuals. 46 individuals are AA, 17 are AG and 4 are GG genotypes. Normalization for mrna quantity was performed with human GAPDH control primers for each sample and final abundance figures adjusted to yield an arbitrary value of 1 for rs AA carriers using the Ct

40 method. The vertical bars indicate the standard deviation (SD). Regressing the relative expression on the number of minor allele (rs [g]) an individual carries with adjustment for age and sex, we found that the expression of KIF1Bβ was increased by an estimated 30.5% with each G allele carried (P = ). Further, compared to the AA carriers, the G allele carriers had a markedly elevated KIF1Bβ transcription (P = ; t test).

41 Supplementary Figure 9: Power to detect a genetic effect of various sizes (OR = 0.25, 0.50, 0.53, 0.60, or 0.75) versus study sample size. Power is reported here as the probability of SNPs to be identified in a scan. Vertical and horizontal dashed lines show that the power of our GWAS stage to identify the rs mapping to 1p36.22, giving HBV-related HCC prevalence of 0.5%, 348 cases and 359 controls, an OR of 0.53, minor G allele frequency of 21.9% and P value of , was estimated to be 45%.

42 Supplementary Figure 10: Identification of outlier by STRUCTURE and PLINK. (a) Distribution of the admixture vector for every participant in the GWAS stage (708 cases and controls) was determined by STRUCTURE. The internal controls were individuals sampled from the HapMap project (57 YRIs, 60 CEUs, 45 CHBs and 44 JPTs); (b) Plots of the first and second components from the multidimensional scaling analysis (MDS, PLINK) using the 708 participants in the GWAS. The red points are cases and the green points are controls. The dark circled point was the same outlier (one control subject) identified by STRUCTURE and PLINK, and then excluded from analysis.