Nature Genetics: doi: /ng.3143

Size: px
Start display at page:

Download "Nature Genetics: doi: /ng.3143"

Transcription

1 Supplementary Figure 1 Quantile-quantile plot of the association P values obtained in the discovery sample collection. The two clear outlying SNPs indicated for follow-up assessment are rs and rs

2 Supplementary Figure 2 Principal-component analysis of the discovery sample collection comprising 432 patients with enteric fever and 2,011 controls from Vietnam. Shown here are plots between the first and second principal components (top panel) and between the first and third principal components (bottom panel). Patients with enteric fever are labeled as cases in red. Controls are labeled in yellow. 2

3 Supplementary Figure 3 Principal-component analysis of the Vietnamese enteric fever cases and controls in the context of Asian populations, some of which are from the Asian 1000 Genomes Project. CDX refers to Chinese Dai individuals from Xishuangbanna, China. CHB refers to Chinese Han in Beijing. CHD refers to Chinese in metropolitan Denver city. CHS refers to Southern Han Chinese. JPT refers to Japanese individuals. KHV refers to Vietnamese Kinh from Ho Chi Minh City. SIMES refers to Singaporean Malays, and SINDI refers to South Indians in Singapore. The upper panel shows all populations so analyzed, whereas the lower panel shows the ancestral matching of the enteric fever cases and controls with more clarity. 3

4 Supplementary Figure 4 Manhattan plot of the association P values obtained in the discovery sample collection. The horizontal red line denotes P = , the threshold for genome-wide significance, and the horizontal dotted blue line denotes P = SNPs surpassing genome-wide significance in the discovery collection are labeled. 4

5 Supplementary Figure 5 Manhattan plot of the association P values obtained in the discovery sample collection after genome-wide imputation using the 1000 Genomes Project Asian reference panel. 5

6 Supplementary Figure 6 Genotyping clouds for SNP rs across different batches. Top, Illumina Human Exome BeadChip; bottom, Illumina 660W BeadChip. The genotypes for wild type, heterozygous carrier and homozygous minor allele are clearly distinguished. 6

7 Supplementary Figure 7 Genotyping clouds for SNP rs across different batches. Top, Salmonella cases, Illumina OmniExpress. Bottom, controls, Illumina 660W BeadChip. The genotypes for wild type, heterozygous carrier and homozygous minor allele are clearly distinguished. 7

8 Supplementary information Dunstan SJ and co-authors Supplementary note HLA and population stratification We performed further analysis to ensure that the association observed between SNP rs and enteric fever is not due to cryptic population stratification between the enteric fever cases and controls. Explicit adjustments along the first 5 and first 10 principal components using logistic regression did not change the magnitude of the association observed at rs These data imply that the association between enteric fever and the rs SNP marker is genuine, rather than the result of subtle ethnic differences between the cases and controls (Supplementary Table 1). Specificity of the association to enteric fever We further examined the frequency of rs in our available Vietnamese population datasets generated from genetic studies of non-enteric fever diseases such as dengue fever and primary angle closure glaucoma 16,17, as well as tuberculosis. We observed that the frequency of the minor allele at rs was firmly at 5.5% for dengue fever, tuberculosis, and primary angle closure glaucoma (Supplementary Table 2). There were 35 heterozygous carriers and two homozygous variants for the minor allele of rs out of 356 angle closure glaucoma patients; whilst there were only nine heterozygous carriers and no homozygous variants within the 432 patients with enteric fever. These data infer that the significant under-representation of the minor allele at rs in the 432 cases is confined to the individuals with enteric fever (Supplementary Table 2).

9 Supplementary Table 1 Association analysis for rs , unadjusted and adjusted for the principal components of genetic stratification using logistic regression (likelihood ratio) in the GWAS discovery sample collection. CHR SNP Effect Allele Reference Allele PC-adjustment OR P 6 rs C A None x rs C A None. Adjusted for gender x rs C A First x rs C A First x Supplementary Table 2 Genotype counts and allele frequency distributions for rs in the two Vietnamese case-control collections for enteric fever, as well as Dengue shock syndrome, tuberculosis, and primary angle closure glaucoma (PACG). The marked under-representation of the minor C allele of rs appears to be specifically confined to cases with enteric fever. Vietnam discovery (Salmonella enteric fever) Vietnam replication (Salmonella enteric fever) Vietnam Vietnam Vietnam Case Control Case Control Dengue cases Tuberculosis cases PACG cases AA AC CC Total Frequency refers to frequency on the minor C allele. All patient and control collections are unique with no overlap.

10 Supplementary Table 3 Association results in the replication attempt for rs and rs CHR SNP Gene A1 MAF cases MAF controls A2 P OR Collection 4 rs GUCY1A3 G A 1.82 x Discovery Vietnam 4 rs GUCY1A3 G A Replication Nepal 6 rs DQB1- DRB1 A G 4.50 x Discovery Vietnam 6 rs DQB1- DRB1 A G Replication Nepal MAF cases: minor allele frequency in cases MAF controls: minor allele frequency in controls A1: minor allele A2: Major allele P: P-value for association with enteric fever.

11 Supplementary Table 4 Association analysis between rs and enteric fever in Nepal, stratified by self-reported ethnic group. SNP Ethnicity N cases N controls MAF cases MAF controls Direction rs Newar Resistance Brahmin Resistance Chhetri Resistance Lama Non-polymorphic Magar Non-polymorphic Rai Resistance Tamang Resistance Chaudhary Non-polymorphic Gurung Non-polymorphic Bishwakarma Non-polymorphic Limbu Resistance Pariyar Non-polymorphic Thapa Non-polymorphic Tharu Non-polymorphic Total with clear self reported ethnicity Stratified Cochran-Mantel-Haenszel test for ethnic groups where rs was polymorphic (Newar, Brahmin, Chhetri, Rai, Tamang, and Limbu)(284 cases and 295 controls): OR = 0.28, P = 9.5 x 10-4 For all ethnic groups with clear self-reported ethnicity (363 cases and 337 controls): OR = 0.25, P = 2 x 10-4 CMH: Cochran-Mantel-Haenszel

12 Supplementary Table 5 Stratified analysis for enteric fever cases infected with S. Typhi A and S. Paratyphi A in the Nepal replication collection. CHR SNP Gene A1 MAF cases MAF controls A2 P OR Collection 6 rs DQB1- DRB1 A G Nepal; S.Typhi 1 6 rs DQB1- DRB1 A G Nepal; S.Paratyphi A cases had blood culture confirmed S. Typhi cases had blood culture confirmed S. Paratyphi A.

13 Supplementary Table 6 (Please refer to web-based Excel table) List of all 5,422 binary markers within the broad HLA region and the accompanying association results. For each marker we list an index (col 1), a description of the marker (col 2, i.e. polymorphic amino acid, nucleotide, or classical allele), whether the marker is genotyped or imputed (col 3), the name of the gene that the marker is within (col 4), marker position in the HG 18 genome (col 5), ID for the imputation method (SNP2HLA) (col 6), the SNP identification if the marker is a SNP (col 7), the position in the amino acid sequence if applicable (col 8), and if applicable the classical allele identification (col 9). We also define the reference and non-reference alleles being tested (cols 10 & 11). In addition to testing amino acid positions within the protein structure, we also tested amino acids in the signal peptide (negative amino acid positions). Alleles can be nucleotides, amino acid residues, or groups of nucleotides or amino acid residues. In many instances where loci are highly polymorphic, they have been exhaustive grouped. We then present association results (cols 12-17). We present case and control allele frequencies for each allele, and OR and p-value for both unconditional and conditional analysis. Finally, we present imputation accuracy quality (INFO) scores. These INFO scores represent the observed variance of dosages divided by the expected variance of the dosages, based on Hardy-Weinberg equilibrium and allele frequencies. We calculated INFO scores separately for cases and controls, and then averaged.

14 Supplementary Table 7 Study power as a function of disease allele frequency and allele odds ratio for a) Nepal replication collection (comprising 595 enteric fever cases and 386 controls) and b) Nepal and Vietnam replication collections (746 cases and 1054 controls). Shaded areas represent > 90 percent statistical power for detection. a) Minor allele frequency Allele Odds Ratio % 2.4% 5.6% 11.1% 19.4% 30.0% 66.10% >99% >99% % 6.7% 16.7% 32.1% 50.6% 68.2% 95.80% >99% >99% % 17.7% 40.6% 66.1% 84.8% 94.7% >99% >99% >99% % 27.0% 56.3% 81.1% 94.1% 98.6% >99% >99% >99% % 32.4% 63.6% 86.4% 96.4% >99% >99% >99% >99% % 33.4% 64.5% 86.8% 96.5% >99% >99% >99% >99% b) Minor allele frequency Allele Odds Ratio % 7.30% 18.60% 36.30% 56.70% 74.90% 98.10% >99% >99% % 21.70% 49% 75.90% 91.90% 98.10% >99% >99% >99% % 49.90% 83.50% 97.10% >99% >99% >99% >99% >99% % 66.10% 93.20% >99% >99% >99% >99% >99% >99% % 72.90% 95.70% >99% >99% >99% >99% >99% >99% % 73.50% 95.70% >99% >99% >99% >99% >99% >99%