Comparison of Methods to Account for Relatedness in Genome-Wide Association Studies with Family-Based Data

Size: px
Start display at page:

Download "Comparison of Methods to Account for Relatedness in Genome-Wide Association Studies with Family-Based Data"

Transcription

1 Comparison of Methods to Account for Relatedness in Genome-Wide Association Studies with Family-Based Data Jakris Eu-ahsunthornwattana 1,2, E. Nancy Miller 3{, Michaela Fakiola 3, Wellcome Trust Case Control Consortium 2 ", Selma M. B. Jeronimo 4, Jenefer M. Blackwell 3,5, Heather J. Cordell 1 * 1 Institute of Genetic Medicine, Newcastle University, International Centre for Life, Newcastle upon Tyne, United Kingdom, 2 Division of Medical Genetics, Department of Internal Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Ratchathevi, Bangkok, Thailand, 3 Cambridge Institute for Medical Research, University of Cambridge School of Clinical Medicine, Addenbrooke s Hospital, Cambridge, United Kingdom, 4 Department of Biochemistry, Center for Biosciences, Universidade Federal do Rio Grande do Norte, Natal, Brazil, 5 Telethon Institute for Child Health Research, Centre for Child Health Research, The University of Western Australia, Subiaco, Western Australia, Australia Abstract Approaches based on linear mixed models (LMMs) have recently gained popularity for modelling population substructure and relatedness in genome-wide association studies. In the last few years, a bewildering variety of different LMM methods/software packages have been developed, but it is not always clear how (or indeed whether) any newlyproposed method differs from previously-proposed implementations.herewecomparetheperformanceofseveral LMM approaches (and software implementations, including EMMAX, GenABEL, FaST-LMM, Mendel, GEMMA and MMM) via their application to a genome-wide association study of visceral leishmaniasis in 348 Brazilian families comprising 3626 individuals (1972 genotyped). The implementations differ in precise details of methodology implemented and through various user-chosen options such as the method and number of SNPs used to estimate the kinship (relatedness) matrix. We investigate sensitivity to these choices and the success (or otherwise) of the approaches in controlling the overall genome-wide error-rate for both real and simulated phenotypes. We compare the LMM results to those obtained using traditional family-based association tests (based on transmission of alleles within pedigrees) and to alternative approaches implemented in the software packages MQLS, ROADTRIPS and MASTOR. We find strong concordance between the results from different LMM approaches, and all are successful in controlling the genome-wide error rate (except for some approaches when applied naively to longitudinal data with many repeated measures). We also find high correlation between LMMs and alternative approaches (apart from transmission-based approaches when applied to SNPs with small or non-existent effects). We conclude that LMM approaches perform well in comparison to competing approaches. Given their strong concordance, in most applications, the choice of precise LMM implementation cannot be based on power/type I error considerations but must instead be based on considerations such as speed and ease-of-use. Citation: Eu-ahsunthornwattana J, Miller EN, Fakiola M, Wellcome Trust Case Control Consortium 2, Jeronimo SMB, et al. (2014) Comparison of Methods to Account for Relatedness in Genome-Wide Association Studies with Family-Based Data. PLoS Genet 10(7): e doi: /journal.pgen Editor: Gonçalo R. Abecasis, University of Michigan, United States of America Received September 20, 2013; Accepted May 2, 2014; Published July 17, 2014 Copyright: ß 2014 Eu-ahsunthornwattana et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by the Wellcome Trust (Grant Reference ). This study makes use of data generated by the Wellcome Trust funded WTCCC2 project (Grant Reference ). JEa receives scholarship and funding from Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * heather.cordell@newcastle.ac.uk " Membership of the Wellcome Trust Case Control Consortium 2 is listed in Text S1. { Deceased. Introduction Recently, linear mixed models based approaches have been proposed as appealing alternatives to principal component based approaches when adjusting for population substructure in genomewide association studies of apparently unrelated individuals [1 4]. These methods build upon work originally described in the animal breeding literature, and subsequently developed in the human genetics literature, in which a genetic effect of interest (e.g. the number of copies of a particular allele at a particular test SNP) is included as a fixed effect in a regression model, with an additional random effect also included to model genetic correlation between individuals. The covariance structure for the random effect is generally assumed to correspond to that implied by a polygenic model, incorporating the genetic relationship (kinship) between each pair of individuals. Although use of this linear mixed model (LMM) was originally proposed for pedigrees with known relationships [5 10], this approach has recently gained popularity for use with samples of unknown or uncertain relationship [1 3,11 13], including apparently unrelated samples who may nevertheless display distant levels of common ancestry. For this purpose, the kinship coefficients between all pairs of individuals modelling either close or distant relatedness are estimated (prior to fitting the linear mixed model) on the basis of genome-wide genotype data, rather than being fixed at their known theoretical values. Fitting a full linear mixed model for each SNP in turn across the genome is computationally challenging. These computational considerations have led to the development of several faster PLOS Genetics 1 July 2014 Volume 10 Issue 7 e

2 Author Summary Recently, statistical approaches known as linear mixed models (LMMs) have become popular for analysing data from genome-wide association studies. In the last few years, a bewildering variety of different LMM methods/ software packages have been developed, but it has not always been clear how (or indeed whether) any newlyproposed method differs from previously-proposed implementations. Here we compare the performance of several different LMM approaches (and software implementations) via their application to a genome-wide association study of visceral leishmaniasis in 348 Brazilian families comprising 3626 individuals. We also compare the LMM results to those obtained using alternative analysis methods. Overall, we find strong concordance between the results from the different LMM approaches and high correlation between the results from LMMs and most alternative approaches. We conclude that LMM approaches perform well in comparison to competing approaches and, in most applications, the precise LMM implementation will not be too important, and can be chosen on the basis of speed or convenience. approximations for constructing tests of the fixed SNP effects of interest in the linear mixed model [1,2,9,10,14]. These approximate tests have been implemented in various software packages including MERLIN, GenABEL, EMMAX, TASSEL, FaST- LMM, Mendel and MMM. The MMM [15] and FaST-LMM [4] packages, in common with the package GEMMA [16], also provide fast implementations of an exact (rather than an approximate) model, which in principle can lead to a small increase in power [15,16], depending on the true underlying level of relatedness. A limited comparison of several LMM implementations, via application to real and simulated data from Genetic Analysis Workshop 18 (GAW18) [17], was performed by Eu-ahsunthornwattana et al. [18]. In the GAW18 data, which comprised 959 Mexican-American individuals from 20 families, the LMM implementations investigated performed rather similarly to one another in terms of the association test statistics and p-values achieved; however, no formal quantification of power or type 1 error was performed. Eu-ahsunthornwattana et al. [18] also investigated the performance of the various LMM implementations when applied naively to longitudinal traits (repeated measures) available in GAW18, simply by treating each measurement as if it came from a separate person and expanding out the genetic data set accordingly (resulting in an expanded data set containing many apparent twins, triplets, quadruplets etc., depending on how many measurements are available for each person). Although this approach is not strictly correct (as it does not distinguish between correlations in trait values due to genetic factors and correlations due to non-genetic within-individual factors), Eu-ahsunthornwattana et al. found this procedure generated only minimal inflation in the resulting distribution of genome-wide test statistics. Here we expand the investigation of Eu-ahsunthornwattana et al. [18] to perform a more comprehensive comparison of LMM approaches (involving a larger number of software implementations) and to conduct a formal investigation of power and type 1 error. We also compare the LMM approaches to traditional family-based approaches ( within-family association tests based on the transmission of high-risk alleles within pedigrees [19 23]), and to alternative previously-proposed approaches based on extending standard case/control tests (such as the Armitage trend test) to allow for either known [24,25] or known and unknown [26] relatedness. The programs compared (see Table 1) differ in the precise details of the methodology implemented (such as whether an LMM approach is used, and, if so, whether an exact method or an approximation is used) and through various user-chosen options such as the specific method and number of SNPs used to estimate the kinship matrix. We investigate the sensitivity to these choices and the success (or otherwise) of the approaches in controlling the overall genome-wide error-rate in both real and simulated data (into which artificial simulated disease loci have been inserted). The approaches are compared via application to real and simulated data derived from a genome-wide association study of visceral leishmaniasis (VL) in 348 Brazilian families comprising 3636 individuals (1970 with both genotype and phenotype data). This Brazilian family data set was used (together with a larger Indian case/control data set) by Fakiola et al. [13] to identify, at genome-wide levels of significance, a replicable association between variants in the HLA region on chromosome 6 and visceral leishmaniasis. Although in [13] the HLA locus (analysed using the LMM package MMM [15]) did not achieve genomewide levels of significance in the Brazilian data set alone (p-value ~ ), this locus was the only one to show strong evidence of association in both Brazilian and Indian data sets, and achieved convincing replication in a separate Indian cohort. Results Estimation of kinship coefficients using genome-wide SNP data Before embarking on a detailed comparison of different methods, we explored the use of different SNP sets (containing different numbers of SNPs) for estimating pairwise kinship measures, in order to identify a robust set of SNPs that could be used for subsequent comparisons. We considered using either the full genome-wide set of SNPs (545,433 SNPs), a pruned set of 50,129 SNPs selected to have minor allele frequencies w0:4 and chosen to be in approximate linkage equilibrium via the --indep command in PLINK [27]), or a thinned set of 1900 evenly-spaced SNPs that were selected from the pruned SNPs based purely on physical position using the software package MapThin ( In addition to exploring the kinship estimates provided by various LMM software packages, we also investigated those provided by the software packages PLINK [27] and KING [28]. KING implements two different kinship estimation methods: KINGhomo (KING_H), which assumes population homogeneity, and KING-robust (KING_R), which provides robust relationship inference in the presence of population substructure. A comparison of the kinship estimates output by different software packages based on the pruned set of SNPs is shown in Figure 1 (similar results were seen for the full and thinned SNP sets, data not shown). Although the scale on which the kinship estimates are measured differs between different packages, the measures themselves are highly correlated, particularly those from EMMAX-BN, FaST-LMM, GenABEL, GEMMA and MMM. Kinship measures from EMMAX-IBS and PLINK were also quite well correlated, although they tended to differ slightly from those in the previous group. Kinship measures are used within the LMM framework to structure the variance/covariance matrix of the genetic random effect (see Methods). Thus, the scale of measurement (i.e. whether the kinship measure actually reflects an estimate of the kinship per se, or a rescaled measure such as twice the PLOS Genetics 2 July 2014 Volume 10 Issue 7 e

3 Table 1. Summary of methods/software packages investigated. Package/method and version Approach Kinship estimation method Reference(s) EMMAX emmax-intel tar.gz LMM (approximate) Kinship matrix estimated internally using user-supplied set of SNPs, or set to theoretical/estimated values calculated externally FaST-LMM v2.04 LMM (approximate or exact) Kinship matrix estimated internally using user- supplied set of SNPs, using SNPs selected through FaST-LMM-Select procedure, or set to theoretical/estimated values calculated externally GEMMA v0.91 LMM (exact) Kinship matrix estimated internally using user-supplied set of SNPs, or set to theoretical/estimated values calculated externally GenABEL v1.7-6 (FASTA) LMM (approximate) Kinship matrix estimated internally using user-supplied set of SNPs, or set to theoretical/estimated values calculated externally GenABEL v1.7-6 (Grammar-Gamma) GTAM (implemented in MASTOR v0.3) LMM (approximate) LMM (approximate) Kinship matrix estimated internally using user-supplied set of SNPs, or set to theoretical/estimated values calculated externally Kinship matrix calculated externally (assumed to reflect known (theoretical) pedigree relationships) Mendel v13.2 LMM (approximate or exact) Kinship matrix estimated internally using theoretical pedigree relationships, estimated within estimated pedigree clusters (using all SNPs), or fully estimated (using all SNPs) MMM v1.01 LMM (approximate or exact) Kinship matrix estimated internally using user-supplied set of SNPs, or set to theoretical/estimated values calculated externally FBAT v2.0.4 MASTOR v0.3 MQLS v1.5 ROADTRIPS v1.2 (RM test) Transmission of alleles within pedigrees Retrospective quantitative trait version of MQLS Adjusted version of retrospective case/control test Adjusted version of retrospective case/control test Method by definition uses known (theoretical) pedigree relationships Kinship matrix calculated externally (assumed to reflect known (theoretical) pedigree relationships) Kinship matrix calculated externally (assumed to reflect known (theoretical) pedigree relationships) Kinship matrix calculated externally (assumed to reflect known (theoretical) pedigree relationships). Further correction based on genome-wide set of SNPs applied internally. [1] [4] [30] [31] [16] [9] [39] [14] [39] [8] [35] [15] [21] [23] [25] [24] [26] doi: /journal.pgen t001 kinship) should not be too important, as any rescaling will be compensated for by a similar rescaling of the estimated genetic variance parameter s 2 g (see Methods). Kinship estimates from both KING methods tended to differ most from the other methods, with the frequent output of negative kinship estimates (compared to most other methods for which the kinship estimates are bounded at 0) among the less related individuals. This was more pronounced for KING_R than for KING_H. We consider later the possible implications of these (rather small) differences in estimated kinships for subsequent association testing. Within any given method, we found the kinship measures (for each pair of individuals) and p-values obtained (in the real data set) based on the full SNP set to be very similar to those based on the pruned set, whereas those calculated based on the thinned set were less similar (see Figure S1). The performance of the different SNP sets in terms of controlling the genome-wide type 1 error rate (i.e. controlling the genomic inflation factor l [29] to the desired level of l~1) in the real data set is shown in Figure 2 (see Figure S2 for full QQ plots). All packages performed well when using the full or pruned set of SNPs (l = ), but performance deteriorated when the thinned set was used (l mostly about ). This was most pronounced for GenABEL (GRAMMAR-Gamma), for which l was Our intuition is that, although 1900 SNPs may be sufficient to accurately model close relationships (such as full sib or parent-offspring), many more SNPs will be required to accurately model distant relationships within pedigrees (such as cousins, second cousins, third cousins etc.) or even more distant relationships between pedigrees. Results obtained using theoretical kinships were inflated for all methods (l&1:11), suggesting the presence of additional relatedness/population structure that is not well accounted for by known family relationships. Regardless of the method or SNP set used, adjustment always resulted in substantially lower inflation than was seen (l = 1.23) in unadjusted analysis. Listgarten et al. [30] proposed an automated method, FaST- LMM-Select, to select the most appropriate set of SNPs to use for kinship estimation when testing for association in a LMM framework. The method proceeds by ordering SNPs according to their linear regression p-values and then constructing kinship matrices with an increasing number of ordered SNPs, until the first minimum genomic control factor l is obtained. We investigated this strategy within the FaST-LMM package using either the full or pruned set of SNPs as a starting point (see Figure S3). We found that the first minimum genomic control factor (achieved using 3 10 ordered SNPs) was generally higher than the desired value of l~1, the genomic control factor subsequently decreased to considerably less than 1, and then increased back to 1 once all (pruned or full) SNPs had been included. The automated version of FaST-LMM-Select available as an option within the current version of the FaST-LMM package uses a slightly different strategy involving k-fold cross-validation [31], with the ordering of SNPs and calculation of genomic control factors as varying numbers of SNPs are included in the kinship calculation carried out within the training data (and then used to predict the test data) within each cross-validation fold. The final number of SNPs to be used in the kinship calculation for the entire data set is that which minimizes the mean-squared error summed over all folds. (See FaST-LMM documentation and [31] for more PLOS Genetics 3 July 2014 Volume 10 Issue 7 e

4 details). Lippert et al. [31] found this procedure to show some advantage over using all SNPs (including a large number of presumably irrelevant SNPs) in simulations that included population stratification (but not familial relatedness) of quantitative phenotypes in randomly ascertained individuals. Application of this automated procedure to the real disease phenotype in our highly ascertained set of Brazilian pedigrees resulted in no SNPs selected for calculation of kinships when applied to the full SNP set, or two SNPs selected when applied to the pruned SNP set, resulting in a genomic control value of l~1:17 when these two SNPs were used to adjust for relatedness in the subsequent association analysis. We conclude that, at least for our data set, there is no particular advantage in using the FaST-LMM-Select procedure, indeed this procedure seems to work less well than simply using all pruned or full SNPs for estimating pairwise kinships. For the remainder of the manuscript we therefore focus on results obtained using the pruned set of SNPs to estimate kinships (apart for genome-wide analysis in the program Mendel, which by default always uses the entire set of SNPs that has been read in). Comparison of LMM and alternative analysis approaches We compared the performance of the different LMM and alternative approaches listed in Table 1 through their application to real and simulated data derived from the Brazilian family data set of Fakiola et al. [13]. The simulation scenarios (see Methods) included a binary disease trait influenced by either two strong (sim- D1) or two weak (sim-d2) genetic effects or a quantitative trait (sim-q) influenced by two strong genetic effects. In all cases the genetic effects were governed by two SNPs (rs and rs233722) located on chromosomes 6 and 12 respectively. In addition to the effects at rs and rs233722, we also allowed for 22 weaker polygenic effects caused by genotype at the 100th SNP on each autosomal chromosome. Where applicable, we used either the default analysis options within each program, or else explored the use of different options as indicated below. The program FaST-LMM uses either maximum likelihood (ML) or restricted maximum likelihood (REML). (In early versions of FaST-LMM the default was ML but in later versions the default became REML). After some experimentation, we deemed the ML option to be the most reliable in the presence of strong genetic effects, and have therefore used ML for all results presented here. The success of the various approaches in controlling the overall genome-wide type 1 error rate (i.e. controlling the genomic inflation factor [29] l to the desired level of l~1) is shown in Table 2. All methods that made use of estimated kinships performed well, apart from Mendel when estimation was restricted only to estimated pedigree clusters (which gave l~1:10) and MQLS, for which use of estimated kinships (in the 1972 genotyped individuals) appeared to result in slightly deflated genomic inflation factors. For all other methods, use of estimated kinships reduced the genomic inflation factor to around 1, compared to a value of l~1:23 in the real data (and up to 1.43 in the simulated data) when performing an unadjusted analysis. Methods that used only theoretical kinships based on known pedigree information performed well in the simulated data sets, but were less successful at controlling inflation for the real data set, suggesting that our real data contains additional, more complicated, relatedness or population substructure that is not accounted for by known family relationships. The Brazilian populations studied here are believed to be longterm (w200 years) admixtures of Caucasian, Negroid and Native Indian ethnic backgrounds, as confirmed in recent analysis of a subset of our families [32]. The discrepancy between the genomic inflation factors seen in our real and simulated data results suggests that our (relatively simplistic) simulation scenarios have not been able to fully mimic the underlying population structure existant in the real data; although our simulation strategy (see Methods) was designed to generate trait correlations that reflect close familial relationships, we did not specifically endeavour to generate correlations due to population stratification or more distant/ cryptic relationships. To investigate the relative contributions of phenomena such as admixture/population stratification/cryptic relationships to the inflation observed in our real data when using theoretical (pedigree-based) kinships, we applied the ADMIX- TURE program [33] to our pruned set of SNPs to estimate ancestry proportions (assuming 3 ancestral populations) in each individual. Although the variation in ancestry proportion estimated within each individual was quite large (standard deviation &0:08{0:15 depending on ancestral population) there was no evidence (Pw0:14) for a relationship between estimated ancestry proportion and disease status, suggesting that the inflation in test statistics observed when using theoretical kinships is more likely to be due to unmeasured cryptic relationships and/or subtle population substructure, than to population substructure or admixture directly related to the Caucasian, Negroid and Native Indian ethnicities. This conclusion was supported by the fact that logistic regression analysis allowing for the ancestry proportions as covariates resulted in a genomic control inflation factor of 1.17, only slightly reduced from the unadjusted genomic control inflation factor of We also used as covariates in a logistic regression analysis the first nine coordinates obtained from a multidimensional scaling (MDS) analysis of the pruned SNPs in PLINK (having considered between one and ten coordinates, nine was the number that minimised the genomic control inflation factor). The resulting genomic control inflation factor was 1.08, considerably smaller than the unadjusted inflation factor of 1.23, but still not perfectly controlled. Inclusion of MDS coordinates as covariates, similar to including principal components scores, might be expected to account for more subtle levels of population substructure than are accounted for by the use of the ADMIXTURE program (and may possibly also indirectly account for relatedness), which perhaps explains the greater success of this procedure. However the fact that LMM approaches based on estimated kinships still do better (with respect to controlling l) than does the MDS approach suggests there may still be levels of known or cryptic relatedness that are not well-captured by these first nine coordinates. An intuitive overview of the expected power provided by the different (real and simulated) data sets can be obtained from Figure S4, which shows Manhattan plots from a FaST-LMM analysis of a single replicate of real or simulated data. The real phenotype data shows a noticeable signal in the HLA region on chromosome 6, consistent with the main finding in [13], while for all simulated traits the primary associated regions are correctly identified without any obvious false signals. A formal comparison of power and type 1 error for the different analysis methods using 1000 simulation replicates is shown in Figure 3. All methods apart from an unadjusted analysis show acceptable levels of type 1 error (although note that the type 1 error rate for FBAT appears to be slightly conservative). In terms of power, all LMM approaches (including GTAM and Mendel) and MASTOR show similar performance, apart from MMM which shows slightly higher power than other methods for detection of loci involved in the (strong) simulated quantitative trait. ROADTRIPS and MQLS show slightly lower power than the LMM approaches, while the approaches implemented in FBAT appear to be considerably less powerful than those implemented in the LMM and other packages PLOS Genetics 4 July 2014 Volume 10 Issue 7 e

5 Figure 1. Comparison of kinship estimates (pruned SNPs) using different software packages. Plots above the diagonal show a comparison of kinship measures, with correlations between the kinship measures indicated below the diagonal. EM_BN = EMMAX (Balding-Nichols), EM_IBS = EMMAX (IBS method), FLMM_C = FaST-LMM using covariance matrix, FLMM_R = FaST-LMM using realised relationship matrix, GA = GenABEL, GMA_C = GEMMA using centred genotypes, GMA_S = GEMMA using standardised genotypes, KING_H = KING with homogeneous population assumption, KING_R = KING with robust estimation. doi: /journal.pgen g001 (even allowing for FBAT s slightly conservative levels of type 1 error). The lower power of FBAT is likely to be caused by the smaller effective sample size (357 cases compared to 357 pseudo controls in FBAT, versus 357 cases compared to 1613 genuine controls in the LMM and other alternative approaches), due to the way the FBAT test statistics are constructed. These results are consistent with a visual examination of the Manhattan plots obtained from the different methods using either the real data or a single replicate of the simulated data (Figure 4, Supplementary Figures S5 S6), with FBAT achieving much lower levels of significance around the true or simulated phenotype-associated SNPs than do the other methods. (The results from all LMM methods not displayed in Figure 4 and Supplementary Figures S5 S6 were indistinguishable from FLMM_E, data not shown). Although the LMM (and several alternative) approaches show similar overall levels of power, an interesting separate question is the degree of concordance between the different methods with respect to the association signals detected. In the real data set we found the p-values obtained at each SNP from the different LMM methods to be highly concordant (Figure S7), while the concordance between the LMM methods and alternative approaches (Figure S8) is high for all methods other than FBAT PLOS Genetics 5 July 2014 Volume 10 Issue 7 e

6 Figure 2. Genomic control factors obtained using different software packages and different strategies for modelling kinships. PLINK = analysis in PLINK with no adjustment made for relatedness. Other methods/software packages are listed in Table 1 (see Table 2 for abbreviated names of methods). Pedigree = theoretical kinships based on known pedigree relationships used to adjust for relatedness. Thinned = kinships based on 1900 thinned SNPs used to adjust for relatedness. Pruned = kinships based on 50,129 pruned SNPs used to adjust for relatedness. Full = kinships based on 545,433 SNPs used to adjust for relatedness. doi: /journal.pgen g002 (although lower than is observed among methods within the LMM class). The test implemented in FBAT is statistically uncorrelated with that implemented in the LMM and other alternative approaches, therefore it is not surprising that little concordance is seen between the test statistics achieved at the vast majority of (presumably null) SNPs. Figure S8 also shows that methods that use phenotype information from non-genotyped family members (MQLS3626 and RT3626, which use all 3626 individuals regardless of whether or not they have genotype data) are most similar to each other and less similar to methods that use information only from the genotyped individuals. The high concordance between the different LMM methods (and, to a slightly lesser extent, between LMM methods and all methods other than FBAT) is also seen for the simulated (weak disease) trait (Figure S9); similar results were found for the other simulated traits and other LMM methods (data not shown). A formal comparison of the concordance between top hits identified by the different methods in the simulated data (1000 simulation replicates, comparison restricted to true and null simulated regions) is shown in Table 3. Using EM_BN as reference, the concordance between the top SNPs identified is seen to be extremely high for all other methods except FBAT, suggesting again that all methods except FBAT provide essentially the same inference. Feeding externally estimated kinship coefficients into LMMs Most LMM packages (although not Mendel) allow a separation between the estimation of kinships step and the association testing step. This is convenient as it allows the user to read in theoretical or estimated kinships as desired, and to consider using an alternative package for estimating kinships to the one used for the actual association testing. We investigated performing an analysis in FaST-LMM (exact calculation), but with the kinships estimated from various different software packages (see Figure S10 and Table S1). Use of the wrong kinship estimates (chosen to be inversely related to the theoretical kinship value) resulted in very similar results to unadjusted analyses (l = 1.23 in the real trait, 1.12 in the simulated strong disease trait, and 1.43 in the simulated quantitative trait). Results based on kinship estimates from KING_R and KING_H were very similar to those obtained using FaST-LMM s own realised relationship matrix (FLMM-R) for all traits, and provided good control of the genome-wide error rate (l&1) in spite of the unusual pattern in KING s estimated kinships that had been noted in Figure 1. Estimation of kinships using PLINK was less satisfactory, leading to inflated genomic control factors in both real and simulated data sets. This is consistent with previous results [28] suggesting that PLINK PLOS Genetics 6 July 2014 Volume 10 Issue 7 e

7 performs less well than KING for relationship estimation. Interestingly, although KING_R has been shown to have an advantage over KING_H in non-homogeneous populations when the goal is relationship estimation for its own sake [28], this advantage is not apparent here, where the goal is instead to adjust for potentially different levels of relatedness, from close family relationships to more distant relationships (perhaps mimicking population membership), while performing association testing. Computational efficiency and ease-of-use Given that many of the software implementations we investigated (and in particular all the various LMM implementations) showed similar levels of power and type 1 error, and gave rather similar inference in terms of localisation of signals and {log 10 p-values achieved, an important practical consideration when deciding what implementation to use is the ease-of-use and computational efficiency. Ease-of-use is necessarily somewhat subjective as it depends on a user s prior experience and software/operating system preferences. Computational efficiency can, in theory, be examined more objectively, however, in practice, the total time required to perform an analysis is dependent on the computer architecture available (in particular the ability of the system and of any given program to allow multithreading), demands of competing users and the availability of (and ability of any given program to make use of) facilities for parallel processing e.g. a multi-node compute cluster. These considerations make it hard to perform a genuine head-to-head comparison between different packages. In Table S2 we present an approximate comparison (carried out on the same machine, without use of parallel processing) together with some comments concerning ease-of-use. Since many groups (including ourselves) use PLINK [27] to perform initial quality control of genome-wide association data, we considered programs that could use PLINK files directly (or with just a few easily-implemented transformation steps) to be the easiest to use, while those programs that required more extensive data transformation, creation of additional input files and/or external estimation of kinships were considered harder. With respect to computational speed, as a rule of thumb we found Mendel (theoretical kinships), FaST-LMM (approximate) and GenABEL (GRAMMAR-Gamma) to be the fastest LMM implementations, taking between 3 minutes and a quarter of an hour on our system to analyse 545,433 SNPs in 1972 genotyped individuals. These were closely followed by EMMAX and MMM (approximate) which took around half an hour, GenABEL (FASTA), GEMMA, FaST-LMM (exact) and MMM (exact) which typically took 1 2 hours, Mendel (estimated kinships) which took around 2.5 hours, and GTAM which took around 4 hours. Of the non-lmm methods, FBAT, MQLS and MASTOR were the fastest, taking a few hours to perform the analysis, while ROADTRIPS was the slowest, taking several days. Inputting estimated (rather than theoretical) kinships into MQLS increased the time taken to around 4 days (and appeared to over-correct the genomic inflation, see Table 2), while an analysis inputting estimated (rather than theoretical) kinships into ROADTRIPS was still running (with analysis completed for only 38,926 of the desired 545,433 SNPs) after more than 2 months. Neither MQLS nor ROADTRIPS were designed for analysis of unrelated individuals and so are most likely optimised for reading in and working with relatively sparse kinship matrices (in which individuals from different pedigrees are assumed to have kinships equal to 0); to force the programs to consider estimated kinships between all individuals we had to recode the pedigree names to pretend that everyone comes from the same pedigree, which most likely considerably increases processing and memory requirements. Analysis of longitudinal phenotypes Eu-ahsunthornwattana et al. [18] investigated a strategy for analysing longitudinal traits (repeated measures) in a linear mixed model framework simply by treating each measurement as if it came from a different individual, and expanding out the genetic data set accordingly (resulting in an expanded data set containing many apparent twins, triplets, quadruplets etc., depending on how many measurements are available for each person). We investigated this strategy in the current data set using a single replicate of data (498 individuals) simulated under either a longitudinal (sim- L20) or longitudinal polygenic (sim-p20) model (see Methods). Results (Table 4) showed that EMMAX, FaST-LMM and GEMMA were successful in maintaining the genomic inflation factor to about 1, whereas GenABEL (FASTA) and MMM showed some inflation, particularly in the polygenic longitudinal simulation, and GenABEL (GRAMMAR-Gamma) showed strong deflation. Comparison of the concordance in {log 10 p-values achieved by the different methods (data not shown) indicated that, although the results from different methods were highly correlated (in terms of the top SNPs identified), the actual p-values achieved were very different, consistent with the differences seen in overall distribution of test statistics. Analysing each repeated measure as if it comes from a different individual treats our data set as a larger pseudo data set containing many apparent twins/triplets/quadruplets (actually, in this case, 20-tuplets). Although less satisfactory than a proper longitudinal analysis that takes into account correlations due to both relatedness between individuals and repeated measures within individuals [34], our intuition was that the LMM framework would absorb the effect of repeated measures within individuals into the genetic component of variance estimated, resulting in an overall correct distribution of test statistics. For EMMAX, FaST-LMM and GEMMA, this intuition appears to have been correct. Although for GenABEL (FASTA) and MMM the resulting distribution of test statistics is inflated, the linear relationship between the observed and desired test statistics means that test statistics following the desired distribution could be obtained simply by dividing the observed x 2 test statistics by the observed genomic control inflation factor, in an approach akin to standard genomic control [29]. We also investigated a proper longitudinal analysis implemented within the R software package longgwas [34]. QQ plots from longgwas (data not shown) indicated acceptable genomic control inflation factors (l~1:00 and 0.97 for sim-l20 and sim- P20 respectively). A comparison of longgwas with our (improper) approach using FaST-LMM (data not shown) indicated that the results (in terms of the {log 10 p-values obtained at each SNP) from longgwas and FaST-LMM were highly correlated for both sim-l20 and sim-p20. Although the proper analysis implemented in longgwas might be considered theoretically most appealing, we note that longgwas was considerably slower than FaST- LMM, taking approximately 19 hours (in comparison to 5.5 minutes for FaST-LMM), when run in parallel for each of 22 chromosomes. If run as a single process (all chromosomes), this translates to about 9.5 days for longgwas versus 7.6 hours for FaST-LMM. Thus, given the satisfactory performance of FaST- LMM, and the high correlation between the results obtained from FaST-LMM and those from longgwas, from a practical point of view, FaST-LMM (or possibly EMMAX or GEMMA) would seem the more attractive option. PLOS Genetics 7 July 2014 Volume 10 Issue 7 e

8 Table 2. Genomic control inflation factors achieved in real data or in a single replicate of the simulated data sets. Trait analysed Method Description Kinships used Real disease (VL) Simulated strong (sim-d1) Simulated weak (sim-d2) Simulated quantitative (sim-q) Unadjusted Standard linear or logistic None regression EM_BN EMMAX (Balding-Nichols kinships) Estimated EM_IBS EMMAX (IBS kinships) Estimated FLMM_A FaST-LMM (approximate Estimated calculation) FLMM_E FaST-LMM (exact calculation) Estimated GA_FA GenABEL (FASTA) Estimated GA_GRG GenABEL (GRAMMAR-Gamma) Estimated GMA_C GEMMA using centred Estimated genotypes GMA_S GEMMA using standardised Estimated genotypes GTAM GTAM (implemented in MASTOR) Pedigree Mendel_T Mendel with theoretical Pedigree kinships Mendel_P Mendel with kinships estimated Estimated within estimated pedigree clusters Mendel Mendel with fully estimated Estimated kinships MMM_E MMM (exact calculation) Estimated MMM_G MMM (GLS approximation) Estimated FBATaff a FBAT (transmissions to affecteds Pedigree only) FBATboth FBAT (transmissions to all Pedigree individuals) MASTOR MASTOR (implemented in Pedigree MASTOR) MQLS1972 a MQLS (using 1972 genotyped Pedigree individuals) MQLS3626 a,b a,b MQLS (using all 3626 individuals with or without genotype data) Pedigree 1.16 MQLS1972_E MQLS using 1972 genotyped individuals and estimated kinships RT1972 a ROADTRIPS (using 1972 genotyped individuals) RT3626 a,b ROADTRIPS (using all 3626 individuals with or without genotype data) Estimated Pedigree & estimated Pedigree & estimated a FBATaff, MQLS and ROADTRIPS are only applicable to binary traits and so do not have results in the Simulated quantitative column. b In the simulated data sets, MQLS and RT could only be based on the 1972 individuals with simulated phenotypes, and so no simulated trait results are displayed in the MQLS3626 and RT3626 rows. doi: /journal.pgen t002 Another program that can, in theory, implement a proper longitudinal analysis is the lmekin function within the R package coxme. We found this function to be computationally infeasible for analysis of genome-wide data, but application to a selected set of 2423 SNPs (of different effect sizes) in the sim-l20 data suggested that the results were very similar to those obtained from GenABEL (FASTA), EMMAX, FaST-LMM, GEMMA and MMM. However, we were unable to get lmekin to give meaningful results (most results were NA ) when applied to the sim-p20 data. We also speculated that a proper longitudinal analysis should, in theory, be implementable in the package Mendel [35], through making use of Mendel s ability to include household effects. (Effectively one would trick Mendel into fitting the correct model by designating all individuals (with each timepoint considered as a separate individual) to be members of a single pedigree, with the individuals corresponding to separate timepoints within a single real individual designated as belonging to the same household). We attempted to fit this model in Mendel for our sim-l20 and sim-p20 data sets, but were unable to obtain reliable PLOS Genetics 8 July 2014 Volume 10 Issue 7 e

9 PLOS Genetics 9 July 2014 Volume 10 Issue 7 e

10 Figure 3. Power and type 1 error of different methods. Powers (left hand plots) are defined as the proportion of replicates (out of 1000) in which both simulated disease loci are detected, with detection corresponding to any SNP within 40 kb of the simulated disease locus reaching the specified p-value threshold. Type 1 errors (right hand plots) are defined as the proportion of null SNPs (out of 20,000 = 20 null SNPs times 1000 simulation replicates) that reach the specified p-value threshold. Horizontal dashed lines indicate the target p-value thresholds (i.e. the expected type 1 error rates). doi: /journal.pgen g003 results. (If included, household effects were continually estimated at 0, and, regardless of whether or not household effects were included, the SNP association tests showed highly inflated significance values, with no correct localisation of true sim-l20 signals as had been seen for FaST-LMM (Figure S4) and little correlation between {log 10 p- values from Mendel and those from these other packages). We speculate that the algorithm used by Mendel may be adversely affected by the presence of many highly-related individuals (e.g. repeated measures that in actuality pertain to a single individual), causing the test statistics generated to be unreliable. Discussion Here we have demonstrated, through simulations and application to real data, that linear mixed model approaches such as those implemented in the packages GenABEL, EMMAX, FAST-LMM, Figure 4. Manhattan plots for the real phenotype using FaST-LMM exact and alternative software packages. The points marked in red denote the confirmed significant region from Fakiola et al. (2013). FLMM_E = FaST-LMM using exact calculation, MQLS1972 = MQLS using 1972 genotyped individuals, RT1972 = ROADTRIPS using 1972 genotyped individuals, FBATaff = FBAT using transmissions to affecteds only, FBATboth = FBAT using transmissions to both affecteds and unaffecteds. Results from all other LMM methods were indistinguishable from FLMM_E and so are not shown. doi: /journal.pgen g004 PLOS Genetics 10 July 2014 Volume 10 Issue 7 e

11 Table 3. Concordance between top SNPs identified by different methods. Mean (standard deviation) in 1000 replicates of proportion of top t SNPs within null and true regions that overlap with top t SNPs from EM_BN Trait Method a t =5 t =10 t =15 t =20 t =25 sim-d1 Unadjusted (0.042) (0.030) (0.033) (0.032) (0.027) EM_IBS (0.017) (0.009) (0.015) (0.013) (0.012) FLMM_A (0.009) (0.003) (0.007) (0.004) (0.003) FLMM_E (0.021) (0.005) (0.008) (0.005) (0.004) GA_FA (0.018) (0.005) (0.011) (0.008) (0.008) GA_GRG (0.021) (0.011) (0.017) (0.010) (0.008) GMA_C (0.021) (0.004) (0.009) (0.005) (0.004) GMA_S (0.021) (0.005) (0.008) (0.005) (0.004) GTAM (0.022) (0.022) (0.025) (0.022) (0.020) Mendel (0.025) (0.019) (0.024) (0.021) (0.018) MMM_E (0.041) (0.004) (0.009) (0.005) (0.004) MMM_G (0.036) (0.003) (0.007) (0.005) (0.005) FBATaff (0.253) (0.115) (0.090) (0.080) (0.072) FBATboth (0.130) (0.084) (0.078) (0.075) (0.071) MASTOR (0.038) (0.024) (0.027) (0.024) (0.022) MQLS (0.062) (0.040) (0.043) (0.041) (0.038) RT (0.059) (0.037) (0.042) (0.041) (0.038) sim-d2 Unadjusted (0.060) (0.041) (0.039) (0.040) (0.036) EM_IBS (0.029) (0.024) (0.025) (0.028) (0.024) FLMM_A (0.027) (0.024) (0.025) (0.029) (0.026) FLMM_E (0.035) (0.025) (0.025) (0.030) (0.026) GA_FA (0.044) (0.024) (0.026) (0.030) (0.026) GA_GRG (0.038) (0.026) (0.027) (0.030) (0.026) GMA_C (0.035) (0.025) (0.025) (0.030) (0.026) GMA_S (0.035) (0.025) (0.025) (0.030) (0.026) GTAM (0.050) (0.036) (0.037) (0.036) (0.032) Mendel (0.051) (0.033) (0.035) (0.036) (0.031) MMM_E (0.037) (0.025) (0.025) (0.030) (0.026) MMM_G (0.028) (0.024) (0.025) (0.029) (0.026) FBATaff (0.255) (0.201) (0.157) (0.128) (0.102) FBATboth (0.246) (0.146) (0.111) (0.099) (0.088) MASTOR (0.075) (0.038) (0.038) (0.039) (0.033) MQLS (0.107) (0.056) (0.053) (0.051) (0.047) RT (0.099) (0.055) (0.053) (0.052) (0.047) sim-q Unadjusted (0.049) (0.038) (0.040) (0.034) (0.033) EM_IBS (0.020) (0.016) (0.020) (0.017) (0.015) FLMM_A (0.000) (0.000) (0.004) (0.005) (0.004) FLMM_E (0.009) (0.008) (0.005) (0.005) (0.005) GA_FA (0.006) (0.010) (0.010) (0.010) (0.012) GA_GRG (0.034) (0.010) (0.018) (0.014) (0.012) GMA_C (0.009) (0.007) (0.004) (0.004) (0.004) GMA_S (0.009) (0.008) (0.005) (0.005) (0.005) GTAM (0.032) (0.028) (0.030) (0.024) (0.022) Mendel (0.021) (0.020) (0.027) (0.022) (0.019) MMM_E (0.100) (0.008) (0.004) (0.004) (0.004) MMM_G (0.100) (0.003) (0.003) (0.004) (0.003) FBAT (0.101) (0.067) (0.059) (0.067) (0.066) MASTOR (0.020) (0.027) (0.030) (0.025) (0.023) a See Table 2 for description of methods. doi: /journal.pgen t003 PLOS Genetics 11 July 2014 Volume 10 Issue 7 e