Direct Power Comparisons between Simple LOD Scores and NPL Scores for Linkage Analysis in Complex Diseases

Size: px
Start display at page:

Download "Direct Power Comparisons between Simple LOD Scores and NPL Scores for Linkage Analysis in Complex Diseases"

Transcription

1 Am. J. Hum. Genet. 65: , 1999 Direct Power Comprisons between Simple LOD Scores nd NPL Scores for Linkge Anlysis in Complex Diseses Pul C. Abreu, 1 Dvid A. Greenberg, 3 nd Susn E. Hodge 1,2,4 1 Division of Biosttistics, School of Public Helth, nd 2 Deprtment of Psychitry, College of Physicins nd Surgeons, Columbi University, 3 Deprtments of Psychitry nd Biomthemtics, Mount Sini Medicl Center, nd 4 Division of Clinicl-Genetic Epidemiology, New York Stte Psychitric Institute, New York Summry Severl methods hve been proposed for linkge nlysis of complex trits with unknown mode of inheritnce. These methods include the LOD score mximized over disese models (MMLS) nd the nonprmetric linkge (NPL) sttistic. In previous work, we evluted the increse of type I error when mximizing over two or more genetic models, nd we compred the power of MMLS to detect linkge, in number of complex modes of inheritnce, with nlysis ssuming the true model. In the present study, we compre MMLS nd NPL directly. We simulted 100 dt sets with 20 fmilies ech, using 26 generting models: (1) 4 intermedite models (penetrnce of heterozygote between tht of the two homozygotes); (2) 6 two-locus dditive models; nd (3) 16 two-locus heterogeneity models (dmixture = 1.0,.7,.5, nd.3; = 1.0 replictes simple Mendelin models). For LOD scores, we ssumed dominnt nd recessive inheritnce with 50% penetrnce. We took the higher of the two mximum LOD scores nd subtrcted 0.3 to correct for multiple tests (MMLS-C). We compred expected mximum LOD scores nd power, using MMLS- C nd NPL s well s the true model. Since NPL uses only the ffected fmily members, we lso performed n ffecteds-only nlysis using MMLS-C. The MMLS-C ws both uniformly more powerful thn NPL for most cses we exmined, except when linkge informtion ws low, nd close to the results for the true model under locus heterogeneity. We still found better power for the MMLS-C compred with NPL in ffecteds-only nlysis. The results show tht use of two simple modes of inheritnce t fixed penetrnce cn hve more power thn NPL when the trit mode of inheritnce is complex nd when there is heterogeneity in the dt set. Received December 17, 1998; ccepted for publiction June 17, 1999; electroniclly published August 9, Address for correspondence nd reprints: Dr. Dvid Greenberg, Box 1229, Mount Sini Medicl Center, New York, NY E-mil: dg@shllot.sld.mssm.edu 1999 by The Americn Society of Humn Genetics. All rights reserved /99/ $02.00 Introduction Severl methods hve been proposed for the linkge nlysis of complex trits, including mximum likelihood bsed methods (LOD scores) nd nonprmetric pproches, such s ffected sib pir (ASP) methods nd the nonprmetric linkge (NPL) sttistic (Kruglyk et l. 1996). The mximum likelihood method uses ll the dt vilble nd is the most powerful method vilble when the true model is used. NPL is less powerful but does not require specifiction of mode of inheritnce. Although specifiction of mode of inheritnce ppers to be disdvntge of mximum LOD score (MLS) methods, it hs been shown tht LOD scores clculted with pproximted genetic prmeters (Mod score [Clerget-Drpoux et l. 1986], MMLS [Greenberg 1990], or MODs [Hodge nd Elston 1994]) re lmost s powerful s LOD scores clculted under the correct model. One cn nlyze linkge under two different genetic models nd choose the one leding to the higher LOD score, thus incresing one s chnces of detecting true linkge with miniml cost in incresed type I error (Hodge et l. 1997). We proposed tht prudent pproch to linkge nlysis in common disese is first to clculte LOD scores ssuming two simple models, dominnt nd recessive, ech with n rbitrry 50% penetrnce, then to tke the higher of the two LOD scores s the rw test sttistic, nd, finlly, to correct for multiple tests. We cll this test sttistic MMLS-C (Greenberg et l. 1998). However, the question of the power to detect linkge remined unnswered. In Greenberg et l. (1998), we compred the power of the MMLS-C with nlysis under the true model (i.e., the generting model [GM]). Anlysis under the true model is the best nlysis one cn expect nd cn be considered gold stndrd tht is unttinble for most common diseses. Using brod rnge of complex genetic models (including intermedite nd two-locus [2L] dditive models), we showed tht the MMLS-C pproch usully hd 70% or greter power to detect linkge compred with the gold stndrd. The reltive power drops only when the power to detect linkge under the true model becomes low 847

2 848 Am. J. Hum. Genet. 65: , 1999 Tble 1 Penetrnces for the Additive2 nd Additive3 Models AA A Additive2 penetrnce vlues for: BB Bb bb Additive3 penetrnce vlues for: BB Bb bb Cpitl letters denote disese lleles. (!50%). Tht work lso confirmed tht, when linkge for complex model is exmined, it is the mode of inheritnce t the linked locus being exmined tht is importnt in detection of linkge, not the overll inheritnce of the disese. The inheritnce t the linked locus is well pproximted by dominnt or recessive model with reduced penetrnce (Greenberg nd Hodge 1989). The present work focuses on the issue of power to detect linkge for MMLS-C versus NPL. Following the guidelines nd results from our two previous studies, we first compre the MMLS-C pproch with NPL. Second, we compre the power to detect linkge of MMLS-C with nlysis under the true model in the presence of heterogeneity. (Heterogeneity models were not exmined in Greenberg et l. [1998].) Third, we perform n ffecteds-only comprison between MMLS-C nd NPL, to remove the contribution of unffected fmily members, thus compring the power of the two methods on n equl footing, since NPL uses only ffecteds. We were lso interested in determining the reduction in power of MMLS-C if we used only ffected individuls. In this work we ddress three questions: (1) Wht effect does heterogeneity hve on power to detect linkge by mens of MMLS-C? (2) Wht is the power of NPL nlysis, compred with MMLS-C, for intermedite, dditive, nd heterogeneity models? nd (3) If we perform comprison using only ffected fmily members, how well does MMLS-C perform, compred with NPL nlysis? To nswer our questions, we simulted dt under number of simple nd complex models, including intermedite nd 2L dditive models, s in Greenberg et l. (1998), s well s 12 new 2L heterogeneity models. We (1) quntify nd compre the power to detect linkge of MMLS-C versus the gold stndrd of ssuming the true model in the presence of locus heterogeneity, (2) compre the power to detect linkge for MMLS-C versus NPL nlysis, nd (3) compre the power of both sttistics for ffecteds-only dt when the GMs re intermedite, dditive, or 2L heterogeneity. Methods GMs We used number of GMs, including ll those in our previous work (Greenberg et l. 1998). We exmined totl of 26 GMs: 4 single-locus intermedite, 3 2L Additive2, 3 2L Additive3, nd 16 2L heterogeneity GMs. We generted fmily dt for single mrker. We generted dt sets under the following genetic models: Intermedite Models. In these models, the heterozygote penetrnce, f 2, lies between the two homozygote penetrnces, f 1 nd f 3. We set f 1 = 90% nd f 3 = 0, nd then we vried f 2 over 10%, 30%, 50%, nd 80%. There ws lwys one disese locus linked to the mrker with recombintion frction (v) The frequency of the disese llele ws.01. These models re denoted Int10, Int30, Int50, nd Int80, respectively. 2L Additive Models. The Additive2 models require t lest two disese lleles, totl, t the two loci, for person to be ffected. One of the two disese loci is linked to the mrker, with v = 0.01; the other disese locus is unlinked. The disese llele frequency t the linked locus is fixed t.01, nd t the unlinked locus it is vried over 0.01, 0.05, nd Tble 1 shows the penetrnces for this model. The Additive3 models require t lest three disese lleles, totl, t the two loci, for person to be ffected. Only one of the two disese loci is linked to the mrker, s in the Additive2 models. Agin, the disese llele frequency t the linked locus is fixed t.01, nd t the unlinked locus it is vried over 0.01, 0.05, nd See Tble 1 for the penetrnces for this model. 2L Heterogeneity Models. Heterogeneity is generted s 2L model in which inheritnce is either dominnt or recessive t both loci, nd penetrnce is 80% or 20% t both loci. We did not generte dt sets in which one locus hs dominnt nd the other recessive mode of inheritnce. v for the linked locus in the heterogeneity models is lso We generted dt in Tble 2 Penetrnces for the Heterogeneity Models D + D nd R + R AA A D D penetrnce vlues for: BB Bb bb R R penetrnce vlues for: BB Bb bb Cpitl letters denote disese lleles; thus, in the R R models, A nd B re the recessive lleles, with frequencies q 1 nd q 2, respectively.

3 Abreu et l.: Power Comprisons between LOD nd NPL Scores 849 simulted ccording to well-chrcterized fmily-size distribution (Cvlli-Sforz nd Bodmer 1971). All mtings were fully informtive for the mrker. Fmilies were selected for linkge nlysis if they hd t lest two ffected children. Dt sets were generted by use of our extensively tested simultion progrm (Greenberg 1989; Durner nd Greenberg 1992; Greenberg nd Doneshk 1996), which uses rndom process for ech step in the simultion (e.g., selecting the mting type, fmily size, nd segregtion lleles from prents to offspring). For the 2L models, we specified the penetrnce of ech of the nine possible genotypes. Figure 1 Power curves for D50, R50, NPL, TRUE, nd MMLS- C nlyses of 100 dt sets generted under the Int50 model ( f 1 =.9, f =.5, f = 0). 2 3 which there ws linkge between the mrker nd 100% (H100), 70% (H70), 50% (H50), nd 30% (H30) of fmilies segregting disese in the generl popultion. Throughout the present rticle, we refer to these GMs by the mode of inheritnce, penetrnce, nd percent of fmilies with linkge in the dt set; for exmple, D20/ H70 represents GM with dominnt mode of inheritnce, 20% penetrnce, nd 70% fmilies with linkge in the dt set. We refer to the 2L heterogeneity models s D D nd R R. This follows the nottion of previous publictions (Durner nd Greenberg 1992) in which 2L heterogeneity models re referred to s D D (i.e., D or D). This is in contrst to DD (D nd D), which indictes 2L episttic model (Greenberg 1981). The popultion prevlence for ll heterogeneity models ws set t 1%, resulting in different gene frequencies, depending on the model. For the dominnt models with linkge in 100% of fmilies ( =1), the disese llele frequency ws lwys.006. When the GM ws recessive nd =1, the disese llele frequency ws lwys.01. For the remining heterogeneity models with.7, the gene frequencies ssumed single-locus nlysis nd were clculted s follows: Q=k/f, where Q represents the popultion frequency of t-risk genotypes, k represents popultion prevlence, nd f is the generting penetrnce. Then Q=1 q1 7 q2 for D D models, nd Q=1 (1 q 1)(1 q 2) for R R models, where, in ll models, p i is the frequency of the dominnt llele nd q i is the frequency of the recessive llele t the ith locus (see tble 2 for the cse f=1). Dt Simultion For ech of our 26 GMs, 100 dt sets of 20 nucler fmilies ech were simulted. The nucler fmilies were Anlysis Models (AMs) We nlyzed the simulted dt for linkge, using twopoint prmetric nd nonprmetric methods. We used the following sttistics: MMLS-C Anlysis. We chose n rbitrry penetrnce of 50% to nlyze our dt, s described in the Introduction. Misspecifiction of the penetrnce does not generlly hve strong effect on the LOD score (only on estimtion of v), s long s the dominnce is specified correctly (Greenberg nd Hodge 1989; Hodge nd Elston 1994). We used the following lgorithm: 1. Anlyze under the ssumption of simple dominnt inheritnce, with 50% penetrnce (D50). 2. Anlyze under the ssumption of simple recessive inheritnce, with 50% penetrnce (R50). 3. Choose the lrger of the two resultnt mximum LOD score (Z mx ) vlues s the MMLS score. 4. Correct for increse in type I error by subtrcting 0.3 from the MMLS. The resultnt score is the corrected MMLS score (MMLS-C). Tble 3 ELODs nd ELOD Stndrd Devitions for the Intermedite nd Additive GMs under Different AMs GM AM MMLS-C NPL TRUE Intermedite: f 2 = (1.24) 2.69 (1.20) 3.57 (1.24) f 2 = (1.48) 3.06 (1.09) 4.24 (1.34) f 2 = (1.66) 3.86 (1.43) 5.98 (1.67) f 2 = (1.80) 5.89 (1.69) (2.22) Additive with two lleles: b (1.60) 3.61 (1.54) 4.97 (1.60) (1.22) 1.30 (1.05) 1.99 (1.13) (.69).49 (.64) 1.19 (1.72) Additive with three lleles: b (2.14) 5.58 (1.75) 6.47 (2.07) (1.39) 3.21 (1.42) 4.05 (1.42) (1.45) 3.18 (1.10) 4.18 (1.36) NOTE. ELOD stndrd devitions given in prentheses Vlues shown re f 2 ; f 1 is fixed t.9; see text. b Vlues shown re gene frequencies t the unlinked locus; see text.

4 850 Am. J. Hum. Genet. 65: , 1999 Tble 4 ELODs nd ELOD Stndrd Devitions for the GMs for Affecteds-Only Anlyses GM MMLS-C AM NPL Intermedite: f 2 = (1.14) 2.69 (1.20) f 2 = (.99) 3.06 (1.09) f 2 = (1.36) 3.86 (1.43) f 2 = (1.44) 5.89 (1.69) Additive with two lleles: b (1.38) 3.61 (1.55) (1.04) 1.30 (1.05) (.58).49 (.64) Additive with three lleles: b (1.89) 5.58 (1.73) (1.27) 3.21 (1.42) (1.05) 3.18 (1.10) Homogeneity ( = 1): D20/H (1.00) 2.74 (1.14) D80/H (1.58) 6.06 (2.01) R20/H (1.77) 6.00 (1.53) R80/H (1.36) 8.12 (1.99) Heterogeneity ( =.7): D80/H (1.52) 2.92 (1.53) R80/H (1.96) 5.13 (1.95) Heterogeneity ( =.5): D80/H (1.28) 1.89 (1.28) R80/H (1.60) 3.06 (1.55) NOTE. ELOD stndrd devitions given in prentheses. Vlues shown re f 2 ; f 1 is fixed t.9. b Vlues shown re gene frequencies t the unlinked locus. The LOD scores were clculted by GENEHUNTER for ll single-locus models (D, R, nd Int). TMLINK (Lthrop nd Ott 1990) ws used to clculte the TRUE (nlysis under the true model) for the twolocus models (dditive nd heterogeneity). To clculte MMLS-C for the heterogeneity models, we used the mximum heterogeneity LOD score of GENEHUNTER, which is mximized over. NPL Anlysis. The NPL method (Kruglyk et l. 1996) cn use either NPL ll or NPL pirs, both of which use only ffected individuls. We clculte only NPL ll. Kruglyk et l. (1996) constructed the NPL score on the bsis of score sttistic (Whittemore nd Hlpern 1994, 1994b). Theoreticlly, once the score sttistic is stndrdized with the pproprite weights, it follows stndrd norml distribution (symptoticlly). The normlized NPL score for the ith pedigree under the null hypothesis of no linkge hs men 0 nd vrince 1. NPL nlysis is implemented in the computer progrm GENEHUNTER (Kruglyk et l. 1996). Asymptoticlly, the NPL scores follow norml distribution, llowing us to trnsform NPL scores into 2 LOD-score units: (NPL) /(4.605) = LOD. To confirm the ssumption of normlity, we plotted both the trnsformed NPL scores nd the LOD scores on the horizontl xis of one grph. On second grph we then plotted the exct significnce levels (P vlues) obtined by GENEHUNTER s vlues on the horizontl xis. (Figures re not shown but re vilble on request.) Since ll mtings were fully informtive for the mrker, we would expect to see pproximtely the sme power curve in both grphs. For ll the GMs, the originl NPL scores followed norml distribution. The NPL score clculted by GENEHUNTER ws recognized to be conservtive in the presence of missing dt. Kong nd Cox (1997) proposed sttistic (KAC) tht hs the pproprite significnce level regrdless of whether there is missing informtion. The GENE- HUNTER PLUS progrms implement KAC. Bdner et l. (1998) showed tht KAC is more powerful thn the NPL score, depending on how much informtion is missing. In our study, we use fully informtive mrker, nd ll fmily members re genotyped; therefore, the NPL hs the sme power s the KAC. Affecteds-Only Anlysis. Since the NPL score uses only informtion from ffected individuls, wheres LOD-score clcultions use ll vilble individuls in the pedigrees, we would expect some loss of power for the NPL sttistic for tht reson lone. Therefore, we were lso interested in determining the reduction in power of MMLS-C if we used only ffected individuls. To nswer this question, we did second type of comprison, coding ll unffected individuls s unknown nd evluting the performnce of MMLS-C. We did this nlysis for most of the 26 GMs, except when power to detect linkge even under the true nlysis ws low or when the stndrd devition of the expected mximum LOD scores (ELODs) ws high. (The excluded GMs were =.70 nd =.50 t low penetrnces, nd ll models with =.30.) Results Clcultion nd Presenttion of Power Results We report three different test sttistics: We focus primrily on the MMLS-C nd the NPL in LOD-score units. In ddition, ll dt sets were nlyzed for linkge under the true model. The mximum LOD score from this nlysis is reported s the TRUE score. We performed two types of nlyses, including ll fmily dt nd the ffecteds-only nlyses. For ll nlyses, ELODs were clculted by tking the men of the 100 vlues of the prticulr sttistic. We lso clculted the stndrd devitions of the ELODs. For the power clcultions, the vlues of ech sttistic were ordered from highest to lowest over the 100 dt sets for given model. Observed power levels, P(Z), were determined s function of score for ech test sttistic T (T is mximum Z mx score, NPL score, or mximum

5 Abreu et l.: Power Comprisons between LOD nd NPL Scores 851 Figure 2 Power curves for D50, R50, NPL, TRUE, nd MMLS- C nlyses of 100 dt sets generted under the Additive3 model. The unlinked gene frequency is.01. HLOD score), s follows: P(Z) { ( number of dt sets yielding T Z)/N, where N represents the number of dt sets generted for the simultion (i.e., 100). Figures 1 4 show selected power curves for MMLS- C nd NPL, s well s curves for the corresponding D50 nd R50 nlyses (i.e., without correction for multiple testing) for comprison. In the grphs, power is plotted s function of LOD scores nd NPL scores. For exmple, if LOD score of 3.0 (corresponding to n NPL score of 3.7) on the X-xis shows power of 0.8 on the Y-xis, this mens tht 80% of the 20-fmily dt sets reched LOD score of 3.0 or higher. We now describe the results for the different generting models. Intermedite Models. For the intermedite models, s the heterozygote penetrnce rises, MMLS-C consistently outperforms NPL, reching difference in ELODs of 3.3 LOD-score units in 20-fmily dt sets when the GM is Int80. Tble 3 presents the ELODs nd the respective ELOD stndrd devition for MMLS-C, NPL, nd TRUE scores for the intermedite nd dditive GMs. (We include the TRUE results for the intermedite nd dditive models for ese of comprison; however, these comprisons were lredy mde in Greenberg et l ) The NPL ELODs re lmost s good s MMLS- C (rtio NPL:MMLS C = 0.99) when the power to detect linkge ws low (only 64%) even with the gold stndrd nlysis, but this rtio drops to 0.64 s the informtion for linkge increses (tble 3). For exmple, when the f 2 penetrnce is high for intermedite models, the power to detect linkge is high. At low f 2 penetrnce, the power to detect linkge is low for both methods, but, for ll thresholds we exmined, MMLS-C performs better thn NPL. Figure 1 shows the power curves for one intermedite model, Int50, where, for threshold of 3.0 (P vlue =.001), the power for MMLS-C is 95%, nd for NPL 64%. For threshold of 4.0 there ws drop in power of 50% for NPL compred with MMLS- C. Tble 4 shows ELODs nd stndrd devitions (in prentheses) for the two sttistics for complex models in n ffecteds-only comprison. The NPL nlysis remined the sme s in tble 3. Tble 4 shows tht, for the intermedite models, the differences in ELODs between the two sttistics re smller thn when ll fmily members re included. The ELOD differences between the sttistics rnge from 0.23 to But s the heterozygote penetrnce increses, so does the difference in ELODs of MMLS-C nd NPL. Although, for the intermedite models, the unffected individuls contribute little informtion for linkge, there is still n increse in ELOD of 24% under MMLS-C compred with NPL (tble 4) when the GM is Int50. 2L Additive Models: Additive2 Models. For the Additive2 model, when both loci hve the sme gene frequency (.01), MMLS-C gives n ELOD of 4.13, wheres NPL yields n ELOD of 3.61 (tble 3). For the Additive2 model, the power to detect linkge decreses s the frequency of the disese llele t the unlinked locus increses. As the informtion for linkge decreses, we expect tht the power for both MMLS-C nd NPL will be similr, since the effect of the model ssumptions on the nlysis will be smll. This is wht we observe. When the gene frequency t the unlinked locus for the Additive2 models is.05 or.1, the ELODs for MMLS-C nd NPL hve similr vlues. This similrity rises from the fct tht there is not much informtion for linkge in these models. The stndrd devitions of the ELODs for MMLS-C nd NPL under these two models re reltively lrge (close to the corresponding ELOD). The ELODs Tble 5 ELODs nd ELOD Stndrd Devitions for the 2L Heterogeneity GMs under Different AMs GM AM MMLS-C NPL TRUE Homogeneity ( = 1): D20/H (1.36) 2.74 (1.14) 3.88 (1.09) D80/H (2.00) 6.06 (2.01) (2.47) R20/H (2.13) 6.00 (1.53) 7.48 (1.89) R80/H (1.85) 8.12 (1.99) (2.21) Homogeneity ( =.7): D20/H (1.09) 1.60 (1.03) 1.73 (.96) D80/H (1.96) 2.92 (1.53) 5.52 (2.11) R20/H (1.46) 2.37 (1.50) 2.71 (1.41) R80/H (2.38) 5.13 (1.95) 8.15 (2.54) Homogeneity ( =.5): D20/H50.94 (.91) 1.05 (.89).97 (.70) D80/H (1.78) 1.89 (1.28) 3.69 (1.94) R20/H (1.18) 1.53 (1.09) 1.70 (1.00) R80/H (1.81) 3.06 (1.55) 4.69 (1.82) NOTE. ELOD stndrd devitions given in prentheses.

6 852 Am. J. Hum. Genet. 65: , 1999 Tble 6 Power to Achieve Given Z Vlue, under the MMLS-C nd True Models, nd MMLS-C:TRUE (M:T) Rtio under Locus Heterogeneity MODEL POWER TO ACHIEVE Z = 3.0 POWER TO ACHIEVE Z = 4.0 MMLS-C TRUE M:T Rtio MMLS-C TRUE M:T Rtio Homogeneity ( = 1): D20/H D80/H R20/H R80/H Heterogeneity ( =.7): D20/H D80/H R20/H R80/H Heterogeneity ( =.5): D20/H D80/H R20/H R80/H Little informtion for linkge. Power under the true model 10%. for the true model under these two models re lso less thn 3.0. Thus, the power to detect linkge under ny nlysis conditions is reltively low for these GMs. There is little difference in the power of NPL nd MMLS-C for ffecteds-only when the GMs re dditive (tble 4). 2L Additive Models: Additive3 Models. When three disese lleles re required for n individul to be ffected, figure 2 suggests tht recessive inheritnce provides good pproximtion to this model with disese llele frequency t the unlinked locus.01, the sme s t the linked locus (i.e., the gene frequency combintion.01,.01). As the frequency of the disese llele t the unlinked locus increses to.05, we see drop in power. At (.01,.1), the ELODs increse gin. For this gene frequency combintion, dominnt inheritnce seems to provide better description of the inheritnce model (dt not shown). The difference in ELODs between MMLS-C nd NPL is not very gret in this model. MMLS-C hs higher ELODs by pproximtely hlf LOD-score unit, compred with NPL, nd 8% greter power to detect linkge (fig. 2). When the gene frequency of the unlinked locus is.1, the ELOD for MMLS-C is 3.78, versus 3.18 when the NPL test sttistic is used, s shown in tble 3. The ELODs rtio for NPL versus MMLS-C rnges from 85% to 93%. 2L Heterogeneity Models. For the models with heterogeneity, the power to detect linkge for ll nlysis methods decreses s the percentge of fmilies with linkge in the dt set decreses, s expected. Also, for ll levels, s the generting penetrnce increses, the MMLS-C nd NPL scores increse. The ELODs for the GMs under locus heterogeneity re presented in tble 5. When the GM is D20/H100, the ELOD from MMLS-C is 3.17, versus 2.74 for the NPL sttistic nd 3.88 for TRUE. When the generting penetrnce is 80%, the difference between MMLS-C nd NPL is 3.3 LOD-score units (tble 5). Tble 6 compres the power chieved by the MMLS- C versus the TRUE sttistics, t LOD-score thresholds of 3.0 nd 4.0. Tble 6 lso shows the rtio of the MMLS-C power to the TRUE power in the presence of locus heterogeneity. The MMLS-C pproch usully hd 70% of the power to detect linkge compred with the power obtined with the TRUE nlysis. When the power to detect linkge under the GM is!50%, the power of MMLS-C drops to 56%. Tble 7 presents the power to chieve Z of 3.0 nd 4.0 by MMLS-C nd NPL nlyses, for ll GMs. There is corresponding difference in power of 40% when NPL versus MMLS-C is used for the D20/H100 GMs; 55% of dt sets rech threshold of 3.0 when MMLS- C is used; 33% of dt sets rech this level when NPL is used. For the recessive models, both sttistics re robust compred with the TRUE sttistic. For R20/H100, when ELODs for NPL re compred with ELODs for MMLS- C, we see rtio of 0.87 (tble 5). For the GM R80/ H100, there is 100% power to detect linkge if one uses MMLS-C or NPL for 3.0 nd 4.0 thresholds (tble 7). However, for higher heterogeneity LOD-score thresholds, MMLS-C hs better power; for exmple, if we look t cutoff Z=5.0, the power of NPL drops to 90%, wheres, with the MMLS-C, ll dt sets still rech this threshold ( power = 100% ). For the D80/H70 model, the ELOD for MMLS-C is 4.74 versus the 2.92 for the ELOD of NPL. Figures 3 nd 3b represent the power obtined with MMLS-C nd

7 Abreu et l.: Power Comprisons between LOD nd NPL Scores 853 Tble 7 Power to Achieve Given Z Vlue, for NPL, MMLS-C, NPL:MMLS-C (N:M) Rtio POWER TO ACHIEVE Z = 3.0 POWER TO ACHIEVE Z = 4.0 MODEL MMLS-C NPL N:M Rtio MMLS-C NPL N:M Rtio Intermedite: f 2 = f 2 = f 2 = f 2 = Additive with two lleles: b ) d ) d ) d ) d ) d ) d Additive with three lleles: b Homogeneity ( = 1): D20/H D80/H R20/H R80/H Heterogeneity ( =.7): D20/H c.04 ) d ) d D80/H R20/H R80/H Heterogeneity ( =.5): D20/H c.01 ) d ) d D80/H R20/H c R80/H Vlues shown re f 2 ; f 1 is fixed t.9. b Vlues shown re gene frequencies t the unlinked locus. c Little informtion for linkge. Power under the true model 10%. d Power = 0%. NPL for R80/H100 nd R80/H70. We see tht the power difference between MMLS-C nd NPL decreses s decreses. In both figures, we cn see tht the MMLS- C outperforms NPL. We see similr pttern for the H50 dominnt nd recessive with 80% penetrnce (D80/H50 nd R80/H50) models. For the low-penetrnce models in which = 0.5, the mximum difference between MMLS-C nd NPL is smll nd the informtion for linkge is low (stndrd devition of ELOD 0.9). Both sttistics lck power to detect linkge. All models in H30 gve smll ELODs with high stndrd devitions nd low power to detect linkge, s we expect when there is little informtion for linkge. For the ffecteds-only nlysis, when the GM is homogeneous ( =1), the simple Mendelin D20 hs n ELOD for MMLS-C of 3.42, versus n ELOD of 2.74 for the NPL sttistic. For D80/H70 the ELOD for the true model is The ELOD for MMLS-C ws 3.15 nd, for NPL, Thus, use of NPL nlysis would led to slight loss in power to detect linkge. When the GM is recessive with 80% penetrnce under homogeneity ( =1) nd heterogeneity ( =.7), we ob- served high power for both NPL nd MMLS-C in the ffecteds-only nlysis (fig. 4). For the ffecteds-only nlysis under heterogeneity, the ELODs for MMLS-C re higher thn for NPL; the difference rnged from 0.2 to 0.9, s shown in tble 4. We lso verify tht, with linkge in 50% of fmilies, the power to detect linkge is low nd MMLS-C nd NPL hve very close ELODs, with stndrd devitions of Discussion The purpose of this simultion study ws to nswer three questions: First, how does MMLS-C perform compred with the TRUE nlysis in the presence of locus heterogeneity? Second, how does the power of NPL nlysis compre with the power of the MMLS-C nlysis for complex models nd for heterogeneity models? Third, when only ffected individuls re included in the nlysis, how does MMLS-C perform compred with

8 854 Am. J. Hum. Genet. 65: , 1999 Figure 3 Power curves for D50, R50, NPL, TRUE, nd MMLS-C nlyses of 100 dt sets generted under locus heterogeneity:, GM R80, =1. b, GM R80, =.7. NPL nlysis? We hve shown tht, for the GMs we exmined, the MMLS-C pproch does not substntilly decrese the power to detect linkge compred with the true model, even in the presence of heterogeneity. The generl pttern ws tht NPL hd lower ELODs thn the MMLS-C under ll the models exmined. In the presence of locus heterogeneity, s the informtion for linkge in dt set increses, the difference between MMLS-C nd NPL increses, whether or not unffected fmily members re included. MMLS-C nd NPL hd pproximtely equl power to detect linkge when there ws very little informtion for linkge (fig. 5). When the TRUE hd power!85%, the generl pttern Figure 4 Power curves for MMLS-C nd NPL nlyses of 100 dt sets generted s R80 with =1nd with =.7for ffecteds- only nlyses. for models under locus heterogeneity ws tht NPL hd less power thn MMLS-C. As the proportion of fmilies with linkge increses (i.e., s pproches 1), the difference in ELODs for MMLS-C nd NPL lso increses (fig. 4 nd fig. 5). Performnce of the MMLS-C nlysis with ffecteds only lowers the power of the nlysis compred with nlysis tht includes unffected individuls. As tble 4 shows, the MMLS-C power is still higher thn tht of NPL for mny (but not ll) of the models we exmined. We lso compred the power of MMLS-C to detect linkge when ll individuls re included with the power of MMLS-C when only ffected individuls re included (tble 8). When penetrnce ws high, excluding the unffected individuls lowered the power on verge by 25%. In the two cses (GM is Int10 or D20/H100) in which the power ws slightly higher for the ffectedsonly nlysis, the ELODs hd lrge stndrd devition (see tble 8). In the present study, we simulted 26 different models, to look t the power to detect linkge for MMLS-C versus NPL. In Greenberg et l. (1998), we reported only results for v = 0.0. In fct, we hd lso exmined v = 0.01 nd v = There ws no inherent difference in the behvior of MMLS-C, except tht, of course, the LOD scores were higher when v ws smller. Our focus in the present study is on the reltive power of MMLS- C nd NPL. Therefore, we did not explore the v for this study. Also, becuse of the lrge number of models, we wnted to keep to resonble number of clcultions. Note tht we investigted only nucler fmilies. Vielnd et l. (1992, 1993) looked t the nlysis of 2L models, using single-locus nlysis for nucler fmilies nd pedigrees. Given the results from the Vielnd et l.

9 Abreu et l.: Power Comprisons between LOD nd NPL Scores 855 Figure 5 Expected HLOD scores for TRUE, MMLS-C, nd NPL nlyses of 100 dt sets generted under locus heterogeneity:, GM D80 t =1,.7,.5, nd.3. b, GM D20 t =1,.7,.5, nd.3. c, GM R80 t =1,.7,.5, nd.3. d, GM R20 t =1,.7,.5, nd.3. studies, we would not expect fundmentl differences for nucler fmilies versus pedigrees. There re currently no mens to incorporte heterogeneity in model-free nlysis. Therefore, model-free methods will be wekened in their bility to detect linkge in the presence of heterogeneity. In contrst, LODscore methods llow us to estimte (i.e., the percentge of fmilies with linkge in the dt set). When looking t heterogeneity models, we mximized the likelihood with respect to (HLOD), but we re simultneously mximizing the LOD score over v. This could be viewed s introducing nother degree of freedom (Ott 1991) nd therefore requiring further correction of the significnce level. On the other hnd, in two-point nlysis, the estimtes of v nd re highly correlted, so perhps mximizing over does not dd degree of freedom. To get n ide of the distribution of the HLOD, we compred HLOD with different x 2 curves. We simulted dt sets with no linkge under two dominnt nd two recessive 2L heterogeneity models. Dt sets were nlyzed for linkge by mximizing the HLOD over dominnce model. The resulting significnce levels very closely mtched two-sided x 2 1, just s the rw (uncorrected) MMLS curves hd done (Hodge et l. 1997). (Figure not shown but vilble on request.) This mens tht, for given type I error, n investigtor would need to increse the LOD score used s cutoff by 0.3 LOD-score units the sme correction s for MMLS-C; tht is, n dditionl correction for type I error is not needed for mximiztion of HLOD. Therefore, since the mximum HLOD distribution follows pproximtely the sme distribution s the mximum LOD distribution, we use the sme scle nd threshold LOD scores in our tbles nd figures. As expected, the power to detect significnt evidence of linkge is reduced in the presence of heterogeneity. We found tht the power is reduced by 2% 90% (not shown), depending on both the mount of heterogeneity in the dt set nd the penetrnce of the disese. The combintion of low penetrnce nd even moderte level of heterogeneity cn noticebly reduce power. Figures 5 nd 5c show tht, even when the MMLS-C hs more power, there is resonbly good power to detect linkge for both MMLS-C nd NPL when penetrnce is high. Figures 5b nd 5d show tht the power to detect linkge is low when 70% nd there is low pene- trnce. Figures 5 nd 5c lso show tht, s the percent of fmilies with linkge in the dt set increses, so does the difference in the power to detect linkge for MMLS- C nd NPL. In complex diseses, in which the trit my be influenced by severl different loci, t ech locus, either one

10 856 Am. J. Hum. Genet. 65: , 1999 Tble 8 Comprison of MMLS-C ELODs for the GMs when Unffected Individuls Are Included nd for Affecteds-only Anlyses GM ELODS Affecteds Only All Affecteds:All Intermedite: f 2 = c 2.71 c 1.08 f 2 = f 2 = f 2 = Additive with two lleles: b Additive with three lleles: b Heterogeneity ( = 1): D20/H c 3.17 c 1.08 D80/H R20/H R80/H Heterogeneity ( =.7): D80/H R80/H Heterogeneity ( =.5): D80/H R80/H Vlues shown re f 2 ; f 1 is fixed t.9. b Vlues shown re gene frequencies t the unlinked locus. c Lrge stndrd devition of ELOD 1.2. or both lleles contribute to the trit expression, thus pproximting either dominnt or recessive inheritnce t the specific locus. Our previous work (Greenberg et l. 1998) nd severl other studies (Greenberg 1989; Vielnd et l. 1992; Goldin nd Weeks 1993; Durner et l. 1999) showed tht the importnt ssumption in the nlysis is the mode of inheritnce t the specific disese locus being nlyzed. The ction of the other locus cn be incorported into the reduced penetrnce (Greenberg nd Hodge 1989). Greenberg nd Berger (1994) investigted the relibility of method for determining the mode of inheritnce from the linkge dt. The method exmined the difference between the mximum LOD scores clculted under the dominnt nd recessive AMs. They showed tht, if this difference ws 1.5, then the higher of the two mximum LOD scores reflected the correct mode of inheritnce with high relibility. A difference of 2.5 essentilly gurntees correct mode of inheritnce inference. Therefore, one cn gin knowledge bout the mode of inheritnce of the disese, using the MMLS-C pproch. For the Additive3 model, we sw tht the gene frequency t the unlinked locus determined which ssumed mode of inheritnce t the linked locus led to the higher LOD score. When the model t the linked locus is misspecified, the LOD score drops, leding to loss of power to detect linkge. In Durner et l. (1999), the uthors crefully exmined the work of Dizier et l. (1996). Dizier et l. (1996) nlyzed linkge dt from complex inheritnce both with ASP nd with LOD scores. But the LOD-score nlysis used genetic prmeters derived from segregtion nlysis. Dizier et l. (1996) concluded tht there re certin models in which ASP nlysis hs more power to detect linkge thn LOD scores. However, Durner et l. (1999) showed tht, hd Dizier et l. (1996) used the MMLS-C nlysis insted of using prmeters from segregtion nlysis, they would hve hd more power to detect linkge using LOD scores thn either ASP or NPL. Vrious studies hve compred the power of different linkge methods with the NPL sttistic. Lin et l. (1997) evluted the performnce of NPL under single Mendelin models nd models with heterogeneity nd concluded tht, under model with mjor gene effect, likelihood-bsed methods (MMLS) tend to be more powerful. However, for minor gene effect, the NPL sttistic is generlly superior to the other tests. Dvis nd Weeks (1997) lso exmined vriety of sttistics for linkge nlysis with different GMs nd fmily structures. They showed tht NPL hd lower power compred with other methods when there ws heterogeneity in the dt nd when fmilies were scertined through two or more ffected children. We hve focused on two-point prmetric versus twopoint model-free nlysis. We looked t some specific GMs (one-locus nd 2L), trying to cover brod rnge of genetic models. We concluded tht MMLS-C hs better power thn NPL under the rnge of GMs we exmined. Our intention ws to show tht prmetric methods remin powerful tool even when the underlying genetic model is unknown. However, there might be circumstnces in which model-free methods will be better suited for genetic nlysis thn the prmetric methods. Our current work in preprtion looks into power comprisons between MMLS-C nd NPL for multipoint nlysis for the sme GMs. Our results thus fr demonstrte tht the conclusions of this work pply eqully well to multipoint nlysis. We conclude tht 1. Our proposed sttistic MMLS-C is simple nd robust, nd its power to detect linkge is often lmost s gret s tht obtined with the true model. 2. MMLS-C hs more power thn NPL for complex models. 3. MMLS-C yields better power to detect linkge thn NPL under heterogeneity ( ( 1). 4. MMLS-C hs more power thn NPL when only ffecteds re nlyzed. For the ffecteds-only nlysis,

11 Abreu et l.: Power Comprisons between LOD nd NPL Scores 857 the MMLS-C ws uniformly more powerful thn NPL for most of the cses we exmined. 5. As informtion for linkge goes down, so does the difference between MMLS-C nd NPL. 6. When only ffected fmily members were nlyzed, the expected LOD score ws on verge 25% lower thn when we included the unffected fmily members. 7. The inheritnce t one locus pproximtes either dominnt or recessive inheritnce. An dvntge of MMLS-C is tht it provides informtion bout the mode of inheritnce t the locus being tested, wheres NPL does not. The results show tht our pproch, using two simple modes of inheritnce t fixed penetrnce, cn hve more power thn NPL when the trit mode of inheritnce is complex nd in the presence of locus heterogeneity. Mendelin models, despite their simplicity, provide resonble pproximtion for locus-by-locus serch for disese genes. Acknowledgments We thnk Dr. Mrtin Durner for helpful discussions nd Mehdi Keddche for computer ssistnce. This work ws supported by NIH grnts MH-48858, DK-31813, MH-28274, DK-31775, NS-27941, nd DK References Bdner JA, Gershon ES, Goldin LR (1998) Optiml scertinment strtegies to detect linkge to common disese lleles. Am J Hum Genet 63: Cvlli-Sforz LL, Bodmer WF (1971) The genetics of humn popultions. Freemn, Sn Frncisco Clerget-Drpoux FC, Bonïti-Pellié C, Hochez J (1986) Effects of misspecifying genetic prmeters in lod score nlysis. Biometrics 42: Dvis S, Weeks DE (1997) Comprison of nonprmetric sttistics for detection of linkge in nucler fmilies: single mrker evlution. Am J Hum Genet 61: Durner M, Greenberg DA (1992) Effect of heterogeneity nd ssumed mode of inheritnce on lod scores. Am J Med Genet 42: Durner M, Vielnd VJ, Greenberg DA (1999) Further evidence of incresed power of LOD scores compred with nonprmetric methods. Am J Hum Genet 64: Dizier MH, Bbron MC, Clerget-Drpoux FC (1996) Conclusions of LOD-score nlysis for fmily dt generted under two-locus models. Am J Hum Genet 58: Goldin LR, Weeks DE (1993) Two-locus models of disese: comprison of likelihood nd nonprmetric linkge methods. Am J Hum Genet 53: Greenberg DA (1981) A simple method for testing two-locus models of inheritnce. Am J Hum Genet 33: (1989) Inferring mode of inheritnce by comprison of lod scores. Am J Med Genet 34: (1990) Linkge nlysis ssuming single-locus mode of inheritnce for trits determined by two loci: inferring mode of inheritnce nd estimting penetrnce. Genet Epidemiol 7: Greenberg DA, Abreu PC, Hodge SE (1998) The power to detect linkge in complex disese using simple genetic models. Am J Hum Genet 63: Greenberg DA, Berger B (1994) Using lod-score differences to determine mode of inheritnce: simple, robust method even in the presence of heterogeneity nd reduced penetrnce. Am J Hum Genet 55: Greenberg DA, Doneshk P (1996) The prtitioned ssocition-linkge (PAL) test: distinguishing necessry from susceptibility loci. Genet Epidemiol 13: Greenberg DA, Hodge SE (1989) Linkge nlysis under rndom nd genetic reduced penetrnce. Genet Epidemiol 6: Hodge SE, Abreu PC, Greenberg DA (1997) Mgnitude of type I error when single-locus linkge nlysis is mximized over models: simultion study. Am J Hum Genet 60: Hodge SE, Elston RC (1994) Lods, wrods, nd mods: the interprettion of lod scores clculted under different models. Genet Epidemiol 11: Kong A, Cox NJ (1997) Allele shring models: lod scores nd ccurte linkge tests. Am J Hum Genet 61: Kruglyk L, Dly MJ, Reeve-Dly MP, Lnder ES (1996) Prmetric nd nonprmetric linkge nlysis: unified multipoint pproch. Am J Hum Genet 58: Lthrop GM, Ott J (1990) Anlysis of complex diseses under oligogenic models nd intrfmilil heterogeneity by LINK- AGE progrms. Am J Hum Genet 47:A188 Lin MW, Zho JH, Curtis D, Shm P (1997) Power comprisons of nonprmetric linkge tests. Am J Med Genet (Neuropsychitr Genet) 74:601 Ott J (1991) Anlysis of Humn Genetic Linkge. 2d ed. Johns Hopkins University Press, Bltimore Vielnd VJ, Greenberg DA, Hodge SE (1993) Adequcy of single-locus pproximtions for linkge nlyses of oligogenic trits: extension to rbitrry pedigree structures. Hum Hered 43: Vielnd VJ, Hodge SE, Greenberg DA (1992) Adequcy of single-locus pproximtions for linkge nlyses of oligogenic trits. Genet Epidemiol 9:45 59 Whittemore AS, Hlpern J (1994) Probbility of gene identity by descent: computtion nd pplictions. Biometrics 50: Whittemore AS, Hlpern J (1994b) A clss of tests for linkge using ffected pedigree members. Biometrics 50: