Evaluating the statistical power of goodness-of-fit tests for health and medicine survey data

8 th World IMACS / MODSIM Congress, Carns, Australa 3-7 July 29 http://mssanz.org.au/modsm9 Evaluatng the statstcal power of goodness-of-ft tests for health and medcne survey data Steele, M.,2, N. Smart, C. Hurst 3 and J. Chaselng 4 PHCRED, Faculty of Health Scence and Medcne, Bond Unversty 2 Faculty of Busness, Technology and Sustanable Development, Bond Unversty 3 Faculty of Health, Queensland Unversty of Technology 4 Faculty of Health Scence and Medcne, Bond Unversty Emal: msteele@bond.edu.au Abstract: oodness-of-ft test statstcs are wdely used n health and medcne related surveys however lttle regard s usually gven to ther statstcal power. Ths paper nvestgates the smulated power of fve categorcal goodness-of-ft test statstcs used to analyze health and medcne survey data collected on a 5-pont Lert scale. The test statstcs used n ths power study are Pearson s Ch-Square, the Kolmogorov-Smrnov test statstc for dscrete data, the Log-Lelhood Rato, the Freeman-Tuey and the specal case of the Dvergence statstc defned by Cresse and Read (984). Recommendatons based on these smulatons are provded on whch of these Decreasng trend Trangular Step Platyurtc goodness-of-ft test statstcs s the most powerful overall and whch s the most powerful for the Fgure. Type of alternatve dstrbutons used n the predefned unform null aganst the four general power studes. shaped alternatve dstrbutons (see Fgure ) nvestgated n ths paper. Keywords: goodness-of-ft, power, ch-square tests, dscrete 92

Steele et al., Evaluatng the statstcal power of goodness-of-ft tests for health and medcne survey data. INTRODUCTION Although wdely used n the analyss of health and medcne related survey data and other types of categorcal data, only a lmted number of publshed studes are avalable on the power of dscrete goodness-of-ft test statstcs. Those avalable typcally compare the power of the Pearson (9) Ch-Square wth test statstcs based on the emprcal dstrbuton functon (e.g. Choulaan et al., 994, Petttt and Stephens, 977, From, 996, Steele and Chaselng, 26) however the recent papers by Steele et al. (28) and Ampadu (28) do compare several goodness-of-ft test statstcs that are asymptotcally equvalent to the χ 2 dstrbuton. Ths paper consders dscrete dstrbutons on a 5- pont Lert (932) scale as commonly used n health and medcne surveys and compares the Monte Carlo smulated power of Pearson s Ch- Table. Test statstcs used n health and medcne surveys usng Lert data Test Statstc Equaton ( O E ) 2 Pearson s Ch-Square 2 Χ = (Pearson, 9) = E Dscrete Kolmogorov-Smrnov max = O E (Petttt and Stephens, 977) Log-Lelhood Rato O = 2 O ln (Wls, 935) = E Freeman-Tuey = 4 (Freeman and Tuey, 95) ( O ) 2 E λ Dvergence wth λ=⅔ 2 O O = (Cresse and Read, 984) λ( λ ) + = E where s the number of cells, O s the observed frequency n cell, E s the expected frequency n cell. Square, the dscrete Kolmogorov-Smrnov, the Log-Lelhood Rato, the Freeman-Tuey and the Dvergence statstc wth λ=⅔ (see Table ). As wth any power study sutable null and alternatve hypotheses need to be defned. oodness-of-ft tests based on multnomal data have been undertaen n a number of currently unpublshed physotherapy projects at Bond Unversty and some of these have been nterested n testng for unformty on the 5-pont Lert scale (that s p =.2 for =, 2, Table 2. Dstrbutons used n the power study 3, 4 and 5). In ths study the Monte Carlo Cell Probabltes smulated power of the test statstcs gven n Dstrbuton Table wll be smulated for a unform null 2 3 4 5 dstrbuton aganst the four alternatve Unform.2.2.2.2.2 dstrbutons gven n Table 2. In Secton 2 the Decreasng.45.9.4.2. calculaton of the smulated power s dscussed and the results of the power studes for each Step.5.5.2.25.25 alternatve dstrbuton are gven n Secton 3. Trangular.27.8..8.27 Concludng comments are made n Secton 4 on Platyurtc.95.27.27.27.95 whch are the most powerful of these fve test statstcs for health and medcne related Lert type data. 2. CALCULATION OF THE SIMULATED POWER For consstency ths power study uses smlar null and alternatve dstrbutons and sample sze to those used by Steele and Chaselng (26) and Steele et al. (28). The major dfference beng that the number of cells s fve as s common n Lert type data. The sample szes are, 2, 3, 5, and 2, representng 2, 4, 6,, 2 and 4 observatons per cell under the unform null dstrbuton. The power of each test statstc s estmated from smulated random samples. In some cases the dscrete nature of the data precludes a crtcal value at the nomnated 5% level of sgnfcance. In such cases lnear nterpolaton s used to enable meanngful power comparson. = 93

Steele et al., Evaluatng the statstcal power of goodness-of-ft tests for health and medcne survey data 3. RESULTS OF THE POWER STUDY 3.. Decreasng Alternatve Hypothess Fgure 2 confrms the common recommendatons made n the power studes dentfed n Secton that Emprcal Dstrbuton Functon (EDF) test statstcs such as the dscrete Kolmogorov- Smrnov are generally more powerful for these trend type alternatve dstrbutons. Although all fve test statstcs are shown to have very hgh power for the larger sample szes there are some other dfferences n power dentfed. It s clear for ths alternatve dstrbuton that the Freeman- Tuey test has relatvely lower power than the other four test statstcs and that the Log- Lelhood Raton test has slghtly lower power than Pearson s Ch-Square and the Dvergence test wth λ=⅔. Based on these results for a 5-pont Lert scale t appears that the dscrete Kolmogorov-Smrnov s more powerful at dentfyng the trend alternatve should one exst. Should a Ch-Square type test statstc be desred, then ether the Dvergence wth λ=⅔ or Pearson s Ch-Square should be used wth hgher power. 3.2. Step alternatve hypothess As was shown to occur wth the decreasng alteratve n Secton 3. the power of the Kolmogorov-Smrnov test statstc s shown n Fgure 3 to be greater than the power of the other four test statstcs for ths step type ncreasng alternatve dstrbuton. Although the powers are calculated for a 5-pont Lert scale t agan agrees wth the general comments about EDF test statstcs when testng a unform null aganst a trend type alternatve dstrbuton. It s nterestng to note that all of the Ch-Square type test statstcs produce approxmately the same power. Such results were not observed n the more comprehensve power study of Ch-Square type test statstcs wth cells by Steele and Chaselng (27). It s mportant to note that even for the sample sze of 5, that s observatons per cell under the unform null dstrbuton, the power of all fve test statstcs s below.35 whch ndcates that a very large sample sze s requred to correctly reject ths unform null n favor of ths step type alternatve when usng a Lert scale. 3.3. Trangular alternatve hypothess For ths partcular trangular alternatve dstrbuton Fgure 4 shows that the powers of all the Ch-Square type test statstcs are generally much hgher than the dscrete Kolmogorov- Smrnov test statstc. As was the case wth the step type alternatve n Secton 3.2 the power of all.8.6.4.2 2 3 5 2 Sample Sze Fgure 2. Smulated power for a unform null aganst a decreasng alternatve dstrbuton.8.6.4.2 2 3 5 2 Sample Sze Fgure 3. Smulated power for a unform null aganst a step alternatve dstrbuton.8.6.4.2 2 3 5 2 Sample Sze Fgure 4. Smulated power for a unform null aganst a trangular alternatve dstrbuton 94

Steele et al., Evaluatng the statstcal power of goodness-of-ft tests for health and medcne survey data the test statstcs were very low even for sample szes as hgh as per cell under the unform null dstrbuton. Clearly a very large sample sze s requred to ncrease to power of these test statstcs for a trangular alternatve dstrbuton when the data are collected on a 5-pont Lert scale 3.4. Platyurtc alternatve hypothess Fgure 5 shows that the powers of all four Ch- Square type test statstcs are relatvely smlar, and qute low for sample szes up to and ncludng 6 per cell under the unform null dstrbuton. The power of the dscrete Kolmogorov-Smrnov only becomes compettve for the very large sample sze of 4 per cell under the unform null dstrbuton..8.6.4.2 2 3 5 2 Sample Sze Fgure 5. Smulated power for a unform null aganst a platyurtc alternatve dstrbuton 4. CONCLUSION AND RECOMMENDATIONS As shown by Steele and Chaselng (26) and Steele et al. (28), albet for a larger number of cells, t s dffcult to mae general recommendatons as to the most powerful goodness-of-ft test statstc for the specfc alternatve dstrbutons used n ths study. ven the wdespread use n the health and medcal survey research communty of Pearson s Ch-Square for Lert type data some comments relatng to power are needed: For sample szes less than sx per cell under the unform null dstrbuton, the smulated powers of all four test statstcs were very poor for all alternatve dstrbutons wth the excepton of the decreasng trend dstrbuton. The smulated power of the Freeman-Tuey test statstc s generally shown to be relatvely less than the power of all the other nvestgated test statstcs. There s generally no mprovement n the smulated power for the Dvergence test statstc wth λ=⅔ over ether Pearson s Ch-Square or the Log-Lelhood Rato test statstcs. REFERENCES Ampadu, C. (28), On the powers of some new Ch-Square type statstcs. Far East Journal of Theoretcal Statstcs, 26, 59-72. Choulaan, V., Lochart, R.A., and Stephens, M.A. (994), Cramér-von Mses statstcs for dscrete dstrbutons. The Canadan Journal of Statstcs, 22, 25 37. Cresse, N., and Read, T.R.C. (984), Multnomal goodness-of-ft tests. Journal of the Royal Statstcal Socety Seres B, 46, 44-464. Freeman, M.F., and Tuey, J.W. (95), Transformatons related to the angular and the square root. Annals of Mathematcal Statstcs, 2, 67-6. From, S.. (996), A new goodness of ft test for the equalty of multnomal cell probabltes verses trend alternatves. Communcatons n Statstcs-Theory and Methods, 25, 367-383. Lert, R. (932), A technque for the measurement of atttudes. Archves of Psychology, 4, -55. Pearson, K. (9), On the crteron that a gven system of devatons from the probable n the case of a correlated system of varables s such that t can be reasonably supposed to have arsen from random samplng. Phlosophcal Magazne Seres 5, 5, 57-75. Petttt, A.N., and Stephens, M.A. (977), The Kolmogorov-Smrnov goodness-of-ft statstc wth dscrete and grouped data. Technometrcs, 9, 25-2. Steele, M., and Chaselng, J. (26), s of dscrete goodness-of-ft test statstcs for a unform null aganst a selecton of alternatve dstrbutons. Communcatons n Statstcs-Smulaton and Computaton, 35, 67-75. Steele, M., Hurst, C., and Chaselng, J. (26), The power of Ch-Square type goodness-of-ft test statstcs. Far East Journal of Theoretcal Statstcs, 26, 9-9. 95

Steele et al., Evaluatng the statstcal power of goodness-of-ft tests for health and medcne survey data Wls, S.S. (935), The lelhood test of ndependence n contngency tables. Annals of Mathematcal Statstcs, 6, 9-96. 96