Evaluating the statistical power of goodness-of-fit tests for health and medicine survey data

Similar documents
1 Basic concepts for quantitative policy analysis

Volume 30, Issue 4. Who likes circus animals?

Construction of Control Chart Based on Six Sigma Initiatives for Regression

COMPARISON ANALYSIS AMONG DIFFERENT CALCULATION METHODS FOR THE STATIC STABILITY EVALUATION OF TAILING DAM

Do not turn over until you are told to do so by the Invigilator.

Evaluating The Performance Of Refrigerant Flow Distributors

CONSUMER PRICE INDEX METHODOLOGY (Updated February 2018)

Analyses Based on Combining Similar Information from Multiple Surveys

International Trade and California Employment: Some Statistical Tests

EVALUATING THE PERFORMANCE OF SUPPLY CHAIN SIMULATIONS WITH TRADEOFFS BETWEEN MULITPLE OBJECTIVES. Pattita Suwanruji S. T. Enns

Consumption capability analysis for Micro-blog users based on data mining

Experiments with Protocols for Service Negotiation

A STUDY ON PROPERTIES OF WATER SUBSTITUTE SOLID PHANTOM USING EGS CODE

A SIMULATION STUDY OF QUALITY INDEX IN MACHINE-COMPONF~T GROUPING

FIN DESIGN FOR FIN-AND-TUBE HEAT EXCHANGER WITH MICROGROOVE SMALL DIAMETER TUBES FOR AIR CONDITIONER

A NONPARAMETRIC APPROACH TO SHORT-RUN PRODUCTION ANALYSIS IN A DYNAMIC CONTEXT. Elvira Silva *

emissions in the Indonesian manufacturing sector Rislima F. Sitompul and Anthony D. Owen

6.4 PASSIVE TRACER DISPERSION OVER A REGULAR ARRAY OF CUBES USING CFD SIMULATIONS

Simulation of Steady-State and Dynamic Behaviour of a Plate Heat Exchanger

WISE 2004 Extended Abstract

K vary over their feasible values. This allows

COMPARING TWO NONLINEAR STRUCTURES FOR SECONDARY AIR PROCESS MODELLING

RIGOROUS MODELING OF A HIGH PRESSURE ETHYLENE-VINYL ACETATE (EVA) COPOLYMERIZATION AUTOCLAVE REACTOR. I-Lung Chien, Tze Wei Kan and Bo-Shuo Chen

Extended Abstract for WISE 2005: Workshop on Information Systems and Economics

Identifying Factors that Affect the Downtime of a Production Process

tems are then tested to determne f they are truly dmensonally dstnct from the rest of the test tems. In ths way, DIMTEST can be used to confrm whether

POTENTIAL CUSTOMER TYPOLOGY FOR ENTRANCE ON POWDERY PAINTING MARKET. Vojtěch SPÁČIL

A Group Decision Making Method for Determining the Importance of Customer Needs Based on Customer- Oriented Approach

The link between immigration and trade in Spain

Simulation of the Cooling Circuit with an Electrically Operated Water Pump

Sporlan Valve Company

Comparison of robust M estimator, S estimator & MM estimator with Wiener based denoising filter for gray level image denoising with Gaussian noise

Volume 29, Issue 2. How do firms interpret a job loss? Evidence from the National Longitudinal Survey of Youth

Estimating Peak Load of Distribution Transformers

Market Dynamics and Productivity in Japanese Retail Industry in the late 1990s

Base flow separation using exponential smoothing and its impact on continuous loss estimates

A Longer Tail?: Estimating The Shape of Amazon s Sales Distribution Curve in Erik Brynjolfsson, Yu (Jeffrey) Hu, Michael D.

Numerical Analysis about Urban Climate Change by Urbanization in Shanghai

Evidence of Changes in Preferences Among Beef Cuts Varieties: An Application of Poisson Regressions. Authors

An Empirical Study about the Marketization Degree of Labor Market from the Perspective of Wage Determination Mechanism

Acta Chim. Slov. 1998, 45(2), pp

Fast Algorithm for Prediction of Airfoil Anti-icing Heat Load *

Willingness to Pay for Beef Quality Attributes: Combining Mixed Logit and Latent Segmentation Approach

Regression model for heat consumption monitoring and forecasting

The Spatial Equilibrium Monopoly Models of the Steamcoal Market

Numerical Analysis of Comfort and Energy Performance of Radiant Heat Emission Systems

Thermodynamic Equilibria Modeling of Ternary Systems of Solid Organics in Compressed Carbon Dioxide

An Evaluation of Alternative Cash, Share, and Flexible Leasing Arrangements for South. Carolina Grain Farms * Todd D. Davis **

Benjamin Vivès 1 and Roger N. Jones

Ronald C. Henry Yu-Shuo Chang Clifford H. Spiegelman NRCSE. T e c h n i c a l R e p o r t S e r i e s. NRCSE-TRS No. 071.

Driving Factors of SO 2 Emissions in 13 Cities, Jiangsu, China

Abstract. Radiative Heating - Definition. Introduction. Motivation and Objective

Impact of Internet Technology on Economic Growth in South Asia with Special Reference to Pakistan

SIMULATION RESULTS ON BUFFER ALLOCATION IN A CONTINUOUS FLOW TRANSFER LINE WITH THREE UNRELIABLE MACHINES

MEASURING TECHNICAL EFFICIENCY OF ONION (Allium cepa L.) FARMS IN BANGLADESH M. A. BAREE 1. Abstract

The ranks of Indonesian and Japanese industrial sectors: A further study

Incorporating the Sampling Variability from an Employee Perception Survey into the Ranking Process of U.S. Government Agencies

Development and production of an Aggregated SPPI. Final Technical Implementation Report

Hysteresis in Regional Unemployment Rates in Turkey

International Trade and California s Economy: Summary of the Data

Introducing income distribution to the Linder hypothesis

Bid-Response Models for Customized Pricing

Guidelines on Disclosure of CO 2 Emissions from Transportation & Distribution

RELATIONSHIP BETWEEN BUSINESS STRATEGIES FOLLOWED BY SERVICE ORGANIZATIONS AND THEIR PERFORMANCE MEASUREMENT APPROACH

The Application of Uninorms in Importance-Performance Analysis

Gender Wage Differences in the Czech Public Sector: A Micro-level Case

Implementing Activity-Based Modeling Approach for Analyzing Rail Passengers Travel Behavior

A Comparison of Unconstraining Methods to Improve Revenue Management Systems

CHICKEN AND EGG? INTERPLAY BETWEEN MUSIC BLOG BUZZ AND ALBUM SALES

Viscosities of nickel base super alloys

Do Competing Suppliers Maximize Profits as Theory Suggests? An Empirical Evaluation

Biomass Energy Use, Price Changes and Imperfect Labor Market in Rural China: An Agricultural Household Model-Based Analysis.

Welfare Gains under Tradable CO 2 Permits * Larry Karp and Xuemei Liu

MULTIPLE FACILITY LOCATION ANALYSIS PROBLEM WITH WEIGHTED EUCLIDEAN DISTANCE. Dileep R. Sule and Anuj A. Davalbhakta Louisiana Tech University

Measuring the determinants of pork consumption in Bloemfontein, Central South Africa

Econometric Methods for Estimating ENERGY STAR Impacts in the Commercial Building Sector

Experimental Validation of a Suspension Rig for Analyzing Road-induced Noise

The Employment Effects of Low-Wage Subsidies

Relationship Between the Uncompensated Price-Elasticity and the Income- Elasticity of Demand Under Conditions of Additive Preferences

Journal of Biomedical Science 2009, 16:52

Willingness to Pay for the Quality of Drinking Water

Impact of E-Auctions on Public Procurement Effectiveness

Modified-LOPA; a Pre-Processing Approach for Nuclear Power Plants Safety Assessment

Learning Curve: Analysis of an Agent Pricing Strategy Under Varying Conditions

EVALUATE THE IMPACT OF CONTEMPORARY INFORMATION SYSTEMS AND THE INTERNET ON IMPROVING BANKING PERFORMANCE

On Measuring the Criticality of Various Variables and Processes in Organization Information Systems: Proposed Methodological Procedure

Experimental design methodologies for the identification of Michaelis- Menten type kinetics

RULEBOOK on the manner of determining environmental flow of surface water

FPGA Device and Architecture Evaluation Considering Process Variations

Multi-UAV Task Allocation using Team Theory

Calculation and Prediction of Energy Consumption for Highway Transportation

Wei Zheng College of Science, Hebei North University, Zhangjiakou , Hebei, China

Identifying and Correcting Publication Selection Bias in the Efficiency-Wage Literature: Heckman Meta-Regression

Numerical Flow Analysis of an Axial Flow Pump

Bundling With Customer Self-Selection: A Simple Approach to Bundling Low-Marginal-Cost Goods

San Juan National Forest - American Marten Snow Track Index Evaluation.doc 1

A Scenario-Based Objective Function for an M/M/K Queuing Model with Priority (A Case Study in the Gear Box Production Factory)

An Analysis of Auction Volume and Market Competition for the Coastal Forest Regions in British Columbia

Bulletin of Energy Economics.

Avian Abundance and Habitat Structure

Transcription:

8 th World IMACS / MODSIM Congress, Carns, Australa 3-7 July 29 http://mssanz.org.au/modsm9 Evaluatng the statstcal power of goodness-of-ft tests for health and medcne survey data Steele, M.,2, N. Smart, C. Hurst 3 and J. Chaselng 4 PHCRED, Faculty of Health Scence and Medcne, Bond Unversty 2 Faculty of Busness, Technology and Sustanable Development, Bond Unversty 3 Faculty of Health, Queensland Unversty of Technology 4 Faculty of Health Scence and Medcne, Bond Unversty Emal: msteele@bond.edu.au Abstract: oodness-of-ft test statstcs are wdely used n health and medcne related surveys however lttle regard s usually gven to ther statstcal power. Ths paper nvestgates the smulated power of fve categorcal goodness-of-ft test statstcs used to analyze health and medcne survey data collected on a 5-pont Lert scale. The test statstcs used n ths power study are Pearson s Ch-Square, the Kolmogorov-Smrnov test statstc for dscrete data, the Log-Lelhood Rato, the Freeman-Tuey and the specal case of the Dvergence statstc defned by Cresse and Read (984). Recommendatons based on these smulatons are provded on whch of these Decreasng trend Trangular Step Platyurtc goodness-of-ft test statstcs s the most powerful overall and whch s the most powerful for the Fgure. Type of alternatve dstrbutons used n the predefned unform null aganst the four general power studes. shaped alternatve dstrbutons (see Fgure ) nvestgated n ths paper. Keywords: goodness-of-ft, power, ch-square tests, dscrete 92

Steele et al., Evaluatng the statstcal power of goodness-of-ft tests for health and medcne survey data. INTRODUCTION Although wdely used n the analyss of health and medcne related survey data and other types of categorcal data, only a lmted number of publshed studes are avalable on the power of dscrete goodness-of-ft test statstcs. Those avalable typcally compare the power of the Pearson (9) Ch-Square wth test statstcs based on the emprcal dstrbuton functon (e.g. Choulaan et al., 994, Petttt and Stephens, 977, From, 996, Steele and Chaselng, 26) however the recent papers by Steele et al. (28) and Ampadu (28) do compare several goodness-of-ft test statstcs that are asymptotcally equvalent to the χ 2 dstrbuton. Ths paper consders dscrete dstrbutons on a 5- pont Lert (932) scale as commonly used n health and medcne surveys and compares the Monte Carlo smulated power of Pearson s Ch- Table. Test statstcs used n health and medcne surveys usng Lert data Test Statstc Equaton ( O E ) 2 Pearson s Ch-Square 2 Χ = (Pearson, 9) = E Dscrete Kolmogorov-Smrnov max = O E (Petttt and Stephens, 977) Log-Lelhood Rato O = 2 O ln (Wls, 935) = E Freeman-Tuey = 4 (Freeman and Tuey, 95) ( O ) 2 E λ Dvergence wth λ=⅔ 2 O O = (Cresse and Read, 984) λ( λ ) + = E where s the number of cells, O s the observed frequency n cell, E s the expected frequency n cell. Square, the dscrete Kolmogorov-Smrnov, the Log-Lelhood Rato, the Freeman-Tuey and the Dvergence statstc wth λ=⅔ (see Table ). As wth any power study sutable null and alternatve hypotheses need to be defned. oodness-of-ft tests based on multnomal data have been undertaen n a number of currently unpublshed physotherapy projects at Bond Unversty and some of these have been nterested n testng for unformty on the 5-pont Lert scale (that s p =.2 for =, 2, Table 2. Dstrbutons used n the power study 3, 4 and 5). In ths study the Monte Carlo Cell Probabltes smulated power of the test statstcs gven n Dstrbuton Table wll be smulated for a unform null 2 3 4 5 dstrbuton aganst the four alternatve Unform.2.2.2.2.2 dstrbutons gven n Table 2. In Secton 2 the Decreasng.45.9.4.2. calculaton of the smulated power s dscussed and the results of the power studes for each Step.5.5.2.25.25 alternatve dstrbuton are gven n Secton 3. Trangular.27.8..8.27 Concludng comments are made n Secton 4 on Platyurtc.95.27.27.27.95 whch are the most powerful of these fve test statstcs for health and medcne related Lert type data. 2. CALCULATION OF THE SIMULATED POWER For consstency ths power study uses smlar null and alternatve dstrbutons and sample sze to those used by Steele and Chaselng (26) and Steele et al. (28). The major dfference beng that the number of cells s fve as s common n Lert type data. The sample szes are, 2, 3, 5, and 2, representng 2, 4, 6,, 2 and 4 observatons per cell under the unform null dstrbuton. The power of each test statstc s estmated from smulated random samples. In some cases the dscrete nature of the data precludes a crtcal value at the nomnated 5% level of sgnfcance. In such cases lnear nterpolaton s used to enable meanngful power comparson. = 93

Steele et al., Evaluatng the statstcal power of goodness-of-ft tests for health and medcne survey data 3. RESULTS OF THE POWER STUDY 3.. Decreasng Alternatve Hypothess Fgure 2 confrms the common recommendatons made n the power studes dentfed n Secton that Emprcal Dstrbuton Functon (EDF) test statstcs such as the dscrete Kolmogorov- Smrnov are generally more powerful for these trend type alternatve dstrbutons. Although all fve test statstcs are shown to have very hgh power for the larger sample szes there are some other dfferences n power dentfed. It s clear for ths alternatve dstrbuton that the Freeman- Tuey test has relatvely lower power than the other four test statstcs and that the Log- Lelhood Raton test has slghtly lower power than Pearson s Ch-Square and the Dvergence test wth λ=⅔. Based on these results for a 5-pont Lert scale t appears that the dscrete Kolmogorov-Smrnov s more powerful at dentfyng the trend alternatve should one exst. Should a Ch-Square type test statstc be desred, then ether the Dvergence wth λ=⅔ or Pearson s Ch-Square should be used wth hgher power. 3.2. Step alternatve hypothess As was shown to occur wth the decreasng alteratve n Secton 3. the power of the Kolmogorov-Smrnov test statstc s shown n Fgure 3 to be greater than the power of the other four test statstcs for ths step type ncreasng alternatve dstrbuton. Although the powers are calculated for a 5-pont Lert scale t agan agrees wth the general comments about EDF test statstcs when testng a unform null aganst a trend type alternatve dstrbuton. It s nterestng to note that all of the Ch-Square type test statstcs produce approxmately the same power. Such results were not observed n the more comprehensve power study of Ch-Square type test statstcs wth cells by Steele and Chaselng (27). It s mportant to note that even for the sample sze of 5, that s observatons per cell under the unform null dstrbuton, the power of all fve test statstcs s below.35 whch ndcates that a very large sample sze s requred to correctly reject ths unform null n favor of ths step type alternatve when usng a Lert scale. 3.3. Trangular alternatve hypothess For ths partcular trangular alternatve dstrbuton Fgure 4 shows that the powers of all the Ch-Square type test statstcs are generally much hgher than the dscrete Kolmogorov- Smrnov test statstc. As was the case wth the step type alternatve n Secton 3.2 the power of all.8.6.4.2 2 3 5 2 Sample Sze Fgure 2. Smulated power for a unform null aganst a decreasng alternatve dstrbuton.8.6.4.2 2 3 5 2 Sample Sze Fgure 3. Smulated power for a unform null aganst a step alternatve dstrbuton.8.6.4.2 2 3 5 2 Sample Sze Fgure 4. Smulated power for a unform null aganst a trangular alternatve dstrbuton 94

Steele et al., Evaluatng the statstcal power of goodness-of-ft tests for health and medcne survey data the test statstcs were very low even for sample szes as hgh as per cell under the unform null dstrbuton. Clearly a very large sample sze s requred to ncrease to power of these test statstcs for a trangular alternatve dstrbuton when the data are collected on a 5-pont Lert scale 3.4. Platyurtc alternatve hypothess Fgure 5 shows that the powers of all four Ch- Square type test statstcs are relatvely smlar, and qute low for sample szes up to and ncludng 6 per cell under the unform null dstrbuton. The power of the dscrete Kolmogorov-Smrnov only becomes compettve for the very large sample sze of 4 per cell under the unform null dstrbuton..8.6.4.2 2 3 5 2 Sample Sze Fgure 5. Smulated power for a unform null aganst a platyurtc alternatve dstrbuton 4. CONCLUSION AND RECOMMENDATIONS As shown by Steele and Chaselng (26) and Steele et al. (28), albet for a larger number of cells, t s dffcult to mae general recommendatons as to the most powerful goodness-of-ft test statstc for the specfc alternatve dstrbutons used n ths study. ven the wdespread use n the health and medcal survey research communty of Pearson s Ch-Square for Lert type data some comments relatng to power are needed: For sample szes less than sx per cell under the unform null dstrbuton, the smulated powers of all four test statstcs were very poor for all alternatve dstrbutons wth the excepton of the decreasng trend dstrbuton. The smulated power of the Freeman-Tuey test statstc s generally shown to be relatvely less than the power of all the other nvestgated test statstcs. There s generally no mprovement n the smulated power for the Dvergence test statstc wth λ=⅔ over ether Pearson s Ch-Square or the Log-Lelhood Rato test statstcs. REFERENCES Ampadu, C. (28), On the powers of some new Ch-Square type statstcs. Far East Journal of Theoretcal Statstcs, 26, 59-72. Choulaan, V., Lochart, R.A., and Stephens, M.A. (994), Cramér-von Mses statstcs for dscrete dstrbutons. The Canadan Journal of Statstcs, 22, 25 37. Cresse, N., and Read, T.R.C. (984), Multnomal goodness-of-ft tests. Journal of the Royal Statstcal Socety Seres B, 46, 44-464. Freeman, M.F., and Tuey, J.W. (95), Transformatons related to the angular and the square root. Annals of Mathematcal Statstcs, 2, 67-6. From, S.. (996), A new goodness of ft test for the equalty of multnomal cell probabltes verses trend alternatves. Communcatons n Statstcs-Theory and Methods, 25, 367-383. Lert, R. (932), A technque for the measurement of atttudes. Archves of Psychology, 4, -55. Pearson, K. (9), On the crteron that a gven system of devatons from the probable n the case of a correlated system of varables s such that t can be reasonably supposed to have arsen from random samplng. Phlosophcal Magazne Seres 5, 5, 57-75. Petttt, A.N., and Stephens, M.A. (977), The Kolmogorov-Smrnov goodness-of-ft statstc wth dscrete and grouped data. Technometrcs, 9, 25-2. Steele, M., and Chaselng, J. (26), s of dscrete goodness-of-ft test statstcs for a unform null aganst a selecton of alternatve dstrbutons. Communcatons n Statstcs-Smulaton and Computaton, 35, 67-75. Steele, M., Hurst, C., and Chaselng, J. (26), The power of Ch-Square type goodness-of-ft test statstcs. Far East Journal of Theoretcal Statstcs, 26, 9-9. 95

Steele et al., Evaluatng the statstcal power of goodness-of-ft tests for health and medcne survey data Wls, S.S. (935), The lelhood test of ndependence n contngency tables. Annals of Mathematcal Statstcs, 6, 9-96. 96