Understanding the Dimensionality and Reliability of the Cognitive Scales of the UK Clinical Aptitude test (UKCAT): Summary Version of the Report

Similar documents
UK Clinical Aptitude Test (UKCAT) Consortium UKCAT Examination. Executive Summary Testing Interval: 1 July October 2016

Reliability and interpretation of total scores from multidimensional cognitive measures evaluating the GIK 4-6 using bifactor analysis

Last update to this document: 11/11/14

Last update to this document:

Item response theory analysis of the cognitive ability test in TwinLife

Statistics & Analysis. Confirmatory Factor Analysis and Structural Equation Modeling of Noncognitive Assessments using PROC CALIS

Psychological Assessment

Verbal Comprehension. Perceptual Reasoning. Working Memory

CHAPTER 5 DATA ANALYSIS AND RESULTS

Running head: THE MEANING AND DOING OF MINDFULNESS

Supplementary material

Mastering Modern Psychological Testing Theory & Methods Cecil R. Reynolds Ronald B. Livingston First Edition

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Band Battery

Research Note. Community/Agency Trust: A Measurement Instrument

Scoring Subscales using Multidimensional Item Response Theory Models. Christine E. DeMars. James Madison University

WISC V Construct Validity: Hierarchical EFA with a Large Clinical Sample

Chapter 11. Multiple-Sample SEM. Overview. Rationale of multiple-sample SEM. Multiple-sample path analysis. Multiple-sample CFA.

Which is the best way to measure job performance: Self-perceptions or official supervisor evaluations?

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

Psychometric Issues in Through Course Assessment

ANZMAC 2010 Page 1 of 8. Assessing the Validity of Brand Equity Constructs: A Comparison of Two Approaches

Estimating Reliabilities of

proficiency that the entire response pattern provides, assuming that the model summarizes the data accurately (p. 169).

Gaining Access to Customers Resources Through Relationship Bonds. Roger Baxter, AUT University, Abstract

Conference Presentation

CHAPTER 5 RESULTS AND ANALYSIS

Second Generation, Multidimensional, and Multilevel Item Response Theory Modeling

Effects of Consumer Lifestyles on Purchasing Behavior on the Internet: A Conceptual Framework and Empirical Validation

Reliability & Validity Evidence for PATH

National Council for Strength & Fitness

CHAPTER 4 RESEARCH METHODOLOGY AND METHOD

MODELLING THE 2009 WORK ENVIRONMENT SURVEY RESULTS THE TECHNICAL REPORT APRIL 2010

Bifactor Modeling and the Estimation of Model-Based Reliability in the WAIS-IV

Reliability & Validity

A standardization approach to adjusting pretest item statistics. Shun-Wen Chang National Taiwan Normal University

to be assessed each time a test is administered to a new group of examinees. Index assess dimensionality, and Zwick (1987) applied

Automated Test Assembly for COMLEX USA: A SAS Operations Research (SAS/OR) Approach

Academic Screening Frequently Asked Questions (FAQ)

Bifador Modeling in Construct Validation of Multifactored Tests: Implications for Understanding Multidimensional Constructs and Test Interpretation

Design of Intelligence Test Short Forms

Partial Least Squares Structural Equation Modeling PLS-SEM

MEASUREMENT OF DISCONFIRMATION IN ONLINE PURCHASING BEHAVIOR

Disentangling Prognostic and Predictive Biomarkers Through Mutual Information

How to Get More Value from Your Survey Data

A MULTIVARIATE ANALYSIS TECHNIQUE: STRUCTURAL EQUATION MODELING

Factor Structure of the Differential Ability Scales-Second Edition: Exploratory and. Hierarchical Factor Analyses with the Core Subtests

Conjoint analysis based on Thurstone judgement comparison model in the optimization of banking products

A SCALE TO MEASURE CONSUMERS ENGAGEMENT WITH SOCIAL MEDIA BRAND-RELATED CONTENT

To appear in: Moutinho and Hutcheson: Dictionary of Quantitative Methods in Management. Sage Publications

AFFECTIVE AND CONTINUANCE COMMITMENT IN CALL CENTRES: VALIDATION OF MEYER AND ALLEN QUESTIONNAIRE ABSTRACT

2016 Proceedings of PICMET '16: Technology Management for Social Innovation

Putting Spearman s Hypothesis to Work: Job IQ as a Predictor of Employee Racial Composition

Putting Spearman s Hypothesis to Work: Job IQ as a Predictor of Employee Racial Composition

Perceived Value and Transportation Preferences: A Study of the Ride-Hailing Transportation Sector in Jakarta

Chapter 3. Basic Statistical Concepts: II. Data Preparation and Screening. Overview. Data preparation. Data screening. Score reliability and validity

Against All Odds: Bifactors in EFAs of Big Five Data

Clustering of Quality of Life Items around Latent Variables

Adequacy of Model Fit in Confirmatory Factor Analysis and Structural Equation Models: It Depends on What Software You Use

SEE Evaluation Report September 1, 2017-August 31, 2018

Kristin Gustavson * and Ingrid Borren

Test Development. and. Psychometric Services

Practical Policy Considerations on Internet Test Security and Score Interpretation - Cheating Candidates or Candidates that Cheat?

Technical Report: Does It Matter Which IRT Software You Use? Yes.

Higher-order models versus direct hierarchical models: g as superordinate or breadth factor?

The impact of management development on firm performance: A comparative study of Europe and Asia

A Study of Perceived Value in the Ride-Hailing Transportation Sector in Jakarta

Competence Model of the Teacher in Colleges and Universities ---Based on Research in Hebei Province

Theory and Characteristics

*Javad Rahdarpour Department of Agricultural Management, Zabol Branch, Islamic Azad University, Zabol, Iran *Corresponding author

Replications and Refinements

An Empirical Investigation of Consumer Experience on Online Purchase Intention Bing-sheng YAN 1,a, Li-hua LI 2,b and Ke XU 3,c,*

EFA in a CFA Framework

Complex modeling of factors influencing market success of new product and service developments

Confirmatory Factor Analysis of the TerraNova-Comprehensive Tests of Basic Skills/5. Joseph Stevens. University of New Mexico.

Comparison of human rater and automated scoring of test takers speaking ability and classification using Item Response Theory

The Reliability of a Linear Composite of Nonequivalent Subtests

Yolande E. Chan, Rajiv Sabherwal, and Jason Bennett Thatcher /$ IEEE

Psychometric Properties of the Norwegian Short Version of the Team Climate Inventory (TCI)

CHAPTER 5 DATA ANALYSIS

MARKETING RESEARCH AN APPLIED APPROACH FIFTH EDITION NARESH K. MALHOTRA DANIEL NUNAN DAVID F. BIRKS. W Pearson

The Joint WAIS III and WMS III Factor Structure: Development and Cross-Validation of a Six-Factor Model of Cognitive Functioning

ASSUMPTIONS OF IRT A BRIEF DESCRIPTION OF ITEM RESPONSE THEORY

What is DSC 410/510? DSC 410/510 Multivariate Statistical Methods. What is Multivariate Analysis? Computing. Some Quotes.

Chapter 7. Measurement Models and Confirmatory Factor Analysis. Overview

An examination of the effects of service brand dimensions on customer satisfaction

Transformational and Transactional Leadership in the Indian Context

Multivariate G-Theory and Subscores 1. Investigating the Use of Multivariate Generalizability Theory for Evaluating Subscores.

TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF SYMBOLS AND ABBREVIATIONS

Impact of Business Value Planning on Business-IT Alignment

CHAPTER 4 METHOD. procedures. It also describes the development of the questionnaires, the selection of the

WPE. WebPsychEmpiricist

Glossary of Standardized Testing Terms

Available online Journal of Scientific and Engineering Research, 2017, 4(9): Research Article

A STUDY OF THE RELATIONSHIP BETWEEN INFORMATION TECHNOLOGY, KNOWLEDGE MANAGEMENT AND SUPPLY CHAIN CAPABILITY

Statistical Competencies for Clinical & Translational Science: Evolution in Action

Validation of the Factor Structure of the Interactions with Disabled Persons Scale. Dr. Chris Forlin

What is Multilevel Structural Equation Modelling?

Common Space Analysis of Several Versions of The Wechsler Intelligence Scale For Children

Confirmatory Factor Analysis for Applied Research

Transcription:

Understanding the Dimensionality and Reliability of the Cognitive Scales of the UK Clinical Aptitude test (UKCAT): Summary Version of the Report Dr Paul A. Tiffin, Reader in Psychometric Epidemiology, Durham University December 2013 Background Although used to select medical and dental students since 2006 the structure of the UKCAT has not been published to date. The dimensionality of the UKCAT responses has implications for how the test scores are summarised and used in the selection process. Moreover, a number of independent researchers have questioned the very high (internal) reliability figures cited by the annual technical reports. Understanding the dimensionality of the UKCAT will allow for accurate reliability estimates to be derived using widely accepted methods. Consequently the UKCAT research panel commissioned an item-level analysis involving both exploratory and confirmatory factor analyses in order to elicit the structure of the test and hence establish its reliability. The analysis was conducted in December 2013. Summary of Methods The UKCAT is delivered in a number of versions which use differing permutations of the four subtests (abstract reasoning [AR], decision analysis [DA], quantitative reasoning [QR] and verbal reasoning [VR]. For this study item responses from the 2013 version 1 were analysed from a total of 5,053 testees, randomly divided into 2,488 observations to be used as the training dataset and 2,565 observations to be used as the prediction or confirmatory dataset. A parallel analysis (Horn 1965) adapted for binary response data was used to evaluate the dimensionality of the response data. In order to reduce the risk of such item dependency (which could artefactually inflate reliability coefficients) only one item from each group of questions shown to be related to the same stem was randomly selected for inclusion in the final analysis. The only subtest that this approach could not be used with was decision analysis as all the items used in a particular form of decision analysis (only two forms are used in each sitting) were related to a single stem. Once plausible structures for the response data were established using the parallel analysis and exploratory factor analyses, these were then tested using the confirmatory dataset from version 1 of the UKCAT 2013. Confirmation of the data structure was also conducted in version 6 of the UKCAT test conducted in 2013. Thus confirmation was conducted in

datasets which related to both the same version and a completely different version of the UKCAT. In terms of reliability ; addition to McDonald s omega, a binary form of Cronbach s Alpha was derived from FACTOR (Lorenzo-Seva, 2013) as well as a conventional form of Cronbach s Alpha that was derived from STATA version 13. Test information curves for the subtests of the UKCAT were plotted based on the findings and confirmatory factor analyses of responses to version 6 the test from 2013 utilising a two parameter logistic model (2-PL) using Mplus v7.1. Main findings The findings from the parallel analysis strongly suggested that no more than three distinct factors could be extracted from the response data in the 2,488 who were in the UKCAT 2013 version 1 training dataset. One, two, three and the current four factor models were all fitted to both the confirmatory dataset for version 1 of the 2013 UKCAT (N=2,565) and all the responses to version 6 of the 2013 UKCAT (N=4,042). Fit between the competing models differed relatively little, as estimated by the Tucker-Lewis Index (TLI- regarded as more conservative and trustworthy as it penalises model complexity(hu and Bentler 1999)). A secondary, generalised, factor could also be conceptualised, lying beneath the four factors representing the scales. This hierarchical factor model also showed an acceptable fit to the response data (TLI=0.96). In particularly, it is worth noting that a two factor model with verbal (VR) versus non-verbal (DA, AR and QR-see Figure 1) had a TLI of 0.941 compared to 0.946 for the original, four factor (i.e. four subscale) model. There may be theoretical and empirical reasons why the test should perhaps be regarded as composed of verbal and non-verbal components (Deary et al. 2007).

Figure 1. Path diagram depicting correlations between the factors relating to an alternative postulated two-factor structure for the UKCAT response scores. This model is based on a general factor represented by the DA, AR and QR items and second factor represented by the VR items. Note, for simplicity only a few indicators (shown as squares) are included for each latent variable. Error (residual) variances and loadings are also omitted for clarity. Internal reliability consistency indices were generated for the UKCAT subscales and total scores (in the case of McDonald s omega the latter was generated for G factor scores). Reliability values for the UKCAT cognitive scales were acceptable, though generally lower than those cited in the previous technical reports with omega values of 0.49 (QR) to 0.87 (DA). Cronbach alpha values (binary version) were almost identical (Table 1). In item response theory (IRT) more importance is placed on test information compared to the traditional reliability. Test information is reciprocal to the standard error of measurement (SEM). This is intuitive; the smaller the error the more information is available from a test to discriminate between testees. The test information curves for the UKCAT 2013 test (version 6) are depicted in Figures 2-5. The horizontal axes relate to the level of trait under evaluation (as a z score with mean of zero and sd of 1). It can be seen that the DA and AR subscales yield relatively information on more able candidates- the problem is particularly acute for DA in this form of the test evaluated. In contrast, the information max s out around the average level of ability for the QR and VR subtest.

Subtest Alpha (range) Cronbach s alpha Cronbach s McDonald s (PV Technical (usual) alpha (binary) Omega* Report, 2012) AR 0.86-0.87 0.53 0.65 0.66 DA 0.66-0.67 0.69 0.87 0.87 QR 0.78-0.79 0.38 0.49 0.49 VR 0.74-0.75 0.45 0.62 0.62 All items (i.e. loadings on G factor for Ω h ) NA 0.77 0.89 0.41 Table 1. Internal reliability consistency estimates for the UKCAT scales derived from responses to version 6 of the UKCAT in 2013 (N=4,042). *Omega is generated as Ω t for subtests and Ω h for a general factor; the latter reflecting the internal reliability consistency of a summed score. Figure 2. The test information curve for the abstract reasoning subtest of the UKCAT. The vertical axis represents the amount of information available on each candidate whilst the

Figure 3. The test information curve for the decision analysis subtest of the UKCAT. The vertical axis represents the amount of information available on each candidate whilst the Figure 4. The test information curve for the quantitative reasoning subtest of the UKCAT. The vertical axis represents the amount of information available on each candidate whilst the

Figure 5. The test information curve for the verbal reasoning subtest of the UKCAT. The vertical axis represents the amount of information available on each candidate whilst the Summary The UKCAT responses fit well into a multidimensional model that can be conceptualised as also being related to a general ( G ) factor. This finding lends some support to the use of the total UKCAT score in selection. There may also be a case for distinguishing between the verbal and non-verbal scales of the UKCAT in the selection process (e.g. summing the VR score and the average for QR, AR and DA). The internal reliability consistency of the UKCAT is acceptable, though not generally as high as inferred from indices reported in the Pearson Vue technical reports. This may have been due to the dependency of items nested in the same question stem in the original analyses. The abstract reasoning and decision analysis subtests are mainly composed of items that are relatively easy. This may lead to difficulties reliably distinguishing between more able candidates and these scales may benefit from the inclusion of more difficult items. Given the relatively high degree of correlation between the non-verbal scores of the UKCAT it may be that they are tending to measure the same underlying construct. This may give rise to an opportunity to shorten this aspect of the test, creating space to include novel tests or items tapping into different traits or abilities.

Acknowledgements Many thanks to Dr Brad Wu (Senior Psychometrician at Pearson Vue) for his invaluable help in obtaining the data and meta-data used in the analyses. I am also grateful to the UKCAT Board for agreeing to fund this work. References Deary, I. et al. (2007) Intelligence and educational achievement. Intelligence 35:13-21. Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika 30: 179-185. Hu, L. and P. M. Bentler (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal 6(1): 1-55. Lorenzo-Seva, U. (2013). FACTOR. University of Tarragona. McDonald, R. P. (1978). Generalizability in factorable domains: domain validity and generalizability. Educational and Psychological Measurement 38(1): 75-79. McManus, I.C. et al. (2013) Construct-level predictive validity of educational attainment and intellectual aptitude tests in medical student selection: meta-regression of six UK longitudinal studies. BMC Medicine 11: 243. Schmid, J. and J. N. Leiman (1957). The development of hierarchical factor solutions. Psychometrika 22: 53-61. Wu, B. (2012). Technical Report UK Clinical Aptitude Test (UKCAT) Consortium. Chicago, IL, Pearson VUE. Zinbarg, R. E., et al. (2005). Cronbach's α, Revelle's β, and McDonald's Ω h : Their relations with each other and two alternative conceptualizations of reliability. Psychometrika 70(1): 123-133.