Automated Test Assembly for COMLEX USA: A SAS Operations Research (SAS/OR) Approach Dr. Hao Song, Senior Director for Psychometrics and Research Dr. Hongwei Patrick Yang, Senior Research Associate
Introduction Automated test assembly (ATA) is the process of automating test form construction through constrained optimization (vs. manual assembly) Improved effectiveness and efficiency for constructing multiple parallel test forms Improved psychometric quality: Increased form comparability and less variation Targeting at population ability to assure more accuracy of pass/fail decisions at the cut score
Introduction In this ATA demonstration, we choose an optimization program PROC OPTMODEL, part of Statistical Analysis Software Operations Research (SAS/OR), as the tool for ATA SAS is the official statistical analysis platform at NBOME SAS is an industry standard product in mathematical and statistical computing Important to operational work related to COMLEX-USA as a high-stakes licensure test designed to protect the public Note: Operations research deals with the application of advanced analytical methods to help make better decisions
Three Fundamental Components In the ATA work, we utilize the technique of mixed/pure integer linear/nonlinear programming. Three fundamental components need to be established: Decision variables Constraints Including both content and psychometrics constraints Objective function(s)
Decision Variables The decision variables we define here are binary variables in the form of 0 s and 1 s indicating the inclusion or exclusion of each item in each test form: x if = 1, if item i is assigned to form f x if = 0, otherwise Here, i = 1, N, f = 1, M, N is the total number of items in the item pool and M is the total number of forms to be assembled
Constraints Constraints are test specifications that need to be met. Typical constraints include: To restrict the test length to be exactly n items for form f N i=1 x if = n To ensure that item i is selected no more than once across all M forms 0 M i=1 x if 1 To limit the number of items on a certain topic (say, OPP items, or set of enemy items, etc.) to be between l and u on any given form Let t be a binary indicator variable with 1 indicating the item falling into the topic and 0 otherwise. Then, l x if t i u N i=1
Objective Function(s) Finally, the objective function is formulated by requiring the test information function (TIF) of each assembled form be as close to the target value as possible at the cut score θ = θ c : N Minimize x I i θ c x if T i θ c i=1 Careful consideration is given to keep examinations comparable over years
Measures of Test Quality Basically, test information function or TIF tells us how well the test is doing in estimating ability over the whole range of ability scores Given ability θ, a higher value in TIF indicates that the test is doing a better job
Data and Constraints Applied In development of the ATA engine, one-level data was used with the target latent ability cut score of θ = θ c Data sources of anchor, operational and pretest items The following criteria are specified as constraints in this ATA demonstration Blueprint Dimension 1 criteria Blueprint Dimension 2 criteria Life stage in Clinical presentations Number of items in a test form etc.
New ATA Forms vs. Previous Forms: TIF by Ability
New ATA Forms vs. Previous Forms The newly assembled ATA forms (in RED) are graphically presented when compared with a set of forms (in BLUE) assembled using the traditional manual assembly method Figure 1 presents an overlay of both groups of graphs by plotting one statistic against a wide range of ability levels across all assembled forms values of test information functions by ability levels from [(-4), (+4)]
New ATA Forms vs. Previous Forms Within each group of graphs, there is very good equivalency among forms The graphs are all closely overlapped with each other The new ATA forms noticeably demonstrate less variability among them around the cut score θ = θ c than do the traditional forms
New ATA Forms vs. Previous Forms Across the two groups of graphs, for a major portion of the continuum, the new ATA forms show relatively high test information function values than those traditional forms
New ATA Forms vs. Previous Forms In sum, the new and the traditional forms are reasonably comparable with each other in terms of equivalency within their respective group The new ATA forms can be better tailored to the candidate ability, around the cut score θ = θ c in particular
Impact Analysis for Classification Accuracy To further evaluate the new ATA forms, we have conducted an impact analysis via a simulation study using the empirical administration data Assuming the same cohort of candidates were to take the newly assembled ATA forms, we would compare their between-year examination scores and pass/fail decisions
Impact Analysis for Classification Accuracy
Impact Analysis for Classification Accuracy Figure 2 plots the newly estimated ability θ values after equating (vertical axis) against their previous estimates (horizontal axis) for two select ATA forms In each plot, the points fall around a 45 reference line, indicating the newly equated ability estimates tend to be identical to their previously obtained values
Impact Analysis for Classification Accuracy Besides, in each scatterplot, almost completely overlapped with the 45 reference line is an ordinary least squares regression line with the equated ability estimates as the dependent variable and the previous estimates as the predictor Additional, convincing evidence supportive of the new, equated ability estimates from the ATA forms
Impact Analysis for Classification Accuracy
Impact Analysis for Classification Accuracy Table 1 provides a cross-tabulation of two sets of classification results from the same classification criterion for measuring ability Based on the actual data from first-time candidates in one recent administration cycle Based on the data simulated from the ATA forms when administered to the same group of candidates above
Impact Analysis for Classification Accuracy Depending on which form it is, the passing rate from ATA ranges from 91.52% to 92.33% across all ATA forms, highly comparable across forms Close to the actual passing rate of 92% Depending on which form it is, the failing rate from ATA ranges from 7.67% to 8.48% across all ATA forms
Impact Analysis for Classification Accuracy As for the sensitivity statistic, its estimate ranges from 97.25% to 98.03% across all ATA forms Definition: Proportion of truly qualified candidates who actually pass the examination As for the specificity statistic, its estimate ranges from 78.55% to 81.80% across all ATA forms Definition: Proportion of truly unqualified candidates who actually fail the examination
Conclusions The ATA approach is preferred over the manual assembly approach because More equivalent with a reduction in the variability among forms over the continuum of candidate ability More rigorous psychometrics and content properties Higher on the test information function along a major portion of the ability continuum More content constraints being factored into form assembly As accurate as traditional forms in terms of scoring and classifying candidates
Conclusions In actual form assembly, we go even further in an effort to keep our strong commitment to the public Numerous communications among the Test Development and the Psychometrics and Research Teams, and external Subject Matter Experts on both content and psychometrics issues To enhance form equivalency to the greatest possible extent Flexibility for adding other content and psychometrics constraints whenever needed Multiple stages of ATA where feedback from item and form review meetings can be factored into each stage
Conclusions This small scale study is based on rigorous mathematics optimization procedures implemented in an industry standard software package and has demonstrated the ATA as one of many ongoing innovations at NBOME A *demonstration* of the ATA work only Not to be viewed as reflecting the full process typically used in a real ATA project at NBOME
References Choe, E. M. & Denbleyker, J. (2014). Quality psychometrics of Common Block Assembly: Summary report. Chicago, IL: National Board of Osteopathic Medical Examiners (NBOME). Crocker, L. & Algina, J. (1986). Introduction to classical and modern test theory. Orlando, FL: Harcourt Brace Jovanovich, Inc. Kalinowski, K. (2015). COMAT form assembly instructions for 2015. Chicago, IL: National Board of Osteopathic Medical Examiners (NBOME). Lathrop, Q. N. (2015). cacirt: Classification Accuracy and Consistency under Item Response Theory. R package version 1.4. http://cran.r-project.org/package=cacirt Linacre, J. M. (2007) How to simulate Rasch data. Rasch Measurement Transactions, 21(3), 1125-1125. Papadimitriou, C. H., & Steiglitz, K. (1998). Combinatorial optimization: Algorithms and complexity. Mineola, NY: Dover Publications. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Danish Institute for Educational Research, Copenhagen. Reif, M. (2014). PP: Estimation of person parameters for the 1,2,3,4-PL model and the GPCM. R package version 0.5.3. https://github.com/manuelreif/pp Rudner, L. M. (2001) Computing the expected proportions of misclassified examinees. Practical Assessment, Research & Evaluation, 7(14), 1 5. Rudner, L. M. (2005) Expected classification accuracy. Practical Assessment Research & Evaluation, 10(13), 1 4. Schrijver, A. (2003). Combinatorial optimization. NYC, NY: Springer. van der Linden, W. J. (2005). Linear models for optimal test design. NYC, NY: Springer. Woo, A., & Gorham, J. L. (2010). Understanding the impact of enemy items on test validity and measurement precision. Journal of Clear Exam Review, 21(1), 15-17.
Feel Free to Follow-Up with Questions! If you have any remaining questions, please do not hesitate to contact Dr. Hao Song or Dr. Hongwei Patrick Yang, via e-mail or phone: E-mail for Dr. Hao Song: HSong@nbome.org Phone number: 773-714-0622 extension 294 E-mail for Dr. Hongwei Patrick Yang: Pyang1@nbome.org Phone number: 773-714-0622 extension 290
And, finally, on behalf of NBOME THANK YOU! 2013 NBOME