THE EFFECTIVESS OF COMPUTERIZED ADAPTIVE TESTING ON ADVANCED PROGRESSIVE MATRICES TEST PERFORMANCE 1
|
|
- Kerry Wilkinson
- 6 years ago
- Views:
Transcription
1 THE EFFECTIVESS OF COMPUTERIZED ADAPTIVE TESTING ON ADVANCED PROGRESSIVE MATRICES TEST PERFORMANCE 1 Aries Yulianto Faculty of Psychology, University of Indonesia Abstract Although computers are still rarely used for test administration in Indonesia, there is a big opportunity to develop it. This experiment was carried out to measure the effectiveness of computerized test administration, especially computerized adaptive test (CAT). Two weeks before experiment, subjects had taken Advanced Progressive Matrices (APM) test in paper-pencil test (PPT) form. The subjects were randomly assigned into six experimental groups to take the same test in classical computerized test (CT) or CAT form with test taking time limit variations of 25 minutes, 50 minutes, or no time limit. Test scores were estimated using maximum likelihood model. Based on Embretson and Reise (2000) findings, items with b between -0.5 to 0.5 are chosen as the first items administered through CAT. The next items are chosen based on the maximum information criterion. Test administration stops if standard error of the score is smaller or equal with 0.4. There was no significant difference between CAT and PPT scores, but there was significant difference between CT and PPT scores. This research found that CAT is effective, because consumed less time and administered lesser items (12 as average) than CT and PPT (total of 36 items). Keywords: Computerized Adaptive Testing, Progressive Matrices, Paper-Pencil Test. INTRODUCTION Psychological test in widely use in Indonesia, from diagnosis to selection purpose, from academic to industrial setting. It can be said that psychological test as ultimate aim to select people, with main objective to place the right person in the right place. Most of the test delivered with paper-pencil administration. Only small number of tests administered as performance test. As a result, there are required some time to administering, scoring, and reporting test result. This will be a heavier job for tester if it includes a huge numbers of examinees. Unfortunately, fast reporting became a major objective for almost testing situations. Another problem rise along as using the same test over few decades. Most of tests were lack of security, so their reliability and validity need to be questioned. On the other side, computers increasingly have been used for many purposes and setting recently. Government and individuals promote computer use on most aspect. Unfortunately, using computer as a method of test administration was not a major attention. Some softwares were built to help testers scoring and reporting test result. But tester still administer test with paper-pencil. Most of non verbal test were used as tolls of assessment, such as Raven s Progressive Matrices (PM), General Intelligence Test subtest 5 (TIU-5), Culture-Fair Intelligence Test (CFIT), and Figure Reasoning Test (FRT). McAulay, Deary, Ferguson, and M. Frier, (2001) found that non verbal ability reflected adaptive ability or problem solving than verbal ability. Verbal test was considered disadvantage for some groups of 1 Paper presented at International Meeting on Psychometric Society (IMPS) 2007, Tokyo, Japan, 9-13 July, 2007.
2 The Effectiveness of Computerized Adaptive Testing on Advanced Progressive Matrices examinees, such as people with hearing disability, verbal disability, vision disability, mental retarded, or children with severe emotional disturbance (Bracken & McCallum, in Fives & Flanagan, 2002). Among nonverbal ability test, PM test was one of frequently use nonverbal ability test (Murphy & Davidshofer, 2001). PM test was constructed based on Spearman s g factor intelligence theory. PM test was widely use in basic research and intellectual screening (Gregory, 2000). As a culture-fair test, PM also use in general cognitive ability research to compare intellectual ability across nation, race, or majority-minority groups. Ackerman (2000) was use test PM to find a major factor in adult s intelligence. In neuropsychological setting, PM was use to know brain damage patient s intellectual ability (Caffarra, Vezzadini, Zonato, Copelli, & Venneri, 2003). Over the past 3 decades in U.S., computers increasingly have been used to automate the administration, scoring, and interpretation of results from a wide variety of psychological measures, including assessment of ability and academic achievement (Brown & Weiss, 1977), neuropsychological status (e.g., Jenskins, Fitzpatrick, Garrat, Peto, & Steward- Brown, 2001) vocational interests, and personality (e.g., Butcher, Perry, & Atlis, 2000; Simms & Clark, 2005). Computers provide an objective, efficient, and reliable means for delivering assessment services to clients and research participants. A concern in both research and clinical settings is the length of many personality measures. For instance, an hour or longer often is required to complete such measures as the 567-item MMPI 2, the 344-item Personality Assessment Inventory, or the 240-item NEO Personality Inventory Revised (NEO-PI R). The time required for such assessments are difficult to accommodate in many applied and research settings. Managed care companies have limited the types of assessments for which they will reimburse practitioners to those that require less time and effort to administer, score, and interpret. Research time also is scarce and costly. Moreover, long measures can lead to fatigue and drifting attention for many test takers, which ultimately compromise the validity of the test profile and complicate test interpretation. Along with developmental technology, shifting from paper-pencil administration to use computer to administer test was start in 1970 (Bunderson, Inouye, & Olsen, 1989). This was the first generation of computerized test. Computer was use to deliver item as in the paper-pencil test. It was give some of advantages, such as fast scoring, immediate reporting, better standardization of test administration, increasing test security, and reduce measurement error. Combine with Item Response Theory (IRT), computer deliver item that suitable to examinee s ability. As a result, each examinee will get different set of items from other examinee. This second generation use of computer administration known as computerized adaptive testing. Computerized Adaptive Testing In the most basic sense, Computerized Adaptive Testing (CAT) permits the selection and administration of items that are individually tailored to the trait or ability level of the examinee, with the potential of substantial item and time savings (Embretson & Reise, 2000). A typical CAT selects and administers only those items that provide the most psychometric information (i.e., yield the lowest standard errors of measurement) at a given trait level. For example, IRT and CAT have been shown to offer noteworthy solutions to the challenge of constructing patient-based health status measures that are 2
3 Aries Yulianto both more practical and more reliable over a wide range of score levels (Ware, Gandek, Sinclair, & Bjorner, 2005). Figure 1 showed scheme of CAT administration. Start with estimate ability level Select and delivered an optimum item Evaluate response No Stopping rule satisfied? Re-estimate ability and standard error Yes End of Test STOP Figure 1. Scheme of CAT Consideration in CAT administration Embretson and Reise (2000) state some consideration in CAT administration, there are: Item bank. The Basic goal of CAT is to administer a set of items that are in some sense maximally efficient and informative for each examinee. Because of the primary importance of the item bank in determining the efficiency of CAT, much thought and research has gone into issues involving the creation and maintenance of item banks. No precise number can be given regarding how many items this requires, but a rough estimate is around 100. Items in bank should be calibrated with one of item parameter model estimation, I PL, 2 PL, or 3 PL. Administer the first item. If it can be assumed that the examinee population is normally distributed, then a reasonable choice for starting a CAT is with an item of moderate difficulty, such as one with a difficulty parameter between -.5 and.5. If some prior information is available regarding the examinee s position on the trait continuum, then such information might be used in selecting the difficulty level of the first item. Average θ from examinees population could be used as ability estimation, to make an optimum CAT (Thissen & Mislevy, 1990). Some testers like to begin their CAT with an easy item so that the examinee has a success experience which may, in turn, alleviate some problems such as test anxiety (Embretson & Reiss, 2000). Score examinee s ability. There are three main methods for estimating an examinee s ability: (a) Maximum Likelihood (ML), (b) Maximum a Posteriori (MAP), and (c) Expected a Posteriori (EAP). Some researchers do not endorse the use of priors because they potentially affect scores. For example, if few items are administered, then ability level estimates may be pulled toward the mean of the prior distribution. For this reason, some researchers have implemented a step-size procedure to assigning scores at beginning of a CAT. 3
4 The Effectiveness of Computerized Adaptive Testing on Advanced Progressive Matrices Select the next item. Two strategies can be used to select the next item, maximum information and minimum expected posterior standard deviation. Thiessen and Mislevy, (1990) called the latter strategy as Bayesian estimation. Maximum information strategy select item that provides the most psychometric information at the examinee s current estimated ability level. This strategy usually corresponds to ML scoring. Second strategy is to select the item that minimizes the examinee s expected posterior standard deviation. That is, select the item that makes the examinee s standard error the smallest. This typically corresponds to the Bayesian scoring procedures and does not always yield the same results as the maximum information strategy. Test termination. In CAT, after every item response an examinee s trait level and standard error is re-estimated and the computer selects the next item to administer. But this can t go forever, and the CAT algorithm needs to have a stopping rule. There are four stopping rules: (1) variable length, (2) fixed length, (3) variable-fixed length, and (4) time- limit. In variable length rule, test will terminate if standard error is below some acceptable value. Thissen and Mislevy (1990) called it as target strategy. It advantages is appropriate with classical theory that equal measurement error variance assumed and suitable for some statistical analyses which considering measurement error. Standard error (S.E.) limitation varied among researchers. In his research, Ury using S.E. equal or smaller than.3162, because it will get same result as the classical reliability coefficient.90 (Thissen and Mislevy, 1990). In another research, Hornke (2000) using.38 as SE limitation. Blais and Raiche (2002) found from their simulation, that if SE is equal or smaller than.40, SE of ability estimate will differ only.03 than the previous estimation. Second test termination strategy, fixed length, depends on amount of items delivered. Thissen and Mislevy (2002) called this strategy as maximum number of items. The advantages are that easy to do and item utilizing could be predicted. These two strategies can be combined, as the third strategy, if running out of items will be possible if precision target won t reach. Thissen and Mislevy (1990) suggested forth strategy, test will be terminated after a specific time. This strategy will give an advantage for speed test, but not for power test. Embretson and Mislevy (2000) recommend SE as an effective strategy for test termination, since it use CAT s algorithm. The basic objective of this study is to prove that CAT administration deliver test more efficient than conventional administrations, paper-pencil test and classical computerized test. To address this objective, two independent variables were involved, test administration type and work-time limitation. The research problem is, are test administration type and work-time limitation influence test performance of APM? Method Participant First, 298 undergraduate students of Faculty of Psychology, University of Indonesia, had taken 36 items of APM in paper-pencil form. Two weeks later, onehundred and twenty students who joined voluntary women and 8 men take the experiment in faculty s computer laboratory. 4
5 Aries Yulianto Design The experiment followed a 2 (test administration type: classical computerized test/computerized adaptive test) x 3 (time limit: 25 minutes/50 minutes/no limitation) randomized factorial between subjects design. The subjects were randomly assigned into six experimental groups to take the APM test in classical computerized test (CT) or computerized adaptive test (CAT) form with test taking time limit variations of 25 minutes, 50 minutes, or no time limit. Procedure This research involved test performance as dependent variable and 2 independent variables, test administration type and work time limitation. Manipulation APM test delivered using Fastest Pro 1.6 trial version software (available at which have two options to deliver test, classical or adaptive. The software also has a feature to control time limitation. Test administration set to 3 variation, 25 minutes (same as paper-pencil administration), 50 minutes (twice as paperpencil administration), and no time limitation. Unlike in paper-pencil administration, test instruction in computerized administration presented individually at computer monitor. There are some strategies for CAT administration: Item bank. One parameter logistic (1 PL) or Rasch model estimated using ACER- QUEST. For this purpose, previously available test data from 1216 subjects were added. As a result, all of 36 items were considered fit and used as item bank. Difficulty parameter varied from to Administer the first item. Randomly select item with difficulty parameter between -.5 and.5 because subjects trait level ability assumed to be distributed normally. Score examinee s ability. Subjects trait level ability was estimated using ML. Select the next item. Item with maximum information at current subject s ability estimate were select to be delivered to subject. With maximum item information method, CAT administration will delivered more effectively (Embretson & Reise, 2007). Test termination. Variable length criteria were used to terminate the test. Using Blais and Raiche (2002) recommendation, test will terminate if S.E. is.40 or below. Dependent Measure Test score or subjects trait level ability (θ) was estimated using maximum likelihood (ML). Although no ML estimate can be obtained from perfect all endorsed or not endorsed response, the ML trait level estimator has several positive asymptotic features (Embretson and Reise, 2000), such as: not biased (the expected value of θ always equals to the true θ), an efficient estimator, and its error are normally distributed. Statistical Analyses To compare subjects estimate θ from two test administration, paper-pencil and computer administration (CT or CAT), paired-sample t-test was used. Factorial analysis of variance was used to know main effect from each IV, test administration type and work time limitation, and also IV s interaction effect. With level of significance 0.05, data were computed with SPSS. 5
6 The Effectiveness of Computerized Adaptive Testing on Advanced Progressive Matrices Results Means, standard deviations, and minimum-maximum subject s estimate ability level (θ) for each experiment groups are shown in table 1. Although CAT administration had smaller mean than CT, there are no differences (F =.721, p >.05). In other words, subject used in this experiment were had equal abstract reasoning ability as measure by APM test. Subject s θ in paper-pencil test administration were significantly differ than when they administered with CT administration (t = 3.479, p<.01). In fact, CT administration (M =.4879) was lower than paper-pencil administration (M =.6737). This is not happen in comparison between paper-pencil and CAT administration subject s θ. Although θ in CAT administration (M =.5059) lower than paper-pencil administration (M =.5469), it found no difference between this two score (t =.547, p>.05). From these result, CAT had an advantage than CT administration, that is make θ estimate close to true θ (assumed that paper-pencil administration were equally to true θ). But when subject s θ in CT administration groups were compare to subject s θ in CAT administration groups, it found no difference (F = 2.202, p>.05). This result wasn t consistent with the previous result. Table 1. Means, Standard Deviations, and min. max. θ scores Test Administration CT CAT Total ( ) ( ) ( ) Time limit ( ) ( ) ( ) No limitation ( ) ( ) ( ) Total ( ) ( ) ( ) Note: bold numbers are mean, italic numbers are standard deviation, and numbers in parenthesis are minimum-maximum test scores. From comparisons between two types of test administrations for each time limitation, there were found similar results. In time limit of 25 minutes, there is no difference between CAT and CT administration in θ estimate (F =.035, p>.05). Although CAT θ estimation higher than CT s, there is no significant difference for 50 minutes time limitation (F = 1.748, p>.05). Similar comparison result also found in groups with no time limitation treatment (F = 1.339, p>.05). For comparison of three time limitations for CT administration, there was no significant difference in estimating θ (F =.160, p>.05). Similar result also found for CAT administration (F =.408, p>.05). 6
7 Aries Yulianto Estimated Marginal Means of θ 0.70 Time limit 25 minutes minutes No limitation CT CA Test administration Figure 1. mean plot for interaction effect One of purpose of this experiment was to prove that CAT is more efficient than other type of test administration. Efficiency is evaluated by amount of time to spend for administering the test. Time to administer the test depend on amount of item to be administered; lesser item to administer, lesser time to spend. Table 2 showed average of item for each treatments condition. From this table, it showed that on every time limitation condition, CAT administration delivered lesser item (12 items as average) than CT (34 items as average). There is also significantly differing on item delivery from two type of test administration. So, we can conclude that CAT is more efficient than CT because it deliver lesser item, but with no difference in subject s ability estimate. Table 2. Item average for each treatments condition Time limitation No limitation Total Administration type CT CAT Total
8 The Effectiveness of Computerized Adaptive Testing on Advanced Progressive Matrices Discussion This experiment proves that CAT is more efficient method to deliver test than classical method (e.g., paper-pencil test administration and classical computerized test). This result consistent with argument from Embretson and Reise (2000) that IRT-based CAT consist lesser items than conventional or paper-pencil test. One thing need to explore further more is about psychological factor contribution to test performance, especially in computerized test. As said earlier, examinees in Indonesia usually take paper-pencil test administration. Then, in computerized test administration setting, there will be a difference performance than in paper-pencil test. Tonidandel, Quinones, and Adams (2002) found that test anxiety negatively correlated with test performance. It support earlier finding by Wise (1997b), that anxiety increasing during test will decrease test performance. It would be happened because computerized testing was unfamiliar (Wise, 1997a). Since subjects in this experiment were all college students, who familiar with computer, I assumed that there was no or little test anxiety as a result from computer administration. As a consequence, this research finding shouldn t generalize to other population than people who weren t familiar with computer. There should be another research to consider psychological factor effect in test performance. References Blais, J., & Raiche, G. (2002). Some Features of the sampling distribution of the ability estimate in computerized adaptive testing according to two stopping rules. Paper presented at 11 th International Objective Measurement Workshop, New Orleans, April 2002 (unpublished). Brown, J.L., & Weiss, D.J. (1977) An Adaptive Testing Strategy for Achievement Test Batteries. Bunderson, C.V., Inouye, D. K., & Olsen, J.B. The Four Generations of Computerized Educational Measurement. Dalam Robert L. Linn. Educational Measurement. 3 rd ed. New York: American Council on Education & Macmillan Publishing Company. Butcher, J.M., Perry, J.L., Atlis, M.M. (2000) Validity and Utility of Computer Based Test Interpretation. Psychological Assessment, Vol. 12, no. 1. Caffarra, P., Vezzadini, G., Zonato, F., Copelli, S., & Venneri, A. (2003). A normative study of a shorter version of Raven s progressive matrices Neurol Sci. 24: Embretson, S.E, & Reise, S.P. (2000). Item Response Theory for Psychologist. New Jersey: Lawrence Erlbaum Associates, Inc. Fives, C.J., & Flanagan, R. (2002). A Review of the Universal Nonverbal Inteligence Test (UNIT): An Advances for Evaluating Youngsters with Diverse Needs. School Psychology International. Vol. 23 (4): Gregory, R.J. (2000). Psychological Testing: History, Principles, and Applications. 3 rd ed. MA: Allyn & Bacon. Hornke, L.F. (2000). Item Response Times in Computerized Adaptive Testing. Psicolόgica. 21,
9 Aries Yulianto Jenskins, C.; Fitzpatrick, R.; Garrat, A.; Peto, V.; & Steward-Brown, S. (2001). Can Item Response Theory Reduce Patient Burden when Measuring Health Status in Neurological Status? Journal of Neurology, vol. 71, no. 2. McAulay, V., Deary, I.J., Ferguson, S.C., & Frier, B.M. (2001). Acute Hypoglycemia in Humans Causes Attentional Dysfunction While Nonverbal Intelligence is Preserved. Diabetes Care; Oct 2001; 24, 10; ProQuest Medical Library, p Murphy, K.R., & Davidshofer, K.O. (2001). Psychological Testing: Principles and Applications. 5 th ed. New Jersey: Prentice-Hall, Inc. Simms, L.J., & Clark, L.A. Validation of a Computerized Adaptive Version of Schedule of Nonadaptive and Adaptive Personality (SNAP). Psychological Assessment, vol. 17, no. 1, Thissen, D., & Mislevy, R. J. (1990). Testing Algorithms. In H. Wainer, N.J. Dorans, R. Flugher, & B.F. Green, Computerized Adaptive Testing: a Primer. New Jersey: Lawrance Erlbaum Associates, Publishers. Tonidandel, S., Quinones, M.A., & Adams, A.A. (2002). Computer-Adaptive Testing: The Impact of Test Characteristics on Perceived Performance and Test Taker s Performance. Journal of Applied Psychology, Vol. 87, No. 2, Ware, J.E. Jr., Gandek, B., Sinclair, S. J., & Bjorner, J.B. (2005). Item Response Theory and Computerized Adaptive Testing: Implications for Outcomes Measurement in Rehabilitation. Rehabilitation Psychology. 50, 1, Wise, S.L. (1997a). Examinee Issues in CAT. Paper presented in the Annual Meeting of the National Council on Measurement in Education. (Unpublished) Wise, S.L. (1997b). Overview of Practical Issues in a CAT Program. Paper presented in the Annual Meeting of the National Council on Measurement in Education. (Unpublished). 9
Designing item pools to optimize the functioning of a computerized adaptive test
Psychological Test and Assessment Modeling, Volume 52, 2 (2), 27-4 Designing item pools to optimize the functioning of a computerized adaptive test Mark D. Reckase Abstract Computerized adaptive testing
More informationAn Automatic Online Calibration Design in Adaptive Testing 1. Guido Makransky 2. Master Management International A/S and University of Twente
Automatic Online Calibration1 An Automatic Online Calibration Design in Adaptive Testing 1 Guido Makransky 2 Master Management International A/S and University of Twente Cees. A. W. Glas University of
More informationItem response theory analysis of the cognitive ability test in TwinLife
TwinLife Working Paper Series No. 02, May 2018 Item response theory analysis of the cognitive ability test in TwinLife by Sarah Carroll 1, 2 & Eric Turkheimer 1 1 Department of Psychology, University of
More informationA standardization approach to adjusting pretest item statistics. Shun-Wen Chang National Taiwan Normal University
A standardization approach to adjusting pretest item statistics Shun-Wen Chang National Taiwan Normal University Bradley A. Hanson and Deborah J. Harris ACT, Inc. Paper presented at the annual meeting
More informationScoring Subscales using Multidimensional Item Response Theory Models. Christine E. DeMars. James Madison University
Scoring Subscales 1 RUNNING HEAD: Multidimensional Item Response Theory Scoring Subscales using Multidimensional Item Response Theory Models Christine E. DeMars James Madison University Author Note Christine
More informationPotential Impact of Item Parameter Drift Due to Practice and Curriculum Change on Item Calibration in Computerized Adaptive Testing
Potential Impact of Item Parameter Drift Due to Practice and Curriculum Change on Item Calibration in Computerized Adaptive Testing Kyung T. Han & Fanmin Guo GMAC Research Reports RR-11-02 January 1, 2011
More informationTest-Free Person Measurement with the Rasch Simple Logistic Model
Test-Free Person Measurement with the Rasch Simple Logistic Model Howard E. A. Tinsley Southern Illinois University at Carbondale René V. Dawis University of Minnesota This research investigated the use
More informationEstimating Reliabilities of
Estimating Reliabilities of Computerized Adaptive Tests D. R. Divgi Center for Naval Analyses This paper presents two methods for estimating the reliability of a computerized adaptive test (CAT) without
More informationChapter 3 Norms and Reliability
Chapter 3 Norms and Reliability - This chapter concerns two basic concepts: o Norms o Reliability - Scores on psychological tests are interpreted by reference to norms that are based on the distribution
More informationAssessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT
University of Iowa Iowa Research Online Theses and Dissertations Summer 2011 Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT Benjamin
More informationAssessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT
University of Iowa Iowa Research Online Theses and Dissertations Summer 2011 Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT Benjamin
More informationRIST-2 Score Report. by Cecil R. Reynolds, PhD, and Randy W. Kamphaus, PhD
RIST-2 Score Report by Cecil R. Reynolds, PhD, and Randy W. Kamphaus, PhD Client name: Sample Client Client ID: SC Gender: Female Age: 23 : 0 Ethnicity: Asian/Pacific Islander Test date: 02/29/2016 Date
More informationWPE. WebPsychEmpiricist
Taylor, N. (2008, June 30). Raven s Standard and Advanced Progressive Matrices among Adults in South Africa. WebPsychEmpiricist. Retrieved June 30, 2008 from http://wpe.info/papers_table.html. WPE WebPsychEmpiricist
More informationTechnical Report: Does It Matter Which IRT Software You Use? Yes.
R Technical Report: Does It Matter Which IRT Software You Use? Yes. Joy Wang University of Minnesota 1/21/2018 Abstract It is undeniable that psychometrics, like many tech-based industries, is moving in
More informationConjoint analysis based on Thurstone judgement comparison model in the optimization of banking products
Conjoint analysis based on Thurstone judgement comparison model in the optimization of banking products Adam Sagan 1, Aneta Rybicka, Justyna Brzezińska 3 Abstract Conjoint measurement, as well as conjoint
More informationInvestigating Common-Item Screening Procedures in Developing a Vertical Scale
Investigating Common-Item Screening Procedures in Developing a Vertical Scale Annual Meeting of the National Council of Educational Measurement New Orleans, LA Marc Johnson Qing Yi April 011 COMMON-ITEM
More informationEstimating Standard Errors of Irtparameters of Mathematics Achievement Test Using Three Parameter Model
IOSR Journal of Research & Method in Education (IOSR-JRME) e- ISSN: 2320 7388,p-ISSN: 2320 737X Volume 8, Issue 2 Ver. VI (Mar. Apr. 2018), PP 01-07 www.iosrjournals.org Estimating Standard Errors of Irtparameters
More informationLongitudinal Effects of Item Parameter Drift. James A. Wollack Hyun Jung Sung Taehoon Kang
Longitudinal Effects of Item Parameter Drift James A. Wollack Hyun Jung Sung Taehoon Kang University of Wisconsin Madison 1025 W. Johnson St., #373 Madison, WI 53706 April 12, 2005 Paper presented at the
More informationITEM RESPONSE THEORY FOR WEIGHTED SUMMED SCORES. Brian Dale Stucky
ITEM RESPONSE THEORY FOR WEIGHTED SUMMED SCORES Brian Dale Stucky A thesis submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the
More informationSetting Standards. John Norcini, Ph.D.
Setting Standards John Norcini, Ph.D. jnorcini@faimer.org Overview Scores and standards Definitions and types Characteristics of a credible standard Who sets the standards, what are the characteristics
More informationGlossary of Standardized Testing Terms https://www.ets.org/understanding_testing/glossary/
Glossary of Standardized Testing Terms https://www.ets.org/understanding_testing/glossary/ a parameter In item response theory (IRT), the a parameter is a number that indicates the discrimination of a
More informationOverview of WASI-II (published 2011) Gloria Maccow, Ph.D. Assessment Training Consultant
Overview of WASI-II (published 2011) Gloria Maccow, Ph.D. Assessment Training Consultant Objectives Describe components of WASI-II. Describe WASI-II subtests. Describe utility of data from WASI- II. 2
More informationPackage subscore. R topics documented:
Package subscore December 3, 2016 Title Computing Subscores in Classical Test Theory and Item Response Theory Version 2.0 Author Shenghai Dai [aut, cre], Xiaolin Wang [aut], Dubravka Svetina [aut] Maintainer
More informationproficiency that the entire response pattern provides, assuming that the model summarizes the data accurately (p. 169).
A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute
More informationEffects of Selected Multi-Stage Test Design Alternatives on Credentialing Examination Outcomes 1,2. April L. Zenisky and Ronald K.
Effects of Selected Multi-Stage Test Design Alternatives on Credentialing Examination Outcomes 1,2 April L. Zenisky and Ronald K. Hambleton University of Massachusetts Amherst March 29, 2004 1 Paper presented
More informationThe Effects of Model Misfit in Computerized Classification Test. Hong Jiao Florida State University
Model Misfit in CCT 1 The Effects of Model Misfit in Computerized Classification Test Hong Jiao Florida State University hjiao@usa.net Allen C. Lau Harcourt Educational Measurement allen_lau@harcourt.com
More informationComputer Adaptive Testing and Multidimensional Computer Adaptive Testing
Computer Adaptive Testing and Multidimensional Computer Adaptive Testing Lihua Yao Monterey, CA Lihua.Yao.civ@mail.mil Presented on January 23, 2015 Lisbon, Portugal The views expressed are those of the
More informationThe uses of the WISC-III and the WAIS-III with people with a learning disability: Three concerns
The uses of the WISC-III and the WAIS-III with people with a learning disability: Three concerns By Simon Whitaker Published in Clinical Psychology, 50 July 2005, 37-40 Summary From information in the
More informationASSUMPTIONS OF IRT A BRIEF DESCRIPTION OF ITEM RESPONSE THEORY
Paper 73 Using the SAS System to Examine the Agreement between Two Programs That Score Surveys Using Samejima s Graded Response Model Jim Penny, Center for Creative Leadership, Greensboro, NC S. Bartholomew
More informationGlossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Band Battery
1 1 1 0 1 0 1 0 1 Glossary of Terms Ability A defined domain of cognitive, perceptual, psychomotor, or physical functioning. Accommodation A change in the content, format, and/or administration of a selection
More informationA Strategy for Optimizing Item-Pool Management
Journal of Educational Measurement Summer 2006, Vol. 43, No. 2, pp. 85 96 A Strategy for Optimizing Item-Pool Management Adelaide Ariel, Wim J. van der Linden, and Bernard P. Veldkamp University of Twente
More informationIBM Workforce Science. IBM Kenexa Ability Series Computerized Adaptive Tests (IKASCAT) Technical Manual
IBM Workforce Science IBM Kenexa Ability Series Computerized Adaptive Tests (IKASCAT) Technical Manual Version 1.0.1 UK/Europe Release Date: October 2014 Copyright IBM Corporation 2014. All rights reserved.
More information** Available at Copytron located on Hampton Blvd.
SYLLABUS 1 Psy.D. 935, OBJECTIVE ASSESSMENT, 2006 Monday, 1:00 p.m. 4:00 p.m. Hofheimer Hall, 707 Robert P. Archer, Ph.D. 446-5881 e-mail: archerrp@evms.edu Teaching Assistant: REQUIRED TEXTS: ** Available
More informationAn Introduction to Psychometrics. Sharon E. Osborn Popp, Ph.D. AADB Mid-Year Meeting April 23, 2017
An Introduction to Psychometrics Sharon E. Osborn Popp, Ph.D. AADB Mid-Year Meeting April 23, 2017 Overview A Little Measurement Theory Assessing Item/Task/Test Quality Selected-response & Performance
More informationDealing with Variability within Item Clones in Computerized Adaptive Testing
Dealing with Variability within Item Clones in Computerized Adaptive Testing Research Report Chingwei David Shin Yuehmei Chien May 2013 Item Cloning in CAT 1 About Pearson Everything we do at Pearson grows
More informationEvaluating the Technical Adequacy and Usability of Early Reading Measures
This is a chapter excerpt from Guilford Publications. Early Reading Assessment: A Practitioner's Handbook, Natalie Rathvon. Copyright 2004. chapter 2 Evaluating the Technical Adequacy and Usability of
More informationA Gradual Maximum Information Ratio Approach to Item Selection in Computerized Adaptive Testing. Kyung T. Han Graduate Management Admission Council
A Gradual Maimum Information Ratio Approach to Item Selection in Computerized Adaptive Testing Kyung T. Han Graduate Management Admission Council Presented at the Item Selection Paper Session, June 2,
More informationHarrison Assessments Validation Overview
Harrison Assessments Validation Overview Dan Harrison, Ph.D. 2016 Copyright 2016 Harrison Assessments Int l, Ltd www.optimizepeople.com HARRISON ASSESSMENT VALIDATION OVERVIEW Two underlying theories are
More informationChapter Standardization and Derivation of Scores
19 3 Chapter Standardization and Derivation of Scores This chapter presents the sampling and standardization procedures used to create the normative scores for the UNIT. The demographic characteristics
More informationAn Approach to Implementing Adaptive Testing Using Item Response Theory Both Offline and Online
An Approach to Implementing Adaptive Testing Using Item Response Theory Both Offline and Online Madan Padaki and V. Natarajan MeritTrac Services (P) Ltd. Presented at the CAT Research and Applications
More informationUsing the CTI to Assess Client Readiness for Career and Employment Decision Making
Using the CTI to Assess Client Readiness for Career and Employment Decision Making James P. Sampson, Jr., Gary W. Peterson, Robert C. Reardon, Janet G. Lenz, & Denise E. Saunders Florida State University
More informationDesign of Intelligence Test Short Forms
Empirical Versus Random Item Selection in the Design of Intelligence Test Short Forms The WISC-R Example David S. Goh Central Michigan University This study demonstrated that the design of current intelligence
More informationTest Partnership Insights Series Technical Manual
Test Partnership Insights Series Technical Manual 2017 First published March 2017 All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic
More informationRedesign of MCAS Tests Based on a Consideration of Information Functions 1,2. (Revised Version) Ronald K. Hambleton and Wendy Lam
Redesign of MCAS Tests Based on a Consideration of Information Functions 1,2 (Revised Version) Ronald K. Hambleton and Wendy Lam University of Massachusetts Amherst January 9, 2009 1 Center for Educational
More informationUK Clinical Aptitude Test (UKCAT) Consortium UKCAT Examination. Executive Summary Testing Interval: 1 July October 2016
UK Clinical Aptitude Test (UKCAT) Consortium UKCAT Examination Executive Summary Testing Interval: 1 July 2016 4 October 2016 Prepared by: Pearson VUE 6 February 2017 Non-disclosure and Confidentiality
More informationInnovative Item Types Require Innovative Analysis
Innovative Item Types Require Innovative Analysis Nathan A. Thompson Assessment Systems Corporation Shungwon Ro, Larissa Smith Prometric Jo Santos American Health Information Management Association Paper
More informationWeb-Based Assessment: Issues and Applications in Personnel Selection
Web-Based Assessment: Issues and Applications in Personnel Selection John A. Weiner Psychological Services, Inc. June 22, 2004 IPMAAC 28th Annual Conference on Personnel Assessment 1 Introduction Computers
More informationDetermining the accuracy of item parameter standard error of estimates in BILOG-MG 3
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Public Access Theses and Dissertations from the College of Education and Human Sciences Education and Human Sciences, College
More informationTest and Measurement Chapter 10: The Wechsler Intelligence Scales: WAIS-IV, WISC-IV and WPPSI-III
Test and Measurement Chapter 10: The Wechsler Intelligence Scales: WAIS-IV, WISC-IV and WPPSI-III Throughout his career, Wechsler emphasized that factors other than intellectual ability are involved in
More informationSales Selector Technical Report 2017
Sales Selector Technical Report 2017 Table of Contents Executive Summary... 3 1. Purpose... 5 2. Development of an Experimental Battery... 5 3. Sample Characteristics... 6 4. Dimensions of Performance...
More informationCONSTRUCTING A STANDARDIZED TEST
Proceedings of the 2 nd SULE IC 2016, FKIP, Unsri, Palembang October 7 th 9 th, 2016 CONSTRUCTING A STANDARDIZED TEST SOFENDI English Education Study Program Sriwijaya University Palembang, e-mail: sofendi@yahoo.com
More informationTheory and Characteristics
Canadian Journal of School Psychology OnlineFirst, published on September 19, 2008 as doi:10.1177/0829573508324458 Reynolds, C. R., & Kamphaus, R. W. (2003). RIAS: Reynolds Intellectual Assessment Scales.
More information1. BE A SQUEAKY WHEEL.
Tips for Parents: Intellectual Assessment of Exceptionally and Profoundly Gifted Children Author: Wasserman, J. D. Source: Davidson Institute for Talent Development 2006 The goal of norm-referenced intelligence
More informationStandardized Measurement and Assessment
Standardized Measurement and Assessment Measurement Identify dimensions, quantity, capacity, or degree of something Assign a symbol or number according to rules (e.g., assign a number for height in inches
More informationJournal of Statistical Software
JSS Journal of Statistical Software May 2012, Volume 48, Issue 8. http://www.jstatsoft.org/ Random Generation of Response Patterns under Computerized Adaptive Testing with the R Package catr David Magis
More informationInternational Journal in Foundations of Computer Science & Technology (IJFCST) Vol.6, No.1, January Azerbaijan, Iran
DESIGNING DIGITAL COMPREHENSIVE SYSTEM TO TEST AND ASSESS THE INTELLIGENTLY BEHAVIORS OF FROM 6 TO 12 YEARS OLD CHILDREN BASED ON THE WECHSLER INTELLIGENCE THEORY Yaser Rahmani 1 and Ahmad Habibizad Navin
More informationA SIMULATION MODEL FOR INTEGRATING QUAY TRANSPORT AND STACKING POLICIES ON AUTOMATED CONTAINER TERMINALS
A SIMULATION MODEL FOR INTEGRATING QUAY TRANSPORT AND STACKING POLICIES ON AUTOMATED CONTAINER TERMINALS Mark B. Duinkerken, Joseph J.M. Evers and Jaap A. Ottjes Faculty of OCP, department of Mechanical
More informationUsing the WASI II with the WAIS IV: Substituting WASI II Subtest Scores When Deriving WAIS IV Composite Scores
Introduction Using the WASI II with the WAIS IV: Substituting WASI II Subtest Scores When Deriving WAIS IV Composite Scores Technical Report #2 November 2011 Xiaobin Zhou, PhD Susan Engi Raiford, PhD This
More informationEquivalence of Q-interactive and Paper Administrations of Cognitive Tasks: Selected NEPSY II and CMS Subtests
Equivalence of Q-interactive and Paper Administrations of Cognitive Tasks: Selected NEPSY II and CMS Subtests Q-interactive Technical Report 4 Mark H. Daniel, PhD Senior Scientist for Research Innovation
More information(1960) had proposed similar procedures for the measurement of attitude. The present paper
Rasch Analysis of the Central Life Interest Measure Neal Schmitt Michigan State University Rasch item analyses were conducted and estimates of item residuals correlated with various demographic or person
More informationAdministration duration for the Wechsler Adult Intelligence Scale-III and Wechsler Memory Scale-III
Archives of Clinical Neuropsychology 16 (2001) 293±301 Administration duration for the Wechsler Adult Intelligence Scale-III and Wechsler Memory Scale-III Bradley N. Axelrod* Psychology Section (116B),
More informationNear-Balanced Incomplete Block Designs with An Application to Poster Competitions
Near-Balanced Incomplete Block Designs with An Application to Poster Competitions arxiv:1806.00034v1 [stat.ap] 31 May 2018 Xiaoyue Niu and James L. Rosenberger Department of Statistics, The Pennsylvania
More informationAn introduction to: Q-interactive. October 2014 Jeremy Clarke Technology Consultant
An introduction to: Q-interactive October 2014 Jeremy Clarke Technology Consultant Good bye paper Changing Times What is Q-interactive? Q-interactive is a Comprehensive Digital Assessment Platform, where
More informationBalancing Security and Efficiency in Limited-Size Computer Adaptive Test Libraries
Balancing Security and Efficiency in Limited-Size Computer Adaptive Test Libraries Cory oclaire KSH Solutions/Naval Aerospace edical Institute Eric iddleton Naval Aerospace edical Institute Brennan D.
More informationField Testing and Equating Designs for State Educational Assessments. Rob Kirkpatrick. Walter D. Way. Pearson
Field Testing and Equating Designs for State Educational Assessments Rob Kirkpatrick Walter D. Way Pearson Paper presented at the annual meeting of the American Educational Research Association, New York,
More informationJOB DESCRIPTION 1. JOB IDENTIFICATION. Job Title: Assistant Clinical Psychologist : Adult. Department: Psychological Services
JOB DESCRIPTION 1. JOB IDENTIFICATION Job Title: Assistant Clinical Psychologist : Adult Department: Psychological Services Accountable to: Consultant Adult Psychology. Job Holder Reference: MHS472 No
More informationUnderstanding the Dimensionality and Reliability of the Cognitive Scales of the UK Clinical Aptitude test (UKCAT): Summary Version of the Report
Understanding the Dimensionality and Reliability of the Cognitive Scales of the UK Clinical Aptitude test (UKCAT): Summary Version of the Report Dr Paul A. Tiffin, Reader in Psychometric Epidemiology,
More informationApplication of Multilevel IRT to Multiple-Form Linking When Common Items Are Drifted. Chanho Park 1 Taehoon Kang 2 James A.
Application of Multilevel IRT to Multiple-Form Linking When Common Items Are Drifted Chanho Park 1 Taehoon Kang 2 James A. Wollack 1 1 University of Wisconsin-Madison 2 ACT, Inc. April 11, 2007 Paper presented
More informationESTIMATING TOTAL-TEST SCORES FROM PARTIAL SCORES IN A MATRIX SAMPLING DESIGN JANE SACHAR. The Rand Corporatlon
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 1980,40 ESTIMATING TOTAL-TEST SCORES FROM PARTIAL SCORES IN A MATRIX SAMPLING DESIGN JANE SACHAR The Rand Corporatlon PATRICK SUPPES Institute for Mathematmal
More informationPersonnel Psychology Centre: Recent Achievements and Future Challenges
Personnel Psychology Centre: Recent Achievements and Future Challenges PRESENTATION TO THE EUROPEAN ASSOCIATION OF TEST PUBLISHERS SEPTEMBER 2016 The Public Service Commission (PSC) Independent agency
More informationRaven's Advanced Progressive Matrices (APM)
Raven's Advanced Progressive Matrices (APM) Development 888-298-6227 TalentLens.com Copyright 2007 NCS Pearson, Inc. All rights reserved. Copyright 2007 by NCS Pearson, Inc. All rights reserved. No part
More informationCHAPTER 2 Understanding the Legal Context of Assessment- Employment Laws and Regulations with Implications for Assessment
CHAPTER 2 Understanding the Legal Context of Assessment- Employment Laws and Regulations with Implications for Assessment The number of laws and regulations governing the employment process has increased
More informationPsychometric Issues in Through Course Assessment
Psychometric Issues in Through Course Assessment Jonathan Templin The University of Georgia Neal Kingston and Wenhao Wang University of Kansas Talk Overview Formative, Interim, and Summative Tests Examining
More informationWorker Types: A New Approach to Human Capital Management
Worker Types: A New Approach to Human Capital Management James Houran, President, 20 20 Skills Employee Assessment 20 20 SKILLS ASSESSMENT 372 Willis Ave. Mineola, NY 11501 +1 516.248.8828 (ph) +1 516.742.3059
More informationCultural Intelligence
Cultural Intelligence Group Report for Bethel College May 28, 2014 www.culturalq.com info@culturalq.com Page 1 Overview This report provides summary feedback on Cultural Intelligence (CQ) of those who
More informationIntroducing WISC-V Spanish Anise Flowers, Ph.D.
Introducing Introducing Assessment Consultant Introducing the WISC V Spanish, a culturally and linguistically valid test of cognitive ability in Spanish for use with Spanish-speaking children ages 6:0
More informationAbility tests, such as Talent Q Elements, have been scientifically proven* to be strong predictors of job performance.
Talent Q Elements Ability tests, such as Talent Q Elements, have been scientifically proven* to be strong predictors of job performance. Elements is a suite of online adaptive ability tests measuring verbal,
More informationReliability & Validity Evidence for PATH
Reliability & Validity Evidence for PATH Talegent Whitepaper October 2014 Technology meets Psychology www.talegent.com Outline the empirical evidence from peer reviewed sources for the validity and reliability
More informationspecialist is 20 or fewer clients. 3= Ratio of clients per employment specialist.
SUPPORTED EMPLOYMENT FIDELITY SCALE* 1/7/08 Rater: Site: Date: Total Score: Directions: Circle one anchor number for each criterion. Criterion Data Anchor Source** Staffing 1. Caseload size: Employment
More informationThe Application of the Item Response Theory in China s Public Opinion Survey Design
Management Science and Engineering Vol. 5, No. 3, 2011, pp. 143-148 DOI:10.3968/j.mse.1913035X20110503.1z242 ISSN 1913-0341[Print] ISSN 1913-035X[Online] www.cscanada.net www.cscanada.org The Application
More informationConstruct-Related Validity Vis-A-Vis Internal Structure of the Test
Construct-Related Validity Vis-A-Vis Internal Structure of the Test Rufina C. Rosaroso (PhD) 1, Enriqueta D. Reston (PhD) 2, Nelson A. Rosaroso (Phd) 3 1 Cebu Normal University, 2,3 University of San Carlos,
More informationSaville Consulting Wave Professional Styles Handbook
Saville Consulting Wave Professional Styles Handbook PART 1: OVERVIEW Chapter 2: Applications This manual has been generated electronically. Saville Consulting do not guarantee that it has not been changed
More informationEvaluating the Performance of CATSIB in a Multi-Stage Adaptive Testing Environment. Mark J. Gierl Hollis Lai Johnson Li
Evaluating the Performance of CATSIB in a Multi-Stage Adaptive Testing Environment Mark J. Gierl Hollis Lai Johnson Li Centre for Research in Applied Measurement and Evaluation University of Alberta FINAL
More informationALTE Quality Assurance Checklists. Unit 1. Test Construction
s Unit 1 Test Construction Name(s) of people completing this checklist: Which examination are the checklists being completed for? At which ALTE Level is the examination at? Date of completion: Instructions
More informationAcademic Screening Frequently Asked Questions (FAQ)
Academic Screening Frequently Asked Questions (FAQ) 1. How does the TRC consider evidence for tools that can be used at multiple grade levels?... 2 2. For classification accuracy, the protocol requires
More informationPRINCIPLES AND APPLICATIONS OF SPECIAL EDUCATION ASSESSMENT
PRINCIPLES AND APPLICATIONS OF SPECIAL EDUCATION ASSESSMENT CLASS 3: DESCRIPTIVE STATISTICS & RELIABILITY AND VALIDITY FEBRUARY 2, 2015 OBJECTIVES Define basic terminology used in assessment, such as validity,
More informationDiscoveries with item response theory (IRT)
Chapter 5 Test Modeling Ratna Nandakumar Terry Ackerman Discoveries with item response theory (IRT) principles, since the 1960s, have led to major breakthroughs in psychological and educational assessment.
More informationKey Elements of the CIP Approach
Key Elements of the CIP Approach James P. Sampson, Jr., Gary W. Peterson, Robert C. Reardon, and Janet G. Lenz Florida State University Copyright 2003 by James P. Sampson, Jr., Gary W. Peterson, Robert
More informationInfluence of the Big Five Personality Traits of IT Workers on Job Satisfaction
, pp.126-131 http://dx.doi.org/10.14257/astl.2016.142.23 Influence of the Big Five Personality Traits of IT Workers on Job Satisfaction Hyo Jung Kim 1Dept. Liberal Education University, Keimyung University
More informationMultidimensional Aptitude Battery-II (MAB-II) Clinical Report
Multidimensional Aptitude Battery-II (MAB-II) Clinical Report Name: Sam Sample ID Number: 1000 A g e : 14 (Age Group 16-17) G e n d e r : Male Years of Education: 15 Report Date: August 19, 2010 Summary
More informationFrequently Asked Questions (FAQs)
I N T E G R A T E D WECHSLER INTELLIGENCE SCALE FOR CHILDREN FIFTH EDITION INTEGRATED Frequently Asked Questions (FAQs) Related sets of FAQs: For general WISC V CDN FAQs, please visit: https://www.pearsonclinical.ca/content/dam/school/global/clinical/canada/programs/wisc5/wisc-v-cdn-faqs.pdf
More informationAn Exploration of the Robustness of Four Test Equating Models
An Exploration of the Robustness of Four Test Equating Models Gary Skaggs and Robert W. Lissitz University of Maryland This monte carlo study explored how four commonly used test equating methods (linear,
More informationChapter 9 External Selection: Testing
Chapter 9 External Selection: Testing Substantive Assessment Methods are used to make more precise decisions about the applicants & to separate finalists from candidates; they are used after the initial
More informationMastering Modern Psychological Testing Theory & Methods Cecil R. Reynolds Ronald B. Livingston First Edition
Mastering Modern Psychological Testing Theory & Methods Cecil R. Reynolds Ronald B. Livingston First Edition Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies
More informationpersonality assessment s average coefficient alpha of.83 is among the highest of all assessments. It
Validity and reliability of the WorkPlace Big Five Profile 4.0 Today s organizations and leaders face a demanding challenge in choosing from among thousands of personality assessment products and services.
More informationStefanie Moerbeek, Product Developer, EXIN Greg Pope, Questionmark, Analytics and Psychometrics Manager
Stefanie Moerbeek, Product Developer, EXIN Greg Pope, Questionmark, Analytics and Psychometrics Manager Stefanie Moerbeek introduction EXIN (Examination institute for Information Science), Senior Coordinator
More informationIN HUMAN RESOURCE MANAGEMENT
RESEARCH AND PRACTICE IN HUMAN RESOURCE MANAGEMENT Lu, L. & Lin, G. C. (2002). Work Values and Job Adjustment of Taiwanese workers, Research and Practice in Human Resource Management, 10(2), 70-76. Work
More informationAudience: Six to eight New employees of YouthCARE, young staff members new to full time youth work.
YouthCARE Youth Workers and Audience: Six to eight New employees of YouthCARE, young staff members new to full time youth work. Goal: To prepare new youth workers to critically think about and demonstrate
More informationALTE Quality Assurance Checklists. Unit 1. Test Construction
ALTE Quality Assurance Checklists Unit 1 Test Construction Name(s) of people completing this checklist: Which examination are the checklists being completed for? At which ALTE Level is the examination
More informationPresented by Anne Buckett, Precision HR, South Africa
The customisation of simulation exercises and other challenges as part of a large skills audit project for development Presented by Anne Buckett, Precision HR, South Africa Part of soon to be published
More information