Second Generation, Multidimensional, and Multilevel Item Response Theory Modeling

Size: px
Start display at page:

Download "Second Generation, Multidimensional, and Multilevel Item Response Theory Modeling"

Transcription

1 Second Generation, Multidimensional, and Multilevel Item Response Theory Modeling Li Cai CSE/CRESST Department of Education Department of Psychology UCLA With contributions from Mark Hansen, Scott Monroe, and Ji Seung Yang NCME, 2012 April 16,

2 Multidimensionality Introduction Modern educational and psychological assessments can be complex in dimensionality. - Consider the PISA assessment framework. - Consider many multi-faceted health outcomes measures. 2

3 Introduction Two-tier Model for PISA (Cai, 2010) 3

4 Introduction Two-tier Model for PISA (Cai, 2010) 4

5 Introduction NIH (NIDA) PROMIS Smoking Module Project goal: to develop, evaluate, and standardize item banks to assess cigarette smoking behavior and constructs associated with smoking for both daily and non-daily smokers The bifactor structure for the Dependence/Craving domain G S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 G: dependence/craving (55) S1: first cig of the day (3) S2: automatic/mindless (4) S3: heavy (5) S4: out of control (8) S5: withdrawal (8) S6: cravings (3) S7: if I couldn t smoke (4) S8: can t quit (2) S9: temptations (5) S10: consistency (6) S11: when not allowed (3) 5

6 Multilevel Data Introduction At the same time, large-scale data collection efforts typically employ multi-stage sampling, resulting in natural nesting of respondents within independent sampling units. - Two-stage stratified sampling design of PISA within countries - Three-stage PPS sampling design of NAEQ (the new national assessment in China) - Of course, there is NAEP and other large-scale surveys. In addition, multilevel data structures arise when the study design involves repeated measurement or multiple raters or sources of information. 6

7 Current Models Calibration of item parameters typically employ the unidimensional model, assuming normality. Single-level latent variable models. Aggregated/group inference by conditioning (latent regression model) and the plausible value methodology. Model fit evaluation? New Models? As an aside (but useful for operational psychometrics): summed score based calculations (EAP to IRT scale score translation tables, item fit, linking, etc.) from the perspective of IRT without the Rasch (equal discrimination) requirement 7

8 New Models? From Yesterday (Yang, Monroe, and Cai) Multilevel - Clustering - Variances at L1, L2 - Intraclass Correlation Multidimensional Multiple groups Efficient computation Constraints Scores at L1 and L2 Fit tests 8

9 Some of the Equations Multilevel Two-Tier Model 9

10 Item Parameter Estimation Curse of Dimensionality High-dimensional integration problem in likelihood-based estimation and inference. Decrease the computational burden through analytical dimension reduction (Gibbons & Hedeker, 1992). Only (p+q+1)-dimensional quadrature is required. Individual response pattern scores for all level-1 latent variables can be produced as posterior expectations. Similarly dimensioned quadrature computations provide basis for score reporting systems based on posterior expectations for level-2 variables, e.g., school- or state-level achievement. 10

11 Non-normality in Latent Variables Empirical Histogram Item Bifactor Model Non-normal latent variables have been studied extensively in the unidimensional IRT setting (e.g., work by Carol Woods). Little discussion of non-normality in latent variables for multidimensional models. Cai & Woods proposed an item bifactor model wherein the general dimension is characterized non-parametrically. Supports multiple-group estimation. Only 2-dimensional integral is needed for MML estimation. EAP scores a bi-product. N N N N EH 11

12 Non-normality in Latent Variables Empirical Histogram Item Bifactor Model 12

13 What About Model Fit? Sparseness and Overall Model Fit Tests In IRT, full-information Pearson s X 2 or the likelihood ratio statistic G 2 may be used to examine model fit. However, if the number of response patterns is large (relative to the sample size), the underlying table can become sparse. Maydeu-Oliveres and colleagues proposed statistics based on lowerorder margins, particularly M 2, which is based on first- and secondorder marginal residuals. M 2 is successful for testing unidimensional dichotomous IRT models, but adaptation of M 2 to hierarchical and polytomous item factor models is not straightforward. 13

14 Challenges of Dimensionality Two challenges arise: What About Model Fit? 1. When the number of response categories increases, even the secondorder marginals become sparse. 2. The calculation of expected cell frequencies, Jacobian elements, and weight matrix elements for high-dimensional IRT models can be computationally burdensome. Cai & Hansen (in press; BJMSP) tackle (1) by a further reduction M 2 * and resolve (2) by extend Gibbons & Hedeker s (1992) strategy of dimension reduction for item parameter estimation to goodness-of-fit testing. For bifactor models, the maximum dimension of integration is 2, regardless of the number of factors. 14

15 Summed-Score Computations Lord-Wingersky Algorithm Version 2.0 For hierarchical multidimensional item factor models, can we obtain similar summed-score posterior based indices as in unidimensional IRT models? Lord-Wingersky Algorithm Version 2.0: Part 1: For each testlet, form likelihoods for within-testlet summed scores. This is standard Lord-Wingersky as applied to 2- dimensional quadrature grids. Part 2: For each within-testlet summed score likelihood, integrate out the specific dimension. This is the same as in bifactor/testlet item parameter estimation. Part 3: Treat testlets as items. Treat testlet scores as item scores. Apply standard Lord-Wingerksy. 15

16 Item Fit? Summed-Score Computations The new algorithm provides a convenient computational shortcut for obtaining the Orlando-Thissen-Bjorner S-X 2 type item fit statistics. Hierarchical (or reparameterized higherorder) models probably remain the only multidimensional IRT models where summed scores correspond nicely with latent traits. Ying Li and Andre Rupp recently examined the performance of these statistics in a paper in EPM. 16

17 Summed-Score Computations Score Combinations Reproducing scoring table from Thissen & Wainer (2001) Chapter 7 material on Wisconsin 3 rd grade mixed format test. MC Open-ended (Summed) Rated Score Sum

18 Summed-Score Computations Calibrated Projection Linking Thissen et al. s (2011) paper described a linking method that fuses calibration with projection and demonstrated how one may conduct summed score based projection linking. Figure on the right stolen from 2010 IMPS presentation by Thissen. 18

19 Software Implementations As the result of a National Cancer Institute SBIR development contract, many of the multidimensional models are implemented in IRTPRO (ssicentral.com): Cai, L., Thissen, D., & du Toit, S. H. C. (2011). IRTPRO: Flexible, multidimensional, multiple categorical IRT modeling [Computer software]. Lincolnwood, IL: Scientific Software International. The newer multilevel, non-parametric, and model fit testing procedures are implemented in flexmirt (flexmirt.com): Cai, L. (2012). Flexible multilevel item factor analysis and test scoring [Computer software]. Seattle, WA: Vector Psychometric Group, LLC. National Cancer Institute also funded: Wu, E. J. C. and Bentler, P. M. (2011). EQSIRT: A user-friendly IRT program [Computer software]. Encino, CA: Multivariate Software, Inc. 19

20 Acknowledgements Thank you for bearing with me. And many thanks to the program organizers and discussant! LCAI [at] UCLA [dot] EDU Part of this research is made possible by grants from the Institute of Education Sciences (R305B and R305D100039) and grants from the National Institute on Drug Abuse (R01DA and R01DA030466). I would like to thank the following members of my research group at UCLA: Mark Hansen, Ji Seung Yang, Scott Monroe, and the RAND/UCLA PROMIS Smoking Initiative Group. I would also like to thank Dave Thissen at UNC-Chapel Hill. 20

RUNNING HEAD: MODELING LOCAL DEPENDENCE USING BIFACTOR MODELS 1

RUNNING HEAD: MODELING LOCAL DEPENDENCE USING BIFACTOR MODELS 1 RUNNING HEAD: MODELING LOCAL DEPENDENCE USING BIFACTOR MODELS 1 Modeling Local Dependence Using Bifactor Models Xin Xin Summer Intern of American Institution of Certified Public Accountants University

More information

Technical Report: Does It Matter Which IRT Software You Use? Yes.

Technical Report: Does It Matter Which IRT Software You Use? Yes. R Technical Report: Does It Matter Which IRT Software You Use? Yes. Joy Wang University of Minnesota 1/21/2018 Abstract It is undeniable that psychometrics, like many tech-based industries, is moving in

More information

Scoring Subscales using Multidimensional Item Response Theory Models. Christine E. DeMars. James Madison University

Scoring Subscales using Multidimensional Item Response Theory Models. Christine E. DeMars. James Madison University Scoring Subscales 1 RUNNING HEAD: Multidimensional Item Response Theory Scoring Subscales using Multidimensional Item Response Theory Models Christine E. DeMars James Madison University Author Note Christine

More information

Estimation of a Rasch model including subdimensions

Estimation of a Rasch model including subdimensions Estimation of a Rasch model including subdimensions Steffen Brandt Leibniz Institute for Science Education, Kiel, Germany Many achievement tests, particularly in large-scale assessments, deal with measuring

More information

ITEM RESPONSE THEORY FOR WEIGHTED SUMMED SCORES. Brian Dale Stucky

ITEM RESPONSE THEORY FOR WEIGHTED SUMMED SCORES. Brian Dale Stucky ITEM RESPONSE THEORY FOR WEIGHTED SUMMED SCORES Brian Dale Stucky A thesis submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the

More information

Psychometric Issues in Through Course Assessment

Psychometric Issues in Through Course Assessment Psychometric Issues in Through Course Assessment Jonathan Templin The University of Georgia Neal Kingston and Wenhao Wang University of Kansas Talk Overview Formative, Interim, and Summative Tests Examining

More information

Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT

Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT University of Iowa Iowa Research Online Theses and Dissertations Summer 2011 Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT Benjamin

More information

Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT

Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT University of Iowa Iowa Research Online Theses and Dissertations Summer 2011 Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT Benjamin

More information

Increasing unidimensional measurement precision using a multidimensional item response model approach

Increasing unidimensional measurement precision using a multidimensional item response model approach Psychological Test and Assessment Modeling, Volume 55, 2013 (2), 148-161 Increasing unidimensional measurement precision using a multidimensional item response model approach Steffen Brandt 1 & Brent Duckor

More information

proficiency that the entire response pattern provides, assuming that the model summarizes the data accurately (p. 169).

proficiency that the entire response pattern provides, assuming that the model summarizes the data accurately (p. 169). A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute

More information

Validity and Reliability Issues in the Large-Scale Assessment of English Language Proficiency

Validity and Reliability Issues in the Large-Scale Assessment of English Language Proficiency Validity and Reliability Issues in the Large-Scale Assessment of English Language Proficiency The 5 th International Conference on ELT in China Beijing, China May 21, 2007 Richard J. Patz, Ph.D. CTB/McGraw-Hill

More information

Understanding the Dimensionality and Reliability of the Cognitive Scales of the UK Clinical Aptitude test (UKCAT): Summary Version of the Report

Understanding the Dimensionality and Reliability of the Cognitive Scales of the UK Clinical Aptitude test (UKCAT): Summary Version of the Report Understanding the Dimensionality and Reliability of the Cognitive Scales of the UK Clinical Aptitude test (UKCAT): Summary Version of the Report Dr Paul A. Tiffin, Reader in Psychometric Epidemiology,

More information

Computer Adaptive Testing and Multidimensional Computer Adaptive Testing

Computer Adaptive Testing and Multidimensional Computer Adaptive Testing Computer Adaptive Testing and Multidimensional Computer Adaptive Testing Lihua Yao Monterey, CA Lihua.Yao.civ@mail.mil Presented on January 23, 2015 Lisbon, Portugal The views expressed are those of the

More information

Conjoint analysis based on Thurstone judgement comparison model in the optimization of banking products

Conjoint analysis based on Thurstone judgement comparison model in the optimization of banking products Conjoint analysis based on Thurstone judgement comparison model in the optimization of banking products Adam Sagan 1, Aneta Rybicka, Justyna Brzezińska 3 Abstract Conjoint measurement, as well as conjoint

More information

Linking errors in trend estimation for international surveys in education

Linking errors in trend estimation for international surveys in education Linking errors in trend estimation for international surveys in education C. Monseur University of Liège, Liège, Belgium H. Sibberns and D. Hastedt IEA Data Processing and Research Center, Hamburg, Germany

More information

A standardization approach to adjusting pretest item statistics. Shun-Wen Chang National Taiwan Normal University

A standardization approach to adjusting pretest item statistics. Shun-Wen Chang National Taiwan Normal University A standardization approach to adjusting pretest item statistics Shun-Wen Chang National Taiwan Normal University Bradley A. Hanson and Deborah J. Harris ACT, Inc. Paper presented at the annual meeting

More information

Determining the accuracy of item parameter standard error of estimates in BILOG-MG 3

Determining the accuracy of item parameter standard error of estimates in BILOG-MG 3 University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Public Access Theses and Dissertations from the College of Education and Human Sciences Education and Human Sciences, College

More information

ASSUMPTIONS OF IRT A BRIEF DESCRIPTION OF ITEM RESPONSE THEORY

ASSUMPTIONS OF IRT A BRIEF DESCRIPTION OF ITEM RESPONSE THEORY Paper 73 Using the SAS System to Examine the Agreement between Two Programs That Score Surveys Using Samejima s Graded Response Model Jim Penny, Center for Creative Leadership, Greensboro, NC S. Bartholomew

More information

ABSTRACT. systems, as they have demonstrated high criterion-related validity in predicting job

ABSTRACT. systems, as they have demonstrated high criterion-related validity in predicting job ABSTRACT WRIGHT, NATALIE ANN. New Strategy, Old Question: Using Multidimensional Item Response Theory to Examine the Construct Validity of Situational Judgment Tests. (Under the direction of Dr. Adam W.

More information

An Introduction to Rasch Measurement: Theory and Applications October 8-9, 2010 at the Hilton Garden Inn, Maple Grove, MN

An Introduction to Rasch Measurement: Theory and Applications October 8-9, 2010 at the Hilton Garden Inn, Maple Grove, MN An Introduction to Rasch Measurement: Theory and Applications October 8-9, 2010 at the Hilton Garden Inn, Maple Grove, MN WORKSHOP DESCRIPTION The purpose of this training session is to introduce participants

More information

Longitudinal Effects of Item Parameter Drift. James A. Wollack Hyun Jung Sung Taehoon Kang

Longitudinal Effects of Item Parameter Drift. James A. Wollack Hyun Jung Sung Taehoon Kang Longitudinal Effects of Item Parameter Drift James A. Wollack Hyun Jung Sung Taehoon Kang University of Wisconsin Madison 1025 W. Johnson St., #373 Madison, WI 53706 April 12, 2005 Paper presented at the

More information

Glossary of Standardized Testing Terms https://www.ets.org/understanding_testing/glossary/

Glossary of Standardized Testing Terms https://www.ets.org/understanding_testing/glossary/ Glossary of Standardized Testing Terms https://www.ets.org/understanding_testing/glossary/ a parameter In item response theory (IRT), the a parameter is a number that indicates the discrimination of a

More information

Clustering of Quality of Life Items around Latent Variables

Clustering of Quality of Life Items around Latent Variables Clustering of Quality of Life Items around Latent Variables Jean-Benoit Hardouin Laboratory of Biostatistics University of Nantes France Perugia, September 8th, 2006 Perugia, September 2006 1 Context How

More information

CHAPTER 4 EXAMPLES: EXPLORATORY FACTOR ANALYSIS

CHAPTER 4 EXAMPLES: EXPLORATORY FACTOR ANALYSIS Examples: Exploratory Factor Analysis CHAPTER 4 EXAMPLES: EXPLORATORY FACTOR ANALYSIS Exploratory factor analysis (EFA) is used to determine the number of continuous latent variables that are needed to

More information

Six Major Challenges for Educational and Psychological Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst

Six Major Challenges for Educational and Psychological Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Six Major Challenges for Educational and Psychological Testing Practices Ronald K. Hambleton University of Massachusetts at Amherst Annual APA Meeting, New Orleans, Aug. 11, 2006 In 1966 (I began my studies

More information

A Hierarchical Rater Model for Constructed Responses, with a Signal Detection Rater Model

A Hierarchical Rater Model for Constructed Responses, with a Signal Detection Rater Model Journal of Educational Measurement Fall 2011, Vol. 48, No. 3, pp. 333 356 A Hierarchical Rater Model for Constructed Responses, with a Signal Detection Rater Model Lawrence T. DeCarlo Teachers College,

More information

The computer-adaptive multistage testing (ca-mst) has been developed as an

The computer-adaptive multistage testing (ca-mst) has been developed as an WANG, XINRUI, Ph.D. An Investigation on Computer-Adaptive Multistage Testing Panels for Multidimensional Assessment. (2013) Directed by Dr. Richard M Luecht. 89 pp. The computer-adaptive multistage testing

More information

Chapter 11. Multiple-Sample SEM. Overview. Rationale of multiple-sample SEM. Multiple-sample path analysis. Multiple-sample CFA.

Chapter 11. Multiple-Sample SEM. Overview. Rationale of multiple-sample SEM. Multiple-sample path analysis. Multiple-sample CFA. Chapter 11 Multiple-Sample SEM Facts do not cease to exist because they are ignored. Overview Aldous Huxley Rationale of multiple-sample SEM Multiple-sample path analysis Multiple-sample CFA Extensions

More information

differential item functioning Wang 2008 DIF-free-then-DIF DFTD DIF DIF-free

differential item functioning Wang 2008 DIF-free-then-DIF DFTD DIF DIF-free differential item functioning Type I error Wang 2008 -free-then- DFTD likelihood ratio test LRT power -free -free -free -free I The Effect of -free-then- Strategy on Likelihood Ratio Test in Assessing

More information

Using a Performance Test Development & Validation Framework

Using a Performance Test Development & Validation Framework Using a Performance Test Development & Validation Framework James B. Olsen Russell W. Smith Cristina Goodwin Alpine Testing Solutions Presentation Overview Present a useful performance test development

More information

Concurrent Unidimensional and Multidimensional Calibration within Item Response Theory

Concurrent Unidimensional and Multidimensional Calibration within Item Response Theory Pensamiento Educativo. Revista de Investigación Educacional Latinoamericana 2017, 54(2), 1-18 Concurrent Unidimensional and Multidimensional Calibration within Item Response Theory Calibración concurrente

More information

Reliability and interpretation of total scores from multidimensional cognitive measures evaluating the GIK 4-6 using bifactor analysis

Reliability and interpretation of total scores from multidimensional cognitive measures evaluating the GIK 4-6 using bifactor analysis Psychological Test and Assessment Modeling, Volume 60, 2018 (4), 393-401 Reliability and interpretation of total scores from multidimensional cognitive measures evaluating the GIK 4-6 using bifactor analysis

More information

ANZMAC 2010 Page 1 of 8. Assessing the Validity of Brand Equity Constructs: A Comparison of Two Approaches

ANZMAC 2010 Page 1 of 8. Assessing the Validity of Brand Equity Constructs: A Comparison of Two Approaches ANZMAC 2010 Page 1 of 8 Assessing the Validity of Brand Equity Constructs: A Comparison of Two Approaches Con Menictas, University of Technology Sydney, con.menictas@uts.edu.au Paul Wang, University of

More information

An Introduction to Psychometrics. Sharon E. Osborn Popp, Ph.D. AADB Mid-Year Meeting April 23, 2017

An Introduction to Psychometrics. Sharon E. Osborn Popp, Ph.D. AADB Mid-Year Meeting April 23, 2017 An Introduction to Psychometrics Sharon E. Osborn Popp, Ph.D. AADB Mid-Year Meeting April 23, 2017 Overview A Little Measurement Theory Assessing Item/Task/Test Quality Selected-response & Performance

More information

Develop Innovative Methods in Secondary Analyses of Child Welfare Databases -- Children s Bureau Discretionary Grants Program Grantee s Final Report

Develop Innovative Methods in Secondary Analyses of Child Welfare Databases -- Children s Bureau Discretionary Grants Program Grantee s Final Report Develop Innovative Methods in Secondary Analyses of Child Welfare Databases -- Children s Bureau Discretionary Grants Program Grantee s Final Report I. Executive Summary Shenyang Guo, Ph.D., Principal

More information

Effects of Selected Multi-Stage Test Design Alternatives on Credentialing Examination Outcomes 1,2. April L. Zenisky and Ronald K.

Effects of Selected Multi-Stage Test Design Alternatives on Credentialing Examination Outcomes 1,2. April L. Zenisky and Ronald K. Effects of Selected Multi-Stage Test Design Alternatives on Credentialing Examination Outcomes 1,2 April L. Zenisky and Ronald K. Hambleton University of Massachusetts Amherst March 29, 2004 1 Paper presented

More information

Confirmatory factor analysis in Mplus. Day 2

Confirmatory factor analysis in Mplus. Day 2 Confirmatory factor analysis in Mplus Day 2 1 Agenda 1. EFA and CFA common rules and best practice Model identification considerations Choice of rotation Checking the standard errors (ensuring identification)

More information

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Band Battery

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Band Battery 1 1 1 0 1 0 1 0 1 Glossary of Terms Ability A defined domain of cognitive, perceptual, psychomotor, or physical functioning. Accommodation A change in the content, format, and/or administration of a selection

More information

UK Clinical Aptitude Test (UKCAT) Consortium UKCAT Examination. Executive Summary Testing Interval: 1 July October 2016

UK Clinical Aptitude Test (UKCAT) Consortium UKCAT Examination. Executive Summary Testing Interval: 1 July October 2016 UK Clinical Aptitude Test (UKCAT) Consortium UKCAT Examination Executive Summary Testing Interval: 1 July 2016 4 October 2016 Prepared by: Pearson VUE 6 February 2017 Non-disclosure and Confidentiality

More information

An Automatic Online Calibration Design in Adaptive Testing 1. Guido Makransky 2. Master Management International A/S and University of Twente

An Automatic Online Calibration Design in Adaptive Testing 1. Guido Makransky 2. Master Management International A/S and University of Twente Automatic Online Calibration1 An Automatic Online Calibration Design in Adaptive Testing 1 Guido Makransky 2 Master Management International A/S and University of Twente Cees. A. W. Glas University of

More information

Academic Screening Frequently Asked Questions (FAQ)

Academic Screening Frequently Asked Questions (FAQ) Academic Screening Frequently Asked Questions (FAQ) 1. How does the TRC consider evidence for tools that can be used at multiple grade levels?... 2 2. For classification accuracy, the protocol requires

More information

Estimating Standard Errors of Irtparameters of Mathematics Achievement Test Using Three Parameter Model

Estimating Standard Errors of Irtparameters of Mathematics Achievement Test Using Three Parameter Model IOSR Journal of Research & Method in Education (IOSR-JRME) e- ISSN: 2320 7388,p-ISSN: 2320 737X Volume 8, Issue 2 Ver. VI (Mar. Apr. 2018), PP 01-07 www.iosrjournals.org Estimating Standard Errors of Irtparameters

More information

Designing item pools to optimize the functioning of a computerized adaptive test

Designing item pools to optimize the functioning of a computerized adaptive test Psychological Test and Assessment Modeling, Volume 52, 2 (2), 27-4 Designing item pools to optimize the functioning of a computerized adaptive test Mark D. Reckase Abstract Computerized adaptive testing

More information

To appear in: Moutinho and Hutcheson: Dictionary of Quantitative Methods in Management. Sage Publications

To appear in: Moutinho and Hutcheson: Dictionary of Quantitative Methods in Management. Sage Publications To appear in: Moutinho and Hutcheson: Dictionary of Quantitative Methods in Management. Sage Publications Factor Analysis Introduction Factor Analysis attempts to identify the underlying structure in a

More information

Discoveries with item response theory (IRT)

Discoveries with item response theory (IRT) Chapter 5 Test Modeling Ratna Nandakumar Terry Ackerman Discoveries with item response theory (IRT) principles, since the 1960s, have led to major breakthroughs in psychological and educational assessment.

More information

Chapter 1 Item Selection and Ability Estimation in Adaptive Testing

Chapter 1 Item Selection and Ability Estimation in Adaptive Testing Chapter 1 Item Selection and Ability Estimation in Adaptive Testing Wim J. van der Linden and Peter J. Pashley 1.1 Introduction The last century saw a tremendous progression in the refinement and use of

More information

THE COMPARISON OF COMMON ITEM SELECTION METHODS IN VERTICAL SCALING UNDER MULTIDIMENSIONAL ITEM RESPONSE THEORY. Yang Lu A DISSERTATION

THE COMPARISON OF COMMON ITEM SELECTION METHODS IN VERTICAL SCALING UNDER MULTIDIMENSIONAL ITEM RESPONSE THEORY. Yang Lu A DISSERTATION THE COMPARISON OF COMMON ITEM SELECTION METHODS IN VERTICAL SCALING UNDER MULTIDIMENSIONAL ITEM RESPONSE THEORY By Yang Lu A DISSERTATION Submitted to Michigan State University in partial fulfillment of

More information

A Gradual Maximum Information Ratio Approach to Item Selection in Computerized Adaptive Testing. Kyung T. Han Graduate Management Admission Council

A Gradual Maximum Information Ratio Approach to Item Selection in Computerized Adaptive Testing. Kyung T. Han Graduate Management Admission Council A Gradual Maimum Information Ratio Approach to Item Selection in Computerized Adaptive Testing Kyung T. Han Graduate Management Admission Council Presented at the Item Selection Paper Session, June 2,

More information

Indian Institute of Technology Kanpur National Programme on Technology Enhanced Learning (NPTEL) Course Title Marketing Management 1

Indian Institute of Technology Kanpur National Programme on Technology Enhanced Learning (NPTEL) Course Title Marketing Management 1 Indian Institute of Technology Kanpur National Programme on Technology Enhanced Learning (NPTEL) Course Title Marketing Management 1 Lecture: W4-L6 Capturing Marketing Insights by Prof. Jayanta Chatterjee

More information

Smarter Balanced Assessment Consortium Field Test: Automated Scoring Research Studies in accordance with Smarter Balanced RFP 17

Smarter Balanced Assessment Consortium Field Test: Automated Scoring Research Studies in accordance with Smarter Balanced RFP 17 Smarter Balanced Assessment Consortium Field Test: Automated Scoring Research Studies in accordance with Smarter Balanced RFP 17 McGraw-Hill Education CTB December 24, 2014 Developed and published under

More information

Investigating Common-Item Screening Procedures in Developing a Vertical Scale

Investigating Common-Item Screening Procedures in Developing a Vertical Scale Investigating Common-Item Screening Procedures in Developing a Vertical Scale Annual Meeting of the National Council of Educational Measurement New Orleans, LA Marc Johnson Qing Yi April 011 COMMON-ITEM

More information

The Effects of Model Misfit in Computerized Classification Test. Hong Jiao Florida State University

The Effects of Model Misfit in Computerized Classification Test. Hong Jiao Florida State University Model Misfit in CCT 1 The Effects of Model Misfit in Computerized Classification Test Hong Jiao Florida State University hjiao@usa.net Allen C. Lau Harcourt Educational Measurement allen_lau@harcourt.com

More information

to be assessed each time a test is administered to a new group of examinees. Index assess dimensionality, and Zwick (1987) applied

to be assessed each time a test is administered to a new group of examinees. Index assess dimensionality, and Zwick (1987) applied Assessing Essential Unidimensionality of Real Data Ratna Nandakumar University of Delaware The capability of DIMTEST in assessing essential unidimensionality of item responses to real tests was investigated.

More information

Field Testing and Equating Designs for State Educational Assessments. Rob Kirkpatrick. Walter D. Way. Pearson

Field Testing and Equating Designs for State Educational Assessments. Rob Kirkpatrick. Walter D. Way. Pearson Field Testing and Equating Designs for State Educational Assessments Rob Kirkpatrick Walter D. Way Pearson Paper presented at the annual meeting of the American Educational Research Association, New York,

More information

Multivariate G-Theory and Subscores 1. Investigating the Use of Multivariate Generalizability Theory for Evaluating Subscores.

Multivariate G-Theory and Subscores 1. Investigating the Use of Multivariate Generalizability Theory for Evaluating Subscores. Multivariate G-Theory and Subscores 1 Investigating the Use of Multivariate Generalizability Theory for Evaluating Subscores Zhehan Jiang University of Kansas Mark Raymond National Board of Medical Examiners

More information

Dealing with Variability within Item Clones in Computerized Adaptive Testing

Dealing with Variability within Item Clones in Computerized Adaptive Testing Dealing with Variability within Item Clones in Computerized Adaptive Testing Research Report Chingwei David Shin Yuehmei Chien May 2013 Item Cloning in CAT 1 About Pearson Everything we do at Pearson grows

More information

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy AGENDA 1. Introduction 2. Use Cases 3. Popular Algorithms 4. Typical Approach 5. Case Study 2016 SAPIENT GLOBAL MARKETS

More information

Subscore Reliability and Classification Consistency: A Comparison of Five Methods

Subscore Reliability and Classification Consistency: A Comparison of Five Methods University of Massachusetts Amherst ScholarWorks@UMass Amherst Doctoral Dissertations Dissertations and Theses 2016 Subscore Reliability and Classification Consistency: A Comparison of Five Methods Fen

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 0, ():7-7 Research Article ISSN : 097-78 CODEN(USA) : JCPRC Multiple regression research on sports and economical structure

More information

Worker Skill Estimation from Crowdsourced Mutual Assessments

Worker Skill Estimation from Crowdsourced Mutual Assessments Worker Skill Estimation from Crowdsourced Mutual Assessments Shuwei Qiang The George Washington University Amrinder Arora BizMerlin Current approaches for estimating skill levels of workforce either do

More information

THREE LEVEL HIERARCHICAL BAYESIAN ESTIMATION IN CONJOINT PROCESS

THREE LEVEL HIERARCHICAL BAYESIAN ESTIMATION IN CONJOINT PROCESS Please cite this article as: Paweł Kopciuszewski, Three level hierarchical Bayesian estimation in conjoint process, Scientific Research of the Institute of Mathematics and Computer Science, 2006, Volume

More information

Support Vector Machines (SVMs) for the classification of microarray data. Basel Computational Biology Conference, March 2004 Guido Steiner

Support Vector Machines (SVMs) for the classification of microarray data. Basel Computational Biology Conference, March 2004 Guido Steiner Support Vector Machines (SVMs) for the classification of microarray data Basel Computational Biology Conference, March 2004 Guido Steiner Overview Classification problems in machine learning context Complications

More information

Three Research Approaches to Aligning Hogan Scales With Competencies

Three Research Approaches to Aligning Hogan Scales With Competencies Three Research Approaches to Aligning Hogan Scales With Competencies 2014 Hogan Assessment Systems Inc Executive Summary Organizations often use competency models to provide a common framework for aligning

More information

STAAR-Like Quality Starts with Reliability

STAAR-Like Quality Starts with Reliability STAAR-Like Quality Starts with Reliability Quality Educational Research Our mission is to provide a comprehensive independent researchbased resource of easily accessible and interpretable data for policy

More information

Running head: GROUP COMPARABILITY

Running head: GROUP COMPARABILITY Running head: GROUP COMPARABILITY Evaluating the Comparability of lish- and nch-speaking Examinees on a Science Achievement Test Administered Using Two-Stage Testing Gautam Puhan Mark J. Gierl Centre for

More information

An Exploration of the Robustness of Four Test Equating Models

An Exploration of the Robustness of Four Test Equating Models An Exploration of the Robustness of Four Test Equating Models Gary Skaggs and Robert W. Lissitz University of Maryland This monte carlo study explored how four commonly used test equating methods (linear,

More information

Practical Exploratory Factor Analysis: An Overview

Practical Exploratory Factor Analysis: An Overview Practical Exploratory Factor Analysis: An Overview James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Practical Exploratory Factor

More information

Mastering Modern Psychological Testing Theory & Methods Cecil R. Reynolds Ronald B. Livingston First Edition

Mastering Modern Psychological Testing Theory & Methods Cecil R. Reynolds Ronald B. Livingston First Edition Mastering Modern Psychological Testing Theory & Methods Cecil R. Reynolds Ronald B. Livingston First Edition Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies

More information

Automated Test Assembly for COMLEX USA: A SAS Operations Research (SAS/OR) Approach

Automated Test Assembly for COMLEX USA: A SAS Operations Research (SAS/OR) Approach Automated Test Assembly for COMLEX USA: A SAS Operations Research (SAS/OR) Approach Dr. Hao Song, Senior Director for Psychometrics and Research Dr. Hongwei Patrick Yang, Senior Research Associate Introduction

More information

Chapter 7. Measurement Models and Confirmatory Factor Analysis. Overview

Chapter 7. Measurement Models and Confirmatory Factor Analysis. Overview Chapter 7 Measurement Models and Confirmatory Factor Analysis Some things have to be believed to be seen. Overview Ralph Hodgson Specification of CFA models Identification of CFA models Naming and reification

More information

Harrison Assessments Validation Overview

Harrison Assessments Validation Overview Harrison Assessments Validation Overview Dan Harrison, Ph.D. 2016 Copyright 2016 Harrison Assessments Int l, Ltd www.optimizepeople.com HARRISON ASSESSMENT VALIDATION OVERVIEW Two underlying theories are

More information

Southern Cross University Tania von der Heidt Southern Cross University Don R. Scott Southern Cross University

Southern Cross University Tania von der Heidt Southern Cross University Don R. Scott Southern Cross University Southern Cross University epublications@scu Southern Cross Business School 2007 Partial aggregation for complex structural equation modelling (SEM) and small sample sizes: an illustration using a multi-stakeholder

More information

Appendix A Mixed-Effects Models 1. LONGITUDINAL HIERARCHICAL LINEAR MODELS

Appendix A Mixed-Effects Models 1. LONGITUDINAL HIERARCHICAL LINEAR MODELS Appendix A Mixed-Effects Models 1. LONGITUDINAL HIERARCHICAL LINEAR MODELS Hierarchical Linear Models (HLM) provide a flexible and powerful approach when studying response effects that vary by groups.

More information

STAT 2300: Unit 1 Learning Objectives Spring 2019

STAT 2300: Unit 1 Learning Objectives Spring 2019 STAT 2300: Unit 1 Learning Objectives Spring 2019 Unit tests are written to evaluate student comprehension, acquisition, and synthesis of these skills. The problems listed as Assigned MyStatLab Problems

More information

SURVEY OF SOFTWARE FOR THE TEST QUALITY ANALYSIS. Varazdat Avetisyan

SURVEY OF SOFTWARE FOR THE TEST QUALITY ANALYSIS. Varazdat Avetisyan 82 SURVEY OF SOFTWARE FOR THE TEST QUALITY ANALYSIS Varazdat Avetisyan Abstract: A test method of checking and evaluating the knowledge is one of the most reliable and promising ways to increase educational

More information

SAS/STAT 13.1 User s Guide. Introduction to Multivariate Procedures

SAS/STAT 13.1 User s Guide. Introduction to Multivariate Procedures SAS/STAT 13.1 User s Guide Introduction to Multivariate Procedures This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is

More information

Martin Senkbeil and Jan Marten Ihme. for Grade 9

Martin Senkbeil and Jan Marten Ihme. for Grade 9 neps Survey papers Martin Senkbeil and Jan Marten Ihme NEPS Technical Report for Computer Literacy: Scaling Results of Starting Cohort 3 for Grade 9 NEPS Survey Paper No. 29 Bamberg, November 2017 Survey

More information

Statistics & Analysis. Confirmatory Factor Analysis and Structural Equation Modeling of Noncognitive Assessments using PROC CALIS

Statistics & Analysis. Confirmatory Factor Analysis and Structural Equation Modeling of Noncognitive Assessments using PROC CALIS Confirmatory Factor Analysis and Structural Equation Modeling of Noncognitive Assessments using PROC CALIS Steven Holtzman, Educational Testing Service, Princeton, NJ Sailesh Vezzu, Educational Testing

More information

Measurement of Employee Productivity using Cluster Analysis of BehavioralIntegrity

Measurement of Employee Productivity using Cluster Analysis of BehavioralIntegrity IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727 PP 33-37 www.iosrjournals.org Measurement of Employee Productivity using Cluster Analysis of BehavioralIntegrity AnupamaVishwasGajbhiye,

More information

Operational Check of the 2010 FCAT 3 rd Grade Reading Equating Results

Operational Check of the 2010 FCAT 3 rd Grade Reading Equating Results Operational Check of the 2010 FCAT 3 rd Grade Reading Equating Results Prepared for the Florida Department of Education by: Andrew C. Dwyer, M.S. (Doctoral Student) Tzu-Yun Chin, M.S. (Doctoral Student)

More information

What is Multilevel Structural Equation Modelling?

What is Multilevel Structural Equation Modelling? What is Multilevel Structural Equation Modelling? Nick Shryane Social Statistics Discipline Area University of Manchester nick.shryane@manchester.ac.uk 1 What is multilevel SEM? 1. What is SEM? A family

More information

_DTIC DJUNT1C. o-psychometric DEVELOPMENTS RELATED TO corn. November 1993 SEMIANNUAL TECHNICAL REPORT FOR THE PROJECT TESTS AND SELECTION

_DTIC DJUNT1C. o-psychometric DEVELOPMENTS RELATED TO corn. November 1993 SEMIANNUAL TECHNICAL REPORT FOR THE PROJECT TESTS AND SELECTION b "*. 4 " _DTIC irlelecte 1 November 1993 DJUNT1C SEMIANNUAL TECHNICAL REPORT FOR THE PROJECT o-psychometric DEVELOPMENTS RELATED TO corn TESTS AND SELECTION IN Grant supported by Office of the Chief of

More information

From Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques. Full book available for purchase here.

From Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques. Full book available for purchase here. From Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques. Full book available for purchase here. Contents List of Figures xv Foreword xxiii Preface xxv Acknowledgments xxix Chapter

More information

PVM Pharmacy Customer Satisfaction Structural Equations Models. Multivariate Solutions

PVM Pharmacy Customer Satisfaction Structural Equations Models. Multivariate Solutions PVM Pharmacy Customer Satisfaction Structural Equations Models Multivariate Solutions Basics of Structural Equation Models Structural equation modeling (SEM) is a statistical technique for building and

More information

Lecture 6: GWAS in Samples with Structure. Summer Institute in Statistical Genetics 2015

Lecture 6: GWAS in Samples with Structure. Summer Institute in Statistical Genetics 2015 Lecture 6: GWAS in Samples with Structure Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 25 Introduction Genetic association studies are widely used for the identification

More information

NEPS Working Papers. NEPS Technical Report Scaling the Data of the Competence Tests. NEPS Working Paper No. 14. Steffi Pohl & Claus H.

NEPS Working Papers. NEPS Technical Report Scaling the Data of the Competence Tests. NEPS Working Paper No. 14. Steffi Pohl & Claus H. NEPS Working Papers Steffi Pohl & Claus H. Carstensen NEPS Technical Report Scaling the Data of the Competence Tests NEPS Working Paper No. 14 Bamberg, October 2012 Working Papers of the German National

More information

Structural Equation Modeling (SEM)

Structural Equation Modeling (SEM) May, 2018 Bentler, Peter M. Ph.D., Clinical Psychology, Stanford University 1964 NIMH Postdoctoral Fellow, Educational Testing Service 1964-65 Staff Psychologist, Psychology Clinic, UCLA 1965-79 Assistant

More information

ACHIEVEMENT VARIANCE DECOMPOSITION 1. Online Supplement

ACHIEVEMENT VARIANCE DECOMPOSITION 1. Online Supplement ACHIEVEMENT VARIANCE DECOMPOSITION 1 Online Supplement Ones, D. S., Wiernik, B. M., Wilmot, M. P., & Kostal, J. W. (2016). Conceptual and methodological complexity of narrow trait measures in personality-outcome

More information

Differential Item Functioning

Differential Item Functioning Differential Item Functioning Ray Adams and Margaret Wu, 29 August 2010 Within the context of Rasch modelling an item is deemed to exhibit differential item functioning (DIF) if the response probabilities

More information

Archives of Scientific Psychology Reporting Questionnaire for Manuscripts Describing Primary Data Collections

Archives of Scientific Psychology Reporting Questionnaire for Manuscripts Describing Primary Data Collections (Based on APA Journal Article Reporting Standards JARS Questionnaire) 1 Archives of Scientific Psychology Reporting Questionnaire for Manuscripts Describing Primary Data Collections JARS: ALL: These questions

More information

Computer Software for IRT Graphical Residual Analyses (Version 2.1) Tie Liang, Kyung T. Han, Ronald K. Hambleton 1

Computer Software for IRT Graphical Residual Analyses (Version 2.1) Tie Liang, Kyung T. Han, Ronald K. Hambleton 1 1 Header: User s Guide: ResidPlots-2 (Version 2.0) User s Guide for ResidPlots-2: Computer Software for IRT Graphical Residual Analyses (Version 2.1) Tie Liang, Kyung T. Han, Ronald K. Hambleton 1 University

More information

Han Du. Department of Psychology University of Notre Dame Notre Dame, IN Telephone:

Han Du. Department of Psychology University of Notre Dame Notre Dame, IN Telephone: Han Du Department of Psychology University of Notre Dame Notre Dame, IN 46556 Email: hdu1@nd.edu Telephone: 5748556736 EDUCATION Ph.D. in Quantitative Psychology University of Notre Dame Expected: 2017

More information

The Occupational Personality Questionnaire Revolution:

The Occupational Personality Questionnaire Revolution: The Occupational Personality Questionnaire Revolution: Applying Item Response Theory to Questionnaire Design and Scoring Anna Brown, Principal Research Statistician Professor Dave Bartram, Research Director

More information

USE OF POLYCHORIC INDEXES TO MEASURE THE IMPACT OF SEVEN SUSTAINABILITY PROGRAMS ON COFFEE GROWERS LIVELIHOOD IN COLOMBIA

USE OF POLYCHORIC INDEXES TO MEASURE THE IMPACT OF SEVEN SUSTAINABILITY PROGRAMS ON COFFEE GROWERS LIVELIHOOD IN COLOMBIA USE OF POLYCHORIC INDEXES TO MEASURE THE IMPACT OF SEVEN SUSTAINABILITY PROGRAMS ON COFFEE GROWERS LIVELIHOOD IN COLOMBIA GARCIA, Carlos; OCHOA, Gustavo; GARCIA, Julián; MORA, Juan; CASTELLANOS, Juan.

More information

Influence of the Criterion Variable on the Identification of Differentially Functioning Test Items Using the Mantel-Haenszel Statistic

Influence of the Criterion Variable on the Identification of Differentially Functioning Test Items Using the Mantel-Haenszel Statistic Influence of the Criterion Variable on the Identification of Differentially Functioning Test Items Using the Mantel-Haenszel Statistic Brian E. Clauser, Kathleen Mazor, and Ronald K. Hambleton University

More information

(1960) had proposed similar procedures for the measurement of attitude. The present paper

(1960) had proposed similar procedures for the measurement of attitude. The present paper Rasch Analysis of the Central Life Interest Measure Neal Schmitt Michigan State University Rasch item analyses were conducted and estimates of item residuals correlated with various demographic or person

More information

Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 2 Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization

Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 2 Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization Business Intelligence Analytics and Data Science A Managerial Perspective 4th Edition Sharda TEST BANK Full download at: https://testbankreal.com/download/business-intelligence-analytics-datascience-managerial-perspective-4th-edition-sharda-test-bank/

More information

The prediction of economic and financial performance of companies using supervised pattern recognition methods and techniques

The prediction of economic and financial performance of companies using supervised pattern recognition methods and techniques The prediction of economic and financial performance of companies using supervised pattern recognition methods and techniques Table of Contents: Author: Raluca Botorogeanu Chapter 1: Context, need, importance

More information

R E COMPUTERIZED MASTERY TESTING WITH NONEQUIVALENT TESTLETS. Kathleen Sheehan Charles lewis.

R E COMPUTERIZED MASTERY TESTING WITH NONEQUIVALENT TESTLETS. Kathleen Sheehan Charles lewis. RR-90-16 R E S E A RC H R COMPUTERIZED MASTERY TESTING WITH NONEQUIVALENT TESTLETS E P o R T Kathleen Sheehan Charles lewis. Educational Testing service Princeton, New Jersey August 1990 Computerized Mastery

More information

CBC Conference Dr Emma Beard

CBC Conference Dr Emma Beard CBC Conference 2017 Using time-series analysis to examine the effects of adding or removing components of digital behavioural interventions and associations between outcomes and patterns of usage Dr Emma

More information