CRITERION- REFERENCED TEST DEVELOPMENT

Similar documents
Audit - The process of conducting an evaluation of an entity's compliance with published standards. This is also referred to as a program audit.

A Practical Guide for Creating Performance Metrics and Valid Tests

+ Purpose of assessment and use in candidate monitoring or decisions are consequential

Glossary of Standardized Testing Terms

2016 Technical Assistance Conference

What are the Steps in the Development of an Exam Program? 1

What to Consider When Setting the Pass Point on Hiring Examinations. Presented by Dr. Jim Higgins Executive Director, BCGi

Assessment Solutions TEST VALIDATION SERVICES. Ramsay Corporation custom develops tests and uses a content validation model to document the process.

STAAR-Like Quality Starts with Reliability

UNIVERSIlY OF SWAZILAND. FACULlY OF EDUCATION DEPARTMENT OF EDUCATIONAL FOUNDATIONS AND MANAGEMENT SUPPLEMENTARY EXAMINATION PAPER 2012/2013

Test Development: Ten Steps to a Valid and Reliable Certification Exam Linda A. Althouse, Ph.D., SAS, Cary, NC

Essential Elements of a Job Analysis

ALTE Quality Assurance Checklists. Unit 1. Test Construction

ALTE Quality Assurance Checklists. Unit 4. Test analysis and Post-examination Review

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Band Battery

Setting Standards. John Norcini, Ph.D.

ALTE Quality Assurance Checklists. Unit 1. Test Construction

PASSPOINT SETTING FOR MULTIPLE CHOICE EXAMINATIONS

Conducting Interviews: What Can Go Wrong and What Can Go Right. Presented by Dr. Jim Higgins Executive Director, BCGi

Title VII Case Study: Plaintiff v. U.S. City

Hiring is Testing. (Even If You re Not Calling It a Test ) Geoff Burcaw. Hilary Ricardo. Senior Consultant. Senior Consultant

DEVELOPMENT OF AN ESSAY-WRITING TEST FOR A LAW ENFORCEMENT JOB. John D. Kraft June 2000

More than 70 percent of the students are likely to respond correctly. Easy

Document Control Information

UW Flexible Option Competency-Based Learning Model

Setting Fair, Defensible Cut Scores More Than One Method to Use

Document Control Information

Document Control Information

Overview. Approaches to Addressing Adverse Impact: Opportunities, Facades, and Pitfalls. What is Adverse Impact? The d-statistic

Applying the Principles of Item and Test Analysis

CONTENT VALIDITY REPORT FOR THE TEST PREPARATION MANUAL (TPM) 8 TH Ed. READING ABILITY TEST FOR ENTRY-LEVEL FIREFIGHTERS

TIPS ON BUILDING AND USING COMPETENCY ASSESSMENTS

Operational Guidelines for the Recognition of Prior Learning (RPL) Under NTVQF

Advanced Human Resource Management Topics Exam Summary: Undergraduate, Masters, and Doctoral Levels

INTERMEDIATE QUALIFICATION

INTERMEDIATE QUALIFICATION

INTERMEDIATE QUALIFICATION

Document Control Information

Reliability & Validity

Using a Performance Test Development & Validation Framework

Document Control Information

KeyMath Revised: A Diagnostic Inventory of Essential Mathematics (Connolly, 1998) is

The 1995 Stanford Diagnostic Reading Test (Karlsen & Gardner, 1996) is the fourth

Setting the Standard in Examinations: How to Determine Who Should Pass. Mark Westwood

Selection. Copyright 2016 Pearson Education, Inc. 6-1

Document Control Information

Document Control Information

ROBERT L. MATHIS JOHN H. JACKSON. Presented by: Prof. Dr. Deden Mulyana, SE.,M,Si. Chapter 8. SECTION 2 Staffing the Organization


"Charting the Course... MOC B Designing a Microsoft SharePoint 2010 Infrastructure. Course Summary

Innovative Item Types Require Innovative Analysis

THE GENERAL ASSEMBLY OF PENNSYLVANIA SENATE BILL

29 CFR Ch. XIV ( Edition)

The Mullen Scales of Early Learning (Mullen, 1992) is an individually administered,

INTERMEDIATE QUALIFICATION

The Examination for Professional Practice in Psychology: The Enhanced EPPP Frequently Asked Questions

CONSTRUCTING A STANDARDIZED TEST

INDEPENDENT ELECTRICAL CONTRACTORS SAFETY PROGRAM AWARENESS & RETENTION KIT

An Introduction to Psychometrics. Sharon E. Osborn Popp, Ph.D. AADB Mid-Year Meeting April 23, 2017

7 Statistical characteristics of the test

The Standards for Educational and Psychological Testing: Zugzwang for the Practicing Professional?

Large Muscle Development

Uniform Guidelines On Employee Selection Procedures (1978)

Mc Graw Hill Education

PROMOTION INDEX CODE: EFFECTIVE DATE:

Examination Report for Testing Year Board of Certification (BOC) Certification Examination for Athletic Trainers.

National Council for Strength & Fitness

Chapter 9 External Selection: Testing

COPYRIGHTED MATERIAL. Contents

IREC Accredited Training Provider (FULL APPLICATION) IREC Standard 01023

BERNARD O'MEARA University of Ballarat, Ballarat, Australia; Deakin University, Geelong, Victoria, Australia

The Legal Defensibility of Assessments: What You Need to Know

HEALTH, SAFETY AND ENVIRONMENT MANAGEMENT SYSTEM MANUAL

Testing Issues and Concerns: An Introductory Presentation to the State Directors of Career and Technical Education

Note: This article is reprinted with the permission of The Journal for Civil Aviation (CAT). The photographs from the magazine have not been included.

Physical Ability Testing

All required readings on the Moodle site for this course.

Stefanie Moerbeek, Product Developer, EXIN Greg Pope, Questionmark, Analytics and Psychometrics Manager

Uniform Guidelines on Employee Selection Procedures

HEWLETT-PACKARD COMPANY CORPORATE GOVERNANCE GUIDELINES

AGENDA. Secrets of Competency Testing: Writing Items for Hospice and Palliative Certification Examinations

NEW YORK CITY COLLEGE OF TECHNOLOGY The City University of New York School of Arts & Sciences Department of Social Science Course Outline

NATIONAL CREDENTIALING BOARD OF HOME HEALTH AND HOSPICE TECHNICAL REPORT: JANUARY 2013 ADMINISTRATION. Leon J. Gross, Ph.D. Psychometric Consultant

Overview. Considerations for Legal Defensibility in Standard Setting. Portland, Oregon September 16, 2016

SUCCEED AT INSTRUCTION TABLE OF CONTENTS

Understanding Your GACE Scores

Standardized Measurement and Assessment

Technical Report Core Skills Entry-Level Assessment

K E N E X A P R O V E I T! V A L I D A T I O N S U M M A R Y Kenexa Prove It!

Frequently Asked Questions (FAQs)

PRINCIPLES AND APPLICATIONS OF SPECIAL EDUCATION ASSESSMENT

predictor of job performance a powerful INTRODUCTION

MCS Experience in Selection & Recruitment (Saudi Arabia)

PSYC C performance appraisal 11/27/11 [Arthur] 1

A Test Development Life Cycle Framework for Testing Program Planning

Recordkeeping for Good Governance Toolkit. GUIDELINE 13: Digital Recordkeeping Readiness Self-assessment Checklist for Organisations

Customer Care Ability Test. Administrator s Manual C.C.A.T. Developed by J. M. Llobet, Ph.D EDI #T0036DL

CHAPTER 2 Understanding the Legal Context of Assessment- Employment Laws and Regulations with Implications for Assessment

Developing, Validating, and Analyzing Training, Education & Experience (TEE) Requirements

Business Administration of PTC Windchill 11.0

Transcription:

t>feiffer~ CRITERION- REFERENCED TEST DEVELOPMENT TECHNICAL AND LEGAL GUIDELINES FOR CORPORATE TRAINING 3rd Edition Sharon A. Shrock William C. Coscarelli BICBNTBNNIAL Bl C NTBN NI A L

List of Figures, Tables, and Sidebars xxiii Introduction: A Little Knowledge Is Dangerous 1 Why Test? 1 Why Read This Book? 2 A Confusing State of Affairs 3 Misleading Familiarity 3 Inaccessible Technology 4 Procedural Confusion 4 Testing and Kirkpatrick's Levels of Evaluation 5 Certification in the Corporate World 7 Corporate Testing Enters the New Millennium 10 What Is to Come... 11 PART I: BACKGROUND: THE FUNDAMENTALS' 13 1 Test Theory 15 What Is Testing? 15 What Does a Test Score Mean? 17 Reliability and Validity: A Primer 18 Reliability 18 Equivalence Reliability 19 Test-Retest Reliability 19 Inter-Rater Reliability 19 Validity 20 Face Validity 23 Context Validity 23 Concurrent Validity 23 Predictive Validity 24 Concluding Comment 24 IX

X 2 Types of Tests Criterion-Referenced Versus Norm-Referenced Tests Frequency Distributions Criterion-Referenced Test Interpretation Six Purposes for Tests in Training Settings Three Methods of Test Construction (One of Which You Should Never Use) Topic-Based Test Construction Statistically Based Test Construction Objectives-Based Test Construction 25 25 25 28 30 32 32 33 34 PART II: OVERVIEW: THE CRTD MODEL AND PROCESS 3 The CRTD Model and Process Relationship to the Instructional Design Process The CRTD Process Plan Documentation Analyze Job Content Establish Content Validity of Objectives Create Items Create Cognitive Items Create Rating Instruments Establish Content Validity of Items and Instruments Conduct Initial Test Pilot Perform Item Analysis Difficulty Index Distractor Pattern Point-Biserial Create Parallel Forms or Item Banks Establish Cut-Off Scores Informed Judgment Angoff 37 39 39 43 44 44 46 46 46 47 47 47 48 48 48 48 49 49 50 50

Contrasting Groups Determine Reliability Determine Reliability of Cognitive Tests Equivalence Reliability Test-Retest Reliability Determine Reliability of Performance Tests Report Scores Summary XI 50 50 50 51 51 52 52 53 I: THE CRTD PROCESS: PLANNING AND CREATING THE TEST Plan Documentation Why Document? What to Document The Documentation Analyze Job Content Job Analysis Job Analysis Models Summary of the Job Analysis Process DACUM Hierarchies Hierarchical Analysis of Tasks Matching the Hierarchy to the Type of Test Prerequisite Test Entry Test Diagnostic Test Posttest Equivalency Test Certification Test Using Learning Task Analysis to Validate a Hierarchy Bloom's Original Taxonomy Knowledge Level 55 57 57 63 64 75 75 77 78 79 87 87 88 89 89 89 89 90 90 91 91 92

xii Comprehension Level 93 Application Level 93 Analysis Level 93 Synthesis Level 93 Evaluation Level 94 Using Bloom's Original Taxonomy to Validate 94 a Hierarchy Bloom's Revised Taxonomy 95 Gagne's Learned Capabilities 96 Intellectual Skills 96 Cognitive Strategies 97 Verbal Information 97 Motor Skill 97 Attitudes 97 Using Gagne's Intellectual Skills to Validate 97 a Hierarchy Merrill's Component Design Theory 98 The Task Dimension 99 Types of Learning 99 Using Merrill's Component Design Theory 99 to Validate a Hierarchy Data-Based Methods for Hierarchy 100 Validation Who Killed Cock Robin? 102 Content Validity of Objectives 105 Overview of the Process 105 The Role of Objectives in Item Writing 106 Characteristics of Good Objectives 107 Behavior Component 107 Conditions Component 108 Standards Component 108 A Word from the Legal Department 109 About Objectives The Certification Suite 109

Certification Levels in the Suite Level A Realworld Level B High-Fidelity Simulation Level C Scenarious Quasi-Certification Level D Memorization Level E Attendance Level F Affiliation How to Use the Certification Suite Finding a Common Understanding Making a Professional Decision The correct level to match the job The operationally correct level The consequences of lower fidelity Convertingjob-Task Statements to Objectives In Conclusion Create Cognitive Items What Are Cognitive Items? Classification Schemes for Objectives Bloom's Cognitive Classifications Types of Test Items Newer Computer-Based Item Types The Six Most Common Item Types True/False Items Matching Items Multiple-Choice Items Fill-Iri Items Short Answer Items Essay Items The Key to Writing Items That Match Jobs The Single Most Useful Improvement You Can Make in Test Development Intensional Versus Extensional Items Show Versus Tell Xlll 110 110 111 111 112 112 112 113 113 113 114 114 114 115 116 119 121 121 122 123 129 129 130 131 132 132 147 147 148 149 149 150 152

XiV The Certification Suite 155 Guidelines for Writing Test Items 158 Guidelines for Writing the Most Common 159 Item Types How Many Items Should Be on a Test? 166 Test Reliability and Test Length 166 Criticality of Decisions and Test Length 167 Resources and Test Length 168 Domain Size of Objectives and Test Length 168 Homogeneity of Objectives and Test Length 169 Research on Test Length 170 Summary of Determinants of Test Length 170 A Cookbook for the SME 172 Deciding Among Scoring Systems 174 Hand Scoring 175 Optical Scanning 175 Computer-Based Testing 176 Computerized Adaptive Testing 180 Create Rating Instruments 183 What Are Performance Tests? 183 Product Versus Process in Performance - 187 Testing Four Types of Rating Scales for Use in 187 Performance Tests (Two of Which You Should Never Use) Numerical Scales Descriptive Scales Behaviorally Anchored Rating Scales Checklists Open Skill Testing 188 188 188 190 192 Establish Content Validity of Items and Instruments The Process 195 195

XV 10 Establishing Content Validity The Single Most Important Step Face Validity Content Validity Two Other Types of Validity Concurrent Validity Predictive Validity Summary Comment About Validity Initial Test Pilot Why Pilot a Test? Six Steps in the Pilot Process Determine the Sample Orient the Participants Give the Test Analyze the Test Interview the Test-Takers Synthesize the Results Preparing to Collect Pilot Test Data Before You Administer the Test Sequencing Test Items Test Directions Test Readability Levels Lexile Measure Formatting the Test Setting Time Limits Power, Speed, and Organizational Culture When You Administer the Test Physical Factors Psychological Factors Giving and Monitoring the Test Special Considerations for Performance Honesty and Integrity in Testing Security During the Training-Testing Sequence 196 196 197 202 202 208 209 211 211 212 212 213 214 214 215 216 217 217 217 218 219 220 220 221 222 222 222 223 Tests 225 231 234

XVI 11 12 13 Organization-Wide Policies Regarding Test Security Statistical Pilot Standard Deviation and Test Distributions The Meaning of Standard Deviation The Five Most Common Test Distributions Problems with Standard Deviations and Mastery Distributions Item Statistics and Item Analysis Item Statistics Difficulty Index P-Value Distractor Pattern Point-Biserial Correlation Item Analysis for Criterion-Referenced Tests The Upper-Lower Index Phi Choosing Item Statistics and Item Analysis Techniques Garbage In-Garbage Out Parallel Forms Paper-and-Pencil Tests Computerized Item Banks Reusable Learning Objects Cut-Off Scores Determining the Standard for Mastery The Outcomes of a Criterion-Referenced Test The Necessity of Human Judgment in Setting a Cut-Off Score Consequences of Misclassification Stakeholders Reusability Performance Data 236 241 241 241 244 247 248 248 248 249 249 250 251 253 255 255 257 259 260 262 264 265 265 266 267 267 268 268 268

XV11 Three Procedures for Setting the Cut-Off Score 269 The Issue of Substitutability 269 Informed Judgment 270 A Conjectural Approach, the Angoff Method 272 Contrasting Groups Method 278 Borderline Decisions 282 The Meaning of Standard Error of 282 Measurement Reducing Misclassification Errors at the 284 Borderline Problems with Correction-for-Guessing The Problem of the Saltatory Cut-Off Score 285 287 14 Reliability of Cognitive Tests The Concepts of Reliability, Validity, and Correlation Correlation Types of Reliability Single-Test-Administration Reliability Techniques Internal Consistency Squared-Error Loss Threshold-Loss Calculating Reliability for Single-Test Administration Techniques Livingston's Coefficient kappa (K 2 ) The Index S Outcomes of Using the Single-Test- Administration Reliability Techniques Two-Test-Administration Reliability Techniques Equivalence Reliability Test-Retest Reliability Calculating Reliability for Two-Test Administration Techniques 289 289 290 293 294 294 296 296 297 297 297 298 299 299 300 301

XV111 The Phi Coefficient 302 Description of Phi 302 Calculating Phi 302 How High Should Phi Be? 304 The Agreement Coefficient 306 Description of the Agreement Coefficient 306 Calculating the Agreement Coefficient 307 How High Should the Agreement 308 Coefficient Be? The Kappa Coefficient 308 Description of Kappa 308 Calculating the Kappa Coefficient 309 How High Should the Kappa 311 Coefficient Be? Comparison of 0, p 0, and K 313 The Logistics of Establishing Test Reliability 314 Choosing Items 314 Sample Test-Takers 315 Testing Conditions 316 Recommendations for Choosing a 316 Reliability Technique Summary Comments 317 15 Reliability of Performance Tests 319 Reliability and Validity of Performance Tests 319 Types of Rating Errors 320 Error of Standards 320 Halo Error 321 Logic Error 321 Similarity Error 321 Central Tendency Error 321 Leniency Error 322 Inter-Rater Reliability 322 Calculating and Interpreting Kappa (K) 323 Calculating and Interpreting Phi (cf>) 335

Xix Repeated Performance and Consecutive 344 Success Procedures for Training Raters 347 What If a Rater Passes Everyone Regardless 349 of Performance? What Should You Do? 352 What If You Get a High Percentage of Agreement 353 Among Raters But a Negative Phi Coefficient? 16 Report Scores 357 CRT Versus NRT Reporting 358 Summing Subscores 358 What Should You Report to a Manager? 361 Is There a Legal Reason to Archive the Tests? 362 A Final Thought About Testing and Teaching 362 PART IV: LEGAL ISSUES IN CRITERION-REFERENCED TESTING 365 17 Criterion-Referenced Testing and Employment 367 Selection Laws What Do We Mean by Employment 368 Selection Laws? Who May Bring a Claim? 368 A Short History of the Uniform Guidelines on 370 Employee Selection Procedures Purpose and Scope 371 Legal Challenges to Testing and the 373 Uniform Guidelines Reasonable Reconsideration 376 In Conclusion 376 Balancing CRTs with Employment 376 Discrimination Laws Watch Out for Blanket Exclusions in the Name 378 of Business Necessity Adverse Impact, the Bottom Line, and 380 Affirmative Action

XX Adverse Impact 380 The Bottom Line 383 Affirmative Action 385 Record-Keeping of Adverse Impact and 387 Job-Relatedness of Tests Accommodating Test-Takers with Special Needs 387 Testing, Assessment, and Evaluation for 390 Disabled Candidates Test Validation Criteria: General Guidelines 394 Test Validation: A Step-by-Step Guide 397 1. Obtain Professional Guidance 397 2. Select a Legally Acceptable Validation 397 Strategy for Your Particular Test 3. Understand and Employ Standards for 398 Content-Valid Tests 4. Evaluate the Overall Test Circumstances to 399 Assure Equality of Opportunity Keys to Maintaining Effective and Legally 400 Defensible Documentation Why Document? 400 What Is Documentation? 401 Why Is Documentation an Ally in Defending 401 Against Claims? How Is Documentation Used? 402 Compliance Documentation 402 Documentation to Avoid Regulatory 404 Penalties or Lawsuits Use of Documentation in Court 404 Documentation to Refresh Memory 404 Documentation to Attack Credibility 404 Disclosure and Production of Documentation 405 Pay Attention to Document Retention Policies 407 and Protocols Use Effective Word Management in Your 409

XXI Documentation Use Objective Terms to Describe Events and 412 Compliance Avoid Inflammatory and Off-the-Cuff 412 Commentary Develop and Enforce Effective Document 413 Retention Policies Make Sure Your Documentation Is Complete 414 Make Sure Your Documentation Is Capable of 415 "Authentication" In Conclusion 415 Is Your Criterion-Referenced Testing Legally 416 Defensible? A Checklist A Final Thought 419 Epilogue: CRTD as Organizational Transformation 421 References 425 Index 433 About the Authors 453