t>feiffer~ CRITERION- REFERENCED TEST DEVELOPMENT TECHNICAL AND LEGAL GUIDELINES FOR CORPORATE TRAINING 3rd Edition Sharon A. Shrock William C. Coscarelli BICBNTBNNIAL Bl C NTBN NI A L
List of Figures, Tables, and Sidebars xxiii Introduction: A Little Knowledge Is Dangerous 1 Why Test? 1 Why Read This Book? 2 A Confusing State of Affairs 3 Misleading Familiarity 3 Inaccessible Technology 4 Procedural Confusion 4 Testing and Kirkpatrick's Levels of Evaluation 5 Certification in the Corporate World 7 Corporate Testing Enters the New Millennium 10 What Is to Come... 11 PART I: BACKGROUND: THE FUNDAMENTALS' 13 1 Test Theory 15 What Is Testing? 15 What Does a Test Score Mean? 17 Reliability and Validity: A Primer 18 Reliability 18 Equivalence Reliability 19 Test-Retest Reliability 19 Inter-Rater Reliability 19 Validity 20 Face Validity 23 Context Validity 23 Concurrent Validity 23 Predictive Validity 24 Concluding Comment 24 IX
X 2 Types of Tests Criterion-Referenced Versus Norm-Referenced Tests Frequency Distributions Criterion-Referenced Test Interpretation Six Purposes for Tests in Training Settings Three Methods of Test Construction (One of Which You Should Never Use) Topic-Based Test Construction Statistically Based Test Construction Objectives-Based Test Construction 25 25 25 28 30 32 32 33 34 PART II: OVERVIEW: THE CRTD MODEL AND PROCESS 3 The CRTD Model and Process Relationship to the Instructional Design Process The CRTD Process Plan Documentation Analyze Job Content Establish Content Validity of Objectives Create Items Create Cognitive Items Create Rating Instruments Establish Content Validity of Items and Instruments Conduct Initial Test Pilot Perform Item Analysis Difficulty Index Distractor Pattern Point-Biserial Create Parallel Forms or Item Banks Establish Cut-Off Scores Informed Judgment Angoff 37 39 39 43 44 44 46 46 46 47 47 47 48 48 48 48 49 49 50 50
Contrasting Groups Determine Reliability Determine Reliability of Cognitive Tests Equivalence Reliability Test-Retest Reliability Determine Reliability of Performance Tests Report Scores Summary XI 50 50 50 51 51 52 52 53 I: THE CRTD PROCESS: PLANNING AND CREATING THE TEST Plan Documentation Why Document? What to Document The Documentation Analyze Job Content Job Analysis Job Analysis Models Summary of the Job Analysis Process DACUM Hierarchies Hierarchical Analysis of Tasks Matching the Hierarchy to the Type of Test Prerequisite Test Entry Test Diagnostic Test Posttest Equivalency Test Certification Test Using Learning Task Analysis to Validate a Hierarchy Bloom's Original Taxonomy Knowledge Level 55 57 57 63 64 75 75 77 78 79 87 87 88 89 89 89 89 90 90 91 91 92
xii Comprehension Level 93 Application Level 93 Analysis Level 93 Synthesis Level 93 Evaluation Level 94 Using Bloom's Original Taxonomy to Validate 94 a Hierarchy Bloom's Revised Taxonomy 95 Gagne's Learned Capabilities 96 Intellectual Skills 96 Cognitive Strategies 97 Verbal Information 97 Motor Skill 97 Attitudes 97 Using Gagne's Intellectual Skills to Validate 97 a Hierarchy Merrill's Component Design Theory 98 The Task Dimension 99 Types of Learning 99 Using Merrill's Component Design Theory 99 to Validate a Hierarchy Data-Based Methods for Hierarchy 100 Validation Who Killed Cock Robin? 102 Content Validity of Objectives 105 Overview of the Process 105 The Role of Objectives in Item Writing 106 Characteristics of Good Objectives 107 Behavior Component 107 Conditions Component 108 Standards Component 108 A Word from the Legal Department 109 About Objectives The Certification Suite 109
Certification Levels in the Suite Level A Realworld Level B High-Fidelity Simulation Level C Scenarious Quasi-Certification Level D Memorization Level E Attendance Level F Affiliation How to Use the Certification Suite Finding a Common Understanding Making a Professional Decision The correct level to match the job The operationally correct level The consequences of lower fidelity Convertingjob-Task Statements to Objectives In Conclusion Create Cognitive Items What Are Cognitive Items? Classification Schemes for Objectives Bloom's Cognitive Classifications Types of Test Items Newer Computer-Based Item Types The Six Most Common Item Types True/False Items Matching Items Multiple-Choice Items Fill-Iri Items Short Answer Items Essay Items The Key to Writing Items That Match Jobs The Single Most Useful Improvement You Can Make in Test Development Intensional Versus Extensional Items Show Versus Tell Xlll 110 110 111 111 112 112 112 113 113 113 114 114 114 115 116 119 121 121 122 123 129 129 130 131 132 132 147 147 148 149 149 150 152
XiV The Certification Suite 155 Guidelines for Writing Test Items 158 Guidelines for Writing the Most Common 159 Item Types How Many Items Should Be on a Test? 166 Test Reliability and Test Length 166 Criticality of Decisions and Test Length 167 Resources and Test Length 168 Domain Size of Objectives and Test Length 168 Homogeneity of Objectives and Test Length 169 Research on Test Length 170 Summary of Determinants of Test Length 170 A Cookbook for the SME 172 Deciding Among Scoring Systems 174 Hand Scoring 175 Optical Scanning 175 Computer-Based Testing 176 Computerized Adaptive Testing 180 Create Rating Instruments 183 What Are Performance Tests? 183 Product Versus Process in Performance - 187 Testing Four Types of Rating Scales for Use in 187 Performance Tests (Two of Which You Should Never Use) Numerical Scales Descriptive Scales Behaviorally Anchored Rating Scales Checklists Open Skill Testing 188 188 188 190 192 Establish Content Validity of Items and Instruments The Process 195 195
XV 10 Establishing Content Validity The Single Most Important Step Face Validity Content Validity Two Other Types of Validity Concurrent Validity Predictive Validity Summary Comment About Validity Initial Test Pilot Why Pilot a Test? Six Steps in the Pilot Process Determine the Sample Orient the Participants Give the Test Analyze the Test Interview the Test-Takers Synthesize the Results Preparing to Collect Pilot Test Data Before You Administer the Test Sequencing Test Items Test Directions Test Readability Levels Lexile Measure Formatting the Test Setting Time Limits Power, Speed, and Organizational Culture When You Administer the Test Physical Factors Psychological Factors Giving and Monitoring the Test Special Considerations for Performance Honesty and Integrity in Testing Security During the Training-Testing Sequence 196 196 197 202 202 208 209 211 211 212 212 213 214 214 215 216 217 217 217 218 219 220 220 221 222 222 222 223 Tests 225 231 234
XVI 11 12 13 Organization-Wide Policies Regarding Test Security Statistical Pilot Standard Deviation and Test Distributions The Meaning of Standard Deviation The Five Most Common Test Distributions Problems with Standard Deviations and Mastery Distributions Item Statistics and Item Analysis Item Statistics Difficulty Index P-Value Distractor Pattern Point-Biserial Correlation Item Analysis for Criterion-Referenced Tests The Upper-Lower Index Phi Choosing Item Statistics and Item Analysis Techniques Garbage In-Garbage Out Parallel Forms Paper-and-Pencil Tests Computerized Item Banks Reusable Learning Objects Cut-Off Scores Determining the Standard for Mastery The Outcomes of a Criterion-Referenced Test The Necessity of Human Judgment in Setting a Cut-Off Score Consequences of Misclassification Stakeholders Reusability Performance Data 236 241 241 241 244 247 248 248 248 249 249 250 251 253 255 255 257 259 260 262 264 265 265 266 267 267 268 268 268
XV11 Three Procedures for Setting the Cut-Off Score 269 The Issue of Substitutability 269 Informed Judgment 270 A Conjectural Approach, the Angoff Method 272 Contrasting Groups Method 278 Borderline Decisions 282 The Meaning of Standard Error of 282 Measurement Reducing Misclassification Errors at the 284 Borderline Problems with Correction-for-Guessing The Problem of the Saltatory Cut-Off Score 285 287 14 Reliability of Cognitive Tests The Concepts of Reliability, Validity, and Correlation Correlation Types of Reliability Single-Test-Administration Reliability Techniques Internal Consistency Squared-Error Loss Threshold-Loss Calculating Reliability for Single-Test Administration Techniques Livingston's Coefficient kappa (K 2 ) The Index S Outcomes of Using the Single-Test- Administration Reliability Techniques Two-Test-Administration Reliability Techniques Equivalence Reliability Test-Retest Reliability Calculating Reliability for Two-Test Administration Techniques 289 289 290 293 294 294 296 296 297 297 297 298 299 299 300 301
XV111 The Phi Coefficient 302 Description of Phi 302 Calculating Phi 302 How High Should Phi Be? 304 The Agreement Coefficient 306 Description of the Agreement Coefficient 306 Calculating the Agreement Coefficient 307 How High Should the Agreement 308 Coefficient Be? The Kappa Coefficient 308 Description of Kappa 308 Calculating the Kappa Coefficient 309 How High Should the Kappa 311 Coefficient Be? Comparison of 0, p 0, and K 313 The Logistics of Establishing Test Reliability 314 Choosing Items 314 Sample Test-Takers 315 Testing Conditions 316 Recommendations for Choosing a 316 Reliability Technique Summary Comments 317 15 Reliability of Performance Tests 319 Reliability and Validity of Performance Tests 319 Types of Rating Errors 320 Error of Standards 320 Halo Error 321 Logic Error 321 Similarity Error 321 Central Tendency Error 321 Leniency Error 322 Inter-Rater Reliability 322 Calculating and Interpreting Kappa (K) 323 Calculating and Interpreting Phi (cf>) 335
Xix Repeated Performance and Consecutive 344 Success Procedures for Training Raters 347 What If a Rater Passes Everyone Regardless 349 of Performance? What Should You Do? 352 What If You Get a High Percentage of Agreement 353 Among Raters But a Negative Phi Coefficient? 16 Report Scores 357 CRT Versus NRT Reporting 358 Summing Subscores 358 What Should You Report to a Manager? 361 Is There a Legal Reason to Archive the Tests? 362 A Final Thought About Testing and Teaching 362 PART IV: LEGAL ISSUES IN CRITERION-REFERENCED TESTING 365 17 Criterion-Referenced Testing and Employment 367 Selection Laws What Do We Mean by Employment 368 Selection Laws? Who May Bring a Claim? 368 A Short History of the Uniform Guidelines on 370 Employee Selection Procedures Purpose and Scope 371 Legal Challenges to Testing and the 373 Uniform Guidelines Reasonable Reconsideration 376 In Conclusion 376 Balancing CRTs with Employment 376 Discrimination Laws Watch Out for Blanket Exclusions in the Name 378 of Business Necessity Adverse Impact, the Bottom Line, and 380 Affirmative Action
XX Adverse Impact 380 The Bottom Line 383 Affirmative Action 385 Record-Keeping of Adverse Impact and 387 Job-Relatedness of Tests Accommodating Test-Takers with Special Needs 387 Testing, Assessment, and Evaluation for 390 Disabled Candidates Test Validation Criteria: General Guidelines 394 Test Validation: A Step-by-Step Guide 397 1. Obtain Professional Guidance 397 2. Select a Legally Acceptable Validation 397 Strategy for Your Particular Test 3. Understand and Employ Standards for 398 Content-Valid Tests 4. Evaluate the Overall Test Circumstances to 399 Assure Equality of Opportunity Keys to Maintaining Effective and Legally 400 Defensible Documentation Why Document? 400 What Is Documentation? 401 Why Is Documentation an Ally in Defending 401 Against Claims? How Is Documentation Used? 402 Compliance Documentation 402 Documentation to Avoid Regulatory 404 Penalties or Lawsuits Use of Documentation in Court 404 Documentation to Refresh Memory 404 Documentation to Attack Credibility 404 Disclosure and Production of Documentation 405 Pay Attention to Document Retention Policies 407 and Protocols Use Effective Word Management in Your 409
XXI Documentation Use Objective Terms to Describe Events and 412 Compliance Avoid Inflammatory and Off-the-Cuff 412 Commentary Develop and Enforce Effective Document 413 Retention Policies Make Sure Your Documentation Is Complete 414 Make Sure Your Documentation Is Capable of 415 "Authentication" In Conclusion 415 Is Your Criterion-Referenced Testing Legally 416 Defensible? A Checklist A Final Thought 419 Epilogue: CRTD as Organizational Transformation 421 References 425 Index 433 About the Authors 453