Overview. Considerations for Legal Defensibility in Standard Setting. Portland, Oregon September 16, 2016

Size: px
Start display at page:

Download "Overview. Considerations for Legal Defensibility in Standard Setting. Portland, Oregon September 16, 2016"

Transcription

1 Have you been involved in a standard setting for your industry? A. Yes B. No Test your iclicker and answer the question above. Considerations for legal defensibility in standard setting Overview Methods Validity Evidence Fairness Questions credentialing organizations should ask their vendors Best practices to maintain legal defensibility and Regulation 2016 Annual Educational Conference 1

2 Passing scores The purpose of setting passing scores is to ensure qualified examinees are successful, and unqualified examinees are unsuccessful How much is enough? What is standard setting? Defines performance standard Process used to establish one or more cut scores Cut scores provide a basis for using and interpreting test results = validity evidence Failing examinees Passing examinees Passing Point What standard setting methods have you participated? A. Angoff B. Bookmark C. Contrasting Groups or Borderline Group D. Hofstee E. None the cut score is set (e.g., 70%) and Regulation 2016 Annual Educational Conference 2

3 Methods for setting standards Relative (Normative) Ranking examinees Standard reflects group performance (e.g., Top 75% of examinees pass) Absolute (Criterion) Test-centered methods (e.g., Angoff Method) Examinee-centered methods (e.g., Contrasting Groups, Borderline methods) Examinees must meet a specifiedcriteria (e.g., correctly answer 70% questions) Compromise Judges consider both the passing score and the passing (or failing) rate e.g., Hofstee Method Key Standards Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014) Clear documentation of process (5.21, 7.4) Selection and qualification of panelists (1.9, 5.22) Presentation of Normative data (e.g., item difficulty) and Impact information (score distributions for criterion groups) (5.23) Report Standard Errors of Measurement for each cut score (2.14, 12.18) Determined by a careful analysis and judgment of credentialworthy performance; not adjusted to control numbers of persons passing (11.16) Evaluating standard setting 3 types of evidence discussed in literature (Kane, 1994; 2001): Procedural Internal External and Regulation 2016 Annual Educational Conference 3

4 Procedural validity evidence Method is defensible Clearly documented Psychometrically sound procedures Appropriate for test PLDs purpose Training Panelists recruitment, Summary data selection, and Cut score qualifications recommendations Evaluation of Standard Setting Internal validity evidence Consistency of results Standard Error Decreased variability between rounds on recommended cut score Looking for a convergence (not necessarily consensus) External validity evidence Requires external criterion Comparison of performance standards with other sources of information about examinee proficiency Split panel design in standard setting Comparison to historical pass rates Policy and Regulation 2016 Annual Educational Conference 4

5 What is the greatest risk to fairness in your program? A. Treatment during the testing process B. Measurement bias C. Access to the construct being measured D. Validity of individual test scores Background of Gulino v. BOE of NYC Educator credentialing examination program in NY Liberal Arts and Science Test (LAST), Content Area, Pedagogy Issuance of temporary licenses Started with adverse impact claim for LAST-2, extended to ALST Burden shifts to defendant to demonstrate job relatedness Re-characterization of content for ALST Consequences of decisions Needed to be eligible for permanent license Future earnings, benefits Defining fairness in testing Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014) Treatment during testing process Lack of measurement bias Access to the construct being measured Validity of individual test score interpretations and uses Legal and societal context will be country/jurisdiction-specific and Regulation 2016 Annual Educational Conference 5

6 U.S. context for fairness 14 th Amendment of the U.S. Constitution Equal protection for individuals Title VI and VII of the Civil Rights Act of 1964 Extended protection in employment practices to groups Equal Employment Opportunity Commission (EEOC Guidelines, 1978) Adverse Impact Substantial differences in the rates of selection by protected classes (e.g., race, religion, sex, age) Operationally interpreted as the 4/5ths (80% rule) Application to standard setting Placement of passing score influences interpretations of adverse impact Representation of stakeholder groups on panel Evaluation of impact data by protected classes Legal interpretation of differential pass rates as indicative of potential bias Differential item functioning as an alternative What is your program s primary strategy for evaluating fairness? A. Differences in pass rates B. Judgmental bias review C. Consistency of administration D. Complaints from examinees and Regulation 2016 Annual Educational Conference 6

7 Implications Fairness is not easily defined Intersection with cultural/values system Legal precedent can require poor practice Reliance on demographic data for things such as: Sampling Differential item functioning Impact on future policy and practice Other than when a new job analysis has been conducted, has your organization conducted a standard setting study to validate/change the passing point? A. Yes B. No Trends among licensure/certification programs Most professions have moved away from setting standards based on sections now have integrated content Cost, ease of understanding, time Most groups use a single study to set standards Risk is always that the group chosen may not be representative and Regulation 2016 Annual Educational Conference 7

8 Compared to elementary and secondary education No Child Left Behind / Every Student Succeeds Act Use three levels of achievement: Basic, Proficient and Advanced Need 2 cut scores 22 Teacher testing Praxis - uses dual groups Each group has specific tasks that are dependent Each panel makes their own recommendation of a passing score Use multiple groups of stakeholders to reach a conclusion 23 Ways in which decisions can be made Standards by directive or authority Easy to communicate Sets a policy based on history, tradition, precedent e.g., 70%, 75% May be unfair because does not consider: Properties of the test Characteristics of candidate population Level of competency needed Preferable to link decisions to criteria for acceptable practice and Regulation 2016 Annual Educational Conference 8

9 What should credentialing organizations ask their vendors? 1. What types of methodologies can be used 2. What is the organization's experience with these different methodologies 3. What will be documented in final report Description of procedures used How competence was defined How SMEs were trained How SME representativeness to the profession is documented Data from study and findings Recommendations Standard setting more than a psychometric exercise Has financial, legal and political ramifications Usually are given a range of options deriving from the data and decision makers select the passing point What do accreditation Standards say? NCCA Standard #17 A certification program must perform and document a standard setting study that relates performance on the examination to proficiency, so that the program can set a passing score appropriate for the certification. and Regulation 2016 Annual Educational Conference 9

10 How to communicate the standard set Why a passing score is necessary and what decisions will be made Publicize some info about the process - buy in from stakeholders Questions to ask Were all the judges qualified? Were they a representative group? Did they understand their tasks? Did they have enough time to complete all the tasks? Importance of documentation Critical to document each step of process Include: Resumes of panel members How panel was trained Definition of minimal competence Description of process used Ratings assigned Data provided Decisions made and Regulation 2016 Annual Educational Conference 10

11 Speaker contact information Michaela Geddes Natasha Parfyonova Chad Buckendahl Linda Waters and Regulation 2016 Annual Educational Conference 11