This document provides an overview of assessor management/inter-rater

Size: px
Start display at page:

Download "This document provides an overview of assessor management/inter-rater"

Transcription

1 WEBINAR SUMMARY Assessor Management/Inter Rater Reliability (Classroom/Group Observations) January 2014 Early Learning Challenge Technical Assistance This document provides an overview of assessor management/inter-rater reliability systems as well as information about approaches used in Pennsylvania, Arizona, and Georgia. OVERVIEW OF ASSESSOR MANAGEMENT / INTER RATER RELIABILITY SYSTEMS FOR CLASSROOM/GROUP OBSERVATIONS According to the U.S. Department of Education and U.S. Department of Health and Human Services, a comprehensive assessment system is a coordinated and comprehensive system of multiple assessments each of which is valid and reliable for its specified purpose and for the population with which it will be used. Key components of a comprehensive assessment system for early childhood include screening measures, formative and summative assessments, measures of environmental quality, and measures of the quality of adult-child interactions. A comprehensive assessment system includes observational assessments of classrooms and groups as key components but not the sole components of the broader system. Classroom and group observational assessments are used for multiple purposes, including caregiver and teacher professional development, program quality improvement, monitoring and accountability, and research. These observational assessments are used widely as a critical component of a quality rating and improvement system (QRIS). Although several classroom and group observation According to directors, the assessment process is often stressful for staff, but if designed well, the process can lead to an increased sense of pride, professionalism, and teamwork. A Count for Quality: Child Care Center Directors on Rating and Improvement Systems by the National Women s Law Center & CLASP assessments exist, the two most frequently used to support a QRIS are the Environmental Rating Scales (ERS) and the Classroom Assessment Scoring System (CLASS). Benefits of assessor management systems include system accountability, consistent training and interpretation, increased inter-rater reliability, and increased stakeholder buy-in to the QRIS. Observations benefit program staff, as they may experience increased job performance and satisfaction. This ELC TA resource is based on a webinar held on January 10, 2014, sponsored by ELC TA. Webinar Presenters: Rose Manganell Program Quality Assessment Supervisor, Pennsylvania Key Katie Romero Assessment Program Manager, Arizona Quality First Brooke Travis Director, Arizona Quality First Laura Johns Georgia Department of Early Care and Learning Bentley Ponder Georgia Department of Early Care and Learning Moderators: Michelle Thomas State Support Team, ELC TA Denise Mauzy State Support Team, ELC TA Kenley Branscome State Support Team, ELC TA The Early Learning Challenge Technical Assistance (ELC TA) program is run through a contract from the U.S. Department of Education in partnership with the U.S. Department of Health and Human Services Administration for Children and Families. The content in this resource does not necessarily reflect the position or policy of the U.S. Department of Education or the U.S. Department of Health and Human Services, nor does mention or visual representation of trade names, commercial products, or organizations imply endorsement by the federal government. The ELC TA Program provides and facilitates responsive, timely, and high-quality technical assistance that supports each Race to the Top Early Learning Challenge (RTT-ELC) grantee s implementation of its RTT-ELC projects. ELC TA is administered by AEM Corp. in partnership with ICF International. For more information, visit

2 As shown in figure 1, there are three essential areas for assessor management systems: Standards for Assessors, such as minimum levels of education and experience, reliability requirements and standards, expectations for assessors such as being unbiased and free of conflicts of interest, and best practices for assessment Training and Administrative Support, such as written policies and practices, ongoing updates to guide processes, ongoing communications training and support, and consensus building across teams on using the instruments and conducting observations Figure 1. Essential areas for assessor management systems Quality Assurance, including data collection and reporting practices, inter-rater reliability practices, and appeals processes and committees THE FEATURED STATES: PENNSYLVANIA, ARIZONA, AND GEORGIA The remainder of this document will focus on the approach that three states Pennsylvania, Arizona and Georgia are taking to address the three essential areas for assessor management systems: Standards for Assessors, Training and Administrative Support, and Quality Assurance. Pennsylvania, Arizona, and Georgia represent a range of maturity and approaches to assessor management systems. These featured systems have been in place for anywhere from 2 to 12 years. At least 20 percent of the programs in each state are currently being assessed, with a wide range in the number of programs per state. There is also variation in the frequency of inter-rater reliability visits and the number of assessors and anchors retained. (Anchors are expert assessors who anchor the interpretation and use of the scales to a common standard.) Detailed information about the assessor management systems in each of the three states is available in the document titled State Comparison: Assessor Management Systems, located at STANDARDS FOR ASSESSORS Featured State: PennSylvania Pennsylvania Keys to Quality (see figure 2) is a system of supports for Keystone STARS (Pennsylvania s QRIS) and other programs. Established in 2002, the system has 16 assessors and 3 anchors, who use ERS. For more information about the assessor management system in Pennsylvania, see the document titled State Comparison: Assessor Management Systems, located at 2 Assessor Management / Inter-Rater Reliability (Classroom/Group Observations), January 2014

3 Two important aspects of the development and maintenance of assessor standards in the state of Pennsylvania are the use of graduated assessor levels and inter-rater reliability checks. Figure 2. Logo of Pennsylvania s early learning system of supports, Keys to Quality Graduated Assessor Levels In Pennsylvania s assessor management system, anchors (in Pennsylvania, anchors are also known as supervisors) oversee and work with assessors to help them maintain reliability on the ERS tools. Recently, Pennsylvania added a peer reviewer designation to create a graduated assessor management system. In this new structure, three of the 16 assessors also serve as peer reviewers and help the anchors/supervisors review ERS reports. Each month, peer reviewers conduct approximately eight assessments and read reports from two to three other assessors. Pennsylvania s graduated system of assessor levels is depicted in figure 3. Inter Rater Reliability Checks New assessors shadow an experienced assessor until ready to conduct a reliability check with supervisors and peers. A new assessor must have five successful reliability visits per scale. In Pennsylvania, all four ERS scales are used. Every three months, the new assessor must have a check on each scale. New assessors become experienced assessors after about 12 to 15 months, after which they must have a check on each scale at least every six months. Assessors often exceed the minimum required frequency of reliability checks, and checks occur between every variation of staff member. For example, there are assessor-assessor checks, peer reviewer-assessor checks, anchor/supervisorassessor checks, and anchor/supervisor-peer reviewer checks. The variation in these checks are also structured to occur both within and outside of each individual s assigned region. Reliability is enhanced in Pennsylvania in many ways. The staff makes use of the following information sources: PA Position Statements, which ensure that ERS assessments are not in conflict with any regulations; Consensus Documents, which deliver important information; Good Summary Report documents, which show how to write a good summary report; Figure 3. Graduated assessor levels in Pennsylvania Assessor Management / Inter-Rater Reliability (Classroom/Group Observations), January

4 regional and statewide meetings, as well as informal discussions with one another; and Reliability Reports from the ERS Portal, so they can monitor their progress as assessors. TRAINING AND ADMINISTRATIVE SUPPORT Featured S tate: arizona There are multiple agencies, including at least one subcontractor, that implement the assessor management system through the Arizona contract (see figure 4). Established in 2009, Arizona s assessor management system has 45 assessors and 13 anchors, who use the ERS, CLASS (PreK and Toddler), and Points Scale assessments. The anchors include seven lead assessors and six assessment supervisors. For more information about the assessor management system in Arizona, see the document titled State Comparison: Assessor Management Systems, located at Figure 4. Logo of three organizations in Arizona that implement the assessor management system: Quality First, Southwest Human Development, and the Association for Supportive Child Care Training for Assessors The training process for new assessors takes approximately 12 weeks. Training topics include an overview of Arizona s system, Quality First; training on using the assessment tools; and reliability assessments conducted on-site until new assessors are 85 percent reliable or higher, on average, for all the assessment tools. Assessors also attend ongoing training sessions quarterly, at minimum. These sessions are customized to the needs of the team. For example, one topic was ensuring consistency in writing reports from two different agencies. Assessors have also had specialized playground training on the new playground safety guidelines, including how the guidelines go along with the ERS assessments. In addition to the training supports mentioned above, technical assistance for assessors is also provided by the developers of the ERS and CLASS assessment tools. Administrative Support for Assessors Individual team meetings and statewide meetings are each held once per month. To increase and maintain inter-rater reliability on the ERS and CLASS, all administrative staff (lead assessors, assessment supervisors, and assessment report reviewers) attend monthly consistency meetings. At these meetings, attendees address issues and look for patterns; topics under discussion become part of the agenda for the next statewide all-assessor meeting. At the statewide meetings, staff detail tool protocols and assessment procedures. 4 Assessor Management / Inter-Rater Reliability (Classroom/Group Observations), January 2014

5 Reliability is checked for every assessor with all assessment tools every tenth assessment. To ensure consistency across both agencies in the state, assessors do not go out with the same anchor every time. Instead, the assessors reliability is continually being checked through different agencies. An important resource that assessors have access to is the Arizona Assessment Operations Manual. A document containing excerpts from this resource is available for download from the Public Domain Clearinghouse (see the document titled Shared Resources on Assessor Management & Inter-Rater Reliability, located at QUALITY ASSURANCE Featured State: GeorGia Quality Rated (see figure 5) is the name of the QRIS in Georgia that launched in Georgia s assessor management system has 12 assessors (including 2 portfolio assessors) and 2 anchors (including 1 portfolio anchor), who use ERS and CLASS (CLASS is for PreK and is not used for the QRIS). Georgia also has contracted assessors. For more information about the assessor management system in Georgia, see the document titled State Comparison: Assessor Management Systems, located at Structure and Support Two anchors focus solely on the review of portfolio documentation, and 10 anchors focus on one primary ERS tool and one secondary ERS tool. Environment Rating Scales Institute (ERSI) staff provide additional direct support to Georgia. For instance, ERSI staff provide assessments of family child care providers whose first language is Spanish, assist with the 90-day assessment window schedule, and provide reliability checks with anchors. State Figure 5. Logo of Georgia s QRIS, Quality Rated anchors provide reliability checks with all the assessors, and all reports are reviewed by ERSI staff. Assessors must maintain an 85-percent reliability standard, while anchors must maintain a 90-percent reliability standard with ERSI staff. Two certified playground assessors conduct professional development sessions with assessor staff to ensure that everyone understands information regarding playground safety assessment. Also, six days of professional development are scheduled every year. During this time, assessors work on updating a variety of materials that are then posted on the website. Data Collection for Quality Improvement In collecting data internally, a different inter-rater reliability process is used for each of the three different sectors: PreK, licensing staff, and nutrition assessors. These processes were recently revised to reflect more realistic expectations, which was beneficial to the Quality Rated program. The purpose of internal data collection is to strive toward continuous quality improvement by analyzing the data and making adjustments where needed. For example, one possibility currently being considered is simplifying the assessor report, while also remaining true to the lens assessors look through when making observations. Any change made to the report will be based on what the data say about reliability and other considerations. Assessor Management / Inter-Rater Reliability (Classroom/Group Observations), January

6 External data are also collected as part of the Quality Rated program. For example, a survey is given to providers to measure their engagement with and expectations for the program. It is important to be able to show providers that the scores they receive are not related to the assessors themselves, but to the conditions and events being observed. Because of inter-rater reliability standards, it should not matter which assessor completes the observation because the score should in each case be reliable. The program also holds focus groups among providers and gives providers information on how reports are constructed and used. Challenges Faced Some of the challenges discussed during the question and answer session included the following: It is difficult to minimize the stress on assessors, though it is possible to reduce it somewhat. It is important to give assessors time to learn and grow though discussing their ratings with other assessors, but it is also important to maintain the integrity of the data and put in place procedures that prevent data from being altered. The Quality Rated program uses the anchor assessor model instead of the consensus model. The consensus model can seem to be more intuitive to some people, but the anchor assessor model is a better model. There have been many ERS clarifications, and the communication process in spreading this information must be functional and efficient. Otherwise, there is a danger that some assessors will not receive the same information as other assessors, or will interpret the information differently. RESOURCES Assessor Management and Inter-Rater Reliability Webinar Slides ELC TA Team Members Pennsylvania Early Learning Keys to Quality (Coordinated by the Berks County Intermediate Unit) Shared Resources on Assessor Management & Inter-Rater Reliability State Comparison: Assessor Management Systems 6 Assessor Management / Inter-Rater Reliability (Classroom/Group Observations), January 2014