Poster Title The Situational Judgement Test in Selection: A Medical Application

Size: px
Start display at page:

Download "Poster Title The Situational Judgement Test in Selection: A Medical Application"

Transcription

1 Poster Title The Situational Judgement Test in Selection: A Medical Application Abstract This poster describes the development of an SJT to select applicants for training in General Practice in the UK. The new test is used to assess 8,000 applicants per annum. Issues concerning best practice and a future research agenda are explored in a case study approach. Press Paragraph This paper explores best practice use of SJTs in selection. The criterion-related validity of SJTs in predicting work performance criteria is well-established (Chan & Schmidt; 2005; McDaniel, 2001). The use of SJTs in large scale selection studies is undergoing a revival. From a practical perspective, the business case has yet to be presented. We present a case study of an SJT developed to select applicants for training in General Practice in the UK. The new test is used to assess 8,000 applicants per annum. Issues concerning best practice and a future research agenda are presented.

2 The Situational Judgement Test in Selection: A Medical Application Introduction The use of Situational Judgement Tests (SJTs) in large scale selection studies is undergoing a revival. Part of the reason is that SJTs have high face and content validity, and recent studies have shown SJTs to produce significant incremental validity. (Chan & Schmidt; 2005; McDaniel et al, 2001) In addition, by focusing less on cognitive aspects of performance, they tend to show smaller group differences than other selection measures. This paper explores best practice in the development and use of Situational Judgement Tests (SJTs) in a large scale selection process. In establishing best practice use of SJTs, there are many issues to be examined. For example, although there is a substantial research literature exploring the criterion-related validity, there is less exploring construct validity (i.e. what are these tests measuring). In addition, there are other key factors requiring exploration such as how candidate instructions are presented and the effect on response processes, fairness issues, susceptibility to faking, and so on. Many of the current research studies are based in data collected from incumbent groups who may differ in their response patterns from candidates. From a practical perspective, the business case has yet to be presented. A research agenda must explore the relative utility and cost-effectiveness of SJTs. To explore these issues, we present a case study of an SJT developed to select medical doctors for training in General Practice (GP) in the UK National Health Service (NHS). Context There are several critical features of the host organisation relevant to this study, as follows; (1) This selection process occurs in the context of significant organisational change, the largest since the inception of the NHS. A competency-based training pathway has been introduced, replacing the previous house officer positions for junior doctors with 2 years of Foundation training. (2) The selection is a coordinated single national process which is run in parallel by regional Deaneries who are responsible for the GP training programmes. (2) This is a very high profile, high stakes selection process. There is a high degree of public interest as public funds are invested in training GPs. Incompetent doctors cannot be allowed to proceed, and the assessments must be fair and transparent. (3) The applicants themselves are articulate, have a strong professional organisation to support them, are above average in cognitive ability, and are very committed to their profession. They have a strong need to succeed in this selection; if they fail, they can t apply to another employer in the UK. There are independent coaching programmes available whose sole purpose is to help applicants successfully negotiate their way through the selection process. Current selection process and selection tools There are approximately 8,000 applicants for training in GP in the UK, per annum. These are doctors who are already registered with the General Medical Council (GMC the regulatory body for medical doctors) as competent to practice. GP training is a specialist training, typically for three years, which is similar in structure to the hospital based training for other medical specialisms. A new competency-based selection system has been developed and validated (Patterson et al, 2001; 2005). The selection procedure is based on a validated competency model (Patterson et al, 2000) which is used as the national framework to guide selection criteria for each of the 3 stages;

3 1. Long-listing: via a national on-line application form through which applicants who do not meet basic criteria (e.g. registration with GMC) are rejected. 2. Short-listing: candidates respond to 7 white space competency questions on the application form (see Patterson, 2000), with 250 word responses allowed. Each form is double-blind marked to ensure reliability. Candidates also complete a clinical knowledge test. 3. Assessment centre: comprising various work-relevant simulations, including a group challenge, a written exercise and a simulated consultation with a patient. Business Case The shortlisting process was redesigned for three key reasons. First, some candidates were found to have purchased their responses to the competency questions, and it was difficult to physically verify who has actually completed the application form. Second, a cost analysis of assessor time spent on scoring was approximately 250,000 per annum. The current clinical knowledge test was to be phased out due to a change in policy for accreditation and selection to medical specialisms. By developing a new machinemarked SJT, sat under test conditions, we aimed to: 1. Provide sufficient information against the competency criteria for an effective sift. 2. Provide a relatively cost efficient process to deal with large numbers of candidates (especially by reducing assessor time). 3. Be fair and defensible. 4. Be less susceptible to cheating or faking. Design Process Four competency domains were targeted in the new SJT (to replace the original white space competency questions) for shortlisting purposes; (i) Problem Solving; (ii) Coping with Pressure, (iii) Professional Integrity (iv) Empathy. An item banking approach was used to allow multiple test forms of the test to be produced. The development and validation of one of the four competency domains (problem solving) developed in early 2006 is presented here for illustration purposes. The data for the remaining three domains currently in advanced stages of development for use in February 2007 will be presented in the final poster. Questions were developed using scenarios typical of a GP consultation which required candidates to apply a problem solving approach in using their clinical knowledge. Thus, rather than asking candidates to identify what disease a symptom related to, or how a drug was used, they dealt with patient related scenarios and had to select responses reflecting a diagnostic process or developing a management strategy for the patient. Method Item Writers: A group of subject matter experts (N=16) who were also trained in assessment process were gathered to develop questions for each of the competencies. Three psychologists provided psychometric expertise and developed a test specification indicating content areas, question types/response formats etc, to ensure consistency of presentation of questions. Response Format Design: Various multiple choice designs were tested, including single best answer questions and extended match questions.

4 Item Review Process: All items were reviewed by multiple members of the item writing group. After revisions, 6 trial test versions which were administered to applicants together with the standard clinical knowledge test and the original competency questions. Each test form was completed over 600 candidates. Psychometric Analyses: Item analyses examined the performance of the items. Factor analysis was used to examine the structure of the item set measured, standard item and distractor analysis and a Mantel Haenszel procedure was used to examine differential item functioning by Gender and Place of training. An ANOVA approach was to identify diff for candidate age. On the basis of the results items were selected for inclusion in the bank, put into a pool for rewriting or rejected. Item writing for the other competencies (empathy, integrity and coping with pressure) followed a similar process. Validation: Data collection for validation against later stage assessment data is currently underway. This includes ratings by experienced assessors at assessment centre against the competency areas measured by the SJT. Results For the problem solving domain, 289 items were drafted of which 216 items were selected for inclusion into trial versions of the SJT. In total 5078 candidates completed one of these trial forms together with the operational knowledge test. The tests were completed by between 630 and 954 people. This difference in performance on the knowledge test between the best and worst performing groups was just over one third of a standard deviation. This difference was small enough to consider the results of the different groups comparable. A small number of items were included in more than one test version to allow the calibration of item difficulties across versions. Factor analysis showed that only a single common factor could be found across all the items, however this explained only a small part of the variance of all the items. That is, items for this more knowledge- based competence have a lot of individual variance. The mean item facility was 0.67 with a range from 0 to Because this was only one of four competencies on the basis of which a sift is being made, a relatively low cut off could be used and therefore this level of difficulty was generally appropriate. The figure shows the distribution of item facility across all items. ***INSERT FIGURE 1 HERE*** Across all versions, item partials (correlation between item score and operational test score) ranged from to 0.46, with a mean of The figure shows the distribution of item partials across all versions. 62 items were classified as good (item partials greater than 0.25) and 69 as moderate (item partials between ). Lower item partials indicated items should be classified as poor. Overall 61% of items trialled had good or moderate psychometric properties. ***INSERT FIGURE 2 HERE*** Of these a number of items were flagged for differential item functioning. These were reviewed and items were dropped where bias in the content not related to the measurement requirements was detected. Further results will be presented in our poster to explore criterion and construct validity, candidate reactions and faking issues.

5 Discussion and Conclusions The use of an SJT for shortlisting purposes in this context has provided increased utility for the organisation. The item bank was developed of a sufficient size to create operational tests for use in assessing the problem solving competency going forward. A group of assessors had developed item writing skills they could use to develop further items to maintain the item bank going forward. The combination of psychometric skills with the expertise of the subject matter experts allowed the development of a psychometrically sound and fit for purpose assessment. The test has clear face and content validity and for the organisation, the ongoing costs of shortlisting are significantly reduced. Future research must examine the criterion-related validity of the test in predicting work performance criteria. Having used a detailed case study to explore the issues, we will discuss best practice use of SJTs. Specifically, we will respond to 7 key areas including; 1. Content validity: Are some competency domains more readily assessed using an SJT as opposed to other selection tools (e.g. integrity, empathy)? 2. Construct validity: What is the relationship between scores on the SJT and dimensions of personality? 3. Test format and instructions: How should candidate instructions be presented and what affect does this have on the response process? 4. Fairness issues: How do we minimise adverse impact in the design process? 5. Susceptibility to faking: What are the issues regarding social desirability? 6. Cost-effectiveness: What is the net gain in dollars of using an SJT over competency questions? 7. Fairness and candidate reactions: In this occupational group, do candidates perceive this as a fair and legitimate selection tool? A summary of the evidence from both research and practice will be used to present a future research agenda, with recommendations for best practice use of SJTs in large scale selection. References Chan, D. & Schmitt, N. (2005) Situational judgment tests. In N. Anderson, A. Evers & O. Voskuijil (Eds.) Blackwell handbook of selection (pp ). Oxford: Blackwell. McDaniel, M.A., Morgeson, F.P., Finnegan, E.B., Campion, M.A. & Braverman, E.P. (2001) Use of situational judgment tests to predict job performance: A clarification of the literature. Journal of Applied Psychology, 86(4), Patterson, F., Ferguson, E., Lane, P., Farrell, K., Martlew, J. & Wells, A. (2000) Competency model for general practice: implications for selection, training and development. British Journal of General Practice, 50, Patterson, F., Ferguson, E., Norfolk, T. & Lane, P. (2005) A new selection system to recruit general practice registrars: preliminary findings from a validation study. British Medical Journal, 330, Patterson, F., Lane, P., Ferguson, E. & Norfolk, T. (2001) Competency based selection system for general practice registrars. BMJ Career Focus, 323, S

6 Frequency Facility FIGURE 1: Distribution of item facility across all items FIGURE 2: Distribution of item partials across all items