IBM Workforce Science. IBM Kenexa Ability Series Computerized Adaptive Tests (IKASCAT) Technical Manual
|
|
- Primrose Short
- 6 years ago
- Views:
Transcription
1 IBM Workforce Science IBM Kenexa Ability Series Computerized Adaptive Tests (IKASCAT) Technical Manual Version UK/Europe Release Date: October 2014
2 Copyright IBM Corporation All rights reserved. 2
3 Table of Contents CHAPTER 1: WHAT IS IKASCAT? Introduction... 5 CHAPTER 2: ASSESSMENT CONTENT Assessment Components Logical Reasoning Test Numerical Reasoning Test Verbal Reasoning Test... 8 CHAPTER 3: IKASCAT UTILIZES CAT TECHNOLOGY CHAPTER 4: DEVELOPMENT OF IKASCAT CAT SYSTEM CAT System Design and Development CAT Content Development CAT Implementation and Maintenance CHAPTER 5: WHY USE PSYCHOMETRIC TESTS? Why Use Cognitive Ability Tests? Generalized Validity of Ability Tests CHAPTER 6: CAT AND HOW IT IS USED IN IKASCAT What is CAT? The Advantages of using CAT What is Item Response Theory (IRT)? Item and Test Information Parameter Estimation Theta Estimation Item Parameter Estimation Building Appropriate CAT Strategies Starting rule for selecting the first item Item selection algorithm Item scoring and updating ability procedure Constraints on item selection Stopping Rule CHAPTER 7: ADMINISTRATION, SCORING & REPORTING Administration Scoring Copyright IBM Corporation All rights reserved. 3
4 7.3 Reporting CHAPTER 8: SUMMARY STATS AND GROUP DIFFERENCES 'Norming' and Norms Groups Available Logical Reasoning Test Numerical Reasoning Test Verbal Reasoning Norms Group Differences Setting cut-off scores Group Differences: LRT Group Differences: NRT Group Differences: VRT CHAPTER 9: RELIABILITY CHAPTER 10: VALIDITY Defining Validity Criterion Validation Studies What Do Employers Get From Using IKASCAT? What do employers get from the LRT? What do employers get from the NRT? What do employers get from the VRT? CHAPTER 11: EQUATING IKASCAT TO INFINITY SERIES Can a PBT or CBT (static form) be equated to CAT? Framework of Score Linking Purpose of Equating Linking Design Data Collection Design for Equating Classical Equating Methods IRT Equating Methods Establishing relationship between IKASCAT and Infinity Series CHAPTER 12: VALIDATION STUDIES (LRT, NRT, VRT) CHAPTER 13: REFERENCES Copyright IBM Corporation All rights reserved. 4
5 Chapter 1: What is IKASCAT? 1.1 Introduction One of the main interests in the field of occupational psychology lies in the area of recruitment and selection, and the identification of factors which can predict successful occupational performance. Researchers have compared possible predictors of job performance, such as biographical data, references, educational level, college grades, interviews, ability tests and personality questionnaires, and the general consensus of the research is that the best predictor of occupational performance is cognitive ability (Schmidt & Hunter, 1998; Gottfredson, 2002). IBM s Kenexa Ability Series Computerized Adaptive Test (IKASCAT) is a suite of assessments that assess three of the major components of cognitive ability (Logical Reasoning, Numerical Reasoning, and Verbal Reasoning). The IKASCAT utilizes computerized adaptive testing (CAT) which adapts to a test taker s responses, providing test takers with items that most closely reflect their ability and calculating their ability in the most accurate and secure method available. IKASCAT measures cognitive abilities that are important predictors of job performance and training success. Schmidt and Hunter s (1998) review of over 85 years of research into personnel selection identified tests of cognitive ability as being the best predictors of job performance and training success. IKASCAT measures such cognitive abilities, assessing both deductive reasoning skills (using verbal and numerical formats) and inductive reasoning skills (using abstract/logical reasoning formats) for use in work-related settings. There are two distinct parts to IKASCAT: the Assessment Content (i.e. the assessments themselves including all of the questions asked) and the CAT system (i.e. the administration and scoring system that deliver the questions and produce scores based on the test taker s answers). Both of these are described in some detail later in this document. This technical manual has been written for users of the IKASCAT, and provides the following: Descriptions of the assessments themselves Rationale behind IBM developing CAT systems for assessments An explanation of what CAT is A summary of the development of the CAT system Statistical details on IKASCAT Information on administration, scoring and reporting, and Examples of reports produced Copyright IBM Corporation All rights reserved. 5
6 Chapter 2: Assessment Content The questions (or items) used in IKASCAT were written and reviewed by a team of occupational psychologists and psychometricians with combined test development expertise in excess of 100 years, and from a range of English-speaking countries (including UK, Ireland, US, New Zealand, Singapore and South Africa). IBM Workforce Science has developed computer-administered psychometric tests for the last 25 years, and online ability assessments produced by IBM psychologists are used with tens of millions of test takers annually. The development work was started formally in January 2012 until the final pilot studies in June The assessment went through several iterations until the final pilot study, the results of which provide some of the technical and statistical information in this manual. General design criteria have been applied in developing this assessment. The most important of these general criteria is the quality of the question asked (the items). Great care was taken in choosing the format, structure, and appearance of the items. All items have been checked, reviewed, modified it necessary, trialled and re-trialled. All items have been reviewed for issues of legality, particularly concerning diversity or disability, and to ensure that local idioms are avoided and offence is not caused by any of the questions asked. The core stems or stimuli used (the information on which the items are based) need to allow test takers to show their ability to draw conclusions, make deductions and infer logically from the information provided. Multiple choice questions were developed. Test takers need to choose the correct answer from a range of possible options. In each case, one and only one of the possible options is correct. Items (for the NRT and VRT in particular) cover a wide range of subject matters. Different IRT scoring methods were used, including Rasch scoring, 2PL and 3PL models. 2.1 Assessment Components The IKASCAT is composed of three assessments: Logical Reasoning (LRT), Numerical Reasoning (NRT) and Verbal Reasoning (VRT). These are designed for use in unproctored internet (or online) testing context for both CAT and non-cat context (IRT scoring or traditional scoring). Copyright IBM Corporation All rights reserved. 6
7 2.1.1 Logical Reasoning Test Logical reasoning is the ability to analyse situations, identify patterns and relationships that underpin these situations, and derive or extrapolate from these. This is a necessary condition for all logical problem solving situations, particular those requiring scientific, mathematical, engineering or financial problem solving. The Logical Reasoning Test (LRT) is designed to provide a fair, objective, rapid and practical measure of inductive reasoning. It measures a person s skills in evaluating the patterns and trends in information, without reference to written text or numerical data. The LRT has been developed to be a culture-fair assessment, useful in multi-cultural, multi-racial or multiple language contexts. Inductive reasoning is the process of reasoning from specific premises or observations to reach a general conclusion or overall rule. Deductive reasoning denotes the process of reasoning from a set of general premises to reach a logically valid conclusion. Deductive inferences draw out conclusions that are implicit in the given information whereas inductive inferences add information in order to draw a conclusion. The information in the LRT questions comes in the form of an abstract form or shape that have been changed or modified across a series of stages. One of these stages is missing and candidates need to carry out an analysis of the information to enable them to choose which one of a series of options would complete the series logically. The LRT does require the test taker to: Attend to the information available (i.e. the characteristics of the forms and shapes) Identify the relationships, patterns and trends in the information Derive a set of rules that can support the relationship Apply these rules to correctly identify the required answer The LRT does not require the candidate to: Use prior knowledge or have knowledge of a particular subject or area Have learned or acquired a particular skill Be a speaker of a particular language Numerical Reasoning Test The Numerical Reasoning Test (NRT) is a test of deductive reasoning, one of the major components of fluid intelligence, a concept originally identified by Raymond Cattell (1971). Numerical reasoning is the ability to evaluate numerical information critically, understand patterns and trends in data, and the ability to draw valid logically inferences from the information presented. It is designed to provide a fair, objective, rapid and practical measure of deductive reasoning, using numerical information. Copyright IBM Corporation All rights reserved. 7
8 The content of this test is representative of numerical information likely to be encountered within a business context, thus providing wide applicability across a range of professional and managerial selection, development and recruitment activities. Managerial and professional roles inherently require employees to frequently deal with complex numerical data, for example in financial planning, market analysis and problem solving situations. The NRT was therefore designed to assess this level of numerical reasoning ability. Questions needed to: Be easy to read and assess Present information in the simplest format possible Include realistic scenarios Use real data sets (simplified and modified for use in assessment) Involve simple arithmetic operations such as the addition, subtraction, multiplication and division Involve the use of whole numbers (integers), decimals and fractions. Involve the use of the ratios and percentages Present information in the form of charts, graphs and tables (often combination of these) The NRT does require the test taker to: Evaluate numerical information critically Understand patterns and trends in the data presented Carry out simple computational analysis in order to come to the correct conclusions The NRT does not require the candidate to: Have prior knowledge of the numerical content in the stimuli Apply complex formulae Have knowledge of complex mathematical methods Verbal Reasoning Test The Verbal Reasoning Test (VRT) is designed to provide a fair, objective, rapid and practical measure of deductive reasoning, using written information. It measures a person s ability to critically evaluate information presented in a written verbal format. In addition to understanding written communication, the VRT also encompasses the ability to understand complex discussions and other verbal interactions. Many jobs involve working with verbal information and verbal comprehension forms a core component of almost all professional and managerial roles. The VRT offers a high level assessment of the verbal reasoning processes that people use almost on a daily basis when analysing and evaluating detailed content of reports and other business documentation, produced by themselves, by colleagues or by Copyright IBM Corporation All rights reserved. 8
9 outside agencies. In many organisations, verbal reasoning skills are key to the effective dissemination of business information, upwards and downwards, right across the workforce. Most of the items in the VRT include a number of short passages of text followed by statements based on the information given in the passage. Candidates are asked to indicate whether the statements are true or false, or whether it is not possible to say so either way. In answering these questions, candidates use only the information given in the passage and should not try and answer them in the light of any more detailed knowledge that they personally may have. Test developers needed to: Make passage length as short as possible (around 120 words) Take into account general reading speed Avoid grammatical or vocabulary complications Ensure that the information in the passage was factually correct Ensure that the information in the passage was not controversial Ensure that the information in the passage was not emotionally affective (i.e. people may react to it emotionally) Develop passages that were similar to short articles found on websites, in newspapers or magazines The VRT does require the test taker to: Analyze and critically evaluate verbal information Understand complex arguments or positions in written communication Draw appropriate inferences from complex written information The VRT does not require the test taker to: Have prior knowledge of the factual content in the passages Have technical knowledge of grammar Spot errors in spelling of unfamiliar words Show knowledge of acquired specialist vocabulary Copyright IBM Corporation All rights reserved. 9
10 . Chapter 3: IKASCAT utilizes CAT Technology IKASCAT utilizes CAT technology in order to provide test users such as hiring managers with the most efficient, effective and accurate method of assessing cognitive ability. IBM has invested millions in developing a bespoke CAT system because the psychometric testing literature shows that CAT has a range of significant advantages over conventional online testing. These advantages include: Shorter test length (more than 50% fewer questions required) Shorter test duration (between 30% and 50% saving in time required) Greater measurement accuracy and test reliability Increased test taker motivation Increased test taker experience Increased test effectiveness (better at differentiating between candidates) Greater test security (particularly important with unsupervised testing) Greater scope for enhancement and updating These advantage are elaborated on and are fully referenced in Chapter 3 of this document, Other considerations involve the use of online assessments with the diversity of candidates expected. Fixed length, timed ability tests are the commonly used outside of North America, Fixed, timed versions of ability tests show larger differences between disabled and non-disabled candidates than untimed assessments (REFERENCE NEEDED). IBM presented a paper at the BPS Division of Occupational Psychology Conference 2014 (Keeley, S, & Parkes, J,. 2014a) which showed that adjustments in test time (i.e. increasing the time allowed) had the effect of reducing differences between disabled and non-disabled candidates but not removing them, as some disabled candidates still timed out even when given extra time. Accordingly, due to being untimed, CAT tests have the additional advantages: Candidate performance is maximized Better at dealing with adjustments required by disabled candidates (no need to add additional time as the assessments are untimed) The IKASCAT utilizes computerized adaptive testing so item administration is tailored to the ability of each individual test taker. Each test is likely to have a unique combination of items; items are drawn from an item bank (or database) containing a large number of individual items and their psychometric Copyright IBM Corporation All rights reserved. 10
11 characteristics (e.g. item difficulty). Tests are constructed based on a number of criteria, the most important of which is the test taker s performance during the test itself. The items presented are selected based on how the test taker has answered previous questions. If the test taker answers correctly, a more difficult item is administered; if the test taker answers incorrectly, an easier item is administered. The test adapts itself to the test taker s ability. Accordingly, lower ability test takers will be presented with easier questions than higher ability test takers. It means that test takers may have got the same number or percentage of questions correct but the higher-ability test takers will score better as they have answered more difficult questions. The psychometric models behind IKASCAT are item response theory (IRT) models for both dichotomously scored (i.e. scored 0 and 1) and polytomously scored items (i.e. scored more than 0 and 1) for a variety of possible item types and formats. In particular, the IRT models available for the IKASCATs are the IRT three parameter logistic (3PL) model, the two parameter logistic (2PL) model and the one parameter logistic (1PL) model or the Rasch Dichotomous measurement model. The IRT models adopted for the development of the IKASCATs are important building blocks that enable the scoring of candidates performances on the cognitive ability assessments in real time and making them comparable. IBM s CAT system built around the IRT models is the most advanced CAT system in the industry with its signature components Item Banker, CAT engine, CAT delivery and CAT management system that are hosted on the Assess on the Cloud platform. IBM began its pre-production process for both the CAT system development and the content development based on the test specification or blueprint in The following chapter explains how this CAT system was developed and what it actually entails. Copyright IBM Corporation All rights reserved. 11
12 Chapter 4: Development of IKASCAT CAT System Based on most popular psychometric models, IKASCAT was developed in three phases: system design and development, content development, and implementation/maintenance. These are shown in Figure 1 below. Figure 1. Phases of Development for the IKASCATs 4.1 CAT System Design and Development During phase one (CAT System Design and Development), the CAT system was designed to accommodate dichotomous and polytomous IRT models and popular item types (e.g. multiple choice, rating scale, forced choice). It accommodates both unproctored or proctored internet based testing (IBT), as well as multiple languages. A psychometric design and programming guideline was Copyright IBM Corporation All rights reserved. 12
13 produced to guide development of a CAT system, based on optimal conditions identified via Monte Carlo simulation studies. A large team of experts in programming and psychometrics were involved to develop and conduct quality control checks on the programming codes from spring 2012 through spring As a result, a series of improvements were made to the system to enhance its usability and scoring accuracy. Further improvements have been made to CAT s scoring and effectiveness after the initial CAT system development phase. The CAT system consists of modules of item banking system, test engine, test management and delivery system. The item banking system (or banker) stores item content and psychometric properties associated with each item (or question). The test engine module reads in the psychometric characteristics of items from the item banker, administers items adaptively and estimates the ability for each content domain. The engine also records, processes and stores all item response data, item records and ability estimates. The test management and delivery system takes in the candidate registration information from the applicant tracking system (ATS) and controls administration allowing unlimited access to CAT via the Internet around the globe. It also produces final scores such as raw or scale scores, and reports out the results (item responses, ability estimates and psychometric item characteristics) to the end users, internally and externally. The CAT delivery and management module is integrated with the IBM s signature assessment platform, Assess on the Cloud. CAT administration and score reporting follows the standard procedural order of Assess - authoring and publishing CATs into Assess, scheduling, delivery and score reporting (see Figure 2 below). Figure 2. CAT Management and Delivery via Assess Item Banking Test creation/ customization Master Catalog Custom Catalog Authoring Scheduling Standalone scheduling Schedule via integration with 2x Solutions (2xB, ATS) Standard reports Custom reports Summary statistics Test and item analysis Reporting Delivery Online, mobile, print/scan On-demand via integration with 2x solutions (2xB, ATS) Copyright IBM Corporation All rights reserved. 13
14 4.2 CAT Content Development IBM Workforce Science has developed computer-based tests for the last 25 years. With the extensive test development experience and expertise, more than 20 I-O psychologists and content experts as well as psychometricians were involved in the content development process for IKASCAT. A full-cycle development process is presented below. Collected and reviewed item content and characteristics of existing cognitive ability assessments as the test will be used globally, each item was reviewed to ensure cultural sensitivity across multiple languages. Identified the item type/style/format for use in CAT. Recruited item writers from a range of global geographic regions and cultures (these included many English speaking countries (UK, Ireland, US, South Africa, Australia, and New Zealand) as well as China, Pakistan, Hong Kong, Singapore, France, and Germany). Conducted item writing training sessions via web conferences to ensure consistency. Wrote new items. Conducted bias and sensitivity review to ensure that new items were free of bias. Assembled standalone pretesting (field testing, item tryout or item trial), given psychometric conditions, documented in the psychometric design IRT model, sample size and demographics, data collection design, number/percentage of items covering each content section or domain, multiple form assembly, test publishing, testing window, test administration, delivery platform, data collection and item linking and calibration. Performed final psychometric data review and final content review. Identified operational items and built the initial item pool for each subject (domain). Conducted simulation studies with the approved operational items to find the optimal conditions for building operational CATs. Planned new item writing and standalone pretesting or embedded pretesting in live CAT, depending on the pool size. A standalone pretesting with multiple pretest forms assembled was necessary to build up the initial item pool since not all participants in the pretesting can see all items in the given test form. Two popular approaches to build a final item pool/bank (as known as item linking) is using the common items that are included in between two adjacent pretesting forms or across all pretesting forms, or having the common group (or sample) of participants take all pretesting forms. Both of these approaches were used in building the final item pool/bank for the IKASCAT assessments; the former approach is known as the common item linking, and the latter approach as the common person item linking. 4.3 CAT Implementation and Maintenance Copyright IBM Corporation All rights reserved. 14
15 In building an item pool and measurement scale for use in an adaptive test, it is critical to determine procedures for identifying items that do not perform well. Poor items should be removed from the pool as soon as they are identified. Otherwise, it introduces bias to the ability estimates. It is probably necessary to evaluate item performance at job candidate volume intervals to see if they are performing as the target functions require. It is possible that the difficulty of items drifts or changes over time. Sometimes they drift to be easier, other times they drift to be harder. Sometimes they drift to be easier, other times they drift to be harder. It is important to evaluate items for drift on an annual basis and when needed to update item parameter estimates. At specified points in the test life cycle, item pools are refreshed to ensure model fit and to conform to specified security provisions. The current item refreshment plan is primarily concerned with updating items that have been overexposed with new items. Further expansion of the banked items is underway, with new items being trialled and included in the item pool on an ongoing basis. Copyright IBM Corporation All rights reserved. 15
16 Chapter 5: Why use Psychometric Tests? The term psychometric means mental measurement. Consequently, psychometric tests are devices that measure psychological characteristics such as intelligence, personality, or ability to perform a particular task. One major benefit of psychometric tests is that they are designed as systematic and standardised methods of measurement. In practice this means that the questions asked are consistent for every person that completes the test, the instructions they are given are consistent, and the conditions under which they complete the test should be controlled and as standardized as possible. With standardized practices, we are able to compare the results from tests taken at different times and in different places. Test developers also put in place systems for scoring their tests (for almost all Kenexa assessments this is computerized) enabling us to score and interpret the results in a consistent way. Another characteristic of psychometric tests is that they are designed to obtain a snapshot or sample of a person s ability or characteristics upon which we can make an assessment. An alternative would be, for example, to observe a person continuously in order to assess their ability, but this would be impractical. Designers of psychometric tests aim to ensure that the information we obtain by assessing a sample of a person s ability can be reliably used to make an assessment of their ability in general. In order to make sense of the information obtained from a psychometric test, often a person s results are compared with those from a relevant group or population. For example, a person s results on a graduate ability test will be compared with the scores of a graduate population. Similarly a person s results on a work-based personality questionnaire will be compared with those from a working population. Psychometric tests can be divided into those that assess maximum performance and those that assess typical performance. Tests that assess maximum performance are designed to determine how well a person performs at their best. These types of test may be timed with everyone given exactly the same amount of time to complete them and they typically have right and wrong answers. Tests that assess maximum performance include ability tests and attainment tests. These are often timed but the IKASCAT assessments are usually untimed, with no strict limit on the amount of time allowed for completing the test (although guidelines are often provided) Why Use Cognitive Ability Tests? Cognitive ability is one of the most studied constructs in psychology, with over 100 years of research behind it. Almost from the outset, work on the understanding of cognitive abilities has been conducted from an applied standpoint. For example, Alfred Binet, considered to be the developer of the first intelligence test, constructed measurements to understand the potential of children to benefit from educational instruction. This resulted in the first recognised test of mental ability being published in 1905 (Binet,1905). This tradition of applied research has continued, particularly in the areas of education and personnel selection. Copyright IBM Corporation All rights reserved. 16
17 Measures of cognitive ability have always been recognised in the academic literature as the best general predictors of job performance and are among the cheapest and most cost-effective methods to implement. The US Office of Personnel Management state on their website that Cognitive ability tests are used because they are among the least expensive measures to administer and the most valid for the greatest variety of jobs. 1 As with many areas of psychology, there is no single agreed definition of what cognitive ability is. In his influential book on the structure of human abilities, Carroll (1993) argues that abilities need to be understood in the context of a specific task, with a cognitive task being any task in which correct or appropriate processing of mental information is critical to successful performance. Cognitive ability is any class of cognitive activity that concerns some class of cognitive tasks, so defined (Carroll, 1993, p 10). It is particularly helpful as it not only provides a far-ranging map of intelligence, but also allows individual tests to be placed within this structure. As Carroll s model shows, at the level of Stratum I sit tests of specific abilities. Stratum II clusters these into broad families of tests, on the basis of factoranalytic research. For example, performance across sequential reasoning, induction and quantitative reasoning tests is assumed to be related to the underlying influence of fluid intelligence. In turn, performance on all tests is assumed to be influenced by a person s general intelligence, which forms Stratum III of Carroll s model. From the perspective of test development, it is important to recognise that most psychometric tests can exist only at Stratum I. Stratum II and III of the model are abstractions hypothesised from the statistical analysis of test results and are never directly observed. However, the weight of empirical research strongly suggests that these abstractions do have psychological reality (Carroll, 1993). Figure 3. Carroll s Three-Stratum Model of Intelligence 1 Retrieved from apps.opm.gov/adt/content.aspx?page=2-02 Copyright IBM Corporation All rights reserved. 17
18 5.1.2 Generalized Validity of Ability Tests A number of major studies are often invoked to support the use of cognitive ability tests such as the Logical, Numerical and Verbal Reasoning tests included in IKASCAT. In 1998, Schmidt and Hunter reviewed over 85 years of research into personnel selection. This extensive synthesis of the literature identified tests of general mental ability (GMA) 2 as being the single best predictor of job performance and success on job-related training courses. Outtz s study (2002) showed significant correlations between cognitive ability tests and measures of job performance across a large range of jobs and roles. Ree et al. (1994) investigated the role of general cognitive ability and specific abilities or knowledge as predictors of work sample job performance criteria in seven jobs for US Air Force enlistees. Analyses revealed cognitive ability was the best predictor of all criteria and specific abilities or knowledge added a statistically significant but smaller amount to predictive efficiency. These results are consistent with previous military studies, such as Army Project A. Schmidt and Hunter s major meta-analytical study (2004) presented extensive evidence that cognitive ability predicts both occupational level attainment and performance within one s chosen occupation and does so better than any other ability, trait, or disposition, and considerably better than job experience. Other work, much of it involving meta-analysis, has further supported the validity of GMA in the prediction of job performance. Bertua, Anderson and Salgado (2005) examined the literature on criterion validity, and largely replicated previous work. Tests of GMA were seen to predict job performance (0.48) and training success (0.50). Validity was again seen to vary among occupations, ranging from 0.74 for professional roles to 0.32 for clerical roles. Bertua et al s work also studied different types of ability tests. All test types studied had substantial validity. In terms of measures of job performance and across 20 different samples (n = 3,410), numerical ability tests showed an operational validity of 0.42 and a 90% credibility value of 0.26, indicating that the validity of numerical ability tests can be generalized across samples and settings. In terms of measures of training success and across 46 different samples (n = 15,925), numerical ability tests showed an operational validity of 0.54 and a 90% credibility value of 0.43, indicating that the validity of numerical ability tests can be generalized across samples and settings. In terms of measures of job performance and across 14 different samples (n = 3,464), verbal ability tests showed slightly lower operational validities of 0.39 and a 90% credibility value of 0.20, indicating that the validity of verbal ability tests can be generalized across samples and settings. In terms of training success and across 33 different samples (n = 12,679), verbal ability tests showed an operational validity of 0.49 and a 90% credibility value of 0.36, indicating that the validity of verbal ability tests can be generalized across samples and settings. 2 General mental ability is the term frequently used in literature that summarises the results from research using a range of cognitive ability tests. Variations in the content and style of the tests are acknowledged. However, the positive manifold demonstrated by such tests, which implies an underlying construct influencing performance across different tests, is used to justify considering them as all being assessments of the construct of general mental ability. Copyright IBM Corporation All rights reserved. 18
19 Chapter 6: CAT and how it is used in IKASCAT 6.1 What is CAT? A Computerized Adaptive Test (CAT) is a test, administered by computer, which dynamically adjusts itself to the cognitive ability level of each test taker during the course of administration. CAT is normally used to describe a test delivery method as compared to the conventional paper and pencil based testing (PBT). In a conventional PBT test of ability, every person takes the same fixed form test, regardless of the item characteristics for a given level of ability. Typically, a conventional ability PBT test presents items that measure well candidates with the mid-ability levels. This means the introduction of more measurement errors for those at the extreme level of ability. In other words, it is wasteful if the hardest items are administered to candidates with the lowest ability level or if the easiest items administered to candidates with the highest ability level. Bored high ability persons are likely to respond carelessly and frustrated low ability persons are more likely to respond in a random manner, and thus more errors of measurement of ability are introduced. CAT creates and delivers a customized test for each respondent using computers (increasingly online), aiming to measure various psychological constructs such as ability, achievement, attitude and personality traits in the most efficient and effective way. CAT successively selects questions so as to maximize the precision of the test based on what is known about the candidate from previous questions. From the candidate's perspective, the difficulty of the exam seems to tailor itself to his or her level of ability. For example, if a candidate performs well on an item of intermediate difficulty, he will then be presented with a more difficult question. Or, if he performed poorly, he would be presented with an easier question. Compared to static multiple choice tests where everyone is required to take a fixed set of items regardless of their ability (or construct) levels, CAT requires fewer test items to arrive at equally precise measures. 6.2 The Advantages of using CAT Among many known advantages, efficiency and control of measurement precision are prominent. CATs are more efficient than conventional tests that are delivered via PBT (and IBT that is non-cat). The test length for examinees can be reduced by 50% or more (i.e., feature of variable length CAT). A properly designed CAT can measure every examinee with the same degree of precision which is not true of conventional PBT, or IBT that is non-cat. Figure 4 shows that the standard error of measurement is similar across the full range of ability and at very low levels. Figure 4. Degree of Precision: Conditional Standard Error of Measurement across Ability Estimates (Thetas) Copyright IBM Corporation All rights reserved. 19
20 There are many additional advantages recorded in the literature with regard to CAT (Linacre, 2000; Rudner,1998). Test takers receive tests that are tailored to their actual ability level. This means that test takers are not given a series of irrelevant questions which are either too easy (and therefore do not tell us the highest level of performance for this test taker) or too difficult (and therefore only tell us that their highest level of performance is lower than this). The fact that CAT assessments adapt to the actual performance of the test taker, means that their approximate ability level is more quickly identified, and then more specific questions can be administered to enable more accurate identification of the test taker s actual ability level. The adaptive nature of these assessments means that CAT tests are shorter in duration (around 50% shorter in terms of time, and up to 65% shorter in terms of questions presented). These CAT tests are most likely to be administered unproctored but both on-site and off-site testing time will be reduced. Overall CAT tests are much more accurate (i.e. more reliable) than conventional static cognitive ability tests or even tests in which items are administered randomly from a large item bank (Grelle, Dainis, & Hurst, 2009). The Kenexa Ability Test CAT series use a minimum reliability equating to 0.8 for each test; some will be well in excess of this. Test Security is also increased by the use of CAT. Item exposure is reduced because fewer questions are administered. By comparison with Kenexa s non-cat versions of these assessments, this might reduce the number of questions presented from 20 items for a fixed NRT test to 8 items or less for a CAT version. This means that each candidate sees fewer questions and only sees questions which equate to their ability level. The methods used to score CAT assessments also mean that efforts to access a large number of items can be thwarted. A maximum number of items per administration is Copyright IBM Corporation All rights reserved. 20
21 set and test sessions may time out if excessive time is taken over the test as a whole or over individual items. If test items do become over exposed or compromised (through cheating or piracy), these items can be deleted from the item bank without the integrity of the whole item bank being affected. One of the advantages of the CAT methodology is that items can be deleted and new items can easily be added to the total item bank. Replacement and alternative items are constantly being trialled and added to the item banks for these assessments. Despite the sophistication and complexity of CAT scoring, scores for test takers are immediately available. This is due to the fact that the ability level (which will be represented by a particular score (or theta value in this case) needs to be calculated after every question, to calculate the next question administered. Another possible unexpected advantage is increased motivation. Linacre (2000) mentions increases in the motivation of candidates during CAT testing sessions. During the assessment, the test takers might feel discouraged if the items are too difficult or, on the other hand, might lose interest if the items are too easy. As CAT assessments adapt themselves to a test taker s ability level, this enables the test taker to achieve their most accurate and highest score possible. The shorter test time is also likely to improve the test taker experience by reducing the chances of test fatigue which should result in a reduction in drop-out rates i.e. the number of test takers who leave the assessment unfinished. 6.3 What is Item Response Theory (IRT)? Item response theory (IRT) is an important advance in the technology of psychometrics that provides benefits to the test and stakeholders, including individualized score precision, better characterization of the concept of measurement error, and the possibility of CAT. The calculation of CAT scores is founded on the principles of IRT models. As suggested, IRT consists of several families of mathematical models, including dichotomous, polytomous, and multidimensional. This manual focuses primarily on dichotomous models, which are appropriate for data that has two scored data points, typically right and wrong or correct and incorrect, where the item type is multiple choice with three to five item response options/alternatives, depending on the item domain area. We assume in the dichotomous IRT that the relationship in between the response to an item and a person can be explained by a specific mathematical function called the item response function (IRF). There are several models commonly used. One of which is the three parameter logistic model (3PLM), which models the probability of an person j with a given ability θ j (Greek letter theta) correctly responding to an item i as (Hambleton & Swaminathan, 1985): P( X i 1 ) c j i exp[ Dai ( bi )] (1 ci ) 1 exp[ Da ( b )] i i (1) Copyright IBM Corporation All rights reserved. 21
22 where a i is the item discrimination parameter or the slope, b i is the item difficulty or the location parameter (or the threshold), c i is the lower asymptote, or the pseudo-guessing parameter, and D is a scaling constant equal to or 1.0. Figure 5 illustrates an IRF for the 3PLM. The difficulty (0.0) is the reflection point in the IRF projected onto the ability continuum, where the probability of correct response to this item is 0.6 (i.e., the midpoint after taking into consideration the pseudo-guessing parameter. The discrimination parameter (1.5) is the slope of the IRF, indicating the strength of an item for discriminating among persons with different levels of ability. The degree of item discrimination is related to precision; that is, a more discriminating item adds more information to the measurement, and thus increases the precision level of ability. The pseudo-guessing parameter (0.2) introduces a non-zero lower-bound to the model; it represents the probability of a lower ability person correctly responding to an item, presumably by chance. Figure 5. Item Response Function for A Dichotomously Scored Item The model can be simplified into two other commonly used dichotomous IRT models. The twoparameter logistic model (2PLM) assumes that there is no guessing (c i = 0.0) and only utilizes the difficulty and discrimination parameters. It is therefore appropriate when guessing would not play an important role in assessment. The one-parameter logistic model (1PLM) makes the further assumption that all items have a discrimination parameter of 1.0, and therefore differ only with respect to difficulty. The 1PLM is Copyright IBM Corporation All rights reserved. 22
23 P P P mathematically equivalent to the Rasch model, although the philosophy is different by the users of each model. Figure 6 presents IRFs for three exemplary items in the 1PL, 2PL, and 3PL models. Note that all IRFs for the 1PLM are parallel to one another and do not intersect. This demonstrates the objective measurement property of the 1PLM, whereby there is no interaction between items and ability. Probability (P) of correct response for harder items will always be lower than probability for easier items. This is not always the case for the 2PLM and 3PLM as evidenced by Figure 6. It is because the slopes (i.e., the discrimination parameters) are different in these two models, whereas the slopes for the 1PLM are equal. Figure 6. IRFs for Dichotomous IRT Models 1PLM 2PLM b = -1.0 b = 0.5 b = a = 0.5, b = -1.0 a = 1.5, b = 0.5 a = 1.0, b = 2.0 3PLM Ability (theta) Ability (theta) a = 0.5, b = -1.0, c = 0.2 a = 1.5, b = 0.5, c = 0.3 a = 1.0, b = 2.0, c = 0.4 Ability (theta) The above models assume that the item responses are a function of only a latent trait (unidimensionality) and that an person s item response is solely determined by his/her location on the latent continuum and not by his/her responses to other items (local or conditional independence). An approach to claim that the test is unidimensional is to show the model-data fit (or data-model fit in the Rasch dichotomous model). Item level fit can be also checked. Another way is to compare the model IRF against the empirical IRF. The model IRF can be conceptualized similarly to a standard linear or logistic regression line: it is simply a model-based function that is fit to a particular set of data. This is illustrated in Figure 7, which provides some plots of empirical and model IRFs. An empirical IRF can be constructed by classifying persons according to ability and computing the proportion-correct within Copyright IBM Corporation All rights reserved. 23
24 P P each ability category. The model IRF attempts to model the curve for the correct response but for an infinite number of groups, on a continuous distribution. Figure 7. Empirical and Model IRFs (a) Good fit between an empirical and modelbased IRF (b) Poor fit between an empirical and modelbased IRF: Suggests the need for a 3PL Ability (theta) Ability (theta) Item and Test Information An important concept in IRT for the purposes of test development and adaptive testing is information. Broadly defined, information is an index of the increase in measurement precision (or decrease in uncertainty). Like the IRF, it is also a continuous function across θs, as an item can provide more information at certain levels. This is because information is primarily a function of the slope of an IRF; at levels of θs, where the IRF has little slope and therefore little differentiating power, the item provides little information. An item provides the most information where the slope of the IRF is highest. For example, a very difficult multiple choice item will differentiate amongst top persons, but provide no differentiation amongst below-average persons; virtually all of the latter would respond incorrectly or be forced to guess. The information function for the 3PL is specifically defined as (Embretson & Reise, 2000) P i ( Pi ci ) D ai 2 Pi (1 ci) Ii( ) (2) which simplifies to D ai Pi (1 Pi ) for the 2PL and D P i (1 P i ) for the 1PL models. While information is maximized at b i for the one- and two-parameter models, for 3PLM it is maximized at (Lord, 1980): b i i Da 1 ln 1 8c 1 i i 2 (3) Copyright IBM Corporation All rights reserved. 24
25 Each item has its own item information function (IIF) that differs based on the item parameters. Consider the following example items: Table 1. Example item parameters Item a b c Item 1 is relatively easy item, with b = -2.00, while Item 4 is more difficult, with b = The IRFs for these items are show in the following figure. Figure 8. IRFs for Example Items The IIFs for the same items are shown below. Note that each item has more information (y-axis) where the IRF in the figure above has more slope. Item 1 had the highest discrimination value, and therefore has the highest peak in the IIF. Figure 9. IIFs for Example Items Copyright IBM Corporation All rights reserved. 25
26 The figure above is one of the core concepts of adaptive testing. CAT typically works by constructing a table of values representing that graph, and look for items that are most informative for a given ability level. For example, if a person s ability estimate is at -2.00, then Item 1 is the most appropriate item for them, as it easily provides the most information around the ability estimate. IIFs are useful in the test construction process because they can be summed across all items to produce the test information function (TIF). The TIF is a function that provides an index of expected (model-based) measurement precision as a function of θs, since TIF and the standard error of measurement (SEM) conditional on ability (CSEM) are inversely related, such that: CSEM 1 n I i i1 ( ) A test intended for a pass/fail decision with a single cut-off score can be built to have a TIF that is peaked near that cut-off score, and thus, there is a high amount of precision. A test that contains several decision points across θs can be built with a TIF that is high across a wider range. The concept of using the TIF and CSEM in test and item bank design are discussed in detail later. (4) Parameter Estimation In IRT, both items and persons are characterized with parameters. Item parameters include a, b, and c, while the person parameter is the ability level θs (theta). These parameters are estimated based on a set of item response data. Estimation of the item and person parameters are dependent on each other. That is, item parameters are used to calculate person θ estimates, which are in turn necessary to estimate item parameters. For this reason, the process of calibrating data with IRT is iterative, and Copyright IBM Corporation All rights reserved. 26
Glossary of Standardized Testing Terms https://www.ets.org/understanding_testing/glossary/
Glossary of Standardized Testing Terms https://www.ets.org/understanding_testing/glossary/ a parameter In item response theory (IRT), the a parameter is a number that indicates the discrimination of a
More informationAbility tests, such as Talent Q Elements, have been scientifically proven* to be strong predictors of job performance.
Talent Q Elements Ability tests, such as Talent Q Elements, have been scientifically proven* to be strong predictors of job performance. Elements is a suite of online adaptive ability tests measuring verbal,
More informationThe Assessment Center Process
The Assessment Center Process Introduction An Assessment Center is not a place - it is a method of evaluating candidates using standardized techniques under controlled conditions. These techniques offer
More informationALTE Quality Assurance Checklists. Unit 1. Test Construction
ALTE Quality Assurance Checklists Unit 1 Test Construction Name(s) of people completing this checklist: Which examination are the checklists being completed for? At which ALTE Level is the examination
More informationAutomated Test Assembly for COMLEX USA: A SAS Operations Research (SAS/OR) Approach
Automated Test Assembly for COMLEX USA: A SAS Operations Research (SAS/OR) Approach Dr. Hao Song, Senior Director for Psychometrics and Research Dr. Hongwei Patrick Yang, Senior Research Associate Introduction
More informationALTE Quality Assurance Checklists. Unit 4. Test analysis and Post-examination Review
s Unit 4 Test analysis and Post-examination Review Name(s) of people completing this checklist: Which examination are the checklists being completed for? At which ALTE Level is the examination at? Date
More informationField Testing and Equating Designs for State Educational Assessments. Rob Kirkpatrick. Walter D. Way. Pearson
Field Testing and Equating Designs for State Educational Assessments Rob Kirkpatrick Walter D. Way Pearson Paper presented at the annual meeting of the American Educational Research Association, New York,
More informationInnovative Item Types Require Innovative Analysis
Innovative Item Types Require Innovative Analysis Nathan A. Thompson Assessment Systems Corporation Shungwon Ro, Larissa Smith Prometric Jo Santos American Health Information Management Association Paper
More informationTalent Q. Elements. Psychometric Review August 2017
Talent Q Elements Psychometric Review August 2017 OVERVIEW OF TECHNICAL MANUALS FOR THE NEW KORN FERRY ASSESSMENT SOLUTION The Korn Ferry Assessment Solution (KFAS) offers a new and innovative process
More informationPotential Impact of Item Parameter Drift Due to Practice and Curriculum Change on Item Calibration in Computerized Adaptive Testing
Potential Impact of Item Parameter Drift Due to Practice and Curriculum Change on Item Calibration in Computerized Adaptive Testing Kyung T. Han & Fanmin Guo GMAC Research Reports RR-11-02 January 1, 2011
More informationCarlo Fabrizio D'Amico
Created: 17/11/2012 14:55:56. Certified user: Erik Svaneborg Carlo Fabrizio D'Amico General problem solving (standard precision) - 17/11/2012 MASTER-HR.COM 1 / 6 Content General problem solving (standard
More informationTest Development. and. Psychometric Services
Test Development and Psychometric Services Test Development Services Fair, valid, reliable, legally defensible: the definition of a successful high-stakes exam. Ensuring that level of excellence depends
More informationThe PI Learning Indicator FAQ. General Questions Predictive Index, LLC The PI Learning Indicator FAQ. What is the PI Learning Indicator?
The PI Learning Indicator FAQ General Questions What is the PI Learning Indicator? What does the PI Learning Indicator measure? Do I need to be trained to use the PI Learning Indicator? Who in the company
More informationUK Clinical Aptitude Test (UKCAT) Consortium UKCAT Examination. Executive Summary Testing Interval: 1 July October 2016
UK Clinical Aptitude Test (UKCAT) Consortium UKCAT Examination Executive Summary Testing Interval: 1 July 2016 4 October 2016 Prepared by: Pearson VUE 6 February 2017 Non-disclosure and Confidentiality
More informationTalent Q. Aspects. Psychometric Review August 2017
Talent Aspects Psychometric Review August 2017 OVERVIEW OF TECHNICAL MANUALS FOR THE NEW KORN FERRY ASSESSMENT SOLUTION The Korn Ferry Assessment Solution (KFAS) offers a new and innovative process for
More informationSales Director Two Sittings Assessment Fact Sheet
Sales Director Two Sittings Assessment Fact Sheet Overview The Sales Director solution is for mid- to senior-level sales management positions that oversee the sales function across multiple stores or geographic
More informationASSESSMENT INFORMATION BRIEF:
ASSESSMENT INFORMATION BRIEF: COGNIFY Prepared by: Revelian Psychology Team E: psych@revelian.com P: 1300 137 937 About Revelian Revelian is an innovation-driven Australian company at the forefront of
More informationSTAAR-Like Quality Starts with Reliability
STAAR-Like Quality Starts with Reliability Quality Educational Research Our mission is to provide a comprehensive independent researchbased resource of easily accessible and interpretable data for policy
More informationSaville Consulting Assessment Suite
Saville Consulting Assessment Suite www.peoplecentric.co.nz info@peoplecentric.co.nz +64 9 963 5020 Overview Swift Aptitude Assessments (IA& SA)... 3 Analysis Aptitudes (IA)... 4 Professional Aptitudes
More informationCore Abilities Assessment
Core Abilities Assessment Evidence of Reliability and Validity 888-298-6227 TalentLens.com Copyright 2007 by NCS Pearson, Inc. All rights reserved. No part of this publication may be reproduced or transmitted
More informationPsychometric tests are a series of standardised tasks, Understanding Psychometric Tests COPYRIGHTED MATERIAL. Chapter 1.
In This Chapter Chapter 1 Understanding Psychometric Tests Looking at what tests measure Distinguishing the range of different tests available Understanding the differences between personality and intelligence
More informationTHE RATIONAL METHOD FREQUENTLY USED, OFTEN MISUSED
THE RATIONAL METHOD FREQUENTLY USED, OFTEN MISUSED Mark Pennington, Engineer, Pattle Delamore Partners Ltd, Tauranga ABSTRACT The Rational Method has been in use in some form or another at least since
More informationAn Introduction to Psychometrics. Sharon E. Osborn Popp, Ph.D. AADB Mid-Year Meeting April 23, 2017
An Introduction to Psychometrics Sharon E. Osborn Popp, Ph.D. AADB Mid-Year Meeting April 23, 2017 Overview A Little Measurement Theory Assessing Item/Task/Test Quality Selected-response & Performance
More informationR&D Connections. The Facts About Subscores. What Are Subscores and Why Is There Such an Interest in Them? William Monaghan
R&D Connections July 2006 The Facts About Subscores William Monaghan Policy makers, college and university admissions officers, school district administrators, educators, and test takers all see the usefulness
More informationUnderstanding and Interpreting Pharmacy College Admission Test Scores
REVIEW American Journal of Pharmaceutical Education 2017; 81 (1) Article 17. Understanding and Interpreting Pharmacy College Admission Test Scores Don Meagher, EdD NCS Pearson, Inc., San Antonio, Texas
More informationIntercultural Development Inventory (IDI): Independent Review
Intercultural Development Inventory (IDI): Independent Review Submitted By: Andrew Wiley 917.885.0858 Awiley@acsventures.com 11035 Lavender Hill Drive, Suite 160-433 Las Vegas, NV 89135 w w w. a c s v
More informationAbility. Verify Ability Test Report. Name Ms Candidate. Date.
Ability Verify Ability Test Report Name Ms Candidate Date www.ceb.shl.com Ability Test Report This Ability Test Report provides the scores from Ms Candidate s Verify Ability Tests. If these tests were
More informationVALUES BASED RECRUITMENT TOOLKIT VBRT MODULE 3 USING A VALUES BASED APPROACH IN INTERVIEWS VALUES BASED RECRUITMENT TOOLKIT: DISABILITY SECTOR
VALUES BASED RECRUITMENT TOOLKIT VBRT MODULE 3 USING A VALUES BASED APPROACH IN INTERVIEWS CONTACT INFORMATION Caroline Alcorso National Manager (Workforce Development) National Disability Services Level
More informationSelection Definition. Selection criteria. Selection Methods
Selection Definition Selection is a variety of imperfect methods to aid the task of predicting which applicant will be most successful in meeting the demands of the job and be the best fit with the work
More informationPRINCIPLES AND APPLICATIONS OF SPECIAL EDUCATION ASSESSMENT
PRINCIPLES AND APPLICATIONS OF SPECIAL EDUCATION ASSESSMENT CLASS 3: DESCRIPTIVE STATISTICS & RELIABILITY AND VALIDITY FEBRUARY 2, 2015 OBJECTIVES Define basic terminology used in assessment, such as validity,
More informationPersonnel Psychology Centre: Recent Achievements and Future Challenges
Personnel Psychology Centre: Recent Achievements and Future Challenges PRESENTATION TO THE EUROPEAN ASSOCIATION OF TEST PUBLISHERS SEPTEMBER 2016 The Public Service Commission (PSC) Independent agency
More informationJoe Sample. Total Administration Time: C6wPgCYJK. Candidate ID: Sample Distributor. Organization:
Joe Sample Date and Time Started: Date and Time Completed: Total Administration Time: 9/28/2016 10:28 AM 9/28/2016 10:36 AM 7 minutes Candidate ID: Email: C6wPgCYJK sample@psymetricsworld.com Organization:
More informationWelcome to Psytech International s. inaugural quarterly newsletter. With the. launch of our new brand, our global expansion,
ISSUE 01 April June 2009 MAY 2009 Testing Times this issue Innovations P.1 15FQ+ Model of Personality P.2 WHAT WE STAND FOR: Think Global: We recognise that Psychometrics is a global industry and strive
More informationEquating and Scaling for Examination Programs
Equating and Scaling for Examination Programs The process of scaling is used to report scores from equated examinations. When an examination is administered with multiple forms like the NBCOT OTC Examination,
More information2017 PMF Application Guide
10 general steps that you should follow as you prepare, work on, and complete the PMF application The timeline for this cycle is as follows: Friday, November 18, 2016 Application for the PMF Class of 2017
More informationChapter Standardization and Derivation of Scores
19 3 Chapter Standardization and Derivation of Scores This chapter presents the sampling and standardization procedures used to create the normative scores for the UNIT. The demographic characteristics
More informationAnte s parents have requested a cognitive and emotional assessment so that Ante can work towards fulfilling his true potential.
55 South Street Strathfield 2135 0417 277 124 Name: Ante Orlovic Date Of Birth: 5/6/2001 Date Assessed: 27/5/2013 Reason for Referral: Test Administered: Cognitive Assessment Wechsler Intelligence Scale
More informationPsychometrics and Assessment Tools Provided by Azure Consulting
Psychometrics and Assessment Tools Provided by Azure Consulting Contents Page 1. Occupational Personality Questionnaire (OPQ) 3 2. Team Management Profile (TMP) 4 3. Myers Briggs Type Indicator (MBTI)
More informationChapter 12. Sample Surveys. Copyright 2010 Pearson Education, Inc.
Chapter 12 Sample Surveys Copyright 2010 Pearson Education, Inc. Background We have learned ways to display, describe, and summarize data, but have been limited to examining the particular batch of data
More informationOverview of Statistics used in QbD Throughout the Product Lifecycle
Overview of Statistics used in QbD Throughout the Product Lifecycle August 2014 The Windshire Group, LLC Comprehensive CMC Consulting Presentation format and purpose Method name What it is used for and/or
More informationAn Integer Programming Approach to Item Bank Design
An Integer Programming Approach to Item Bank Design Wim J. van der Linden and Bernard P. Veldkamp, University of Twente Lynda M. Reese, Law School Admission Council An integer programming approach to item
More informationThe Lexile Framework as an Approach for Reading Measurement and Success
The Lexile Framework as an Approach for Reading Measurement and Success By Colleen Lennon and Hal Burdick Revised 08/05/2014 Original Publication 04/01/2004 VISIT WWW.METAMETRICSINC.COM FOR MORE INFORMATION
More informationIntroduction. Products and Services
List 2018 Introduction Psysoft is an occupational psychology consultancy providing services to clients throughout the UK to support their selection and people development projects. We specialise in running
More informationSAGE Publications. Reliability. Achieving consistency in research is as complicated as it is in everyday life. We may often
C H A P T E R 4 Reliability Achieving consistency in research is as complicated as it is in everyday life. We may often have the expectation that most things we plan for on a daily basis are actually going
More informationINTERPRETATIVE REPORT
Laura Borgogni, Laura Petitta, Silvia Dello Russo, Andrea Mastrorilli INTERPRETATIVE REPORT Name: Gender: Age: Education: Profession: Role: Years worked: People managed: female 30 postgraduate degree (year
More informationBetter assessment, brighter future. What are the steppingstones for developing a test?
The four steps are: Step 1: Test purpose Defining the test objective Defining the test design Step 2: Construction Item creation Pre-testing Step 3: Assembly Item selection Test assembly Step 4: Reporting
More informationEstimating Reliabilities of
Estimating Reliabilities of Computerized Adaptive Tests D. R. Divgi Center for Naval Analyses This paper presents two methods for estimating the reliability of a computerized adaptive test (CAT) without
More informationCompetency Frameworks as a foundation for successful Talent Management. part of our We think series
Competency Frameworks as a foundation for successful part of our We think series Contents Contents 2 Introduction 3 If only they solved all of our problems 3 What tools and techniques can we use to help
More informationDetermining the accuracy of item parameter standard error of estimates in BILOG-MG 3
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Public Access Theses and Dissertations from the College of Education and Human Sciences Education and Human Sciences, College
More informationBenin Indicator Survey Data Set
Benin Indicator Survey Data Set 1. Introduction 1.1. This document provides additional information for the Indicator Survey data collected in Benin from 18 th May and 30 th September 2009 as part of the
More informationUnit QUAN Session 6. Introduction to Acceptance Sampling
Unit QUAN Session 6 Introduction to Acceptance Sampling MSc Strategic Quality Management Quantitative methods - Unit QUAN INTRODUCTION TO ACCEPTANCE SAMPLING Aims of Session To introduce the basic statistical
More informationUpon leaving the programme you can expect one of the following awards, depending on your level of achievement as outlined below:
PROGRAMME SPECIFICATION POSTGRADUATE PROGRAMMES KEY FACTS Programme name Financial Economics Award MSc School School of Arts and Social Sciences Department or equivalent Department of Economics Programme
More informationNDIT TM Numerical Data Interpretation Test. FAQs
NDIT TM Numerical Data Interpretation Test FAQs NDIT TM Numerical Data Interpretation Test Frequently Asked Questions About NDIT What does the Numerical Data Interpretation Test measure? NDIT assesses
More informationCrowe Critical Appraisal Tool (CCAT) User Guide
Crowe Critical Appraisal Tool (CCAT) User Guide Version 1.4 (19 November 2013) Use with the CCAT Form version 1.4 only Michael Crowe, PhD michael.crowe@my.jcu.edu.au This work is licensed under the Creative
More informationPsychological Testing: A Test Taker s Guide. Downloaded from
Psychological Testing: A Test Taker s Guide Contents Introduction...5 Section 1: General information about test-taking...5 What are psychological tests?...6 Where can I find out more about particular tests?...6
More informationTest Bank Business Intelligence and Analytics Systems for Decision Support 10th Edition Sharda
Test Bank Business Intelligence and Analytics Systems for Decision Support 10th Edition Sharda Instant download and all Business Intelligence and Analytics Systems for Decision Support 10th Edition Sharda
More informationWoodcock Reading Mastery Test Revised (WRM)Academic and Reading Skills
Woodcock Reading Mastery Test Revised (WRM)Academic and Reading Skills PaTTANLiteracy Project for Students who are Deaf or Hard of Hearing A Guide for Proper Test Administration Kindergarten, Grades 1,
More informationIntroducing WISC-V Spanish Anise Flowers, Ph.D.
Introducing Introducing Assessment Consultant Introducing the WISC V Spanish, a culturally and linguistically valid test of cognitive ability in Spanish for use with Spanish-speaking children ages 6:0
More informationThe Changing Singaporean Graduate
Startfolie The Changing Singaporean Graduate The impact of demographic and economic trends how this impacts you who select David Barrett Managing Director cut-e 1 cut-e talent Solutions Services Overview
More informationEquivalence of Q-interactive and Paper Administrations of Cognitive Tasks: Selected NEPSY II and CMS Subtests
Equivalence of Q-interactive and Paper Administrations of Cognitive Tasks: Selected NEPSY II and CMS Subtests Q-interactive Technical Report 4 Mark H. Daniel, PhD Senior Scientist for Research Innovation
More informationWOMBAT-CS. Candidate's Manual Electronic Edition. Version 6. Aero Innovation inc.
WOMBAT-CS Version 6 Candidate's Manual Electronic Edition Aero Innovation inc. www.aero.ca Familiarization with WOMBAT-CS Candidate's Manual This manual should be read attentively by the candidate before
More informationMarketing Plan Handbook
Tennessee FFA Association Marketing Plan Handbook 2017-2021 TENNESSEE FFA ASSOCIATION MARKETING PLAN HANDBOOK 2017 2021 2 Purpose The Tennessee FFA State Marketing Plan Career Development Event is designed
More informationExaminer s report F5 Performance Management December 2017
Examiner s report F5 Performance Management December 2017 General comments The F5 Performance Management exam is offered in both computer-based (CBE) and paper formats. The structure is the same in both
More informationHow Differential Item Functioning Analysis (DIF) Can Increase the Fairness and Accuracy of Your Assessments
How Differential Item Functioning Analysis (DIF) Can Increase the Fairness and Accuracy of Your Assessments Nikki Eatchel, SVP of Assessment David Grinham, SVP International Assessment Solutions Scantron
More informationCONSTRUCTING A STANDARDIZED TEST
Proceedings of the 2 nd SULE IC 2016, FKIP, Unsri, Palembang October 7 th 9 th, 2016 CONSTRUCTING A STANDARDIZED TEST SOFENDI English Education Study Program Sriwijaya University Palembang, e-mail: sofendi@yahoo.com
More informationAnalyzing Language & Literacy using the WMLS-R
Analyzing Language & Literacy using the WMLS-R CRISTINA HUNTER ERIC WILLIAMSON TWIN Academy 2017 Part 1: What the assessment tells us about language acquisition? How can it be used to inform teaching and
More informationAnnual Employer Survey : Employers satisfaction with DWP performance against Departmental Strategic Objective 7
Department for Work and Pensions Research Report No 635 Annual Employer Survey 2008 09: Employers satisfaction with DWP performance against Departmental Strategic Objective 7 Jan Shury, Lorna Adams, Alistair
More informationThe Technological Edge: Unproctored Employment Testing in Large Organizations
The Technological Edge: Unproctored Employment Testing in Large Organizations Presented by Jasmin Loi Human Resources Services Manager Erik Collier Human Resources Analyst 31 st ANNUAL IPMAAC CONFERENCE
More informationCROWN FINANCIAL MINISTRIES
RESEARCH AND DEVELOPMENT TECHNICAL SUMMARY for Career Direct I. TECHNICAL INFORMATION ON THE Career Direct PERSONALITY SECTION The Personality Section of the Career Direct Report is a personality inventory
More informationData Collection Instrument. By Temtim Assefa
Data Collection Instrument Design By Temtim Assefa Instruments Instruments are tools that are used to measure variables There are different types of instruments Questionnaire Structured interview Observation
More informationUnderstanding Your GACE Scores
Understanding Your GACE Scores October 2017 Georgia educator certification is governed by the Georgia Professional Standards Commission (GaPSC). The assessments required for educator certification are
More informationThinking about competence (this is you)
CPD In today s working environment, anyone who values their career must be prepared to continually add to their skills, whether it be formally through a learning programme, or informally through experience
More informationInformation and Practice Leaflet
Information and Practice Leaflet Verbal and Numerical Reasoning Tests Why are tests used? Ability or aptitude tests are increasingly being used in the world of work to assess the key skills relevant to
More informationAssessment Center Report
Assessment Center Report Candidate Name: Title: Department: Assessment Date: Presented to Company/Department Purpose As of the Assessment Center Service requested by (Company Name) to identify potential
More informationCANDIDATE FEEDBACK REPORT KATHERINE ADAMS
CANDIDATE FEEDBACK REPORT KATHERINE ADAMS Report Date: 24 Aug 2016 Position: Example Position Client/Company: ABC Company Assessments Included Report Interpretation Module Assessment Date Results Valid
More informationLaunchPad psychometric assessment system An overview
LaunchPad psychometric assessment system An overview P ERCEPT RESOURCE MANAGEMENT INDEX LAUNCHPAD OVERVIEW...1 LaunchPad s outstanding value proposition...1 THE FEATURES AND FUNCTIONS OF LAUNCHPAD...2
More informationThe effective recruitment and selection practices of organizations in the financial sector operating in the Slovak republic
The effective recruitment and selection practices of organizations in the financial sector operating in the Slovak republic Ľuba Tomčíková University of Prešov in Prešov Department of management Ul. 17
More informationCRITERION- REFERENCED TEST DEVELOPMENT
t>feiffer~ CRITERION- REFERENCED TEST DEVELOPMENT TECHNICAL AND LEGAL GUIDELINES FOR CORPORATE TRAINING 3rd Edition Sharon A. Shrock William C. Coscarelli BICBNTBNNIAL Bl C NTBN NI A L List of Figures,
More informationWatson-Glaser Critical Thinking Appraisal III (US)
Watson-Glaser Critical Thinking Appraisal III (US) Profile Report Candidate Name: Organization: Pearson Sample Corporation Date of Testing: 21-11-2017 (dd-mm-yyy) 21-11-2017 Page 1 of 5 Watson Glaser III
More informationRobotic Process Automation. Reducing process costs, increasing speed and improving accuracy Process automation with a virtual workforce
Robotic Process Automation Reducing process costs, increasing speed and improving accuracy Process automation with a virtual workforce What is Robotic Process Automation (RPA)? Advanced macros? Robots...
More informationSTUDY SUBJECTS TAUGHT IN ENGLISH FOR EXCHANGE STUDENTS SPRING SEMESTER 2017/2018
STUDY SUBJECTS TAUGHT IN ENGLISH FOR EXCHANGE STUDENTS SPRING SEMESTER 2017/2018 1-3 YEAR Study programme: INTERNATIONAL BUSINESS Credits Description of study subject (ECTS) Subject International Business
More informationBefore You Start Modelling
Chapter 2 Before You Start Modelling This chapter looks at the issues you need to consider before starting to model with ARIS. Of particular importance is the need to define your objectives and viewpoint.
More informationA Quality Assurance Framework for Knowledge Services Supporting NHSScotland
Knowledge Services B. Resources A1. Analysis Staff E. Enabling A3.1 Monitoring Leadership A3. Measurable impact on health service Innovation and Planning C. User Support A Quality Assurance Framework for
More informationLinking Current and Future Score Scales for the AICPA Uniform CPA Exam i
Linking Current and Future Score Scales for the AICPA Uniform CPA Exam i Technical Report August 4, 2009 W0902 Wendy Lam University of Massachusetts Amherst Copyright 2007 by American Institute of Certified
More informationReliability & Validity
Request for Proposal Reliability & Validity Nathan A. Thompson Ph.D. Whitepaper-September, 2013 6053 Hudson Road, Suite 345 St. Paul, MN 55125 USA P a g e 1 To begin a discussion of reliability and validity,
More informationabc GCE 2005 January Series Mark Scheme Economics ECN2/1 & ECN2/2 The National Economy
GCE 2005 January Series abc Mark Scheme Economics ECN2/1 & ECN2/2 The National Economy Mark schemes are prepared by the Principal Examiner and considered, together with the relevant questions, by a panel
More informationProgram Assessment. University of Cincinnati School of Social Work Master of Social Work Program. August 2013
University of Cincinnati School of Social Work Master of Social Work Program Program Assessment August 01 Submitted to the College of Allied Health Sciences University of Cincinnati 1 University of Cincinnati
More informationGetting Started with OptQuest
Getting Started with OptQuest What OptQuest does Futura Apartments model example Portfolio Allocation model example Defining decision variables in Crystal Ball Running OptQuest Specifying decision variable
More informationStudent Workbook. Designing A Pay Structure TOTAL REWARDS. Student Workbook. STUDENT WORKBOOK Designing A Pay Structure. By Lisa A. Burke, Ph.D.
Case Study and Integrated Application Exercises By Lisa A. Burke, Ph.D., SPHR Student Workbook Student Workbook TOTAL REWARDS 2008 SHRM Lisa Burke, Ph.D., SPHR 45 46 2008 SHRM Lisa Burke, Ph.D., SPHR INSTRUCTOR
More informationSTATISTICAL TECHNIQUES. Data Analysis and Modelling
STATISTICAL TECHNIQUES Data Analysis and Modelling DATA ANALYSIS & MODELLING Data collection and presentation Many of us probably some of the methods involved in collecting raw data. Once the data has
More informationDIPLOMA OF HUMAN RESOURCES MANAGEMENT-BSB50615 Study Support materials for Manage recruitment selection and induction processes BSBHRM506
DIPLOMA OF HUMAN RESOURCES MANAGEMENT-BSB50615 Study Support materials for Manage recruitment selection and induction processes BSBHRM506 STUDENT HANDOUT This unit describes the performance outcomes, skills
More informationHTS Report. d2-r. Test of Attention Revised. Technical Report. Another Sample ID Date 14/04/2016. Hogrefe Verlag, Göttingen
d2-r Test of Attention Revised Technical Report HTS Report ID 467-500 Date 14/04/2016 d2-r Overview 2 / 16 OVERVIEW Structure of this report Narrative Introduction Verbal interpretation of standardised
More informationInfluence of the Criterion Variable on the Identification of Differentially Functioning Test Items Using the Mantel-Haenszel Statistic
Influence of the Criterion Variable on the Identification of Differentially Functioning Test Items Using the Mantel-Haenszel Statistic Brian E. Clauser, Kathleen Mazor, and Ronald K. Hambleton University
More informationTest and Measurement Chapter 10: The Wechsler Intelligence Scales: WAIS-IV, WISC-IV and WPPSI-III
Test and Measurement Chapter 10: The Wechsler Intelligence Scales: WAIS-IV, WISC-IV and WPPSI-III Throughout his career, Wechsler emphasized that factors other than intellectual ability are involved in
More informationStaffing Organizations (2nd Canadian Edition) Heneman et al. - Test Bank
Chapter 08 1. External selection refers to the assessment and evaluation of external job applicants. 2. Understanding the legal issues of assessment methods is necessary. 3. Cost should not be used to
More informationTDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.
Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide
More informationNear-Balanced Incomplete Block Designs with An Application to Poster Competitions
Near-Balanced Incomplete Block Designs with An Application to Poster Competitions arxiv:1806.00034v1 [stat.ap] 31 May 2018 Xiaoyue Niu and James L. Rosenberger Department of Statistics, The Pennsylvania
More informationIdentifying the people to grow your organisation and deliver business results
Innovative talent assessment solutions for Retail & Hospitality Identifying the people to grow your organisation and deliver business results Meeting the key retail & hospitality talent challenges Company
More informationThe 360-Degree Assessment:
WHITE PAPER WHITE PAPER The : A Tool That Can Help Your Organization Maximize Human Potential CPS HR Consulting 241 Lathrop Way Sacramento, CA 95815 t: 916.263.3600 f: 916.263.3520 www.cpshr.us INTRODUCTION
More informationTraining Watson: How I/O Psychology, Data Science, and Engineering integrate to produce responsible AI in HR.
Training Watson: How I/O Psychology, Data Science, and Engineering integrate to produce responsible AI in HR. Stefan Liesche, IBM Distinguished Engineer - Watson Talent Architecture Nigel Guenole, IO Psychologist
More information