IBM Workforce Science. IBM Kenexa Ability Series Computerized Adaptive Tests (IKASCAT) Technical Manual

Size: px
Start display at page:

Download "IBM Workforce Science. IBM Kenexa Ability Series Computerized Adaptive Tests (IKASCAT) Technical Manual"

Transcription

1 IBM Workforce Science IBM Kenexa Ability Series Computerized Adaptive Tests (IKASCAT) Technical Manual Version UK/Europe Release Date: October 2014

2 Copyright IBM Corporation All rights reserved. 2

3 Table of Contents CHAPTER 1: WHAT IS IKASCAT? Introduction... 5 CHAPTER 2: ASSESSMENT CONTENT Assessment Components Logical Reasoning Test Numerical Reasoning Test Verbal Reasoning Test... 8 CHAPTER 3: IKASCAT UTILIZES CAT TECHNOLOGY CHAPTER 4: DEVELOPMENT OF IKASCAT CAT SYSTEM CAT System Design and Development CAT Content Development CAT Implementation and Maintenance CHAPTER 5: WHY USE PSYCHOMETRIC TESTS? Why Use Cognitive Ability Tests? Generalized Validity of Ability Tests CHAPTER 6: CAT AND HOW IT IS USED IN IKASCAT What is CAT? The Advantages of using CAT What is Item Response Theory (IRT)? Item and Test Information Parameter Estimation Theta Estimation Item Parameter Estimation Building Appropriate CAT Strategies Starting rule for selecting the first item Item selection algorithm Item scoring and updating ability procedure Constraints on item selection Stopping Rule CHAPTER 7: ADMINISTRATION, SCORING & REPORTING Administration Scoring Copyright IBM Corporation All rights reserved. 3

4 7.3 Reporting CHAPTER 8: SUMMARY STATS AND GROUP DIFFERENCES 'Norming' and Norms Groups Available Logical Reasoning Test Numerical Reasoning Test Verbal Reasoning Norms Group Differences Setting cut-off scores Group Differences: LRT Group Differences: NRT Group Differences: VRT CHAPTER 9: RELIABILITY CHAPTER 10: VALIDITY Defining Validity Criterion Validation Studies What Do Employers Get From Using IKASCAT? What do employers get from the LRT? What do employers get from the NRT? What do employers get from the VRT? CHAPTER 11: EQUATING IKASCAT TO INFINITY SERIES Can a PBT or CBT (static form) be equated to CAT? Framework of Score Linking Purpose of Equating Linking Design Data Collection Design for Equating Classical Equating Methods IRT Equating Methods Establishing relationship between IKASCAT and Infinity Series CHAPTER 12: VALIDATION STUDIES (LRT, NRT, VRT) CHAPTER 13: REFERENCES Copyright IBM Corporation All rights reserved. 4

5 Chapter 1: What is IKASCAT? 1.1 Introduction One of the main interests in the field of occupational psychology lies in the area of recruitment and selection, and the identification of factors which can predict successful occupational performance. Researchers have compared possible predictors of job performance, such as biographical data, references, educational level, college grades, interviews, ability tests and personality questionnaires, and the general consensus of the research is that the best predictor of occupational performance is cognitive ability (Schmidt & Hunter, 1998; Gottfredson, 2002). IBM s Kenexa Ability Series Computerized Adaptive Test (IKASCAT) is a suite of assessments that assess three of the major components of cognitive ability (Logical Reasoning, Numerical Reasoning, and Verbal Reasoning). The IKASCAT utilizes computerized adaptive testing (CAT) which adapts to a test taker s responses, providing test takers with items that most closely reflect their ability and calculating their ability in the most accurate and secure method available. IKASCAT measures cognitive abilities that are important predictors of job performance and training success. Schmidt and Hunter s (1998) review of over 85 years of research into personnel selection identified tests of cognitive ability as being the best predictors of job performance and training success. IKASCAT measures such cognitive abilities, assessing both deductive reasoning skills (using verbal and numerical formats) and inductive reasoning skills (using abstract/logical reasoning formats) for use in work-related settings. There are two distinct parts to IKASCAT: the Assessment Content (i.e. the assessments themselves including all of the questions asked) and the CAT system (i.e. the administration and scoring system that deliver the questions and produce scores based on the test taker s answers). Both of these are described in some detail later in this document. This technical manual has been written for users of the IKASCAT, and provides the following: Descriptions of the assessments themselves Rationale behind IBM developing CAT systems for assessments An explanation of what CAT is A summary of the development of the CAT system Statistical details on IKASCAT Information on administration, scoring and reporting, and Examples of reports produced Copyright IBM Corporation All rights reserved. 5

6 Chapter 2: Assessment Content The questions (or items) used in IKASCAT were written and reviewed by a team of occupational psychologists and psychometricians with combined test development expertise in excess of 100 years, and from a range of English-speaking countries (including UK, Ireland, US, New Zealand, Singapore and South Africa). IBM Workforce Science has developed computer-administered psychometric tests for the last 25 years, and online ability assessments produced by IBM psychologists are used with tens of millions of test takers annually. The development work was started formally in January 2012 until the final pilot studies in June The assessment went through several iterations until the final pilot study, the results of which provide some of the technical and statistical information in this manual. General design criteria have been applied in developing this assessment. The most important of these general criteria is the quality of the question asked (the items). Great care was taken in choosing the format, structure, and appearance of the items. All items have been checked, reviewed, modified it necessary, trialled and re-trialled. All items have been reviewed for issues of legality, particularly concerning diversity or disability, and to ensure that local idioms are avoided and offence is not caused by any of the questions asked. The core stems or stimuli used (the information on which the items are based) need to allow test takers to show their ability to draw conclusions, make deductions and infer logically from the information provided. Multiple choice questions were developed. Test takers need to choose the correct answer from a range of possible options. In each case, one and only one of the possible options is correct. Items (for the NRT and VRT in particular) cover a wide range of subject matters. Different IRT scoring methods were used, including Rasch scoring, 2PL and 3PL models. 2.1 Assessment Components The IKASCAT is composed of three assessments: Logical Reasoning (LRT), Numerical Reasoning (NRT) and Verbal Reasoning (VRT). These are designed for use in unproctored internet (or online) testing context for both CAT and non-cat context (IRT scoring or traditional scoring). Copyright IBM Corporation All rights reserved. 6

7 2.1.1 Logical Reasoning Test Logical reasoning is the ability to analyse situations, identify patterns and relationships that underpin these situations, and derive or extrapolate from these. This is a necessary condition for all logical problem solving situations, particular those requiring scientific, mathematical, engineering or financial problem solving. The Logical Reasoning Test (LRT) is designed to provide a fair, objective, rapid and practical measure of inductive reasoning. It measures a person s skills in evaluating the patterns and trends in information, without reference to written text or numerical data. The LRT has been developed to be a culture-fair assessment, useful in multi-cultural, multi-racial or multiple language contexts. Inductive reasoning is the process of reasoning from specific premises or observations to reach a general conclusion or overall rule. Deductive reasoning denotes the process of reasoning from a set of general premises to reach a logically valid conclusion. Deductive inferences draw out conclusions that are implicit in the given information whereas inductive inferences add information in order to draw a conclusion. The information in the LRT questions comes in the form of an abstract form or shape that have been changed or modified across a series of stages. One of these stages is missing and candidates need to carry out an analysis of the information to enable them to choose which one of a series of options would complete the series logically. The LRT does require the test taker to: Attend to the information available (i.e. the characteristics of the forms and shapes) Identify the relationships, patterns and trends in the information Derive a set of rules that can support the relationship Apply these rules to correctly identify the required answer The LRT does not require the candidate to: Use prior knowledge or have knowledge of a particular subject or area Have learned or acquired a particular skill Be a speaker of a particular language Numerical Reasoning Test The Numerical Reasoning Test (NRT) is a test of deductive reasoning, one of the major components of fluid intelligence, a concept originally identified by Raymond Cattell (1971). Numerical reasoning is the ability to evaluate numerical information critically, understand patterns and trends in data, and the ability to draw valid logically inferences from the information presented. It is designed to provide a fair, objective, rapid and practical measure of deductive reasoning, using numerical information. Copyright IBM Corporation All rights reserved. 7

8 The content of this test is representative of numerical information likely to be encountered within a business context, thus providing wide applicability across a range of professional and managerial selection, development and recruitment activities. Managerial and professional roles inherently require employees to frequently deal with complex numerical data, for example in financial planning, market analysis and problem solving situations. The NRT was therefore designed to assess this level of numerical reasoning ability. Questions needed to: Be easy to read and assess Present information in the simplest format possible Include realistic scenarios Use real data sets (simplified and modified for use in assessment) Involve simple arithmetic operations such as the addition, subtraction, multiplication and division Involve the use of whole numbers (integers), decimals and fractions. Involve the use of the ratios and percentages Present information in the form of charts, graphs and tables (often combination of these) The NRT does require the test taker to: Evaluate numerical information critically Understand patterns and trends in the data presented Carry out simple computational analysis in order to come to the correct conclusions The NRT does not require the candidate to: Have prior knowledge of the numerical content in the stimuli Apply complex formulae Have knowledge of complex mathematical methods Verbal Reasoning Test The Verbal Reasoning Test (VRT) is designed to provide a fair, objective, rapid and practical measure of deductive reasoning, using written information. It measures a person s ability to critically evaluate information presented in a written verbal format. In addition to understanding written communication, the VRT also encompasses the ability to understand complex discussions and other verbal interactions. Many jobs involve working with verbal information and verbal comprehension forms a core component of almost all professional and managerial roles. The VRT offers a high level assessment of the verbal reasoning processes that people use almost on a daily basis when analysing and evaluating detailed content of reports and other business documentation, produced by themselves, by colleagues or by Copyright IBM Corporation All rights reserved. 8

9 outside agencies. In many organisations, verbal reasoning skills are key to the effective dissemination of business information, upwards and downwards, right across the workforce. Most of the items in the VRT include a number of short passages of text followed by statements based on the information given in the passage. Candidates are asked to indicate whether the statements are true or false, or whether it is not possible to say so either way. In answering these questions, candidates use only the information given in the passage and should not try and answer them in the light of any more detailed knowledge that they personally may have. Test developers needed to: Make passage length as short as possible (around 120 words) Take into account general reading speed Avoid grammatical or vocabulary complications Ensure that the information in the passage was factually correct Ensure that the information in the passage was not controversial Ensure that the information in the passage was not emotionally affective (i.e. people may react to it emotionally) Develop passages that were similar to short articles found on websites, in newspapers or magazines The VRT does require the test taker to: Analyze and critically evaluate verbal information Understand complex arguments or positions in written communication Draw appropriate inferences from complex written information The VRT does not require the test taker to: Have prior knowledge of the factual content in the passages Have technical knowledge of grammar Spot errors in spelling of unfamiliar words Show knowledge of acquired specialist vocabulary Copyright IBM Corporation All rights reserved. 9

10 . Chapter 3: IKASCAT utilizes CAT Technology IKASCAT utilizes CAT technology in order to provide test users such as hiring managers with the most efficient, effective and accurate method of assessing cognitive ability. IBM has invested millions in developing a bespoke CAT system because the psychometric testing literature shows that CAT has a range of significant advantages over conventional online testing. These advantages include: Shorter test length (more than 50% fewer questions required) Shorter test duration (between 30% and 50% saving in time required) Greater measurement accuracy and test reliability Increased test taker motivation Increased test taker experience Increased test effectiveness (better at differentiating between candidates) Greater test security (particularly important with unsupervised testing) Greater scope for enhancement and updating These advantage are elaborated on and are fully referenced in Chapter 3 of this document, Other considerations involve the use of online assessments with the diversity of candidates expected. Fixed length, timed ability tests are the commonly used outside of North America, Fixed, timed versions of ability tests show larger differences between disabled and non-disabled candidates than untimed assessments (REFERENCE NEEDED). IBM presented a paper at the BPS Division of Occupational Psychology Conference 2014 (Keeley, S, & Parkes, J,. 2014a) which showed that adjustments in test time (i.e. increasing the time allowed) had the effect of reducing differences between disabled and non-disabled candidates but not removing them, as some disabled candidates still timed out even when given extra time. Accordingly, due to being untimed, CAT tests have the additional advantages: Candidate performance is maximized Better at dealing with adjustments required by disabled candidates (no need to add additional time as the assessments are untimed) The IKASCAT utilizes computerized adaptive testing so item administration is tailored to the ability of each individual test taker. Each test is likely to have a unique combination of items; items are drawn from an item bank (or database) containing a large number of individual items and their psychometric Copyright IBM Corporation All rights reserved. 10

11 characteristics (e.g. item difficulty). Tests are constructed based on a number of criteria, the most important of which is the test taker s performance during the test itself. The items presented are selected based on how the test taker has answered previous questions. If the test taker answers correctly, a more difficult item is administered; if the test taker answers incorrectly, an easier item is administered. The test adapts itself to the test taker s ability. Accordingly, lower ability test takers will be presented with easier questions than higher ability test takers. It means that test takers may have got the same number or percentage of questions correct but the higher-ability test takers will score better as they have answered more difficult questions. The psychometric models behind IKASCAT are item response theory (IRT) models for both dichotomously scored (i.e. scored 0 and 1) and polytomously scored items (i.e. scored more than 0 and 1) for a variety of possible item types and formats. In particular, the IRT models available for the IKASCATs are the IRT three parameter logistic (3PL) model, the two parameter logistic (2PL) model and the one parameter logistic (1PL) model or the Rasch Dichotomous measurement model. The IRT models adopted for the development of the IKASCATs are important building blocks that enable the scoring of candidates performances on the cognitive ability assessments in real time and making them comparable. IBM s CAT system built around the IRT models is the most advanced CAT system in the industry with its signature components Item Banker, CAT engine, CAT delivery and CAT management system that are hosted on the Assess on the Cloud platform. IBM began its pre-production process for both the CAT system development and the content development based on the test specification or blueprint in The following chapter explains how this CAT system was developed and what it actually entails. Copyright IBM Corporation All rights reserved. 11

12 Chapter 4: Development of IKASCAT CAT System Based on most popular psychometric models, IKASCAT was developed in three phases: system design and development, content development, and implementation/maintenance. These are shown in Figure 1 below. Figure 1. Phases of Development for the IKASCATs 4.1 CAT System Design and Development During phase one (CAT System Design and Development), the CAT system was designed to accommodate dichotomous and polytomous IRT models and popular item types (e.g. multiple choice, rating scale, forced choice). It accommodates both unproctored or proctored internet based testing (IBT), as well as multiple languages. A psychometric design and programming guideline was Copyright IBM Corporation All rights reserved. 12

13 produced to guide development of a CAT system, based on optimal conditions identified via Monte Carlo simulation studies. A large team of experts in programming and psychometrics were involved to develop and conduct quality control checks on the programming codes from spring 2012 through spring As a result, a series of improvements were made to the system to enhance its usability and scoring accuracy. Further improvements have been made to CAT s scoring and effectiveness after the initial CAT system development phase. The CAT system consists of modules of item banking system, test engine, test management and delivery system. The item banking system (or banker) stores item content and psychometric properties associated with each item (or question). The test engine module reads in the psychometric characteristics of items from the item banker, administers items adaptively and estimates the ability for each content domain. The engine also records, processes and stores all item response data, item records and ability estimates. The test management and delivery system takes in the candidate registration information from the applicant tracking system (ATS) and controls administration allowing unlimited access to CAT via the Internet around the globe. It also produces final scores such as raw or scale scores, and reports out the results (item responses, ability estimates and psychometric item characteristics) to the end users, internally and externally. The CAT delivery and management module is integrated with the IBM s signature assessment platform, Assess on the Cloud. CAT administration and score reporting follows the standard procedural order of Assess - authoring and publishing CATs into Assess, scheduling, delivery and score reporting (see Figure 2 below). Figure 2. CAT Management and Delivery via Assess Item Banking Test creation/ customization Master Catalog Custom Catalog Authoring Scheduling Standalone scheduling Schedule via integration with 2x Solutions (2xB, ATS) Standard reports Custom reports Summary statistics Test and item analysis Reporting Delivery Online, mobile, print/scan On-demand via integration with 2x solutions (2xB, ATS) Copyright IBM Corporation All rights reserved. 13

14 4.2 CAT Content Development IBM Workforce Science has developed computer-based tests for the last 25 years. With the extensive test development experience and expertise, more than 20 I-O psychologists and content experts as well as psychometricians were involved in the content development process for IKASCAT. A full-cycle development process is presented below. Collected and reviewed item content and characteristics of existing cognitive ability assessments as the test will be used globally, each item was reviewed to ensure cultural sensitivity across multiple languages. Identified the item type/style/format for use in CAT. Recruited item writers from a range of global geographic regions and cultures (these included many English speaking countries (UK, Ireland, US, South Africa, Australia, and New Zealand) as well as China, Pakistan, Hong Kong, Singapore, France, and Germany). Conducted item writing training sessions via web conferences to ensure consistency. Wrote new items. Conducted bias and sensitivity review to ensure that new items were free of bias. Assembled standalone pretesting (field testing, item tryout or item trial), given psychometric conditions, documented in the psychometric design IRT model, sample size and demographics, data collection design, number/percentage of items covering each content section or domain, multiple form assembly, test publishing, testing window, test administration, delivery platform, data collection and item linking and calibration. Performed final psychometric data review and final content review. Identified operational items and built the initial item pool for each subject (domain). Conducted simulation studies with the approved operational items to find the optimal conditions for building operational CATs. Planned new item writing and standalone pretesting or embedded pretesting in live CAT, depending on the pool size. A standalone pretesting with multiple pretest forms assembled was necessary to build up the initial item pool since not all participants in the pretesting can see all items in the given test form. Two popular approaches to build a final item pool/bank (as known as item linking) is using the common items that are included in between two adjacent pretesting forms or across all pretesting forms, or having the common group (or sample) of participants take all pretesting forms. Both of these approaches were used in building the final item pool/bank for the IKASCAT assessments; the former approach is known as the common item linking, and the latter approach as the common person item linking. 4.3 CAT Implementation and Maintenance Copyright IBM Corporation All rights reserved. 14

15 In building an item pool and measurement scale for use in an adaptive test, it is critical to determine procedures for identifying items that do not perform well. Poor items should be removed from the pool as soon as they are identified. Otherwise, it introduces bias to the ability estimates. It is probably necessary to evaluate item performance at job candidate volume intervals to see if they are performing as the target functions require. It is possible that the difficulty of items drifts or changes over time. Sometimes they drift to be easier, other times they drift to be harder. Sometimes they drift to be easier, other times they drift to be harder. It is important to evaluate items for drift on an annual basis and when needed to update item parameter estimates. At specified points in the test life cycle, item pools are refreshed to ensure model fit and to conform to specified security provisions. The current item refreshment plan is primarily concerned with updating items that have been overexposed with new items. Further expansion of the banked items is underway, with new items being trialled and included in the item pool on an ongoing basis. Copyright IBM Corporation All rights reserved. 15

16 Chapter 5: Why use Psychometric Tests? The term psychometric means mental measurement. Consequently, psychometric tests are devices that measure psychological characteristics such as intelligence, personality, or ability to perform a particular task. One major benefit of psychometric tests is that they are designed as systematic and standardised methods of measurement. In practice this means that the questions asked are consistent for every person that completes the test, the instructions they are given are consistent, and the conditions under which they complete the test should be controlled and as standardized as possible. With standardized practices, we are able to compare the results from tests taken at different times and in different places. Test developers also put in place systems for scoring their tests (for almost all Kenexa assessments this is computerized) enabling us to score and interpret the results in a consistent way. Another characteristic of psychometric tests is that they are designed to obtain a snapshot or sample of a person s ability or characteristics upon which we can make an assessment. An alternative would be, for example, to observe a person continuously in order to assess their ability, but this would be impractical. Designers of psychometric tests aim to ensure that the information we obtain by assessing a sample of a person s ability can be reliably used to make an assessment of their ability in general. In order to make sense of the information obtained from a psychometric test, often a person s results are compared with those from a relevant group or population. For example, a person s results on a graduate ability test will be compared with the scores of a graduate population. Similarly a person s results on a work-based personality questionnaire will be compared with those from a working population. Psychometric tests can be divided into those that assess maximum performance and those that assess typical performance. Tests that assess maximum performance are designed to determine how well a person performs at their best. These types of test may be timed with everyone given exactly the same amount of time to complete them and they typically have right and wrong answers. Tests that assess maximum performance include ability tests and attainment tests. These are often timed but the IKASCAT assessments are usually untimed, with no strict limit on the amount of time allowed for completing the test (although guidelines are often provided) Why Use Cognitive Ability Tests? Cognitive ability is one of the most studied constructs in psychology, with over 100 years of research behind it. Almost from the outset, work on the understanding of cognitive abilities has been conducted from an applied standpoint. For example, Alfred Binet, considered to be the developer of the first intelligence test, constructed measurements to understand the potential of children to benefit from educational instruction. This resulted in the first recognised test of mental ability being published in 1905 (Binet,1905). This tradition of applied research has continued, particularly in the areas of education and personnel selection. Copyright IBM Corporation All rights reserved. 16

17 Measures of cognitive ability have always been recognised in the academic literature as the best general predictors of job performance and are among the cheapest and most cost-effective methods to implement. The US Office of Personnel Management state on their website that Cognitive ability tests are used because they are among the least expensive measures to administer and the most valid for the greatest variety of jobs. 1 As with many areas of psychology, there is no single agreed definition of what cognitive ability is. In his influential book on the structure of human abilities, Carroll (1993) argues that abilities need to be understood in the context of a specific task, with a cognitive task being any task in which correct or appropriate processing of mental information is critical to successful performance. Cognitive ability is any class of cognitive activity that concerns some class of cognitive tasks, so defined (Carroll, 1993, p 10). It is particularly helpful as it not only provides a far-ranging map of intelligence, but also allows individual tests to be placed within this structure. As Carroll s model shows, at the level of Stratum I sit tests of specific abilities. Stratum II clusters these into broad families of tests, on the basis of factoranalytic research. For example, performance across sequential reasoning, induction and quantitative reasoning tests is assumed to be related to the underlying influence of fluid intelligence. In turn, performance on all tests is assumed to be influenced by a person s general intelligence, which forms Stratum III of Carroll s model. From the perspective of test development, it is important to recognise that most psychometric tests can exist only at Stratum I. Stratum II and III of the model are abstractions hypothesised from the statistical analysis of test results and are never directly observed. However, the weight of empirical research strongly suggests that these abstractions do have psychological reality (Carroll, 1993). Figure 3. Carroll s Three-Stratum Model of Intelligence 1 Retrieved from apps.opm.gov/adt/content.aspx?page=2-02 Copyright IBM Corporation All rights reserved. 17

18 5.1.2 Generalized Validity of Ability Tests A number of major studies are often invoked to support the use of cognitive ability tests such as the Logical, Numerical and Verbal Reasoning tests included in IKASCAT. In 1998, Schmidt and Hunter reviewed over 85 years of research into personnel selection. This extensive synthesis of the literature identified tests of general mental ability (GMA) 2 as being the single best predictor of job performance and success on job-related training courses. Outtz s study (2002) showed significant correlations between cognitive ability tests and measures of job performance across a large range of jobs and roles. Ree et al. (1994) investigated the role of general cognitive ability and specific abilities or knowledge as predictors of work sample job performance criteria in seven jobs for US Air Force enlistees. Analyses revealed cognitive ability was the best predictor of all criteria and specific abilities or knowledge added a statistically significant but smaller amount to predictive efficiency. These results are consistent with previous military studies, such as Army Project A. Schmidt and Hunter s major meta-analytical study (2004) presented extensive evidence that cognitive ability predicts both occupational level attainment and performance within one s chosen occupation and does so better than any other ability, trait, or disposition, and considerably better than job experience. Other work, much of it involving meta-analysis, has further supported the validity of GMA in the prediction of job performance. Bertua, Anderson and Salgado (2005) examined the literature on criterion validity, and largely replicated previous work. Tests of GMA were seen to predict job performance (0.48) and training success (0.50). Validity was again seen to vary among occupations, ranging from 0.74 for professional roles to 0.32 for clerical roles. Bertua et al s work also studied different types of ability tests. All test types studied had substantial validity. In terms of measures of job performance and across 20 different samples (n = 3,410), numerical ability tests showed an operational validity of 0.42 and a 90% credibility value of 0.26, indicating that the validity of numerical ability tests can be generalized across samples and settings. In terms of measures of training success and across 46 different samples (n = 15,925), numerical ability tests showed an operational validity of 0.54 and a 90% credibility value of 0.43, indicating that the validity of numerical ability tests can be generalized across samples and settings. In terms of measures of job performance and across 14 different samples (n = 3,464), verbal ability tests showed slightly lower operational validities of 0.39 and a 90% credibility value of 0.20, indicating that the validity of verbal ability tests can be generalized across samples and settings. In terms of training success and across 33 different samples (n = 12,679), verbal ability tests showed an operational validity of 0.49 and a 90% credibility value of 0.36, indicating that the validity of verbal ability tests can be generalized across samples and settings. 2 General mental ability is the term frequently used in literature that summarises the results from research using a range of cognitive ability tests. Variations in the content and style of the tests are acknowledged. However, the positive manifold demonstrated by such tests, which implies an underlying construct influencing performance across different tests, is used to justify considering them as all being assessments of the construct of general mental ability. Copyright IBM Corporation All rights reserved. 18

19 Chapter 6: CAT and how it is used in IKASCAT 6.1 What is CAT? A Computerized Adaptive Test (CAT) is a test, administered by computer, which dynamically adjusts itself to the cognitive ability level of each test taker during the course of administration. CAT is normally used to describe a test delivery method as compared to the conventional paper and pencil based testing (PBT). In a conventional PBT test of ability, every person takes the same fixed form test, regardless of the item characteristics for a given level of ability. Typically, a conventional ability PBT test presents items that measure well candidates with the mid-ability levels. This means the introduction of more measurement errors for those at the extreme level of ability. In other words, it is wasteful if the hardest items are administered to candidates with the lowest ability level or if the easiest items administered to candidates with the highest ability level. Bored high ability persons are likely to respond carelessly and frustrated low ability persons are more likely to respond in a random manner, and thus more errors of measurement of ability are introduced. CAT creates and delivers a customized test for each respondent using computers (increasingly online), aiming to measure various psychological constructs such as ability, achievement, attitude and personality traits in the most efficient and effective way. CAT successively selects questions so as to maximize the precision of the test based on what is known about the candidate from previous questions. From the candidate's perspective, the difficulty of the exam seems to tailor itself to his or her level of ability. For example, if a candidate performs well on an item of intermediate difficulty, he will then be presented with a more difficult question. Or, if he performed poorly, he would be presented with an easier question. Compared to static multiple choice tests where everyone is required to take a fixed set of items regardless of their ability (or construct) levels, CAT requires fewer test items to arrive at equally precise measures. 6.2 The Advantages of using CAT Among many known advantages, efficiency and control of measurement precision are prominent. CATs are more efficient than conventional tests that are delivered via PBT (and IBT that is non-cat). The test length for examinees can be reduced by 50% or more (i.e., feature of variable length CAT). A properly designed CAT can measure every examinee with the same degree of precision which is not true of conventional PBT, or IBT that is non-cat. Figure 4 shows that the standard error of measurement is similar across the full range of ability and at very low levels. Figure 4. Degree of Precision: Conditional Standard Error of Measurement across Ability Estimates (Thetas) Copyright IBM Corporation All rights reserved. 19

20 There are many additional advantages recorded in the literature with regard to CAT (Linacre, 2000; Rudner,1998). Test takers receive tests that are tailored to their actual ability level. This means that test takers are not given a series of irrelevant questions which are either too easy (and therefore do not tell us the highest level of performance for this test taker) or too difficult (and therefore only tell us that their highest level of performance is lower than this). The fact that CAT assessments adapt to the actual performance of the test taker, means that their approximate ability level is more quickly identified, and then more specific questions can be administered to enable more accurate identification of the test taker s actual ability level. The adaptive nature of these assessments means that CAT tests are shorter in duration (around 50% shorter in terms of time, and up to 65% shorter in terms of questions presented). These CAT tests are most likely to be administered unproctored but both on-site and off-site testing time will be reduced. Overall CAT tests are much more accurate (i.e. more reliable) than conventional static cognitive ability tests or even tests in which items are administered randomly from a large item bank (Grelle, Dainis, & Hurst, 2009). The Kenexa Ability Test CAT series use a minimum reliability equating to 0.8 for each test; some will be well in excess of this. Test Security is also increased by the use of CAT. Item exposure is reduced because fewer questions are administered. By comparison with Kenexa s non-cat versions of these assessments, this might reduce the number of questions presented from 20 items for a fixed NRT test to 8 items or less for a CAT version. This means that each candidate sees fewer questions and only sees questions which equate to their ability level. The methods used to score CAT assessments also mean that efforts to access a large number of items can be thwarted. A maximum number of items per administration is Copyright IBM Corporation All rights reserved. 20

21 set and test sessions may time out if excessive time is taken over the test as a whole or over individual items. If test items do become over exposed or compromised (through cheating or piracy), these items can be deleted from the item bank without the integrity of the whole item bank being affected. One of the advantages of the CAT methodology is that items can be deleted and new items can easily be added to the total item bank. Replacement and alternative items are constantly being trialled and added to the item banks for these assessments. Despite the sophistication and complexity of CAT scoring, scores for test takers are immediately available. This is due to the fact that the ability level (which will be represented by a particular score (or theta value in this case) needs to be calculated after every question, to calculate the next question administered. Another possible unexpected advantage is increased motivation. Linacre (2000) mentions increases in the motivation of candidates during CAT testing sessions. During the assessment, the test takers might feel discouraged if the items are too difficult or, on the other hand, might lose interest if the items are too easy. As CAT assessments adapt themselves to a test taker s ability level, this enables the test taker to achieve their most accurate and highest score possible. The shorter test time is also likely to improve the test taker experience by reducing the chances of test fatigue which should result in a reduction in drop-out rates i.e. the number of test takers who leave the assessment unfinished. 6.3 What is Item Response Theory (IRT)? Item response theory (IRT) is an important advance in the technology of psychometrics that provides benefits to the test and stakeholders, including individualized score precision, better characterization of the concept of measurement error, and the possibility of CAT. The calculation of CAT scores is founded on the principles of IRT models. As suggested, IRT consists of several families of mathematical models, including dichotomous, polytomous, and multidimensional. This manual focuses primarily on dichotomous models, which are appropriate for data that has two scored data points, typically right and wrong or correct and incorrect, where the item type is multiple choice with three to five item response options/alternatives, depending on the item domain area. We assume in the dichotomous IRT that the relationship in between the response to an item and a person can be explained by a specific mathematical function called the item response function (IRF). There are several models commonly used. One of which is the three parameter logistic model (3PLM), which models the probability of an person j with a given ability θ j (Greek letter theta) correctly responding to an item i as (Hambleton & Swaminathan, 1985): P( X i 1 ) c j i exp[ Dai ( bi )] (1 ci ) 1 exp[ Da ( b )] i i (1) Copyright IBM Corporation All rights reserved. 21

22 where a i is the item discrimination parameter or the slope, b i is the item difficulty or the location parameter (or the threshold), c i is the lower asymptote, or the pseudo-guessing parameter, and D is a scaling constant equal to or 1.0. Figure 5 illustrates an IRF for the 3PLM. The difficulty (0.0) is the reflection point in the IRF projected onto the ability continuum, where the probability of correct response to this item is 0.6 (i.e., the midpoint after taking into consideration the pseudo-guessing parameter. The discrimination parameter (1.5) is the slope of the IRF, indicating the strength of an item for discriminating among persons with different levels of ability. The degree of item discrimination is related to precision; that is, a more discriminating item adds more information to the measurement, and thus increases the precision level of ability. The pseudo-guessing parameter (0.2) introduces a non-zero lower-bound to the model; it represents the probability of a lower ability person correctly responding to an item, presumably by chance. Figure 5. Item Response Function for A Dichotomously Scored Item The model can be simplified into two other commonly used dichotomous IRT models. The twoparameter logistic model (2PLM) assumes that there is no guessing (c i = 0.0) and only utilizes the difficulty and discrimination parameters. It is therefore appropriate when guessing would not play an important role in assessment. The one-parameter logistic model (1PLM) makes the further assumption that all items have a discrimination parameter of 1.0, and therefore differ only with respect to difficulty. The 1PLM is Copyright IBM Corporation All rights reserved. 22

23 P P P mathematically equivalent to the Rasch model, although the philosophy is different by the users of each model. Figure 6 presents IRFs for three exemplary items in the 1PL, 2PL, and 3PL models. Note that all IRFs for the 1PLM are parallel to one another and do not intersect. This demonstrates the objective measurement property of the 1PLM, whereby there is no interaction between items and ability. Probability (P) of correct response for harder items will always be lower than probability for easier items. This is not always the case for the 2PLM and 3PLM as evidenced by Figure 6. It is because the slopes (i.e., the discrimination parameters) are different in these two models, whereas the slopes for the 1PLM are equal. Figure 6. IRFs for Dichotomous IRT Models 1PLM 2PLM b = -1.0 b = 0.5 b = a = 0.5, b = -1.0 a = 1.5, b = 0.5 a = 1.0, b = 2.0 3PLM Ability (theta) Ability (theta) a = 0.5, b = -1.0, c = 0.2 a = 1.5, b = 0.5, c = 0.3 a = 1.0, b = 2.0, c = 0.4 Ability (theta) The above models assume that the item responses are a function of only a latent trait (unidimensionality) and that an person s item response is solely determined by his/her location on the latent continuum and not by his/her responses to other items (local or conditional independence). An approach to claim that the test is unidimensional is to show the model-data fit (or data-model fit in the Rasch dichotomous model). Item level fit can be also checked. Another way is to compare the model IRF against the empirical IRF. The model IRF can be conceptualized similarly to a standard linear or logistic regression line: it is simply a model-based function that is fit to a particular set of data. This is illustrated in Figure 7, which provides some plots of empirical and model IRFs. An empirical IRF can be constructed by classifying persons according to ability and computing the proportion-correct within Copyright IBM Corporation All rights reserved. 23

24 P P each ability category. The model IRF attempts to model the curve for the correct response but for an infinite number of groups, on a continuous distribution. Figure 7. Empirical and Model IRFs (a) Good fit between an empirical and modelbased IRF (b) Poor fit between an empirical and modelbased IRF: Suggests the need for a 3PL Ability (theta) Ability (theta) Item and Test Information An important concept in IRT for the purposes of test development and adaptive testing is information. Broadly defined, information is an index of the increase in measurement precision (or decrease in uncertainty). Like the IRF, it is also a continuous function across θs, as an item can provide more information at certain levels. This is because information is primarily a function of the slope of an IRF; at levels of θs, where the IRF has little slope and therefore little differentiating power, the item provides little information. An item provides the most information where the slope of the IRF is highest. For example, a very difficult multiple choice item will differentiate amongst top persons, but provide no differentiation amongst below-average persons; virtually all of the latter would respond incorrectly or be forced to guess. The information function for the 3PL is specifically defined as (Embretson & Reise, 2000) P i ( Pi ci ) D ai 2 Pi (1 ci) Ii( ) (2) which simplifies to D ai Pi (1 Pi ) for the 2PL and D P i (1 P i ) for the 1PL models. While information is maximized at b i for the one- and two-parameter models, for 3PLM it is maximized at (Lord, 1980): b i i Da 1 ln 1 8c 1 i i 2 (3) Copyright IBM Corporation All rights reserved. 24

25 Each item has its own item information function (IIF) that differs based on the item parameters. Consider the following example items: Table 1. Example item parameters Item a b c Item 1 is relatively easy item, with b = -2.00, while Item 4 is more difficult, with b = The IRFs for these items are show in the following figure. Figure 8. IRFs for Example Items The IIFs for the same items are shown below. Note that each item has more information (y-axis) where the IRF in the figure above has more slope. Item 1 had the highest discrimination value, and therefore has the highest peak in the IIF. Figure 9. IIFs for Example Items Copyright IBM Corporation All rights reserved. 25

26 The figure above is one of the core concepts of adaptive testing. CAT typically works by constructing a table of values representing that graph, and look for items that are most informative for a given ability level. For example, if a person s ability estimate is at -2.00, then Item 1 is the most appropriate item for them, as it easily provides the most information around the ability estimate. IIFs are useful in the test construction process because they can be summed across all items to produce the test information function (TIF). The TIF is a function that provides an index of expected (model-based) measurement precision as a function of θs, since TIF and the standard error of measurement (SEM) conditional on ability (CSEM) are inversely related, such that: CSEM 1 n I i i1 ( ) A test intended for a pass/fail decision with a single cut-off score can be built to have a TIF that is peaked near that cut-off score, and thus, there is a high amount of precision. A test that contains several decision points across θs can be built with a TIF that is high across a wider range. The concept of using the TIF and CSEM in test and item bank design are discussed in detail later. (4) Parameter Estimation In IRT, both items and persons are characterized with parameters. Item parameters include a, b, and c, while the person parameter is the ability level θs (theta). These parameters are estimated based on a set of item response data. Estimation of the item and person parameters are dependent on each other. That is, item parameters are used to calculate person θ estimates, which are in turn necessary to estimate item parameters. For this reason, the process of calibrating data with IRT is iterative, and Copyright IBM Corporation All rights reserved. 26

Glossary of Standardized Testing Terms https://www.ets.org/understanding_testing/glossary/

Glossary of Standardized Testing Terms https://www.ets.org/understanding_testing/glossary/ Glossary of Standardized Testing Terms https://www.ets.org/understanding_testing/glossary/ a parameter In item response theory (IRT), the a parameter is a number that indicates the discrimination of a

More information

Ability tests, such as Talent Q Elements, have been scientifically proven* to be strong predictors of job performance.

Ability tests, such as Talent Q Elements, have been scientifically proven* to be strong predictors of job performance. Talent Q Elements Ability tests, such as Talent Q Elements, have been scientifically proven* to be strong predictors of job performance. Elements is a suite of online adaptive ability tests measuring verbal,

More information

The Assessment Center Process

The Assessment Center Process The Assessment Center Process Introduction An Assessment Center is not a place - it is a method of evaluating candidates using standardized techniques under controlled conditions. These techniques offer

More information

ALTE Quality Assurance Checklists. Unit 1. Test Construction

ALTE Quality Assurance Checklists. Unit 1. Test Construction ALTE Quality Assurance Checklists Unit 1 Test Construction Name(s) of people completing this checklist: Which examination are the checklists being completed for? At which ALTE Level is the examination

More information

Automated Test Assembly for COMLEX USA: A SAS Operations Research (SAS/OR) Approach

Automated Test Assembly for COMLEX USA: A SAS Operations Research (SAS/OR) Approach Automated Test Assembly for COMLEX USA: A SAS Operations Research (SAS/OR) Approach Dr. Hao Song, Senior Director for Psychometrics and Research Dr. Hongwei Patrick Yang, Senior Research Associate Introduction

More information

ALTE Quality Assurance Checklists. Unit 4. Test analysis and Post-examination Review

ALTE Quality Assurance Checklists. Unit 4. Test analysis and Post-examination Review s Unit 4 Test analysis and Post-examination Review Name(s) of people completing this checklist: Which examination are the checklists being completed for? At which ALTE Level is the examination at? Date

More information

Field Testing and Equating Designs for State Educational Assessments. Rob Kirkpatrick. Walter D. Way. Pearson

Field Testing and Equating Designs for State Educational Assessments. Rob Kirkpatrick. Walter D. Way. Pearson Field Testing and Equating Designs for State Educational Assessments Rob Kirkpatrick Walter D. Way Pearson Paper presented at the annual meeting of the American Educational Research Association, New York,

More information

Innovative Item Types Require Innovative Analysis

Innovative Item Types Require Innovative Analysis Innovative Item Types Require Innovative Analysis Nathan A. Thompson Assessment Systems Corporation Shungwon Ro, Larissa Smith Prometric Jo Santos American Health Information Management Association Paper

More information

Talent Q. Elements. Psychometric Review August 2017

Talent Q. Elements. Psychometric Review August 2017 Talent Q Elements Psychometric Review August 2017 OVERVIEW OF TECHNICAL MANUALS FOR THE NEW KORN FERRY ASSESSMENT SOLUTION The Korn Ferry Assessment Solution (KFAS) offers a new and innovative process

More information

Potential Impact of Item Parameter Drift Due to Practice and Curriculum Change on Item Calibration in Computerized Adaptive Testing

Potential Impact of Item Parameter Drift Due to Practice and Curriculum Change on Item Calibration in Computerized Adaptive Testing Potential Impact of Item Parameter Drift Due to Practice and Curriculum Change on Item Calibration in Computerized Adaptive Testing Kyung T. Han & Fanmin Guo GMAC Research Reports RR-11-02 January 1, 2011

More information

Carlo Fabrizio D'Amico

Carlo Fabrizio D'Amico Created: 17/11/2012 14:55:56. Certified user: Erik Svaneborg Carlo Fabrizio D'Amico General problem solving (standard precision) - 17/11/2012 MASTER-HR.COM 1 / 6 Content General problem solving (standard

More information

Test Development. and. Psychometric Services

Test Development. and. Psychometric Services Test Development and Psychometric Services Test Development Services Fair, valid, reliable, legally defensible: the definition of a successful high-stakes exam. Ensuring that level of excellence depends

More information

The PI Learning Indicator FAQ. General Questions Predictive Index, LLC The PI Learning Indicator FAQ. What is the PI Learning Indicator?

The PI Learning Indicator FAQ. General Questions Predictive Index, LLC The PI Learning Indicator FAQ. What is the PI Learning Indicator? The PI Learning Indicator FAQ General Questions What is the PI Learning Indicator? What does the PI Learning Indicator measure? Do I need to be trained to use the PI Learning Indicator? Who in the company

More information

UK Clinical Aptitude Test (UKCAT) Consortium UKCAT Examination. Executive Summary Testing Interval: 1 July October 2016

UK Clinical Aptitude Test (UKCAT) Consortium UKCAT Examination. Executive Summary Testing Interval: 1 July October 2016 UK Clinical Aptitude Test (UKCAT) Consortium UKCAT Examination Executive Summary Testing Interval: 1 July 2016 4 October 2016 Prepared by: Pearson VUE 6 February 2017 Non-disclosure and Confidentiality

More information

Talent Q. Aspects. Psychometric Review August 2017

Talent Q. Aspects. Psychometric Review August 2017 Talent Aspects Psychometric Review August 2017 OVERVIEW OF TECHNICAL MANUALS FOR THE NEW KORN FERRY ASSESSMENT SOLUTION The Korn Ferry Assessment Solution (KFAS) offers a new and innovative process for

More information

Sales Director Two Sittings Assessment Fact Sheet

Sales Director Two Sittings Assessment Fact Sheet Sales Director Two Sittings Assessment Fact Sheet Overview The Sales Director solution is for mid- to senior-level sales management positions that oversee the sales function across multiple stores or geographic

More information

ASSESSMENT INFORMATION BRIEF:

ASSESSMENT INFORMATION BRIEF: ASSESSMENT INFORMATION BRIEF: COGNIFY Prepared by: Revelian Psychology Team E: psych@revelian.com P: 1300 137 937 About Revelian Revelian is an innovation-driven Australian company at the forefront of

More information

STAAR-Like Quality Starts with Reliability

STAAR-Like Quality Starts with Reliability STAAR-Like Quality Starts with Reliability Quality Educational Research Our mission is to provide a comprehensive independent researchbased resource of easily accessible and interpretable data for policy

More information

Saville Consulting Assessment Suite

Saville Consulting Assessment Suite Saville Consulting Assessment Suite www.peoplecentric.co.nz info@peoplecentric.co.nz +64 9 963 5020 Overview Swift Aptitude Assessments (IA& SA)... 3 Analysis Aptitudes (IA)... 4 Professional Aptitudes

More information

Core Abilities Assessment

Core Abilities Assessment Core Abilities Assessment Evidence of Reliability and Validity 888-298-6227 TalentLens.com Copyright 2007 by NCS Pearson, Inc. All rights reserved. No part of this publication may be reproduced or transmitted

More information

Psychometric tests are a series of standardised tasks, Understanding Psychometric Tests COPYRIGHTED MATERIAL. Chapter 1.

Psychometric tests are a series of standardised tasks, Understanding Psychometric Tests COPYRIGHTED MATERIAL. Chapter 1. In This Chapter Chapter 1 Understanding Psychometric Tests Looking at what tests measure Distinguishing the range of different tests available Understanding the differences between personality and intelligence

More information

THE RATIONAL METHOD FREQUENTLY USED, OFTEN MISUSED

THE RATIONAL METHOD FREQUENTLY USED, OFTEN MISUSED THE RATIONAL METHOD FREQUENTLY USED, OFTEN MISUSED Mark Pennington, Engineer, Pattle Delamore Partners Ltd, Tauranga ABSTRACT The Rational Method has been in use in some form or another at least since

More information

An Introduction to Psychometrics. Sharon E. Osborn Popp, Ph.D. AADB Mid-Year Meeting April 23, 2017

An Introduction to Psychometrics. Sharon E. Osborn Popp, Ph.D. AADB Mid-Year Meeting April 23, 2017 An Introduction to Psychometrics Sharon E. Osborn Popp, Ph.D. AADB Mid-Year Meeting April 23, 2017 Overview A Little Measurement Theory Assessing Item/Task/Test Quality Selected-response & Performance

More information

R&D Connections. The Facts About Subscores. What Are Subscores and Why Is There Such an Interest in Them? William Monaghan

R&D Connections. The Facts About Subscores. What Are Subscores and Why Is There Such an Interest in Them? William Monaghan R&D Connections July 2006 The Facts About Subscores William Monaghan Policy makers, college and university admissions officers, school district administrators, educators, and test takers all see the usefulness

More information

Understanding and Interpreting Pharmacy College Admission Test Scores

Understanding and Interpreting Pharmacy College Admission Test Scores REVIEW American Journal of Pharmaceutical Education 2017; 81 (1) Article 17. Understanding and Interpreting Pharmacy College Admission Test Scores Don Meagher, EdD NCS Pearson, Inc., San Antonio, Texas

More information

Intercultural Development Inventory (IDI): Independent Review

Intercultural Development Inventory (IDI): Independent Review Intercultural Development Inventory (IDI): Independent Review Submitted By: Andrew Wiley 917.885.0858 Awiley@acsventures.com 11035 Lavender Hill Drive, Suite 160-433 Las Vegas, NV 89135 w w w. a c s v

More information

Ability. Verify Ability Test Report. Name Ms Candidate. Date.

Ability. Verify Ability Test Report. Name Ms Candidate. Date. Ability Verify Ability Test Report Name Ms Candidate Date www.ceb.shl.com Ability Test Report This Ability Test Report provides the scores from Ms Candidate s Verify Ability Tests. If these tests were

More information

VALUES BASED RECRUITMENT TOOLKIT VBRT MODULE 3 USING A VALUES BASED APPROACH IN INTERVIEWS VALUES BASED RECRUITMENT TOOLKIT: DISABILITY SECTOR

VALUES BASED RECRUITMENT TOOLKIT VBRT MODULE 3 USING A VALUES BASED APPROACH IN INTERVIEWS VALUES BASED RECRUITMENT TOOLKIT: DISABILITY SECTOR VALUES BASED RECRUITMENT TOOLKIT VBRT MODULE 3 USING A VALUES BASED APPROACH IN INTERVIEWS CONTACT INFORMATION Caroline Alcorso National Manager (Workforce Development) National Disability Services Level

More information

Selection Definition. Selection criteria. Selection Methods

Selection Definition. Selection criteria. Selection Methods Selection Definition Selection is a variety of imperfect methods to aid the task of predicting which applicant will be most successful in meeting the demands of the job and be the best fit with the work

More information

PRINCIPLES AND APPLICATIONS OF SPECIAL EDUCATION ASSESSMENT

PRINCIPLES AND APPLICATIONS OF SPECIAL EDUCATION ASSESSMENT PRINCIPLES AND APPLICATIONS OF SPECIAL EDUCATION ASSESSMENT CLASS 3: DESCRIPTIVE STATISTICS & RELIABILITY AND VALIDITY FEBRUARY 2, 2015 OBJECTIVES Define basic terminology used in assessment, such as validity,

More information

Personnel Psychology Centre: Recent Achievements and Future Challenges

Personnel Psychology Centre: Recent Achievements and Future Challenges Personnel Psychology Centre: Recent Achievements and Future Challenges PRESENTATION TO THE EUROPEAN ASSOCIATION OF TEST PUBLISHERS SEPTEMBER 2016 The Public Service Commission (PSC) Independent agency

More information

Joe Sample. Total Administration Time: C6wPgCYJK. Candidate ID: Sample Distributor. Organization:

Joe Sample. Total Administration Time: C6wPgCYJK. Candidate ID:   Sample Distributor. Organization: Joe Sample Date and Time Started: Date and Time Completed: Total Administration Time: 9/28/2016 10:28 AM 9/28/2016 10:36 AM 7 minutes Candidate ID: Email: C6wPgCYJK sample@psymetricsworld.com Organization:

More information

Welcome to Psytech International s. inaugural quarterly newsletter. With the. launch of our new brand, our global expansion,

Welcome to Psytech International s. inaugural quarterly newsletter. With the. launch of our new brand, our global expansion, ISSUE 01 April June 2009 MAY 2009 Testing Times this issue Innovations P.1 15FQ+ Model of Personality P.2 WHAT WE STAND FOR: Think Global: We recognise that Psychometrics is a global industry and strive

More information

Equating and Scaling for Examination Programs

Equating and Scaling for Examination Programs Equating and Scaling for Examination Programs The process of scaling is used to report scores from equated examinations. When an examination is administered with multiple forms like the NBCOT OTC Examination,

More information

2017 PMF Application Guide

2017 PMF Application Guide 10 general steps that you should follow as you prepare, work on, and complete the PMF application The timeline for this cycle is as follows: Friday, November 18, 2016 Application for the PMF Class of 2017

More information

Chapter Standardization and Derivation of Scores

Chapter Standardization and Derivation of Scores 19 3 Chapter Standardization and Derivation of Scores This chapter presents the sampling and standardization procedures used to create the normative scores for the UNIT. The demographic characteristics

More information

Ante s parents have requested a cognitive and emotional assessment so that Ante can work towards fulfilling his true potential.

Ante s parents have requested a cognitive and emotional assessment so that Ante can work towards fulfilling his true potential. 55 South Street Strathfield 2135 0417 277 124 Name: Ante Orlovic Date Of Birth: 5/6/2001 Date Assessed: 27/5/2013 Reason for Referral: Test Administered: Cognitive Assessment Wechsler Intelligence Scale

More information

Psychometrics and Assessment Tools Provided by Azure Consulting

Psychometrics and Assessment Tools Provided by Azure Consulting Psychometrics and Assessment Tools Provided by Azure Consulting Contents Page 1. Occupational Personality Questionnaire (OPQ) 3 2. Team Management Profile (TMP) 4 3. Myers Briggs Type Indicator (MBTI)

More information

Chapter 12. Sample Surveys. Copyright 2010 Pearson Education, Inc.

Chapter 12. Sample Surveys. Copyright 2010 Pearson Education, Inc. Chapter 12 Sample Surveys Copyright 2010 Pearson Education, Inc. Background We have learned ways to display, describe, and summarize data, but have been limited to examining the particular batch of data

More information

Overview of Statistics used in QbD Throughout the Product Lifecycle

Overview of Statistics used in QbD Throughout the Product Lifecycle Overview of Statistics used in QbD Throughout the Product Lifecycle August 2014 The Windshire Group, LLC Comprehensive CMC Consulting Presentation format and purpose Method name What it is used for and/or

More information

An Integer Programming Approach to Item Bank Design

An Integer Programming Approach to Item Bank Design An Integer Programming Approach to Item Bank Design Wim J. van der Linden and Bernard P. Veldkamp, University of Twente Lynda M. Reese, Law School Admission Council An integer programming approach to item

More information

The Lexile Framework as an Approach for Reading Measurement and Success

The Lexile Framework as an Approach for Reading Measurement and Success The Lexile Framework as an Approach for Reading Measurement and Success By Colleen Lennon and Hal Burdick Revised 08/05/2014 Original Publication 04/01/2004 VISIT WWW.METAMETRICSINC.COM FOR MORE INFORMATION

More information

Introduction. Products and Services

Introduction. Products and Services List 2018 Introduction Psysoft is an occupational psychology consultancy providing services to clients throughout the UK to support their selection and people development projects. We specialise in running

More information

SAGE Publications. Reliability. Achieving consistency in research is as complicated as it is in everyday life. We may often

SAGE Publications. Reliability. Achieving consistency in research is as complicated as it is in everyday life. We may often C H A P T E R 4 Reliability Achieving consistency in research is as complicated as it is in everyday life. We may often have the expectation that most things we plan for on a daily basis are actually going

More information

INTERPRETATIVE REPORT

INTERPRETATIVE REPORT Laura Borgogni, Laura Petitta, Silvia Dello Russo, Andrea Mastrorilli INTERPRETATIVE REPORT Name: Gender: Age: Education: Profession: Role: Years worked: People managed: female 30 postgraduate degree (year

More information

Better assessment, brighter future. What are the steppingstones for developing a test?

Better assessment, brighter future. What are the steppingstones for developing a test? The four steps are: Step 1: Test purpose Defining the test objective Defining the test design Step 2: Construction Item creation Pre-testing Step 3: Assembly Item selection Test assembly Step 4: Reporting

More information

Estimating Reliabilities of

Estimating Reliabilities of Estimating Reliabilities of Computerized Adaptive Tests D. R. Divgi Center for Naval Analyses This paper presents two methods for estimating the reliability of a computerized adaptive test (CAT) without

More information

Competency Frameworks as a foundation for successful Talent Management. part of our We think series

Competency Frameworks as a foundation for successful Talent Management. part of our We think series Competency Frameworks as a foundation for successful part of our We think series Contents Contents 2 Introduction 3 If only they solved all of our problems 3 What tools and techniques can we use to help

More information

Determining the accuracy of item parameter standard error of estimates in BILOG-MG 3

Determining the accuracy of item parameter standard error of estimates in BILOG-MG 3 University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Public Access Theses and Dissertations from the College of Education and Human Sciences Education and Human Sciences, College

More information

Benin Indicator Survey Data Set

Benin Indicator Survey Data Set Benin Indicator Survey Data Set 1. Introduction 1.1. This document provides additional information for the Indicator Survey data collected in Benin from 18 th May and 30 th September 2009 as part of the

More information

Unit QUAN Session 6. Introduction to Acceptance Sampling

Unit QUAN Session 6. Introduction to Acceptance Sampling Unit QUAN Session 6 Introduction to Acceptance Sampling MSc Strategic Quality Management Quantitative methods - Unit QUAN INTRODUCTION TO ACCEPTANCE SAMPLING Aims of Session To introduce the basic statistical

More information

Upon leaving the programme you can expect one of the following awards, depending on your level of achievement as outlined below:

Upon leaving the programme you can expect one of the following awards, depending on your level of achievement as outlined below: PROGRAMME SPECIFICATION POSTGRADUATE PROGRAMMES KEY FACTS Programme name Financial Economics Award MSc School School of Arts and Social Sciences Department or equivalent Department of Economics Programme

More information

NDIT TM Numerical Data Interpretation Test. FAQs

NDIT TM Numerical Data Interpretation Test. FAQs NDIT TM Numerical Data Interpretation Test FAQs NDIT TM Numerical Data Interpretation Test Frequently Asked Questions About NDIT What does the Numerical Data Interpretation Test measure? NDIT assesses

More information

Crowe Critical Appraisal Tool (CCAT) User Guide

Crowe Critical Appraisal Tool (CCAT) User Guide Crowe Critical Appraisal Tool (CCAT) User Guide Version 1.4 (19 November 2013) Use with the CCAT Form version 1.4 only Michael Crowe, PhD michael.crowe@my.jcu.edu.au This work is licensed under the Creative

More information

Psychological Testing: A Test Taker s Guide. Downloaded from

Psychological Testing: A Test Taker s Guide. Downloaded from Psychological Testing: A Test Taker s Guide Contents Introduction...5 Section 1: General information about test-taking...5 What are psychological tests?...6 Where can I find out more about particular tests?...6

More information

Test Bank Business Intelligence and Analytics Systems for Decision Support 10th Edition Sharda

Test Bank Business Intelligence and Analytics Systems for Decision Support 10th Edition Sharda Test Bank Business Intelligence and Analytics Systems for Decision Support 10th Edition Sharda Instant download and all Business Intelligence and Analytics Systems for Decision Support 10th Edition Sharda

More information

Woodcock Reading Mastery Test Revised (WRM)Academic and Reading Skills

Woodcock Reading Mastery Test Revised (WRM)Academic and Reading Skills Woodcock Reading Mastery Test Revised (WRM)Academic and Reading Skills PaTTANLiteracy Project for Students who are Deaf or Hard of Hearing A Guide for Proper Test Administration Kindergarten, Grades 1,

More information

Introducing WISC-V Spanish Anise Flowers, Ph.D.

Introducing WISC-V Spanish Anise Flowers, Ph.D. Introducing Introducing Assessment Consultant Introducing the WISC V Spanish, a culturally and linguistically valid test of cognitive ability in Spanish for use with Spanish-speaking children ages 6:0

More information

The Changing Singaporean Graduate

The Changing Singaporean Graduate Startfolie The Changing Singaporean Graduate The impact of demographic and economic trends how this impacts you who select David Barrett Managing Director cut-e 1 cut-e talent Solutions Services Overview

More information

Equivalence of Q-interactive and Paper Administrations of Cognitive Tasks: Selected NEPSY II and CMS Subtests

Equivalence of Q-interactive and Paper Administrations of Cognitive Tasks: Selected NEPSY II and CMS Subtests Equivalence of Q-interactive and Paper Administrations of Cognitive Tasks: Selected NEPSY II and CMS Subtests Q-interactive Technical Report 4 Mark H. Daniel, PhD Senior Scientist for Research Innovation

More information

WOMBAT-CS. Candidate's Manual Electronic Edition. Version 6. Aero Innovation inc.

WOMBAT-CS. Candidate's Manual Electronic Edition. Version 6. Aero Innovation inc. WOMBAT-CS Version 6 Candidate's Manual Electronic Edition Aero Innovation inc. www.aero.ca Familiarization with WOMBAT-CS Candidate's Manual This manual should be read attentively by the candidate before

More information

Marketing Plan Handbook

Marketing Plan Handbook Tennessee FFA Association Marketing Plan Handbook 2017-2021 TENNESSEE FFA ASSOCIATION MARKETING PLAN HANDBOOK 2017 2021 2 Purpose The Tennessee FFA State Marketing Plan Career Development Event is designed

More information

Examiner s report F5 Performance Management December 2017

Examiner s report F5 Performance Management December 2017 Examiner s report F5 Performance Management December 2017 General comments The F5 Performance Management exam is offered in both computer-based (CBE) and paper formats. The structure is the same in both

More information

How Differential Item Functioning Analysis (DIF) Can Increase the Fairness and Accuracy of Your Assessments

How Differential Item Functioning Analysis (DIF) Can Increase the Fairness and Accuracy of Your Assessments How Differential Item Functioning Analysis (DIF) Can Increase the Fairness and Accuracy of Your Assessments Nikki Eatchel, SVP of Assessment David Grinham, SVP International Assessment Solutions Scantron

More information

CONSTRUCTING A STANDARDIZED TEST

CONSTRUCTING A STANDARDIZED TEST Proceedings of the 2 nd SULE IC 2016, FKIP, Unsri, Palembang October 7 th 9 th, 2016 CONSTRUCTING A STANDARDIZED TEST SOFENDI English Education Study Program Sriwijaya University Palembang, e-mail: sofendi@yahoo.com

More information

Analyzing Language & Literacy using the WMLS-R

Analyzing Language & Literacy using the WMLS-R Analyzing Language & Literacy using the WMLS-R CRISTINA HUNTER ERIC WILLIAMSON TWIN Academy 2017 Part 1: What the assessment tells us about language acquisition? How can it be used to inform teaching and

More information

Annual Employer Survey : Employers satisfaction with DWP performance against Departmental Strategic Objective 7

Annual Employer Survey : Employers satisfaction with DWP performance against Departmental Strategic Objective 7 Department for Work and Pensions Research Report No 635 Annual Employer Survey 2008 09: Employers satisfaction with DWP performance against Departmental Strategic Objective 7 Jan Shury, Lorna Adams, Alistair

More information

The Technological Edge: Unproctored Employment Testing in Large Organizations

The Technological Edge: Unproctored Employment Testing in Large Organizations The Technological Edge: Unproctored Employment Testing in Large Organizations Presented by Jasmin Loi Human Resources Services Manager Erik Collier Human Resources Analyst 31 st ANNUAL IPMAAC CONFERENCE

More information

CROWN FINANCIAL MINISTRIES

CROWN FINANCIAL MINISTRIES RESEARCH AND DEVELOPMENT TECHNICAL SUMMARY for Career Direct I. TECHNICAL INFORMATION ON THE Career Direct PERSONALITY SECTION The Personality Section of the Career Direct Report is a personality inventory

More information

Data Collection Instrument. By Temtim Assefa

Data Collection Instrument. By Temtim Assefa Data Collection Instrument Design By Temtim Assefa Instruments Instruments are tools that are used to measure variables There are different types of instruments Questionnaire Structured interview Observation

More information

Understanding Your GACE Scores

Understanding Your GACE Scores Understanding Your GACE Scores October 2017 Georgia educator certification is governed by the Georgia Professional Standards Commission (GaPSC). The assessments required for educator certification are

More information

Thinking about competence (this is you)

Thinking about competence (this is you) CPD In today s working environment, anyone who values their career must be prepared to continually add to their skills, whether it be formally through a learning programme, or informally through experience

More information

Information and Practice Leaflet

Information and Practice Leaflet Information and Practice Leaflet Verbal and Numerical Reasoning Tests Why are tests used? Ability or aptitude tests are increasingly being used in the world of work to assess the key skills relevant to

More information

Assessment Center Report

Assessment Center Report Assessment Center Report Candidate Name: Title: Department: Assessment Date: Presented to Company/Department Purpose As of the Assessment Center Service requested by (Company Name) to identify potential

More information

CANDIDATE FEEDBACK REPORT KATHERINE ADAMS

CANDIDATE FEEDBACK REPORT KATHERINE ADAMS CANDIDATE FEEDBACK REPORT KATHERINE ADAMS Report Date: 24 Aug 2016 Position: Example Position Client/Company: ABC Company Assessments Included Report Interpretation Module Assessment Date Results Valid

More information

LaunchPad psychometric assessment system An overview

LaunchPad psychometric assessment system An overview LaunchPad psychometric assessment system An overview P ERCEPT RESOURCE MANAGEMENT INDEX LAUNCHPAD OVERVIEW...1 LaunchPad s outstanding value proposition...1 THE FEATURES AND FUNCTIONS OF LAUNCHPAD...2

More information

The effective recruitment and selection practices of organizations in the financial sector operating in the Slovak republic

The effective recruitment and selection practices of organizations in the financial sector operating in the Slovak republic The effective recruitment and selection practices of organizations in the financial sector operating in the Slovak republic Ľuba Tomčíková University of Prešov in Prešov Department of management Ul. 17

More information

CRITERION- REFERENCED TEST DEVELOPMENT

CRITERION- REFERENCED TEST DEVELOPMENT t>feiffer~ CRITERION- REFERENCED TEST DEVELOPMENT TECHNICAL AND LEGAL GUIDELINES FOR CORPORATE TRAINING 3rd Edition Sharon A. Shrock William C. Coscarelli BICBNTBNNIAL Bl C NTBN NI A L List of Figures,

More information

Watson-Glaser Critical Thinking Appraisal III (US)

Watson-Glaser Critical Thinking Appraisal III (US) Watson-Glaser Critical Thinking Appraisal III (US) Profile Report Candidate Name: Organization: Pearson Sample Corporation Date of Testing: 21-11-2017 (dd-mm-yyy) 21-11-2017 Page 1 of 5 Watson Glaser III

More information

Robotic Process Automation. Reducing process costs, increasing speed and improving accuracy Process automation with a virtual workforce

Robotic Process Automation. Reducing process costs, increasing speed and improving accuracy Process automation with a virtual workforce Robotic Process Automation Reducing process costs, increasing speed and improving accuracy Process automation with a virtual workforce What is Robotic Process Automation (RPA)? Advanced macros? Robots...

More information

STUDY SUBJECTS TAUGHT IN ENGLISH FOR EXCHANGE STUDENTS SPRING SEMESTER 2017/2018

STUDY SUBJECTS TAUGHT IN ENGLISH FOR EXCHANGE STUDENTS SPRING SEMESTER 2017/2018 STUDY SUBJECTS TAUGHT IN ENGLISH FOR EXCHANGE STUDENTS SPRING SEMESTER 2017/2018 1-3 YEAR Study programme: INTERNATIONAL BUSINESS Credits Description of study subject (ECTS) Subject International Business

More information

Before You Start Modelling

Before You Start Modelling Chapter 2 Before You Start Modelling This chapter looks at the issues you need to consider before starting to model with ARIS. Of particular importance is the need to define your objectives and viewpoint.

More information

A Quality Assurance Framework for Knowledge Services Supporting NHSScotland

A Quality Assurance Framework for Knowledge Services Supporting NHSScotland Knowledge Services B. Resources A1. Analysis Staff E. Enabling A3.1 Monitoring Leadership A3. Measurable impact on health service Innovation and Planning C. User Support A Quality Assurance Framework for

More information

Linking Current and Future Score Scales for the AICPA Uniform CPA Exam i

Linking Current and Future Score Scales for the AICPA Uniform CPA Exam i Linking Current and Future Score Scales for the AICPA Uniform CPA Exam i Technical Report August 4, 2009 W0902 Wendy Lam University of Massachusetts Amherst Copyright 2007 by American Institute of Certified

More information

Reliability & Validity

Reliability & Validity Request for Proposal Reliability & Validity Nathan A. Thompson Ph.D. Whitepaper-September, 2013 6053 Hudson Road, Suite 345 St. Paul, MN 55125 USA P a g e 1 To begin a discussion of reliability and validity,

More information

abc GCE 2005 January Series Mark Scheme Economics ECN2/1 & ECN2/2 The National Economy

abc GCE 2005 January Series Mark Scheme Economics ECN2/1 & ECN2/2 The National Economy GCE 2005 January Series abc Mark Scheme Economics ECN2/1 & ECN2/2 The National Economy Mark schemes are prepared by the Principal Examiner and considered, together with the relevant questions, by a panel

More information

Program Assessment. University of Cincinnati School of Social Work Master of Social Work Program. August 2013

Program Assessment. University of Cincinnati School of Social Work Master of Social Work Program. August 2013 University of Cincinnati School of Social Work Master of Social Work Program Program Assessment August 01 Submitted to the College of Allied Health Sciences University of Cincinnati 1 University of Cincinnati

More information

Getting Started with OptQuest

Getting Started with OptQuest Getting Started with OptQuest What OptQuest does Futura Apartments model example Portfolio Allocation model example Defining decision variables in Crystal Ball Running OptQuest Specifying decision variable

More information

Student Workbook. Designing A Pay Structure TOTAL REWARDS. Student Workbook. STUDENT WORKBOOK Designing A Pay Structure. By Lisa A. Burke, Ph.D.

Student Workbook. Designing A Pay Structure TOTAL REWARDS. Student Workbook. STUDENT WORKBOOK Designing A Pay Structure. By Lisa A. Burke, Ph.D. Case Study and Integrated Application Exercises By Lisa A. Burke, Ph.D., SPHR Student Workbook Student Workbook TOTAL REWARDS 2008 SHRM Lisa Burke, Ph.D., SPHR 45 46 2008 SHRM Lisa Burke, Ph.D., SPHR INSTRUCTOR

More information

STATISTICAL TECHNIQUES. Data Analysis and Modelling

STATISTICAL TECHNIQUES. Data Analysis and Modelling STATISTICAL TECHNIQUES Data Analysis and Modelling DATA ANALYSIS & MODELLING Data collection and presentation Many of us probably some of the methods involved in collecting raw data. Once the data has

More information

DIPLOMA OF HUMAN RESOURCES MANAGEMENT-BSB50615 Study Support materials for Manage recruitment selection and induction processes BSBHRM506

DIPLOMA OF HUMAN RESOURCES MANAGEMENT-BSB50615 Study Support materials for Manage recruitment selection and induction processes BSBHRM506 DIPLOMA OF HUMAN RESOURCES MANAGEMENT-BSB50615 Study Support materials for Manage recruitment selection and induction processes BSBHRM506 STUDENT HANDOUT This unit describes the performance outcomes, skills

More information

HTS Report. d2-r. Test of Attention Revised. Technical Report. Another Sample ID Date 14/04/2016. Hogrefe Verlag, Göttingen

HTS Report. d2-r. Test of Attention Revised. Technical Report. Another Sample ID Date 14/04/2016. Hogrefe Verlag, Göttingen d2-r Test of Attention Revised Technical Report HTS Report ID 467-500 Date 14/04/2016 d2-r Overview 2 / 16 OVERVIEW Structure of this report Narrative Introduction Verbal interpretation of standardised

More information

Influence of the Criterion Variable on the Identification of Differentially Functioning Test Items Using the Mantel-Haenszel Statistic

Influence of the Criterion Variable on the Identification of Differentially Functioning Test Items Using the Mantel-Haenszel Statistic Influence of the Criterion Variable on the Identification of Differentially Functioning Test Items Using the Mantel-Haenszel Statistic Brian E. Clauser, Kathleen Mazor, and Ronald K. Hambleton University

More information

Test and Measurement Chapter 10: The Wechsler Intelligence Scales: WAIS-IV, WISC-IV and WPPSI-III

Test and Measurement Chapter 10: The Wechsler Intelligence Scales: WAIS-IV, WISC-IV and WPPSI-III Test and Measurement Chapter 10: The Wechsler Intelligence Scales: WAIS-IV, WISC-IV and WPPSI-III Throughout his career, Wechsler emphasized that factors other than intellectual ability are involved in

More information

Staffing Organizations (2nd Canadian Edition) Heneman et al. - Test Bank

Staffing Organizations (2nd Canadian Edition) Heneman et al. - Test Bank Chapter 08 1. External selection refers to the assessment and evaluation of external job applicants. 2. Understanding the legal issues of assessment methods is necessary. 3. Cost should not be used to

More information

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended. Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide

More information

Near-Balanced Incomplete Block Designs with An Application to Poster Competitions

Near-Balanced Incomplete Block Designs with An Application to Poster Competitions Near-Balanced Incomplete Block Designs with An Application to Poster Competitions arxiv:1806.00034v1 [stat.ap] 31 May 2018 Xiaoyue Niu and James L. Rosenberger Department of Statistics, The Pennsylvania

More information

Identifying the people to grow your organisation and deliver business results

Identifying the people to grow your organisation and deliver business results Innovative talent assessment solutions for Retail & Hospitality Identifying the people to grow your organisation and deliver business results Meeting the key retail & hospitality talent challenges Company

More information

The 360-Degree Assessment:

The 360-Degree Assessment: WHITE PAPER WHITE PAPER The : A Tool That Can Help Your Organization Maximize Human Potential CPS HR Consulting 241 Lathrop Way Sacramento, CA 95815 t: 916.263.3600 f: 916.263.3520 www.cpshr.us INTRODUCTION

More information

Training Watson: How I/O Psychology, Data Science, and Engineering integrate to produce responsible AI in HR.

Training Watson: How I/O Psychology, Data Science, and Engineering integrate to produce responsible AI in HR. Training Watson: How I/O Psychology, Data Science, and Engineering integrate to produce responsible AI in HR. Stefan Liesche, IBM Distinguished Engineer - Watson Talent Architecture Nigel Guenole, IO Psychologist

More information