Applying the Principles of Item and Test Analysis

Size: px

Start display at page:

Download "Applying the Principles of Item and Test Analysis"

Alexina Melton
5 years ago
Views:

1 Applying the Principles of Item and Test Analysis 2012 Users Conference New Orleans March 20-23

2 Session objectives Define various Classical Test Theory item and test statistics Identify poorly performing items and tests using item and test analysis reports Use item analysis reports to guide item revisions to improve your assessments Slide 2

A global network of professional services firms. 150,000 staff members across the network. 35,000 are in the USA. The development of our people is a top priority.

3 A global network of professional services firms. 150,000 staff members across the network. 35,000 are in the USA. The development of our people is a top priority. PwC is in Training Magazine s Hall of Fame and is the only company to have been awarded #1 in the Top 125 for 3 years in a row. We have a highly mobile and virtual workforce. Slide 3

4 Introduction to Test Theory Slide 4

5 What is test theory? Mathematical concepts to help answer your questions about assessment quality: Does the test measure one thing or multiple things? How should the test be scored? How precisely does the assessment measure the knowledge? Are any items influenced by factors other than what you are trying to measure (a.k.a. bias or "irrelevant variance")? Can we use alternative, equivalent items to test the same thing? Slide 5

6 Assessments are measurement scales! Measurement is the assignment of numbers to an attribute according to a rule. Most physical measurement scales we take for granted: Temperature Weight Volume Slide 6

7 Assessments have a limit Assessment scales have a limited range. They have what is called a Floor (0%) & Ceiling (100%) No Knowledge New Hire Novice 0% 100% Expert Knowledge Experienced Hire Master Test questions measure only part of the possible range of knowledge. Slide 7

8 Slide 8 Assessments can use different scales Different metrics can be used for the same thing. We assign the meaning to the numbers. Think of the differences between Celsius and Fahrenheit. Number Correct Square Square Root Log-odds Scale Passed all items ? Passed 9 items Passed 8 items Passed 7 items Passed 6 items Passed 5 items Passed 4 items Passed 3 items Passed 2 items Passed 1 items Failed all items 0 0 0?

9 Is your assessment a norm-referenced scale OR a criterion-referenced scale? A norm-referenced assessment measures people against a defined population. How do you measure up against other people? A Criterion-referenced assessment measures people against a defined domain of knowledge. Have you mastered the material in a domain of knowledge? Normal distribution of scores Negative distribution of scores 0% 0% 100% 100% Slide 9

10 Classical Test Theory is Based on the assumption that all the test items measure the same concept. Concerned mainly with the overall test score. Used to improve the quality of the test score. True score = observed test score + error Slide 10

11 Reliability & Validity Reliability Are test scores consistent across factors which should not influence the score (time, versions, environment, etc)? Are test items all measuring the same thing? Validity Are you measuring what you intended to measure? Is the test score being used appropriately? We will be addressing mainly reliability Slide 11

12 Reliability Measures Slide 12 In general: There are many of them Usually take the form of a correlation with a range from 0 to 1. Closer to 1 is better Below.5 is unacceptable Some measure consistency across different factors Test-Retest reliability Alternative forms reliability Others measure internal consistency (Are you measuring a single topic well?) Split-half Reliability = Split the test in half and correlate the 2 scores Cronbach's alpha (α) = All possible split half combinations Most appropriate for Norm-referenced tests

13 Test Analysis Slide 13

14 Test Analysis: First examine your sample Do you have enough people in your sample to have confidence in the statistics? You can get by with people for pilot testing in lowmoderate stakes assessments. Ideally you want 100 people for a solid analysis. Do you have the expected distribution of scores for your testing program? Norm-referenced = normal distribution Criterion-referenced = negatively skewed distribution Slide 14

15 Test Analysis: Score distribution Several statistics will help you understand your distribution: Mean The mathematical average of all scores. The mean can be misleading if your distribution is skewed. Median The middle value. You should use the median when you have a skewed distribution. Mode The most common value in the distribution. Skew Tells you how evenly scores are distributed around the mean. Negative skew more values are higher than the mean Positive skew more values are lower than the mean Zero skew score are evenly distributed around the mean Kurtosis A measure of the peakedness of the distribution Slide 15

16 Test Analysis: Histogram If you don t like all those numbers just look at the histogram! Slide 16

17 Test Analysis: The Numbers These statistics correspond to the previous slide histogram. Statistic Value Number of Examinees 193 Mean 21 Median 22 Mode 22 Skew Kurtosis Slide 17

18 Test Analysis: Overall Difficulty Next check your floor and ceiling to ensure the test is targeted to the population s ability. Is your test too hard? What was the minimum score? For a multiple choice test how many people are scoring around 25%? Is your test too easy? What was the maximum score? How many people are scoring 95% or higher? Slide 18

19 Test Analysis: Standard Error of Measurement (SEM) Shows the average amount of measurement error around a test score. Use to create confidence intervals around a test score. The true score is likely to fall within the range. SEM is best used with norm-referenced tests. SEM = 2 68% confident that the true score is between 12 and Slide 19 Passing score = 14

20 Test Analysis: Internal Consistency Cronbach s Alpha (α) Internal consistency.9 Excellent.9 > α.8 Good.8 > α.7 Acceptable.7 > α.6 Questionable.6 > α.5 Poor <.5 Unacceptable Cronbach s Alpha is the most popular reliability measure. A very high reliability may indicate you have redundant items. Topic or sub-scores should also be reliable. Slide 20

21 Improving Reliability Have a consistent controlled testing environment & provide clear instructions. Have a large group of examinees with a broad range of ability. If everyone is of equal ability they will score relatively the same on the test and thus reliability index will be low. Use objectively scored test items (multiple choice items). Items like an essay which requires scoring from the teacher tends to have lower reliability due to additional error introduced by teacher judgment. Slide 21

22 Improving Reliability Increase the test length In general more items means less error for the test (Google Spearman-Brown formula ). Use only quality items Develop items using best practices to reduce lucky guessing. Delete or edit items which are too easy, too difficult, or otherwise do not help to differentiate those with knowledge. See Developing and Validating Multiple-Choice Test-Items by Thomas M. Haladyna q=multiple%20choice&pg=pp1#v=twopage&q&f=false Slide 22

23 Does this look like a good test? Statistic Value Number of examinees 90 Minimum achieved score 48% Maximum achieved score 95% Test reliability (Cronbach's Alpha) Mean 80.72% Median 81.25% Mode 81.25% Standard deviation 9.53% Standard error of measurement 4.31% Skew Kurtosis Slide 23

24 Does this look like a bad test? Statistic Value Number of examinees 193 Minimum achieved score 10% Maximum achieved score 31% Test reliability (Cronbach's Alpha) Mean 24.39% Median 25.29% Mode 25.29% Standard deviation 3.18% Standard error of measurement 4.16% Skew Kurtosis This test used an item banking approach with random selection of items. Beware only use basic test analysis reports when all examinees get exactly the same items. Slide 24

25 Item Analysis Slide 25

26 Anatomy of an item Stem Stem The actual question being asked Options all choices Key Correct answer Distractors The wrong answer choices Feedback Explanatory remediation as to why that particular answer choice is incorrect. Key Distractors What Code section deals with the taxable gain to a corporation when a corporation distributes property to a shareholder? A. 301 Incorrect. This section deals with the character of the amount received by a shareholder from a corporation. B. 311 Correct. C. 312 Incorrect. This section deals with the effect on earnings and profits of a transaction. D. 334 Incorrect. This section deals with the basis of property received in a liquidation. Feedback

27 Item statistics: Definitions Number of examinees: The number of examinees answering the question. Aim to get at least 30 responses however 100 is ideal P-Value: The percentage of examinees who chose the correct answer. For norm-referenced tests you'll want a wide range of P-values For Criterion-referenced tests you'll want more around your passing score Distractor Percentage: The percentage of examinees who chose a wrong answer If zero - 5% consider replacing with a more attractive distractor All wrong answers should be common mistakes a novice would make Item-Total Correlation OR Discrimination: A correlation between picking an option and the total score on the test. Theory is if you get the question right you should be scoring higher on the test than those who got the question wrong. Correct answer should have strong positive correlation with the total test score (if that question is measuring the same thing as the other questions on the test). Item-total correlation is influenced by P-value so expect lower values on very hard or very easy items. A P-value of 0 or 1 will always give you a zero correlation. Slide 27

28 Item Statistics: Some Guidelines Items too hard or too easy? Items tricky or confusing? P-Value < 50% Very Difficult 51% - 64% Difficult 65% - 75% Good 76% - 94% Easy 95% - 100% Very Easy Discrimination <.20 Low Moderate >.30 Good Slide 28

29 Example #1 - What would you do with this item? Stat A B C D Total # Examinees P-Value/Distractor % 0% 0% 0% 100% 100% Item-total Correlation Correct answer Slide 29

30 Example #1 Too Easy Which of the following most accurately presents the reconciliation of Partners' Capital found in Schedule M 2 of Form 1065? Original A Beginning Capital Accounts minus Distributions equals ending capital accounts. B C Beginning Equity minus Distributions plus stock buy-backs equals ending equity. Beginning Retained Earnings plus current year income minus dividends equals ending retained earnings. D Beginning Capital plus current year net income plus capital contributions minus distributions equals ending capital. Revised BOY Capital plus Guaranteed Payments plus/minus CY Net Income minus Distributions equals EOY Capital. BOY Capital plus Distributions plus/minus CY Net Income equals EOY Capital. BOY Capital minus Capital Contributions plus/minus CY Income equals EOY Capital. BOY Capital plus current year net income plus Capital Contributions minus Distributions equals EOY Capital. Slide 30

31 Example #1 - Revised Stat A B C D Total # Examinees P-Value/Distractor % 6% 5% 7% 83% 100% Item-total Correlation Correct answer Slide 31

32 Example # 2 - What could be wrong with this item? Stat A B C D Total # Examinees P-Value/Distractor % 70% 4% 22% 4% 100% Item-total Correlation Correct answer Slide 32

33 Example #2 Confusing the Learner Which of the following is the best definition of a Book Tax Return? Original A A business tax return that reflects taxable income according to financial statement rules instead of tax rules. B C A business return prepared using federal tax rules in calculating taxable income. A business return prepared using tax rules to calculate taxable income but reflects a book basis balance sheet. D A business return prepared using international accounting standards. Revised A return that reflects taxable income according to financial statement rules instead of tax rules. A return prepared using federal tax rules in calculating taxable income. A return prepared using tax rules to calculate taxable income and a GAAP balance sheet. A return prepared using international accounting standards. Slide 33

34 Example # 2 - Revised Stat A B C D Total # Examinees P-Value/Distractor % 57% 10% 32% 1% 100% Item-total Correlation Correct answer Slide 34

35 Example #3 - What could be wrong with this item? Stat A B C D Total # Examinees P-Value/Distractor % 78% 9% 13% 0% 100% Item-total Correlation Correct answer Slide 35

36 Example #3 Bad Item Format Dalton Enterprises, Inc sold investment assets this year. Will Form 4797 be required? Original A No, because the assets were capital assets and reported on schedule D B C No, because the assets qualified as "involuntary conversion" instead of a sale Yes, because all asset sales must be recorded on this form D Yes, because the sale was a tax free sale Dalton Enterprises, Inc sold investment assets this year. Will Form 4797 or schedule D be required? Revised Because the assets were capital assets, Schedule D will be filed. If the assets were sold at a loss and used in the business Schedule D will be filed If the assets were sold at a gain Form 4797 will be filed. Since investment assets are tax exempt Form 4797 is filed Slide 36

Example #3 - Revised Stat A B C D Total # Examinees 86 5 4 9 104 P-Value/Distractor

37 Example #3 - Revised Stat A B C D Total # Examinees P-Value/Distractor % 83% 5% 4% 9% 100% Item-total Correlation Correct answer Slide 37

38 Additional Resources National Council on Measurement in Education free articles: Understanding Reliability: Standard error of measurement: 6E9DDC581EE47F88/showMeta/0/ Practical Assessment, Research & Evaluation: Writing Multiple Choice Items: Basic Item Analysis: Slide 38

39 Contact Information You can find me on LinkedIn Slide 39