Hypothesis Testing Chapter 21. HO = innocent. innocent HO true. True accept (1 α) innocent jury: innocent

Size: px
Start display at page:

Download "Hypothesis Testing Chapter 21. HO = innocent. innocent HO true. True accept (1 α) innocent jury: innocent"

Transcription

1 Hypothesis Testing Chapter 21 HO = innocent innocent HO true not innocent HO False Test says Accept HO True accept (1 α) innocent jury: innocent False accept "Type II error" β not innocent jury: innocent Test says Reject HO jury: guilty False reject "Type I error" (α) innocent jury: guilty True reject "Power" (1 β) not innocent jury: guilty 1

2 Hypothesis Testing * True accept 1 α * False accept Type II error (β) These values are critical, because of our interest in achieving a high rate of true rejection results (which are = to 1 β) * False reject Type I error (α) Common in experimental design α is often set at 0.05 (expect a false rejection 5% of the time) * True reject Equals (1 β) ("power") Usually want to achieve power of 90% or 95% This means β can only be 5% to 10% 2

3 3

4 Hypothesis Testing At a pharmaceutical company, a new drug has been developed which should reduce cholesterol much more than their current drug on the market. Is this true? H O : New drug has the same effect on cholesterol as current drug. H A : New drug reduces cholesterol more than current drug. Type I Error: We declare the new drug is more effective than current one, when it isn't. Type II Error: We declare the new drug is not more effective when it really is. Which error is more serious? It depends upon the situation. In rejecting H O, we are claiming the new is better than the standard. Here, a Type I error means we are claiming the new is better when it really isn't. A Type II error is serious because you have missed something which is better. Decision rule: We have 2 samples and must decide in the face of uncertainty. Choose a test statistic, (T), and a decision rule, "We reject H O if T is too large." How large is too large? Pick a probability for Type I error, α, (usually 0.05 or smaller). Determine how large is too large. 4

5 Example: A potato chip producer and its main supplier agree that each shipment of potatoes must meet certain quality standards. If the producer determines that more than 8% of the potatoes have "blemishes", the truck will be sent away to get another load of potatoes from the supplier. Otherwise, the entire truckload will be used to make potato chips. To make the decision, a supervisor will inspect a random sample of potatoes from the shipment. The producer will then perform a significance test using the hypotheses H O : p = 0.08 H A : p > 0.08 where p = actual proportion of potatoes with blemishes in a given truckload Type I error: Producer: p > 0.08 when actual proportion is 0.08 or less. Consequence: * Producer sends truckload of acceptable potatoes away. * Supplier may lose revenue. * Produce must wait for another shipment of potatoes before producing the next batch of potato chips. Type II error: Producer does not send truck away when > 8% of the potatoes in the shipment have blemishes. Consequence: * Producer uses the truckload of potatoes to make potato chips. * More chips will be made with blemished potatoes. * Customers may be upset and switch to another brand lost revenue for producer. 5

6 Hypothesis Testing Power Alpha level is selected suitably low, often at * This means there would only be a 5% chance of falsely rejecting H O and concluding that a difference exists when in fact, there is no difference (Type I error). ** Type I error (Reject H O when it is correct.) ** Type II error (Fail to reject H O when it is wrong.) One way to protect against Type I error is to reduce the α level to *This means there will only be a 1% chance of rejecting a true H O. Change in α will also affect the Type II error, in the opposite direction. ** Decreasing α from 0.05 to 0.01 increases the chance of a Type II error. ** This makes it harder to reject H O. Choosing the α level is a judgment call. * Drug studies may be set at 0.01 or even * Clinical/diagnostic studies commonly set at 0.05 to 0.01 * Lab method validation 0.05 to 0.01 studies 6

7 Hypothesis Testing The test statistic measures how likely we are to get such sample data given that the null hypothesis is true. * If the probability of getting such a sample is extremely low, then we are inclined to reject H O in favor of H A. * If the probability isn't too small, then H O might well be true and we would have to say, "fail to reject". To be objective, must choose the significance level before evaluating the test statistic. * The 5% functions as our level of reasonable doubt. * If we used 1%, then it would be harder to reject H O evidence would have to be stronger. Note: Simply because you've rejected H O does not allow you to prove H A. * Example: Rejecting H O that all swans are black by finding one white swan is not the same as proving all swans are black. * Rather, rejecting H O may provide some supporting evidence for HA, but does not provide complete evidence for anything. 7

8 The Power of a Statistical Test Power of a test against a specific alternative is the probability that the test will reject H O at a chosen significance level α when the specified alternative value of the parameter is true. * We can just as easily describe the test by giving the probability of making a Type II error (β). * The power of a test to detect a specific alternative is the probability of reaching the right conclusion when that alternative is true. Power of a test against a specific alternative value of the parameter is a number between 0 and 1. * A power close to 0 means the test has almost no chance of detecting that H O is false. * A power near 1 means the test is very likely to reject H O in favor of H A when H O is false. The significance level of a test is the probability of reaching the wrong conclusion when H O is true. Questions to answer in order to decide how many observations are needed: 1. Significance level how much protection do we want against a Type I error getting a significant result from our sample when HO is actually true? 2. Practical importance how large a difference between the hypothesized parameter value and the actual parameter value is important in practice? (The chip producer feels it's important to detect a shipment with 11% blemished potatoes a difference of 3% from the hypothesized value of p = 0.08). 3. Power How confident do we want to be that our study will detect a difference of the size we think is important? 8

9 The Power of a Statistical Test Example: Can a 6 month exercise program increase the total body bone mineral content (TBBMC) of young women? The researchers would like to perform a test of H O : μ = 0 H A : μ > 0 where μ is the true mean % change in TBBMC due to the exercise program To decide how many subjects they should include in their study, researchers begin by answering the 3 questions. 1. Significance level α 0.05 gives enough protection against declaring that the exercise program increases mineral bone content when it really doesn't (Type I error). 2. Practical importance a mean increase in TBBMC of 1% would be considered important. 3. Power the researchers want probability of at least 0.9 that a test at the chosen significance level will reject H O : μ = 0 when the truth is μ = 1. Can increase power by: 1. Increasing the size of the sample (to reduce the probability of a Type II error) when the significance level remains fixed. 2. Using a higher significance level (say, α = 0.10 instead of 0.05), which means increasing the risk of a Type I error. 9

10 10

11 11

12 12

13 HW p. 500; 12, 20, 21 ANS 13