Statistics 511 Additional Materials

Size: px
Start display at page:

Download "Statistics 511 Additional Materials"

Transcription

1 Statistics 5 Additional Materials Confidence intervals for difference of means of two independent populations, µ -µ 2 Previously, we focused on a single population and parameters calculated from that population. Often we want to compare two populations. In this section, we will be interested in comparing the means of two populations. Specifically, we will consider the difference between the means of two populations. This type of comparison for two populations we will want to make is between independent populations. Independent populations imply the two populations are distinct and are not related. We might be interested in the iron levels of the blood in two different species of elephants. We take a sample from each population and compare the means for each sample. Another common occurrence is for the two populations to be similar but for each population to receive a different treatment. Comparison is then made on a measurement related to the treatments given. For example, one fourth-grade class at Welch elementary might be shown a DVD about volcanoes, while the second fourth grade class at Welch elementary would read an article about volcanoes. The two groups would be given the same test about volcanoes. We could then compare the two groups to see if there is a difference in the means of the two populations. We often we are interested in whether or not there is a difference between the means of the two populations. Remember that because of sampling variability a difference in the sample means may not imply that the population means are different. To account for this variability, we use a confidence interval. As described below, we can create a confidence interval for the difference of the mean of the two populations. If the two populations would have the same mean, then the difference of the means would be 0 (zero). For example, call the mean of the first population, µ and the mean of the second population, µ 2. If µ = µ 2, then µ µ 2 = 0. Consequently, when we consider the confidence intervals, we are interested in whether or not 0 (zero) is inside the confidence interval. If zero is inside the confidence interval then, we would conclude there may be no significant difference in the means of the two populations. Confidence interval for the difference of two independent population means, µ -µ 2 (Small Samples) With two independent populations, we have two different samples from two different populations. We need special notation to distinguish the two populations. From the first population, we will have a sample of size n. The sample mean of those n observations will be X and the sample standard deviation will be s. For the first population, we will refer to the population mean as µ and the population standard deviation as σ. From the second population, we will have a sample of size. The sample mean of those observations will be X 2 and the sample standard deviation will be s 2. For the second population, we will refer to the population mean as µ 2 and the population standard deviation as σ 2. Page of 5

2 Statistics 5 Additional Materials The following (-α)*00% CI for the difference of independent means can be used when. n <30 and <30 and 2. σ = σ 2 and 3. each population possess a Normal distribution. (X ) ± t (n + 2, α 2) * s p * where + n s p = (n )s ( )s 2 n + 2 s p represents a type of average of the standard deviations (called the pooled standard deviation) from the two samples. It is necessary to calculate s p before you can complete the calculation of the confidence interval. Note that s x x 2 = s p * n +. Example: Suppose we want to construct a 90% CI for the difference of independent population means. Also suppose that X = 49.37, s = 4.89, n = 25. X 2 =52.3, s 2 = 5.38, = 6. s p = (n )s ( )s 2 n + 2 = s = (25 ) (6 ) ( ) = Then, (X ) t (n + 2, α 2 ) * s p * n + = ( ) ± t (39,0.05) * * 25 + `6 = 2.76 ±.6839*5.084 * Page 2 of 5

3 Statistics 5 Additional Materials = 2.76 ± = ( , ) So we are 90% confident that the difference of µ µ 2 is between and Confidence interval for the difference of independent population means, µ -µ 2 (Large Samples). When we have two large samples (each sample has at least 30 observations), we can use the following formula: (X ) ± z * s 2 + s 2 2 ( α 2 ) n Example: The lifetimes of calculator batteries is being investigated by Consumer Digest. They find that the mean length of 45 Everuse batteries is hours and the sample standard deviation is hours. For ComVac the mean length of 50 batteries is hours and the sample standard deviation is hours. Construct a 95% confidence interval for the mean difference of lifetimes for these two batteries. We have two distinct sets of batteries. Each battery in one population is unrelated to another battery in the other population; so they are independent populations. For the samples that we have, (call the Everuse batteries population and the ComVac batteries populatio), both n and are more than 30. Consequently, we can use the formula below to make our confidence intervals. (X ) ± z * s 2 + s 2 2 ( α 2 ) n = ( ) ± z (0.025) * Page 3 of 5

4 Statistics 5 Additional Materials = 5.94 ±.96* = 5.94 ±.96 * = 5.94 ±5.642 = (-0.448, ) We are 95 % confident that the difference in calculator batteries mean lifetimes is between Everuse and ComVac is between and Note that the differences in the sample means was 5.94; however since zero was inside the confidence interval, we conclude with 95% confidence that 0 is a possible value for the difference between the population means. This is because of the variability present from sample to sample. Because of the variability, we must conclude, with 95% confidence, that there is probably no significant difference between the means of these two populations. Some notes on confidence intervals:. This topic on confidence intervals is the first that develops ideas that are statistical. For most people it is a new way of thinking. It implies that a point estimate of a parameter is likely not the actual value of the parameter. This forces us to acknowledge the variability from sample to sample. And we must recognize that there is sampling variability in any estimate, which includes almost every statistic reported in the media and in research. 2. The reason for using a CI is the presence of variability from one sample to another sample. Each sample is different; each sample gives us a different value for a statistic. The range of a confidence interval gives an indication of how much variability there is in the sample it was derived from. Another way to think of this is that the smaller the variability in the sample, the more accurate the information we have about the location of the mean. 3. There are three factors that influence the size or width of a confidence interval. The sample size n (or n and ). As n increases, the width of the CI decreases. Confidence level (-α). As confidence level increases, the width of the CI increases. The sample standard deviation s. The bigger s is, the wider the CI is. 4. As mentioned in the previous note, the samples size is defined to be the number of observations in a sample. It is possible to determine the minimum sample size required Page 4 of 5

5 Statistics 5 Additional Materials to estimate a population parameter with a specified precision at a given confidence level. This is discussed during Stat5 lectures. 5. The consequence of not having the assumptions met for a particular confidence interval is that the confidence level is likely incorrect. In almost all cases this means that the confidence level is lower than it should be. That is, if we make a 95% confidence interval but not all the assumptions are met for this interval, then the true confidence level will be less (often much less) than 95%. Page 5 of 5