(a) What is the sample size for this poll? The sample size for this poll is = 14,484 people.

Size: px
Start display at page:

Download "(a) What is the sample size for this poll? The sample size for this poll is = 14,484 people."

Transcription

1 1. A researcher for Consumer Reports is interested in comparing the effectiveness of two dandruff shampoos. One possibility is to have a simple randomized design that assigns shampoo A to ten subjects and shampoo B to another ten subjects. An alternative is a matched pairs design that assigns shampoo A to ten subjects in the first week, and then assigns shampoo B to the same ten subjects in the second week. (a) Compared to the simple randomized design, what is an advantage and disadvantage of this matched pairs design? The major advantage of the matched pairs design over the simple randomized design is that it eliminates many confounding variables, since the treatment and control groups have the same subjects. The major disadvantage to this matched pairs design is that the order of the shampoos is a confounding variable that can influence the subject s response, and this is not balanced in this design (i.e., shampoo B is always administered second). (b) How could the disadvantages be rectified? The disadvantage could be rectified by randomizing the order of the shampoos for each subject by a coin toss, which will help to balance the number of subjects that get shampoo A first vs. shampoo B first. Alternatively, you could choose to administer shampoo A to one side of the scalp and shampoo B to the other side, and randomize which side of the scalp receives which shampoo. 2. Moderate use of alcohol is associated with better health. Some studies suggest that drinking wine rather than beer or spirits confers added health benefits. (a) Explain the difference between an observational study and an experiment to compare people who drink wine with people who drink beer. For an observational study, we would classify each subject as either a wine drinker or a beer drinker (without directly controlling what each subject drinks). On the other hand, for an experiment, we would assign each subject to drink either wine or beer. In either case, we would then observe the health of the subjects over time. (b) Suggest some characteristics of wine drinkers that might benefit their health. In an observational study, these characteristics are confounded with drinking wine. Wine drinkers might be wealthier (since wine typically is more expensive than beer), be better educated, and have different dietary habits. (Any reasonable suggestion is acceptable here). 3. The Excite Poll can be found online at poll.excite.com. The question appears on the screen, and you simply click buttons to vote Yes, No, Not sure, or Don t care. On February 17, 2004, the question was Do you think that beer advertisements are targeted towards minors? In all, 4316 (29%) said Yes, another 8986 (62%) said No, and the remaining 1182 were not sure or didn t care. (a) What is the sample size for this poll? The sample size for this poll is = 14,484 people.

2 (b) That s a much larger sample than standard sample surveys. In spite of this, we can t trust the result to give good information about any clearly defined population. Why? First, the sample consists only of those people who went to excite.com and felt strongly enough to respond. Hence, we do not obtain any information about those people who do not visit excite.com or did not feel strongly enough to respond. In addition, in this voluntary response sample, it is possible that an individual could respond multiple times. 4. Coin tossing can illustrate the idea of a sampling distribution. The population is all outcomes (heads or tails) we would get if we tossed a coin forever. The parameter p is the proportion of heads in this population. We suspect that p is close to 0.5. That is, we think the coin will show about one-half heads in the long run. The sample is the outcomes of 20 tosses, and the statistic pˆ is the proportion of heads in these 20 tosses (the number of heads obtained by 20). (a) Toss a coin 20 times and record the value of pˆ. (Values will vary). Our 20 tosses gave 8 heads, so p ˆ = (b) Repeat this sampling process 10 times. Make a histogram of the 10 values of pˆ. You are constructing the sampling distribution of pˆ. Is the center of this distribution close to 0.5? The 10 values we obtained in our trials were {0.4, 0.35, 0.6, 0.5, 0.6, 0.45, 0.4,, 0.35, 0.55} (but results will vary). The histogram is shown below: The center of this distribution is around 0.45, so reasonably close to 0.5 (given the small number of trials we ran). 5. Here are the percents of women among students seeking various graduate and professional degrees in the academic year: Degree Percent female

3 Master s in business administration Master s in education Other master of arts Other master of science Doctorate in education Other PhD degree Medicine (MD) Law Theology (a) Explain clearly why we cannot use a pie chart to display these data. The most obvious reason why a pie chart cannot be used in this instance is because the percentages do not add up to 100% (which is required for a pie chart). (b) Make a bar graph of the data. (Comparison are easier if you order the bars by height.) M. Ed. PhD Ed. M.A. Other PhD M.S. Law MD MBA Theol. As we can see, women make up the largest % of students seeking graduate degrees in education (and a small % of students seeking a graduate degree in theology). 6. Burning fuels in power plants or motor vehicles emits carbon dioxide (CO 2 ), which contributes to global warming. The following table displays CO 2 emissions per person from countries with population at least 20 million. Carbon dioxide emissions, metric tons per person Country CO 2 Country CO 2 Country CO 2

4 Algeria Argentina Australia Bangladesh Brazil Canada China Colombia Congo Egypt Ethiopia France Germany Ghana India Indonesia Italy Iran Iraq Japan Kenya Korea, North Korea, South Malaysia Mexico Morocco Myanmar Nepal Nigeria Pakistan Peru Philippines Poland Romania Russia Saudi Arabia South Africa Spain Sudan Tanzania Thailand Turkey Ukraine United Kingdom United States Uzbekistan Venezuela Vietnam (a) Why do you think we choose to measure emissions per person rather than total CO 2 emissions for each country? Total CO 2 emissions for each country would be influenced by the population of each country. So, countries with higher populations would have higher total CO 2 emissions. To mitigate the effect of larger populations, we divide the total emissions by the population, so the average per person is (or should be) uninfluenced by the size of the population. (b) Display the data in a graph. Describe the shape, center, and spread of the distribution. Which countries are outliers? Here is a histogram of the emissions per person (which allows us to see how the data are distributed): Count As we can see, the data is fairly heavily skewed to the right, and the peak is (probably) around 1. The center is somewhere between 2 and 3 (approximately). The data are fairly spread out (certainly, we would not describe this as tight ). Officially, only the

5 U.S. qualifies as an outlier, but we would probably also describe Canada and Australia as extremely large values.