AP Statistics: Chapter 4.1, Pages Summarized by Lauren Chambers, Logan Wagner, Jack Thompson

Size: px
Start display at page:

Download "AP Statistics: Chapter 4.1, Pages Summarized by Lauren Chambers, Logan Wagner, Jack Thompson"

Transcription

1 AP Statistics: Chapter 4.1, Pages Summarized by Lauren Chambers, Logan Wagner, Jack Thompson

2 Inference for Sampling (pg 220) Inference - the process of drawing conclusions about a population on the basis of sample data. - Sample data only estimates the truth about a population. It is unlikely that results from a random sample are exactly the same for an entire population. - The reason we rely on random sampling is to eliminate bias in selecting samples from the list of available.

3 Inference and Random Sampling (pg 220) Inference from convenience samples and voluntary response samples would be misleading because these samples are biased. Instead, rely on random sampling to draw inferences Reasons to rely on random sampling for inference: 1. Bias is eliminated when selecting individuals. 2. The laws of probability allow trustworthy inference about the population a. The results of random sampling don t dramatically change from sample to sample

4 Inference and Random Sampling Cont. (pg ) A margin of error comes with the results of random samples. They tell us the maximum difference expected between the predicted value from the sample and the actual value from the population. In other words, it is how much sampling variability to expect. Important: - Larger random samples give better information about the population than smaller samples because they are more representative of the entire population.

5 Sampling Errors (pg 221) Sampling errors are mistakes made in the process of taking a sample that could lead to inaccurate information about the population. Can occur through: 1. Bad sampling methods (voluntary response, convenience sample - this can easily be avoided) 2. Undercoverage a. Occurs when some groups in the population are left out of the process of choosing a sample A sampling frame is a list of individuals from which the sample is drawn, but one that lists every individual of the population is hard to come by.

6 Question 27: Page 229 Sampling Frame Ideally, the sampling frame in a sample survey should list every individual in the population, but in practice, this is often difficult Suppose that a sample of households in a community is selected at random from the telephone directory. Explain how this sampling method results in undercoverage that could lead to bias.

7 Question 27 Answer This sampling method could lead to bias because individuals who can t afford a phone, who choose not to have a phone, and individuals who do not consent to have their number published will not be included in this survey. This is an example of undercoverage. This may lead to bias in the variable being measured because it is affected by whether or not people are included in the telephone directory.

8 Nonsampling Errors (pg 222) Nonsampling errors are produced by factors other than those associated with sample selection. One of the biggest sources of bias in sample surveys is nonresponse. - Occurs when an individual chosen for the sample cannot be contacted or refuses to participate, and nonresponse to surveys often exceed 50% Caution! - Nonresponse can occur only after a sample has been selected. In a voluntary sample, everyone has chosen to participate thus there is no nonresponse. Do NOT misuse the term voluntary response to explain why individuals don t respond in a survey.

9 Example: The ACS, GSS, and Opinion Polls The Census Bureau's American Community Survey has a very low percent who refuse to respond (1%) and nonresponse rate (2.5%) because it is mandatory for every household to participate. The University of Chicago s General Social Survey is a very important social science survey that has a 30% nonresponse rate.the Pew Research Center for the People and the Press used a careful random digit dialing survey that reached 76% of households, yet were only able to complete 38%, so they had a nonresponse rate of 73%

10 Question 31: Page 229 Nonresponse A survey of driver began by randomly sampling all listed residential telephone numbers in the indicted states. Of 45,956 calls to these numbers, 5029 were completed. The goal of the survey was to estimate how far people drive, on average, per day. A) What was the rate of no response for this sample. B) Explain how no response can lead to bias in this survey. Be sure to give the direction of the bias.

11 Question 31 Answer A) the rate of nonresponse: ( )/45956 = The survey had a nonresponse rate of 89.06% B) People who aren t away from their homes very often are more likely to answer their house phone. This means that there would be bias because if they are at their house more often, they therefor drive less per day. This sample will yield a lower average driving distance per day than what the overall population actually drives.

12 Nonsampling Errors Cont. (pg 223) Response bias is another source of nonsampling error. This occurs when there is a systematic pattern of incorrect responses given. - Individuals may purposely respond incorrectly to make themselves look better - Individuals may respond incorrectly due to faulty memory - Individuals responses to certain questions may be influenced by the race or gender of the interviewer Good interviewing technique can reduce response bias.

13 Nonsampling Errors Cont. (pg 224) The most important influence on the answers given to a sample survey is the wording of the questions. Confusing or leading questions can introduce strong bias. Changes in wording can greatly change a survey s outcome. The order in which the questions are asked matters. Caution! Don t trust the results of a sample survey until you have read the exact questions asked.

14 Example: How Do Americans Feel about Illegal Immigrants The survey questions Should illegal immigrants be prosecuted and deported for being in the U.S. illegally, or shouldn t they? and should illegal immigrants who haved worked in the United States for two years be given a chance to keep their jobs and eventually apply for legal status? give very different impressions of attitudes toward illegal immagrants and greatly differ the survey responses.

15 Question 33 A survey of drivers began by randomly sampling all listed residential telephone numbers in the United States. Of 45,956 calls, 5029 were completed. The goal of the survey was to estimate how far people drive on average per day. Using this sample, the investigators then choose an SRS of 880 of these drivers to answer questions on driving habits. One question was: Recalling the last ten traffic lights you drove through, how many of them were red when you entered the intersection? Of the 880 respondents, 171 admitted that at least one was red. A practical problem with this question is that people may not respond truthfully. What is the likely direction of the bias: do you think more or fewer than 171 of the 880 respondents really ran a red light and why?

16 Question 33 Answer More than 171 respondents are probable to have really ran a red light because running a red light makes them look bad. Some of the respondents would have lied to make themselves look better.

17 Question 35 Comment on each of the following potential survey questions. Is the question clear? Is it slanted toward a desired response?

18 Question 35 Answers a) The first sentence is very frightening. It also has no evidence to back up its claims. A small percentage of most populations, not just cell phone users, develop brain cancer, creating an increase of those who would favor a warning on cell phones b) This question only states the positive consequences of having a national system of health insurance, creating a probable increase of favoring one. c) The question has a bias toward favoring economic incentives. This would create an increase in responses to do the same.

19 Page 229: Question #29 Baseball Tickets Suppose you want to know the average amount of money spent by the fans attending opening day for the Cleveland Indians baseball season. You get permission from the team s management to conduct a survey at the stadium, but they will not allow you to bother the fans in the club seating or box seats (the most expensive seating). Using a computer, you randomly select 500 seats from the fans in those seats how much they spend that day A) Provide a reason why this survey might yield a biased result. B) Explain whether the reason you provided in (A) is a sampling error or non sampling error

20 Question 29 Answer A) This survey may yield a biased result because the sampling frame does not include those who are sitting in the most expensive seats. Those who can afford the most expensive seating are much more likely to be able to spams more money throughout the day. B) This is a sampling error because the error is a mistake made in the process of taking a sample. This sampling frame does not accurately represent the entire population (meaning that this is an example of undercoverage)

21 Explain how the wording of the question could result in bias. Be sure to specify the direction of the bias. Check Your Understanding 1. Each of the following is a source of error in a sample survey. Label each as sampling error or nonsampling error, and explain your answers. A) the telephone directory is used as a sampling frame B) The person cannot he contacted in five calls C) Interviewers choose people walking by on the sidewalk 2. A survey paid for by makers of disposable diapers found that 84% of the sample opposed banning disposable diapers. Here is the actual question: It is estimated that disposable diapers account for less than 2% if he trash in today s landfills. In contrast, beverage containers, third class mail, and yard waste re estimated to account for about 21% of the trash in landfills. Given this, in your opinion, would it be fair to ban disposable diapers?

22 1. Sampling error - this is an example of undercoverage because Jose who can not afford a phone, who choose not to have a phone, or those who are not listed will not be included in the sample. Nonsampling error - this is an example of nonresponse Sampling error - this is an example of a convenience sample. It is a bad sampling method that can result is bias. 2. The wording of this question could result in bias because: They have statistics that could easily mislead someone. They used three items (beverage containers, third class mail, and yard wastes) to get to a total of 21% while only using disposable diapers to counter it. They put pressure on the individual to answer what was implied as the obvious answer. The question t was given this information, it fair to ban disposable diapers?