Take a Sample, Please

Size: px
Start display at page:

Download "Take a Sample, Please"

Transcription

1 Connecting Algebra 2 to Advanced Placement* Mathematics A Resource and Strategy Guide New: 05/15/10 Take a, Please Objective: Students will be introduced to some of the basic concepts of sampling distributions in statistics. Connections to Previous Learning: Students should be familiar with generating random numbers on a calculator, calculating measures of central tendency, and drawing histograms and boxplots. Connections to AP*: AP Statistics Topics: Data Gathering and Simulation; Graphical Displays; Distributions: Measures of Center, Variability, and Shape Materials: Student Activity pages, graphing calculators Teacher Notes: This lesson provides an experimental approach to an introduction of the Central Limit Theorem, an important concept in AP Statistics. Simply stated, this theorem deals with the distribution of sample data, graphical displays of the data, and is used in inferential statistics to predict information about the entire population. The Central Limit Theorem and the resulting sampling distribution connect the real world to the imaginary model of the simulation and enable us to generalize about the entire population. The concepts are presented in simplified language, without the accompanying formulas or standard measures and symbols. The goal of the lesson is to have students experience concretely some of the underlying principles of inferential statistics. The lesson should be completed as a whole class activity and each student needs a graphing calculator. As students generate random samples of given data for questions 2 and 3, the teacher should monitor the students progress in order to determine when a sufficient number of samples has been collected by the class as a whole. A specific number of samples is not required, nor must each student in the class produce the same number of samples. The teacher should determine ahead of time an approximate number of samples desired ( 20, 30, 50, 100 are all appropriate numbers depending on the class size, the time available for the activity, etc). Direct students to begin recording their data on a classroom display calculator or in a large table displayed in the room as the total number of samples begins to approach the goal. This process allows for individual differences in proficiency of calculator skills. Be sure to emphasize that each student need not feel pressured to generate a specific number of samples. Accuracy of calculation is much more important, and teamwork will provide the class with a reasonably large set of sample data. *Advanced Placement and AP are registered trademarks of the College Entrance Examination Board. The College Board was not involved in the production of this product. Copyright 2009 Laying the Foundation, Inc. Dallas, TX. All rights reserved. Visit: 1

2 Teacher Notes When analyzing the histograms constructed from the sample mean data in question 3, teachers should encourage students to notice that the shape of the distribution looks more like a normal distribution (the graph is centered at the mean and the shape is approximately symmetrical) as the sample size (size 5, then size 10) increases. Another important characteristic is that the spread of the sample means should decrease as the sample size increases. With both sample sizes, the mean of the sample means should be fairly close to the population mean. Students may ask about the purpose of collecting samples to predict a value that is already known. This question is addressed in the opening paragraph of the activity. If necessary, explain to students that we are convincing ourselves of the validity of these concepts by using a population of known and relatively small size. In later mathematics courses, specifically AP Statistics, we will be able to comfortably apply these concepts to more complex situations. If students have not previously constructed histograms and boxplots, see the following lessons: Bar Graphs and Histograms and Box-and-Whisker Plots from the Middle Grades guide and Comparing Boxplots from the Algebra 2 guide. Copyright 2009 Laying the Foundation, Inc. Dallas, TX. All rights reserved. Visit: 2

3 Take a, Please Suppose you have been given the task of computing the average age, arithmetic mean, of everyone who lives in California. Actually collecting the ages of each Californian, adding them together, and dividing by the size of the population would be virtually impossible. Even if you could collect all of the ages, by the time you actually finish, the data would have changed, births and deaths would have occurred and at least some people would have celebrated birthdays, making them another year older. Statisticians face this conundrum constantly. Thankfully, they can rely on some basic concepts of inferential statistics so that research in the social sciences and marketing, opinion polling, and the evaluation of new medicines can be conducted. The following activities are designed to demonstrate some of these basic statistical concepts. The methods for applying these concepts to market research or opinion polling will be left to future mathematics courses. The goal in these activities is to experience the development of the concepts with populations of a very manageable size so that you can infer their validity when used in more daunting situations. The following concepts will be explored: Concept #1: For a fixed sample size, the mean of all possible sample means is equal to the mean of the population. Concept #2: The mean of the sample means of a randomly selected subset of all possible samples of a fixed size provides a good approximation of the mean of the population. Concept #3: The Central Limit Theorem (in simplified terms) says that, regardless of the shape of the distribution of the original population, as the sample size increases, the distribution of sample means will approach the shape of a normal distribution. When the center of the graph is located at the mean and the shape is approximately symmetrical, the shape is described as normal. Additionally, as the sample size increases, the spread of the distribution of the sample means will decrease, while the mean of the sample means remains remarkably close to the population mean. Copyright 2009 Laying the Foundation, Inc. Dallas, TX. All rights reserved. Visit: 3

4 1. Concept #1: For a fixed sample size, the mean of all possible sample means is equal to the mean of the population. Begin with a very small population of four quiz scores: 72, 80, 88, 98. a) What is the mean of the four quiz scores? b) In how many ways can you select 1 score and then a 2 nd score, if you are allowed to select the same score more than once and if the same two scores listed in a different order represents a different sample? In other words, how many 2-score samples of the 4 scores, with replacement, are possible? Three of the possible samples are {72,80}, {80,72}, and {72,72}. c) List all of the samples in the table below and calculate the mean of each sample. Scores Scores d) Calculate the mean of these sample means. How does this answer confirm Concept #1? Copyright 2009 Laying the Foundation, Inc. Dallas, TX. All rights reserved. Visit: 4

5 2. Concept #2: The mean of the sample means of a randomly selected subset of all possible samples of a fixed size provides a good approximation of the mean of the population. a) How many 3-score samples of the 4 test scores, with replacement, are possible? In other words, in how many ways can you select one score, then a 2 nd score and then a 3 rd score, if you are allowed to select the same score more than once and if the same three scores listed in a different order represents a different sample? b) Rather than listing all the 3-score samples, collect a randomly selected subset of all the possible samples. To begin, number the quiz scores. #1: 72 #2: 80 #3: 88 #4: 98 To randomly select 3 of the 4 scores for a sample, use a calculator s random number generator. Steps for the TI-83/84 are shown. The random integer command is located in Math PRB 5: randint( The parameters for the command are: randint(smallest integer allowed, largest integer allowed, number of integers to generate) The command randint (1, 4, 3) will generate three independent random integers from 1 to 4, which will in turn identify the quiz scores for a particular sample. For example, if the calculator returns {4, 1, 3}, the sample mean would be = If the calculator returns {4, 2, 3}, what is the sample mean? c) Collect random 3-score samples of the quiz data. Record the scores, not the random numbers generated by the calculator, and the mean of each sample in the table. Continue collecting samples until your teacher directs you to stop. Scores Scores Scores d) Combine your data with that of the other members of your class. Calculate the mean of the combined sample means. How close is this answer to the actual mean of the four quiz scores? How does this activity confirm Concept #2? Explain how this activity could be applied to determining the average age of the population of California. Copyright 2009 Laying the Foundation, Inc. Dallas, TX. All rights reserved. Visit: 5

6 3. Concept #3: The Central Limit Theorem (in simplified terms) says that, regardless of the shape of the distribution of the original population, the distribution of sample means will approach the shape of a normal distribution as the sample size increases. Additionally, as the sample size increases, the spread of the distribution of the sample means will decrease, while the mean of the sample means remains remarkably close to the population mean. To help visualize Concept #3, work with a population that is larger than the four quiz scores. The table below lists the salary for the highest paid player on each National Football League team as reported by the team for the 2008 season. a) For ease in working with the large numbers, code the data in millions to the nearest tenth of a million. For instance, Dallas Terrell Owens would have a coded salary of $8.7 million. Team Player 2008 Salary Coded Data Salary # Arizona Larry Fitzgerald (WR) 6,999,574 1 Atlanta John Abraham (DE) 8,506,720 2 Baltimore Chris McAlister (CB) 10,907,082 3 Buffalo Aaron Schobel (DE) 8,729,795 4 Carolina Julius Peppers (DE) 14,137,500 5 Chicago Charles Tillman (CB) 8,216,666 6 Cincinnati Carson Palmer (QB) 13,980,000 7 Cleveland Joe Thomas (OL) 9,460,000 8 Dallas Terrell Owens (WR) 8,666,668 9 Denver Champ Bailey (CB) 12,690, Detroit Roy Williams (WR) 6,292, Green Bay Brett Favre (QB) 12,800, Houston Andre Johnson (WR) 8,704, Indianapolis Peyton Manning (QB) 18,700, Kansas City Patrick Surtain (CB) 8,380, Miami Jason Taylor (DE) 10,025, Minnesota Bernard Berrian (WR) 9,538, New England Tom Brady (QB) 14,626, New Orleans Drew Brees (QB) 9,000, New York Giants Eli Manning (QB) 12,916, New York Jets Dewayne Robertson (DT) 11,191, San Diego LaDainian Tomlinson (RB) 7,822, San Francisco Alex Smith (QB) 9,916, Seattle Matt Hasselbeck (QB) 9,950, St Louis Terry Holt (WR) 9,204, Tampa Bay Jeff Faine (OL) 7,000, Tennessee Keith Bulluck (LB) 7,864, Washington Shawn Springs (CB) 7,483, Copyright 2009 Laying the Foundation, Inc. Dallas, TX. All rights reserved. Visit: 6

7 b) Create a histogram of the data, with bin width of 1 million, on the grid marked Population Data on the Results Page at the end of the activity. c) Describe the shape and any unusual features of the histogram. d) Calculate the mean of the data, record it below and in the blank on the Results Page, and mark its location with a vertical dotted line on the histogram. e) Calculate and list the 5-number summary for this data, and then construct a box and whiskers plot above the histogram on the graph at the end of the activity. Explain how the box and whiskers plot adds additional understanding about the distribution of the population. f) Begin to collect samples of size 5 using the calculator s random number generator to generate five random integers between 1 and 28 to indicate which salaries are included in the sample. Calculate the mean of each sample. Continue the process until instructed to record your data on the class calculator. Salaries g) Combine your data with that of the other members of your class. Create a histogram of the combined data, with bin width of 1 million, on the grid marked s of Size 5 on the Results Page at the end of the activity. It may be necessary to adjust the scale on the vertical axis, depending on the number of combined data items. h) Calculate the mean of this data, record it below and in the blank on the Results Page, and mark its location with a vertical dotted line on the histogram. Copyright 2009 Laying the Foundation, Inc. Dallas, TX. All rights reserved. Visit: 7

8 i) How does the shape of this histogram compare with the shape of the histogram of the population? Compare the mean of the sample means with the mean of the population. j) Begin to collect samples of size 10 using the calculator s random number generator to generate ten random integers between 1 and 28 to indicate which salaries are included in the sample. Calculate the mean of each sample. Continue the process until instructed to record your data on the class calculator. Salaries k) Create a histogram of the combined data, with bin width of 1 million, on the grid marked s of Size 10 on the Results Page at the end of the activity. It may be necessary to adjust the scale on the vertical axis, depending on the number of combined data items. l) Calculate the mean of this data, record it below and in the blank on the Results Page, and mark its location with a vertical dotted line on the histogram. m) How does the shape of this histogram compare with shape of the histogram of the population? Compare the mean of the sample means with the mean of the population. Describe the changes that you observe in the histograms as the sample size increases. Copyright 2009 Laying the Foundation, Inc. Dallas, TX. All rights reserved. Visit: 8

9 4. Consider the histograms shown. The data includes 50 samples of each size. Use the graphs to answer the questions. s of Size 5 s of Size 7 s of Size 10 s of Size 12 a) Describe the effects of the increase of sample size on the spread or range of the data. b) As the sample size increases, does the shape appear more symmetrical and is the mean located closer to the center of the graph? In other words, does the shape appear more normal? Explain. c) Describe the effect of the increase in sample size on the mean. d) The histogram in question 3b and boxplot in question 3e show Payton Manning s salary as an outlier. As the sample sizes increase, why is this outlier not a factor in the shape and spread of the graph? e) How does this activity confirm Concept #3? How does this activity apply to determining the average age of the population of California? Copyright 2009 Laying the Foundation, Inc. Dallas, TX. All rights reserved. Visit: 9

10 Results Page Population Data Size Size Copyright 2009 Laying the Foundation, Inc. Dallas, TX. All rights reserved. Visit: 10

11 Connecting Algebra 2 to Advanced Placement* Mathematics A Resource and Strategy Guide 1. a) 84.5 b) 4 x 4 = 16 Take a, Please c) Scores Scores 72, , , , , , , , , , , , , , , , d) ( )/16 = 84.5 The mean of the samples of size 2 was 84.5 which is the same as the mean of all four scores. 2. a) 4 x 4 x 4 = b) = c) All students do not need to generate the same number of samples. Each student will record their results in their table. results are shown. Scores Scores Scores 72,80, ,80, ,80, ,88, ,72, ,72, ,98, ,72, ,98, ,80, ,88, ,98, ,88,88 88 d) Students should have combined their data with the class and determined the mean of the sample means. The mean of the sample means should be relatively close to the actual mean of This activity confirms Concept #2 because the mean of the sample means of the 3- scores samples was (student value of means) which is very close to the mean of the population which was To determine the average age of the population of California, one could collect sample age data to predict the average age of the population. Copyright 2009 Laying the Foundation, Inc. Dallas, TX. All rights reserved. Visit: 11

12 Answers 3 a) Team Player 2008 Salary Coded Data Salary # Arizona Larry Fitzgerald (WR) 6,999, Atlanta John Abraham (DE) 8,506, Baltimore Chris McAlister (CB) 10,907, Buffalo Aaron Schobel (DE) 8,729, Carolina Julius Peppers (DE) 14,137, Chicago Charles Tillman (CB) 8,216, Cincinnati Carson Palmer (QB) 13,980, Cleveland Joe Thomas (OL) 9,460, Dallas Terrell Owens (WR) 8,666, Denver Champ Bailey (CB) 12,690, Detroit Roy Williams (WR) 6,292, Green Bay Brett Favre (QB) 12,800, Houston Andre Johnson (WR) 8,704, Indianapolis Peyton Manning (QB) 18,700, Kansas City Patrick Surtain (CB) 8,380, Miami Jason Taylor (DE) 10,025, Minnesota Bernard Berrian (WR) 9,538, New England Tom Brady (QB) 14,626, New Orleans Drew Brees (QB) 9,000, New York Giants Eli Manning (QB) 12,916, New York Jets Dewayne Robertson (DT) 11,191, San Diego LaDainian Tomlinson (RB) 7,822, San Francisco Alex Smith (QB) 9,916, Seattle Matt Hasselbeck (QB) 9,950, St Louis Terry Holt (WR) 9,204, Tampa Bay Jeff Faine (OL) 7,000, Tennessee Keith Bulluck (LB) 7,864, Washington Shawn Springs (CB) 7,483, b) c) Skewed right with gaps between 13 and 14 million and between 15 and 18 million. There appears to be a possible outlier between 18 and 19 million. Copyright 2009 Laying the Foundation, Inc. Dallas, TX. All rights reserved. Visit: 12

13 Answers d) of the salaries is approximately e) x = min = 6.3 Q1 = 8.3 med = 9.35 Q3 = max = 18.7 The boxplot shows that the mean is to the right of the median, confirming that the data is skewed right. Peyton Manning s high salary is shown as an outlier and the upper 50% of the data has a much greater spread than the lower 50%. The boxplot also illustrates that the 2 nd 25% of the data, between Q 1 and the median, is concentrated in a range of only about $1 million. f) Students will collect 5-Salary samples. results are shown. Salaries 14.6, 9.0, 9.9, 7.8, , 12.8, 9.5, 12.9, , 10.0, 18.7, 12.7, , 9.2, 8.5, 12.8, , 6.3, 7.0, 12.9, , 14.1, 14.1, 8.5, , 18.7, 12.8, 7.0, , 7.9, 14.6, 10.0, g) Students will combine class results and draw a histogram. A sample result for the 5-salary histogram from combined data is shown. h) Students should list the mean of the class data. For the sample data, x = i) Students will compare the histogram for the Size 5 histogram to the Population Data histogram. For the sample data, this histogram is less skewed right than the histogram of the population. The mean of sample means is relatively close to the population mean. Copyright 2009 Laying the Foundation, Inc. Dallas, TX. All rights reserved. Visit: 13

14 Answers j) Students will collect 10-Salary samples. results are shown. Salaries 14.0, 14.0, 9.5, 7.9, 7.5, 14.6, 9.0, 9.9, 7.8, , 7.9, 8.7, 11.2, 12.8, 9.5, 12.9, 8.7, , 8.4, 8.7, 12.9, 10.0, 10.0, 14.0, 18.7, 12.7, ,14.1, 14.1, 9.2,12.7, 10.9, 9.2, 8.5, 12.8, , 7.0,10.0, 14.1, 14.1, 7.9, 6.3, 7.0, 12.9, k) Students will combine class results and draw a histogram. results for the 10-salary histogram from combined data are shown. l) Students should list the mean of the class data. For the sample data, x = m) Students will compare the histogram for the Size 5 histogram to the Population Data histogram. For the sample data provided, the 10-salary histogram is not skewed right, and its mean is reasonably close to the mean of the population. n) As the sample size increases, the distribution of the sample means becomes more symmetrical and less skewed. Also, the range of the sample means decreases. In both cases, the effect of averaging several salaries limits the influence of the extremely large or small salaries. For larger sample sizes, the mean is closer to the mean of the population, so the larger the sample size for the population data collection, the closer the mean will be to the total population of California. 4. a) As the sample size increases, the spread of the data decreases significantly from the original histogram. b) As the sample size increases, the shape of the graph becomes more symmetrical and the mean appears to be closer to the center of the graph. Copyright 2009 Laying the Foundation, Inc. Dallas, TX. All rights reserved. Visit: 14

15 Answers c) The mean does not change significantly as the sample size increases. d) Since we are calculating the mean with a larger number of salaries, the effect of the outlier is not significant. e) This activity demonstrates that as the sample size increases, the shape of the graph becomes more symmetrical. The mean of the samples is the same as the population mean and the spread of the graph decreases with an increase in sample size. To determine the mean average age of the population of California, I could use what I have learned to collect and analyze the data. Copyright 2009 Laying the Foundation, Inc. Dallas, TX. All rights reserved. Visit: 15