Math 1 Variable Manipulation Part 8 Working with Data

Size: px
Start display at page:

Download "Math 1 Variable Manipulation Part 8 Working with Data"

Transcription

1 Math 1 Variable Manipulation Part 8 Working with Data 1 INTERPRETING DATA USING NUMBER LINE PLOTS Data can be represented in various visual forms including dot plots, histograms, and box plots. Suppose Bob scores the following points in a season of basketball games: 8, 15, 10, 10, 10, 15, 7, 8, 10, 9, 12, 11, 11, 13, 7, 8, 9, 9, 8, 10, 11, 14, 11, 10, 9, 12, 14, 14, 12, 13, 5, 13, 9, 11, 12, 13, 10, 8, 7, 8. His scores can be represented in the following three ways:

2 CREATING A DOT PLOT A dot plot is easy to create. Place one dot for each data point above the appropriate value on the number line. Data points of the same value stack on top of each other. Since there are six points with the value 8, there will be six dots stacked over the number 8 on the number line. 2 CREATING A HISTOGRAM In order to create a histogram, you need to first determine the maximum and minimum values for the data set. Then determine the length of each interval. (Note: The interval will generally be given.) Next, figure out how many values fall within each interval. And finally, graph the data.

3 3 CREATING A BOX-AND-WHISKER PLOT In order to create a box-and-whisker plot, you need the minimum, the first quartile or lower quartile, the median, the third quartile or upper quartile, and the maximum. You also need to see if there are any outliers. Outliers are more than 1.5 times the length of the interquartile range away from the first or third quartile (explanation below).

4 INTERPRET AND COMPARE TWO DATA SETS Use the shape of the data graphed in a number line plot to compare two or more data sets using center (mean, median) and/or spread (range, interquartile range, standard deviation). 4 Center: Common measures of center are the median and the mean. Spread: The spread of a distribution refers to the variability of the data. If the data cluster around a single central value, the spread is smaller. The further the data points fall from the center, the greater the spread or variability of the set. MEASURES OF SPREAD Range: The range is the difference between the largest and smallest values in a set of data. In our example of Bob s points scored, the range is 10 since we subtract our minimum value of 5 from our maximum value of 15. Note: The range of a function is different from the range of data. Interquartile range: The interquartile range (IQR) is a measure of variability, based on dividing a data set into quartiles. Quartiles divide an ordered data set into four equal parts. The values that divide each part are called the first, second, and third quartiles; and they are denoted by Q1, Q2 (median), and Q3. Q1 is the middle value in the lower half of the ordered data. Q2 is the median value in the set. Q3 is the middle value in the upper half of the ordered data. The interquartile range is equal to Q3 minus Q1. The IQR of Bob s points scored is Q3 Q1 = = 3.5. *Another way to think about it: The interquartile range is the distance between the 75th and 25th percentile. Essentially, it is the range of the middle 50% of the data.

5 SHAPE The shape of a distribution is described by symmetry, number of peaks, skewness, or uniformity. 5 Symmetry: A symmetric distribution can be divided at the center so that each half is a mirror image of the other. Number of peaks: Distributions can have few or many peaks. Distributions with one clear peak are called unimodal, and distributions with two clear peaks are called bimodal. Unimodal distributions are sometimes called bell-shaped. Skewness: Some distributions have many more data points on one side of a graph than the other. Distributions with a tail (low levels of frequency) on the right, toward the higher values, are said to be skewed right; and distributions with a tail on the left, toward the lower values, are said to be skewed left. Uniformity: When data points in a set of data are equally spread across the range of the distribution, the distribution is called uniform distribution. A uniform distribution has no clear peaks. Comparing Mean and Median If the graph is left skewed, the mean is less than the median. If the graph is right skewed, the mean is greater than the median.

6 Unusual Features Sometimes, statisticians refer to unusual features in a set of data. The two unusual features are gaps and outliers. 6 Example: Bob s points during a game: 8, 15, 10, 10, 10, 15, 7, 8, 10, 9, 12, 11, 11, 13, 7, 8, 9, 9, 8, 10, 11, 14, 11, 10, 9, 12, 14, 14, 12, 13, 5, 13, 9, 11, 12, 13, 10, 8, 7, 8 Create a Histogram of this data and summarize the shape and features of Bobs points. Solution: The shape is fairly symmetric. There are no gaps or outliers indicating that most of the data values are close to the center of the graph.

7 7 Comparing Distributions Common graphical displays such as dot plots and box plots can be effective tools for comparing data from two or more populations.

8 Sample Questions: Use the following student test score data. Interpret, represent, and analyze the data according to the instructions. 8 Class A Test Scores: 51, 45, 45, 45, 33, 51, 48, 36, 48, 51, 27, 51, 36, 48, 51, 39, 51, 39, 30, 51, 39, 51, 42, 48, 33, 51, 48, 42, 45, 51, 21, 39, 51 Class B Test Scores: 48, 51, 48, 24, 48, 51, 48, 48, 51, 18, 48, 51, 48, 45, 21, 30, 36, 48, 45, 51, 36, 39, 30, 45, 33, 45, 27, Find the three measures of central tendency (mean, median and mode) and the lower and upper quartiles for the data. 2. Create a dot/line plot. 3. Create a histogram with 8 intervals beginning with the interval

9 4. Create a box-and-whisker plot Analyze Which class did better overall? How can you tell? 6. Analyze Which measurement best helps you to evaluate which class did better? 7. Compare Which class has a higher average score? 8. Compare Write a comparison about student performance in the 2 classes. Use any of the information from the table above that will help you compare. 9. Compare What effects did outliers have on the data? If outliers were removed, how would it change the overall averages?

10 Table of candies in a bag a. Create a dot plot of the data in the table. b. Find the Mean and the Median. c. Identify the data as symmetric, left skewed, right skewed, or other. 11. Identify each distribution as symmetric, left skewed, right skewed, or other.

11 12. For each histogram/dot plot: Describe the distribution of the data. (Shape: Symmetrical, skewed left, skewed right, uniform, or other). And determine whether the mean of the histogram is greater than, less than, or about the same as the median Use the graph to the right to answer the following questions. a. Draw a curve that follows the data distribution. b. This graph is an example of a distribution. c. What can be said about the mean and median of this data set?

12 TWO-WAY FREQUENCY TABLES Variables can be classified as categorical (qualitative) or numerical (quantitative). 12 Categorical: Categorical variables take on values that are names or labels. The color of a ball (e.g., red, green, blue), gender (male or female), year in school (freshmen, sophomore, junior, senior). These are data that cannot be averaged or represented by a scatter plot as they have no numerical meaning. Numerical: Numerical or quantitative variables represent a measurable quantity. For example, when we speak of the population of a city, we are talking about the number of people in the city a measurable attribute of the city. Therefore, population would be a quantitative variable. Other examples: scores on a set of tests, height and weight, temperature at the top of each hour. Example: The temperature at the park over a 12-hour period was: 60, 64, 66, 71, 75, 77, 78, 80, 78, 77, 73, 65. Can you find the average temperature over the 12-hour period? Solution: Yes, you can. This is quantitative (numerical) data. Example: Once a week, Maria has to fill her car up with gas. She records the day of the week each time she fills her car for three months: Monday, Friday, Tuesday, Thursday, Monday, Wednesday, Monday, Tuesday, Monday, Wednesday, Tuesday, and Thursday. Can you find the average day of the week that Maria had to fill her car with gas? Solution: No, this is categorical data. However, we can see that Monday was the day that she most often filled her car in this data set, i.e., we can look at frequencies. Sample Questions: Use the following scenario for the following two questions. School grade is a categorical value even though it is represented by a number. Age is a variable that can be either categorical or numerical because ages have a specific order that are generally important quantitatively, but could be used as a sorting value categorically. You need to be careful when assigning numbers as categorical or quantitative data. Here are the ages of a group of students 12, 15, 11, 14, 12, 11, 15, 13, 12, 12, 11, 14, Find the average age of the students. 15. Suppose 11-year-olds are in 6th grade, 12-year-olds are in 7th grade, 13-year-olds are in 8th grade, 14-year-olds are in 9th grade, and 15-year-olds are in 10th grade. Find the average grade for the set of data.

13 13 TWO-WAY FREQUENCY TABLES A two-way frequency table is a useful tool for examining relationships between categorical variables. The entries in the cells of a two-way table can be frequency counts or relative frequencies. The table summarizes and shows how often a value occurs. This two-way frequency table below shows the favorite leisure activities for 20 men and 30 women using frequency counts. The two-way frequency table below shows the favorite leisure activities for 20 men and 30 women using relative frequencies. Generally relative frequencies are written as a decimal or percentage. It is the ratio of the observed number of a particular event to the total number of events, often taken as an estimate of probability. You can use the relative frequency to determine how often a value may occur in the future. (Each number in the table has been divided by the total number of people, or 50.) Two-way tables can show relative frequencies for the whole table, for rows, or for columns. The tables below show relative frequencies for rows and columns. Each type of relative frequency table makes a different contribution to understanding the relationship between gender and preferences for leisure activities. For example, the Relative Frequency for Rows table most clearly shows the probability that each gender will prefer a particular leisure activity. For instance, it is easy to see that the probability that a man will prefer movies is 40% and that the probability that a woman will prefer movies is 27%, and so on. The Relative Frequency for Columns table shows similar, but a different set of data based on category. For example of all those who selected movies, half were women and half were men. But of those who selected dance, 89% were women and only 11 % were men.

14 14 Example: A bank teller splits transactions into two categories: deposits and withdrawals. a. Design a table that he could use to show how many transactions are deposits and how many are withdrawals. b. In one shift, he has 72 transactions. Of those, 12 males make deposits and 30 make withdrawals. While 20 females make deposits. Put these figures in your table and complete the missing portion of the table. Solution: First make a two-way frequency table and add the known values Deposits Withdrawals Totals Male Female 20 Totals 72 Then use the known information to calculate the missing values 12 deposits + 30 withdrawals = 42 total transactions by males 72 total transactions 42 transations by males = 30 transactions by females 30 total transactions by females 20 deposits = 10 withdrawals by females 12 deposits by males + 20 deposits by females = 32 total deposits 30 withdrawals by males + 10 withdrawals by females = 40 total withdrawals 32 total deposits + 40 total withdrawals = 72 total transactions (which is the answer given) Place all calculated values in the table Deposits Withdrawals Totals Male Female Totals Sample Questions: 16. Heather has a dance studio that offers classes in both contemporary and hip-hop dance. a. Design a table that will show the number of female and male dancers who take classical or hiphop classes. b. She has 38 female hip-hop dancers and 43 male hip-hop dancers. Heather has a total of 200 dancers enrolled in classes with 60 of them being male. Put these figures in your table and complete the missing portion of the table. c. How many contemporary male dancers are enrolled in her studio?

15 Sarah is worried about how much garbage she creates each week. She decides to look at how many items she could recycle instead in three weeks time. a. Design a table to show the number of cans, glass bottles, and newspapers she recycled over the last three weeks. b. Sarah recycled 5 cans in the 1st week, 3 in the second and 4 in the last week. She recycled 6 glass bottles every week and 1 newspaper in the last week. In the 1st two weeks she recycled 2 and then 3 newspapers. Put these numbers in your table and complete any missing portions. 18. Mr. Smith splits pupils that did not do their homework into two categories: first timers and second(+) timers. a. Design a table to show how many boys and how many girls did not do their homework. b. In one month, 36 girls and 12 boys did not do their homework for the first time. Twelve girls and 30 boys did not do their homework for the second time. Put these figures in your table. 19. Complete the two-way table for Hollywood Junior High s eating habits. a. Of the total males, what percentage do not eat breakfast regularly? b. Of the total people who eat breakfast regularly, what percentage of them are males?

16 Analyze Frequency Tables (Joint and Marginal) Because entries in the table are frequency counts, the table is a frequency table. Entries in the "Total" row and "Total" column are called marginal frequencies or the marginal distribution. Entries in the body of the table are called joint frequencies. Example: Copy and complete the two-way table for Hollywood Junior High s eating habits. Then answer the following questions. 16 a. How many females eat breakfast regularly? (a joint frequency) b. How many females were included in the survey? (a marginal frequency) c. How many females eat breakfast out of the total number of females? (a conditional relative frequency) d. How many people were included in this survey? (this is a marginal frequency) e. How many males do not eat breakfast regularly? (this is a joint frequency) f. How many males and females do not eat breakfast regularly? (this is a marginal frequency) g. How many males do not eat breakfast out of the total number of people who do not eat breakfast? (a conditional relative frequency) h. Do more females eat breakfast or do more males eat breakfast? (a comparison of joint frequencies) i. Which group of people eat breakfast more regularly? j. Which group of people does not eat breakfast regularly? Solution: First use the data given to calculate the unknown values. 300 total eat breakfast regularly 110 females eat breakfast regularly = 190 males eat breakfast regularly 130 males do not eat breakfast regularly females do not eat breakfast regularly = 295 total people do not eat breakfast regularly 300 total people eat breakfast regularly people do not eat breakfast regularly = 595 total people 190 males eat breakfast regularly males do not eat breakfast regularly = 320 total males 320 total males total females = 595 total people (which is same as calculated above!) a. 110 b. 275 c. 110/295 = 37% d. 595 e. 130 f. 295 g. 130/295 = 44% h. More males eat breakfast than females i. Males eat breakfast more regularly j. Females do not eat breakfast more regularly

17 17 Sample Questions: 20. Below is a table showing men s and women s preferences of activities. Use this table to answer the following questions. a. Looking at just the total columns (marginal frequencies), what can we conclude about the activities? i. Dance is more interesting. ii. They have roughly equal appeal. iii. Sports is the least chosen activity. iv. TV is the preferred activity. b. Looking at the joint frequencies, we see that women show a strong preference for which activity? c. Looking at the joint frequencies, we see that men show a strong preference for which activity? 21. Below is a table for Jersey High s transportation survey results. a) Identify one joint frequency from this table and describe its meaning. b) Identify one marginal frequency from this table and describe its meaning. d) Write two conclusions that you can make from this frequency table.

18 18 Relative, Joint, and Marginal Frequencies 22. The frequency table below shows the results of a survey that Carla took. She asked 40 randomly selected people what their favorite food was to eat at a baseball game. The three choices were hotdogs, hamburgers, or pizza. Convert this table into a relative frequency table that uses decimals as well as percentages. Preferred Food at the ball game Hotdogs Hamburgers Pizza Total Frequency a. Divide the numbers in the frequency table by the total to obtain relative frequencies as decimals. Record the results in the table below. Preferred Food at the ball game Hotdogs Hamburgers Pizza Total Relative Frequency 18/40=0.45 b. Write the decimals as percentages in the table below. Preferred Food at the ball game Hotdogs Hamburgers Pizza Total Relative Frequency 45% c. How can you check to see if you have accurately converted frequencies to relative frequencies? d. Explain why the number in the total column of a relative frequency table is always 1 or 100%? e. What does the data tell us about the most preferred food to eat at a baseball game? 23. For her survey, Carla also recorded the age of each person. The results are shown in the two-way frequency table below. Each entry is the frequency of people who prefer a certain food and are in a certain age group. a. Fill in the missing marginal frequencies (the entries in the total row and total column). b. Highlight the joint frequencies (entries in the body of the table). c. Find the grand total, which is the sum of the row totals as well as the sum of the column totals. Write the grand total in the lower-right corner of the table (the intersection of the total column and the total row). d. In terms of Carla's survey, what does the grand total represent? e. What does the data tell us about the preference of food for children at a baseball game? f. How does this compare to adults? g. Amongst all age groups what food would you say is most preferred?

19 24. Make a relative frequency table for the rows and columns 19 a. What is the probability that a child will choose pizza? b. What is the probability that an adult will choose a hamburger? c. What percentage of adults prefer hotdogs? d. What percentage of teenagers prefer pizza? 25. Make a relative frequency table by calculating the relative frequency of the marginal and joint frequencies compared to the grand total. Write your relative frequencies as decimals and percentages. a. Highlight the conditional frequencies in the table (relative frequencies in the body of the table). b. What is the probability of randomly choosing a person with food at a baseball game and that person being a child who prefers hamburgers? c. What is the probability of randomly choosing a person with food at a baseball game and that person being a teenager who prefers pizza? d. What is the probability of randomly choosing a person with food at a baseball game and that person being an adult who prefers pizza? e. What percentage of the people that Carla asked were adults? f. What percentage of the people that Carla asked were children? g. What is the probability that a person will choose a hamburger at a baseball game? h. What is the probability that a person will choose pizza at a baseball game? i. What food is a person most likely to choose at a baseball game?

20 Answer Key Class A Class B Mean 1437/33 = /28 = Mode Lower Quartile 636/17 = /14 = Median Upper Quartile 897/17 = /14 = Class A because the mean, lower quartile and upper quartile are all higher than Class B. 6. Various Answers 7. Class A 8. Various Answers 9. If low outliers were removed, the averages would be higher.

21 10. a b. Mean = 1350/54 = 25 Median = c. Symmetric 11. a. Symmetric 11. b. Left Skewed 11. c. Right Skewed 11. d. Right Skewed (optional: with gaps) 12. a. Right Skewed with mean greater than median 12. b. Uniform with mean about the same as median 12. c. Right Skewed with mean greater than median 12. d. Left Skewed with mean less than median 12. e. Uniform with mean about the same as median 12. f. Right Skewed with mean greater than median 12. g. Right Skewed and mean is greater than median or symmetrical with outliers and mean about the same as median 13. a. Curve should follow data 13. b. Symmetrical 13. c. The mean and medial are about the same 14. This is possible because the data in this way are quantitative. The average is This is NOT possible, now the data are categorical. You can t find the average of categorical data. 16. a. Contemporary Hip Hop Total Male Female Total 16. b. Contemporary Hip Hop Total Male Female Total c. 119 contemporary dancers

22 a. Week 1 Week 2 Week 3 Total Cans Glass Bottles News Total 17. b. Cans Glass Bottles News Total Week Week Week Total a. Girl Boy Total First Timers Second (+) Timers Total 18. b. First Timers Second (+) Timers Total Girl Boy Total Male Female Total Eat Breakfast Regularly Do Not Eat Breakfast Regularly Totals a. 130/320 = 41% b. 190/300 = 63% a. They have roughly equal appeal b. Dance c. Sports 21. a. Answers Vary- - - One possible: More males cycle than any other mode of transportation b. Answers Vary - - -One possible: Most people walk to school at Jersey High c. Answers Vary - - -Some possible: Riding the bus is the least popular mode of transportation surveyed and Females prefer walking to Jersey High.

23 a. Preferred Food at the ball game Hotdogs Hamburgers Pizza Total Relative Frequency 18/40= /40 = /40 = b. Preferred Food at the ball game Hotdogs Hamburgers Pizza Total Relative Frequency 45% 30% 25% 100% 22. c. Add up the relative frequencies as see if they add up to d. When numbers are divided by the total, it gives the fractional portion of the total or the number relative to the total. When all the parts are added together, it will be 100% or all of the total. 22. e. Hotdogs are the most preferred item. It is chosen almost half (45%) of the time. 23. a. Hotdogs Hamburgers Pizza Total Child Teenager Adult Total b. Hotdogs Hamburgers Pizza Total Child Teenager Adult Total c d. Total of all people surveyed and total of all responses 23. e. Children prefer hotdogs 23. f. Adults prefer hamburgers, but like hot dogs too. 23. g. Hot Dogs 24. Relative Frequency by Rows Hotdogs Hamburgers Pizza Total Child 8/11 = /11 = /11 = /11 = 1 Teenager 5/13 = /13 = /13 = /13 = 1 Adult 5/16 = /16 = /16 = /16 = 1 Total 18/40 = /40 = /40 = /40 = 1 Relative Frequency by Columns Hotdogs Hamburgers Pizza Total Child 8/18 = /12 = /10 = /40 = 0.28 Teenager 5/18 = /12 = /10 = /40 = 0.33 Adult 5/18 = /12 = /10 = /40 = 0.4 Total 18/18 = 1 12/12 = 1 10/10 = 1 40/40 = 1

24 a. 18% 24. b. 50% 24. c. 31% 24. d. 38% 25. Relative Frequency Table Hotdogs Hamburgers Pizza Total Child 8/40 = 0.2 1/40 = = = 0.28 Teenager 5/40 = /40 = = = 0.33 Adult 5/40 = = = = 0.40 Total 18/40 = = = = a. Hotdogs Hamburgers Pizza Total Child 8/40 = 0.2 1/40 = = = 0.28 Teenager 5/40 = /40 = = = 0.33 Adult 5/40 = = = = 0.40 Total 18/40 = = = = b. 3% 25. c. 13% 25. d. 8% 25. e. 40% 25. f. 28% 25. g. 30% 25. h. 25% 25. i. Hotdogs