Unit 1 Analyzing One-Variable Data

Size: px
Start display at page:

Download "Unit 1 Analyzing One-Variable Data"

Transcription

1 Unit 1 Analyzing One-Variable Data So what is statistics? Statistics is the science and art of,, and from data. Statistical problem-solving process : Clarify the research problem and ask one or more valid statistics questions. : Design and carry out an appropriate plan to collect the data. : Use appropriate graphical and numerical methods to analyze the data. : Draw conclusions based on the data analysis. Be sure to answer the research question(s)! A. Classifying and Summarizing Data are objects described by a set of data. They may be people, but they may also be animals or things. A is any characteristic of an individual. They can take different values for different individuals. o A variable places an individual into one of several groups or categories. o A variable takes numerical values for which arithmetic operations such as adding and averaging make sense. Example 1: Label the data table below with the new vocabulary learned. The of a variable tells us what values the variable takes and how often it takes these variables. Ways to summarize distribution: A shows the number of individuals having each data value. A relative frequency table shows the proportion or percent of individuals having each data value.

2 Example 2: Who wants to party? One of Mr. Tyson s statistics classes was recently asked if they would prefer a pasta party, a pizza party, or a donut party. Here are their responses, which surprised their instructor: Pizza Pizza Pasta Pasta Pasta Donut Donut Pizza Pasta Pasta Pasta Pasta Pasta Pizza Donut Donut Pasta Pasta Pasta Pasta Pasta Pizza Pizza Pizza Pasta Pasta Donut Pasta Pasta Pasta Summarize the distribution of preferred party with a frequency table and a relative frequency table. B. Displaying Categorical Data A shows each category as a bar. The heights of the bars show the category frequencies or relative frequencies. A shows each category as a slice of the pie. The areas of the slices are proportional to the category frequencies or relative frequencies. Example 3: Who wants to party with frequency? Below is a frequency table of the preferred party for the 30 students in Mr.Tyson s statistics class. Make a bar graph and pie chart to display the data. Describe what you see. Preferred Frequency party Donut 5 Pasta 18 Pizza 7 Total 30

3 Things to look out for: - Scale - Misleading Pictographs - Must have keys! C. Displaying Quantitative Data A shows each data value as a dot above its location on a number line. Example 4: How much would you pay for fun Knoebels Amusement Park in Elysburg, Pennsylvania, has earned acclaim for being an affordable, family-friendly entertainment venue. Knoebels does not charge for general admission or parking, but they do charge customers for each ride they take. How much do the rides cost at Knoebels? The table below shows the cost for each ride in a sample of 22 rides in a recent year. Make a dotplot of these data. Name Cost Name Cost Merry Mixer $1.50 Looper $1.75 Italian Trapeze $1.50 Flying Turns $3.00 Satellite $1.50 Flyer $1.50 Galleon $1.50 The Haunted Mansion $1.75 Whipper $1.25 StratosFear $2.00 Skooters $1.75 Twister $2.50 Ribbit $1.25 Cosmotron $1.75 Roundup $1.50 Paratrooper $1.50 Paradrop $1.25 Downdraft $1.50 The Phoenix $2.50 Rockin' Tug $1.25 Gasoline Alley $1.75 Sklooosh! $1.75 How to Describe the Distribution of a Quantitative Variable --- Remember your SOCS! give the lowest and highest value in the data set are there any values that stand out as unusual? An outlier in any graph of data is an individual observation that falls outside the overall pattern of the graph. what is the approximate average value of the data (only an estimation) does the graph show symmetry, or is it skewed in one direction (see below) A shows each data value separated into two parts: a stem, which consists of all but the final digit, and a leaf, the final digit. The stems are ordered from least to greatest and arranged in a vertical column. The leaves are arranged in increasing order out from the appropriate stems.

4 Example 5: Masses of fun? The table shows the masses in grams of 17 unwrapped Snickers Fun Size bars taken from one bag. Make a stemplot of these data. Snickers Fun Size Mass You can use a back-to-back stemplot with common stems to compare the distribution of a quantitative variable in two groups. The leaves on each side are placed in order leading out from the common stem. Example 6: How many shoes are too many shoes? How many pairs of shoes does a typical teenager own? To find out, a group of statistics students surveyed separate random samples of 20 female students and 20 male students from their large high school. Then they recorded the number of pairs of shoes that each person owned. Here are the data. 1. Make a stemplot of the female data. Do not split stems. 2. Describe the shape of the distribution. 3. Explain why we should split stems for the male data. A shows each interval as a bar. The heights of the bars show the frequencies or relative frequencies of values in each interval. Follow the following steps when making one by hand: 1. Choose equal-width intervals that span the data. Five intervals is a good minimum. 2. Make a table that shows the frequency or relative frequency of data values in each interval. 3. Draw and label the axes. 4. Scale the axes. 5. Draw bars

5 Example 7 : What can we learn from the past? Ancient Egyptian embalming techniques and the climate in Egypt can preserve bodies of the dead remarkably well. Researchers made measurements of 30 Egyptian male skulls that date from about 4000 BCE. One variable they recorded was the maximum width of the skull in millimeters, shown in the table. Make a frequency histogram of these data. Skull Width Example 8: Which is the stronger picker-upper? In commercials for Bounty paper towels, the manufacturer claims that they are the quicker picker-upper, but are they also the stronger picker upper? A random sample of 30 Bounty paper towels and a random sample of 30 generic paper towels and measured their strength when wet. To do this, each paper towel was uniformly soaked with 4 ounces of water and counted how many quarters each paper towel could hold until ripping, alternating brands. Compare the distributions as revealed in the following histograms.

6 D. Measuring Center The is the midpoint of a distribution, the number such that about half the observations are smaller and about half are larger. To find the median, arrange the data values from smallest to largest. If the number n of data values is odd, the median is the middle value in the ordered list. If the number n of data values is even, the median is the average of the two middle values in the ordered list. The x (pronounced x-bar ) of a quantitative data set is the average of all n data values. To find the mean, add all the values and divide by n. That is, x = sum of data values number of data values Example 9: Did you skip a beat? A statistics instructor asked 15 students to record their heart rate while they were sitting at their desks. Here are their heart rates in beats per minute (bpm). Find the median and mean. Interpret this value in context ****Note the mean is not resilient but the median is. A measure of center (or variability) is resistant if it isn t influenced by unusually large or unusually small values in a distribution.**** The mean is pulled in the direction of the long tail in a skewed distribution. So which one do we use to measure the center? - If a distribution of quantitative data is roughly symmetric and has no outliers? - If the distribution is strongly skewed or has outliers?

7 E. Measuring Variability The of a distribution is the distance between the minimum value and the maximum value. That is, range = maximum minimum The of a distribution divide the ordered data set into four groups having roughly the same number of values. To find the quartiles, arrange the data values from smallest to largest and find the median. The is the median of the data values that are to the left of the median in the ordered list. The is the median of the data values that are to the right of the median in the ordered list. The is the distance between the first and third quartiles of a distribution. In symbols, IQR = Q3 Q1 Example 10: An offensive explosion? During the 2013 NFL regular season, the Denver Broncos had an offensive explosion, setting a new NFL record for total points in a single season up to that time. Over the course of 16 games, the Broncos scored 606 points. The number of points the Broncos scored each game is given below along with a dotplot Find the range of the distribution. Find the IQR The measures the typical distance of the values in a distribution from the mean. To find the standard deviation sx of a quantitative data set with n values: 1. Find the mean of the distribution. 2. Calculate the deviation of each value from the mean: deviation = value mean. 3. Square each of the deviations. 4. Add all the squared deviations, divide by n 1, and take the square root. If the values in a data set are given by x 1, x 2,..., x n, we can rewrite the formula for calculating the standard deviation as

8 Example 11: How close are your friends? Eleven high school students were asked how many close friends they have. Here are their responses, along with a dotplot: Calculate the standard deviation. Interpret this value in context. 1. Find the mean of the distribution. 2. Calculate the deviation of each value from the mean: deviation = value mean. 3. Square each of the deviations. Value xi Deviation from mean x i x Squared deviation x x 2 i 4. Add all the squared deviations, divide by n 1, and take the square root. Things to know about standard deviation: s x is always to 0. s x = 0 only when there is no variability, that is, when all values in a distribution are the same. Larger values of s x indicate from the mean of a distribution. s x is not. The use of squared deviations makes s x even more sensitive than x to extreme values in a distribution. s x measures variation about the.

9 F. Summarizing Quantitative Data: Box plots and Outliers Besides serving as a measure of variability, the interquartile range (IQR) is used as a ruler for identifying. How to Identify Outliers: the 1.5 X IQR Rule Call an observation an outlier if it falls more than 1.5 IQR above the third quartile or more than 1.5 IQR below the first quartile. That is, Low Outliers < Q1 1.5 IQR High Outliers > Q IQR Example 12: Home run king? Here are data on the number of home runs that Bonds hit in each of his 21 complete seasons. Identify any outliers in the distribution. Show your work It is important to identify outliers in a distribution for several reasons: 1. They might be inaccurate data values. 2. They can indicate a remarkable occurrence. 3. They can heavily influence the values of some summary statistics. The of a distribution of quantitative data consists of the minimum, the first quartile Q1, the median, the third quartile Q3, and the maximum. A is a visual representation of the five-number summary Example 13: How big are the large fries? Ryan and Brent were curious about the amount of french fries they would get in a large order from their favorite fast-food restaurant, Burger King. They went to several different Burger King locations over a series of days and ordered a total of 14 large fries. The weight of each order (in grams) is shown here (a) Make a boxplot to display the data. (b) According to a nutrition website, Burger King s large fries weigh 160 grams, on average. Ryan and Brent suspect that their local Burger King restaurants may be skimping on fries. Does the graph in part (a) support their suspicion? Explain.

10 ***Warning! boxplots don t show gaps, clusters, or peaks.*** Example 14: Which company makes a better tablet? In a recent year, Consumer Reports rated many tablet computers for overall performance and quality. Based on several variables, they gave each tablet an overall rating, where higher scores indicate better ratings. The overall ratings of the tablets produced by Apple and Samsung are given below: Apple Samsung Parallel boxplots of the data and numerical summaries are shown below. x s x Min Q 1 Median Q 3 Max IQR Apple Samsung Compare the overall rating distributions for Apple and Samsung.

11 G. Describing Location in a Distribution An individual s percentile is the percent of values in a distribution less than the individual s data value. Example 15: Which salads are at McDonald s? The dotplot below shows the number of calories in McDonald s salads in a recent year. (a) Find the percentile for the Premium Bacon Ranch Salad with Grilled Chicken, which contains 230 calories. (b) The Premium Bacon Ranch Salad (without chicken) is at the 18 th percentile of the distribution. Interpret this value in context. How many calories does the Premium Bacon Ranch Salad contain? A plots a point corresponding to the cumulative relative frequency in each interval at the smallest value of the next interval, starting with a point at a height of 0% at the smallest value of the first interval. Consecutive points are then connected with a line segment to form the graph. Example 16: How fast can you run? As part of a student project, students were asked to sprint 50 yards. Their times were recorded and the cumulative relative frequency graph of the sprint times is shown. (a) One student ran the 50 yards in 8 seconds. Is a sprint time of 8 seconds unusually slow? (b) Estimate and interpret the 25th percentile of the distribution.

12 The for an individual value in a distribution tells us how many standard deviations from the mean the value falls, and in what direction. To find the standardized score (z-score), compute z = Example 17: Are Caimans affected by pesticides? The spectacled caiman is a crocodilian reptile that lives in Central and South America. Researchers recorded the mass (in kilograms) of fourteen caimans. The data are shown below, along with a dotplot and summary statistics. Find the standardized score (zscore) for the caiman that has a mass of 15 kg. Interpret this value in context. Mass Variable n Mean StDev Minimum Q1 Median Q3 Maximum Mass We often standardize observed values to express them on a common scale. For example, we might compare the heights of two children of different ages or genders by calculating their z-scores. At age 7, Jordan is 51 in. tall. Her height puts her at a z-score of 1. That is, Jordan is 1 standard deviation above the mean height of 7-year-old girls. Zayne s height at age 9 is 54 in. His corresponding z-score is 0.5. In other words, Zayne is one-half standard deviation above the mean height of 9-year-old boys. So Jordan is taller relative to girls her age than Zayne is relative to boys his age. The standardized heights tell us where each child stands (pun intended!) in the distribution for his or her age group.