CHAPTER 10. Graphs, Good and Bad

Similar documents
Transcription:

CHAPTER 10 Graphs, Good and Bad

DISPLAYING DATA The first part of this course dealt with the production of data, through random sampling and randomized comparative experiments. This particular unit focuses on good ways to summarize and organize data. 2

DATA TABLES Who did you vote for in the 2008 presidential election? One way to organize the responses for all Americans is to create a data table. Good data tables should contain the following things: A clear main heading Clearly labeled variables Rates (percentages or proportions) should be used either instead of or to supplement counts 3

EXAMPLE 10.1 Votes in 2008 Presidential Election Candidate Number of votes Percentage Barack Obama 69,456,897 52.92% John McCain 59,934,814 45.66% Ralph Nader 738,475 0.56% Bob Barr 523,686 0.40% Chuck Baldwin 199,314 0.15% Cynthia McKinney 161,603 0.12% Other 242,539 0.18% Total 131,257,328 100% Data tables show what values a variable takes and how often it takes these values. In other words, data tables present the distribution of a variables 4

TYPES OF VARIABLES Some variables place individuals into categories (like eye color or gender), while some variables have a meaningful numerical scale (like height, age, or exam score). There are two types of variables: A categorical variables places an individual into one of several categories. A quantitative variable takes numerical values for which arithmetic operations such as averaging make sense. 5

CATEGORICAL VARIABLES Pie charts and bar graphs are good ways to show the distribution of a categorical variable. So we could summarize our presidential election data with either a pie char or a bar graph. 6

PIE CHART Voters in 2008 Presidential Election 0% 0% 0% 0% 1% 46% 53% Obama McCain Nader Barr Baldwin McKinney Other 7

BAR GRAPH 80,000,000 Voters in 2008 Presidential Election 70,000,000 60,000,000 Number of Voters 50,000,000 40,000,000 30,000,000 20,000,000 10,000,000 0 Obama McCain Nader Barr Baldwin McKinney Other Candidate 8

PICTOGRAMS Another method of displaying the distribution of a categorical variable. What is a problem with this graphic? 9

PICTOGRAMS Here are two charts which display the same information Ownership among certain types of pets Often misleading because they misrepresent the difference between values of the categorical variable. The artists who produce pictograms often sacrifice the accuracy of data so that they can avoid distortion of the pictures being used 10

LINE GRAPHS Line graphs are used to display how a quantitative variable changes over time. A line graph of a variable plots each observation against the time at which it was measured. We always put time on the horizontal axis (x-axis) and the variable on the vertical axis (y-axis). We then connect each data point to display the change over time. 11

EXAMPLE 10.2 For any line graph, we want to look for an overall pattern and any striking deviations from that pattern. What is the overall pattern? Are there any striking deviations from that pattern. Count Sales of New Trucks 10000000 9000000 8000000 7000000 6000000 5000000 4000000 3000000 2000000 1000000 0 1981 1985 1989 1993 1997 2001 Year 12

SEASONAL VARIATION Particular line graphs may display what is known as seasonal variation. This is a pattern that repeats itself at regular time intervals. Often times, series of regular measurements over time might be seasonally adjusted. This means that the expected seasonal variation is removed before the data are published. 13

EXAMPLE 10.3 Notice that the line graph has seasonal variation. We see that every year there is a spike in airline passengers. The overall trend here is an increase in airline passengers. 14

MISREPRESENTING DATA The most common method of misrepresenting data in line graphs is a result of picking certain scales. Notice how when I choose this scale, it looks like we have a rather slow increase in the number of unmarried couples over time. Unmarried Couples Unmarried Couples (thousands) 10000 5000 0 1977 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 Year 15

MISREPRESENTING DATA However, when we switch scales for the same data, we might be inclined to draw a different conclusion. While this line graph still shows an increasing trend, it looks much more dramatic than the previous line graph. U n m a r r i e d C o u p l e s ( t h o u s a n d s ) Unmarried Couples 5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0 Year 16

MAKING GOOD GRAPHS Title, Label, Scale Make sure labels and legends describe variables and their measurement units. Be careful with the scales used. Make the data stand out We want to ensure that the data itself, rather than any background art or labels, catches the viewer s attention. Avoid pictograms and be careful when choosing scales. Avoid 3D effects or other graphics that might confuse people. 17

REMINDERS Chapter 10 homework is posted online and due Friday. 18