Online Student Guide Scatter Diagrams

Size: px
Start display at page:

Download "Online Student Guide Scatter Diagrams"

Transcription

1 Online Student Guide Scatter Diagrams OpusWorks 2016, All Rights Reserved 1

2 Table of Contents LEARNING OBJECTIVES... 3 INTRODUCTION... 3 UNIVARIATE AND BIVARIATE DATA... 3 CORRELATION... 4 POSITIVE OR NEGATIVE CORRELATION... 4 NEGATIVE CORRELATION... 4 NO CORRELATION... 5 RELATIVE STRENGTH... 5 TYPES OF CORRELATION... 6 BUILDING A SCATTER DIAGRAM... 6 COLLECT DATA... 7 ARRANGE DATA... 7 MAKE CHART... 7 PLOT THE DATA... 8 SELECTING AN ACTION... 8 ERRORS IN ANALYSIS... 8 STRATIFICATION... 9 APPLYING STRATIFICATION... 9 CAUSE AND EFFECT... 9 CORRELATION DOES NOT IMPLY CAUSE AND EFFECT EXTRAPOLATION ANALYSIS TECHNIQUES: SUMMARY by OpusWorks. All rights reserved. Version 5.5 April, 2016 Terms of Use This guide can only be used by those with a paid license to the corresponding course in the e-learning curriculum produced and distributed by OpusWorks. No part of this Student Guide may be altered, reproduced, stored, or transmitted in any form by any means without the prior written permission of OpusWorks. Trademarks All terms mentioned in this guide that are known to be trademarks or service marks have been appropriately capitalized. Comments Please address any questions or comments to your distributor or to OpusWorks at info@opusworks.com. 2

3 Learning Objectives Upon completion of this course, student will be able to: Show how to determine if two variables plotted on a Scatter Diagram appear to be correlated and to what degree How to build a Scatter Diagram How to avoid errors in analyzing Scatter Diagrams How to use stratification to further explore the relationship between variables Introduction Univariate and Bivariate Data With a few exceptions, your analysis up to this point has focused mainly on analysis of data for one variable. In this example, a second variable, Advertising Expenses was measured to see how Sales plotted on the vertical axis might be related to Advertising Expenses, plotted on the horizontal axis. This plot requires the data be collected in pairs, one value for each variable. Like a histogram that gives us a picture of the variation in the data for one variable, a Scatter Diagram gives us a picture of the relationship between two variables. From this plot, it appears there is an association between Advertising Expenses and Sales. 3

4 Correlation In general terms, correlation refers to any kind of association or interdependence between two sets of data or variables. Correlation is shown graphically with the use of Scatter Diagrams also known as Scatter Plots. As spending on Advertising Expenses increases, Sales increase. When both variables change in the same direction, we call this a positive relationship or positive correlation. Positive or Negative Correlation Negative Correlation But not all variables will have a positive correlation between them. Look at this example. As you drag the pointer on this graph from point A to point B for the X variable, "Mortgage Rates", observe what happens to the values for, "Housing Starts", plotted on the Y axis. As "Mortgage Rates" increase, what happens to the corresponding values for "Housing Starts"? Correlation is an indication of direction. Like a needle on a compass that lines up in a North and South direction, we use this Correlation Compass to tell us whether there is a positive or negative correlation between the variables. Placing the compass at the center of the data, the needle of the compass represents the best line we can draw through the data. The direction the needle points to indicates whether the correlation is positive or negative. In this case, the direction is positive. 4

5 No Correlation By building and examining a Scatter Diagram, you can form a general impression if two variables are positively or negatively correlated. Of course, the variables may not be correlated at all. When no relationship exists between the variables, the Scatter Diagram will appear as random dots scattered about with no apparent pattern. Relative Strength For large values of the X variable, the value of the Y variable could also be large or the value of the Y variable could be small. Similarly, small values of the X variable could be associated with small values of the Y variable or with large values of the Y variable. When this occurs, we say there is no correlation between the two variables. The best line representing this data is a horizontal line. Besides indicating if a relationship between two variables is either positive or negative, a Scatter Diagram may be used to classify the relative strength of the relationship as weak or strong. The strength of the relationship between two variables is determined visually by the tightness of the cluster of points on the Scatter Diagram around the line drawn through the points. Visually, the Scatter Diagram is an excellent tool to provide initial insight into a possible relationship between two variables. 5

6 Types of Correlation Displayed here are various Scatter Diagrams showing examples of the different types of correlation between two variables. Building a Scatter Diagram You have learned what a Scatter Diagram looks like and how to use it to visually identify the possible relationship between two variables. Now, let s learn how to build a Scatter Diagram. To build a Scatter Diagram, we will use a six-step process. Collect Data, Arrange Data, Make Chart, Plot Data, Understand Chart, and Select Action. To remember the six steps, think of the word "CAMPUS". The letters of the word CAMPUS represent the six steps. 6

7 Collect Data In the first step we need to collect data. Data for the two variables will be collected in pairs. Each pair of numbers will represent the value of the two variables at a given point in time and will become a single point on the Scatter Diagram. Arrange Data Arranging the data is not necessary for plotting but it helps to visualize the relationship and reduces errors. Arrange the data, keeping the pairs in tact, from the smallest value to the largest value for the X or independent variable. During the sort the integrity of the original pairs, as well as other associated data, is maintained. This task is easily accomplished using a computer. With the values of the X variable sorted in increasing order, scan the values of the Y variable to see if they are increasing or decreasing. This data appears to imply a negative relationship. Make Chart Step three is making the chart. Label the horizontal and vertical axis, select the appropriate scales, and put titles on the chart. 7

8 Plot the Data Starting with the first pair of numbers, we plot the first point on the graph. Proceeding with each successive pair, another point is added to the graph. This continues until all the points are plotted. Of course, using a computer, this happens almost instantaneously. With a Scatter Diagram, we are interested in the pattern formed by the all the points, not the order in which the points were plotted. Selecting an Action Errors in Analysis Scatter Diagrams are a powerful tool for investigating relationships between variables but all too often, due to a lack of understanding, they are not interpreted correctly. Some of the common mistakes made are: Failing to recognizing a relationship exists. Thinking correlation implies cause and effect. Extrapolating or predicting outside the bounds of the data. The final step in the process is for the team to select a course of action based upon the interpretation of the Scatter Diagram. After all, one important use of a scatter diagram is to see whether the possible relationship implied in the data is true or not. Actions taken may include the need for further exploratory study of the data or a more in depth analysis of the problem to see if the variables really are related. This could include the need to design an experiment to confirm the relationship or include more variables in the study. 8

9 Stratification Look at this Scatter Diagram. Based upon preliminary analysis, a rush to judgment might dismiss a potential relationship between the variables. By using a technique called stratification a potential relationship may be discovered. Stratifying the data is done by identifying related subgroups. Of course, there must be some commonality in the groupings. Applying Stratification This Scatter Diagram is a plot of productivity versus hours of overtime for all three shifts together. But when you look at each shift separately, you might see something different. These points belong to first shift. Let's plot them separately. And similarly, these points are from second shift. And finally, these points are from third shift. Quite a different picture now! Looks like the more overtime third shift works, the less productive they are. Cause and Effect One of the most commonly committed errors when using Scatter Diagrams is to think correlation implies a cause and effect relationship exists between the two variables. This is not always true. Let's look at just one example. 9

10 Correlation Does Not Imply Cause and Effect Reducing ice cream sales probably would have no effect on the number of lifeguard rescues required. Ice cream sales and the number of rescues reported are correlated but one is not the cause of the other. There is a hidden or third variable that both of these variables are related to, and that variable is temperature. More people are at the beach during summer when temperatures rise. Temperature rises, ice cream sales increase, and the number of lifeguard rescues also increase. It s that simple. So remember these words of wisdom, correlation does not necessarily imply a cause and effect relationship. Extrapolation Throughout this lesson on Scatter Diagrams, we have made an assumption that the points cluster around a straight line. That is commonly referred to as a linear relationship. Analysis Techniques: Summary But there are times when the true relationship between the variables is nonlinear. One common error, when using data for predicting is the tendency to extrapolate. Extrapolate means to extend beyond the realm of what is known. To try and predict outside the range of the data plotted on the Scatter Diagram. This example will show the danger very clearly. Here is a summary of the techniques and pitfalls just discussed. First tip is to extrapolate with caution, verify. With the MPH example, you should collect some data at 60,70 and 80 MPH to verify your prediction. Watch for hidden variables. Remember the ice cream example. Stratification may help. This refers to our example with Productivity and Overtime. Once learned and practiced, the techniques we just discussed will become a routine part of your analysis. 10