CS130 Software Tools. Fall 2010 Regression and Excel

Size: px
Start display at page:

Download "CS130 Software Tools. Fall 2010 Regression and Excel"

Transcription

1 Software Tools Regression and Excel 1

2 Regression Analysis (Part 1) Regression Analysis The statistical crystal ball Regression analysis is a form of statistical analysis used for forecasting. We are going to talk quite a bit about it again when we reach the SPSS section Regression analysis estimates the relationship between variables, so that a particular variable can be predicted from one or more other variables. During regression analysis, we need to fit functions to data. Let's talk about why forecasting this way is both important and error prone. 2

3 Regression Analysis (Part 1) Trendlines are used to graphically display trends in data and to analyze problems of prediction. Behind the trendlines, are equation which look at the actual data points and determine the differences between the actual and a model line In other words we try to draw a line that best fits the data. By using regression analysis, you can extend a trendline in a chart beyond the actual data to predict future values. However, you should understand that the line should be placed such that the distance or variation from each data point to the line is minimized. 3

4 Regression Analysis (Part 1) There a many types of regression models, the most common is linear regression In linear regression, we try to find a straight line that best fits our data. We first need to plot our data using Excel s XY or scatter chart. We then add the trendline to the chart and use the function to predict future values for our data. 4

5 Regression Analysis (Part 1) A trendline for the data of AUD/USD foreign exchange rate 5

6 Regression Analysis (Part 1) The chart above depicts the price ratio of the Australian Dollar to the US Dollar over a period of time. The ratio of these two currencies means quite a bit to a foreign exchange trader. The upward trend means that the value of the Australian dollar, and maybe the strength of their economy, is increasing relative to the US Dollar. The trader then can watch the trendline and determine when he/she will want to sell any holding in AUDs. So, let's get some sample data and try to analyze it using these techniques. This process is often called data visualization, or at least a very simple view of it. Trendlines provide insight into the meaning behind the data, they are an "interpretation" of the data points. 6

7 Economic Indicators Important sources of statistical data for economics, social science and finance. Find the International Trade and Goods reports, this tells us how much of a trade deficit we are accruing We can now do some analysis of our own using this data file (in Excel) Try It => Grab the Data file from Turing and place it on your desktop. Open it in Excel 7

8 Economic Indicators To start understanding the nature of your data and to use regression analysis tools, you must first have data and in Excel you approach it from the chart wizard. The detailed steps are: Enter/Load/Import the data in an Excel worksheet and select the data you want to plot. Click on the chart wizard. Choose XY (scatter) plot, line, or other charts 8

9 Economic Indicators Check that the data range is correct. Enter the titles and labels. Click on the chart then select CHART from menu bar and ADD TRENDLINE from this menu. From the menu that appears, select the type of function that you would like to use for your model. In this example we will use the default, which is LINEAR REGRESSION. In order to have Excel display the equation of our regression line and the correlation coefficient, you need to click on the OPTIONS tab within the Add Trendline screen above, click on these two options, and then press OK. You should be rewarded with a graph, equation and regression coefficient. 9

10 Linear Regression Regression Coefficient (R 2 ) The regression coefficient, also known as the R-squared value, is an indicator that ranges in value from 0 to 1 and reveals how closely the estimated values for the trendline correspond to your actual data. A trendline is most reliable or predictive when its R-squared value is at or near 1. Why might that be? 10

11 Linear Relationship visualstatistics.net 11

12 Non-Linear Regression What happens if you the data you have collected does not obey a linear relationship? Meaning the natural effects between the important variables do not follow the y=mx + b relationship? Often, these relationships are nonlinear and we need a different type of graph to fit the data. What does it mean to be nonlinear with the data we have collected? Excel provides us with different types of nonlinear functions that we can use to fit data. These functions include polynomial, exponential, logarithmic and power. 12

13 Non-Linear Regression Solving Exponential and Logarithmic Equations Some nonlinear data turns out upon analysis to follow an exponential or logarithmic relationship, this means while it is not linear, the variables do have a predictable condition in relation to each other. Linear relationship => my spouse comes home from work, pretty mad and continues to grow more and more angry. Because we are linear, the minute he walks through the door (start of our data collection), I am equally as mad and the madder he gets, the madder I get. Even though I am not sure what he is mad about. 13

14 Non-Linear Regression Solving Exponential and Logarithmic Equations Recall that to solve an equation of the form y = ae bx for x (where a and b are just constants), you first divide by a to obtain y/a = e bx. Now, you must take the natural logarithm of each side to obtain ln(y/a)=bx. Dividing by b yields x = (1/b)ln(y/a). Recall that to solve an equation of the form y = a ln(bx) for x (where a and b are just constants), you again divide by a to obtain y/a = ln(bx). Now, you must take the exponent of each side to obtain e y/a = bx. Dividing by b yields x = (1/b)e y/a. 14

15 In-class exercise 5 15