Forecasting Introduction Version PDF Free Download

Forecasting Introduction Version 1.7 Dr. Ron Tibben-Lembke Sept. 3, 2006 This introduction will cover basic forecasting methods, how to set the parameters of those methods, and how to measure forecast accuracy. We will use the following terminology: F i Forecast of demand in period i. D i Actual demand in period i. F t,i Forecast of what the demand will be in period i; forecast was made at time t. 1 Forecast Error The way we will measure how well a forecast performs is by using measures of forecast error that you ve probably seen before. Define δ (delta) as the difference between the forecast and the actual: δ i = F i D i. We can perform a variety of calculations on this number to get a feel for how well our forecast method is performing. MAD = n i=1 δ i /n MSE = n i=1 (δ i ) 2 /n MAPE = n δ i i=1 D i /n RSFE = n i=1 δ i TS = RSFE / MAD MAD is the Mean Absolute Deviation, which tells us the average of the absolute values of the errors. MSE is the Mean Squared Deviation, which is the average of the squared errors. MAPE is the Mean Absolute Percentage Error, which takes the absolute error of each forecast, and divides it by the value of the demand, to get the error as a percentage of the demand, and then averages these percentage errors. RSFE is the running sum of forecast errors. Instead of taking the absolute value of the errors, the positive and negative numbers are allowed to cancel each other out, if that s what happens. Finally, the Tracking Signal (TS) takes the RSFE and divides it by the MAD. MSE is not as widely used. The MAD gives us a picture of the average amount of the error: on average we are off by ten units, sometimes high, sometimes low. The RSFE tells us whether our forecast is biased to always be too high, or always be to low. The analogy I like to make is to 1

archery. The MAD tells us that we miss the bullseye by 10 inches, on average, but it doesn t tell us which way we need to correct. The RSFE tells us that cumulatively, we have missed the target 100 inches to the right; so maybe sometimes we miss to the left, but on general, we miss to the right a lot more than we miss to the left, so we should correct by aiming a bit more to the left. If we aren t consistently shooting in the wrong direction, the RSFE should stick close to zero, sometimes positive, sometimes negative. If it becomes a large positive or negative number, we need to correct our forecast. But how big is a big enough error that we should do something about it? That s where the TS comes in. If the RSFE gets to be as big as say, 5 times the MAD, we need to fix something. So divide the RSFE by the MAD, and that s the TS. If that gets close to 5, (either positive or negative) we need to re-evaluate our forecasting method. 2 No Trend or Seasonality If there is no trend or seasonality to your demand, then every day should be pretty much like any other day. 2.1 Naive Method With the naive method, your forecast for any day is that it will be exactly like what happened the day before: F t = D t 1. This seems so simple that it would seem like it could never work very well, but it does surprisingly well in a lot of different circumstances. If demand goes up and down randomly a little bit each day, then the naive method seems like it would be at a bit of a disadvantage. If demand is a little below average one day, it s likely that it will be closer to the usual, or maybe a little above the usual the next day. 2.2 Simple Average With a simple average, our forecast for the next day is just the long-term average of all of the sales data we have. Sum of all t demands t 1 i=1 F t = = D i t t 1. Average all of the demand information you have. The argument that can be made in favor of this method is that the more data you get, the more your estimation of the true average gets closer to the real thing. To compare it to baseball, if a player gets 2 hits in 5 at-bats, that would be a 0.400 batting average. But to really know if a player is capable of hitting 0.400, we need to see how well the player does over a whole season. By the time we ve seen the player in action for a whole season, we really know how good the player is. The same argument could be made for the simple average: the more sales numbers you get, the better your estimate really fits the reality. 2.3 Moving Average In theory, the simple average works wonderfully for demand that has no growth over time, but in reality, everyone really has some increase or decrease over time. Because you are averaging some 2

numbers that eventually become really old, these old numbers drag down the average and should not be considered representative of our situation. For that reason, we should throw out some of the oldest data. In a moving average forecast, we decide that we are only going to take an average of the n most recent data points. We might use the last 4 months, or 3 years, whatever number we feel comfortable with. Our forecast for period t is given by: last n demands F t = = n t 1 i=t 1 n D i. n If there truly is no trend to the demand, this won t work quite as well as the simple average, but if there is any actual trend to the demand, the moving average will work better than the simple average. 2.4 Weighted Moving Average By switching to the moving average, we have solved the problem the simple average had of considering too much data. But there still could be a problem in that all of the data we are considering is getting an equal amount of weight. In some cases, people argue that the most recent information should get the most consideration in our calculation, and that older information should not get as much consideration. A way to fix this is to give each demand point a different amount of weight, and give more weight to the most recent data points, and less weight to older points. If we are using m periods in our calculation, we will give a weight of b j to each period. The oldest data point gets a weight of b 1, and the most recent gets a weight of b m. Usually, people set the weights so that b 1 b 2 b 3... b m. If all of the weights do not sum to 1.0, it won t really be an average. So to easily make the forecast a proper average, we will multiply each data point by its relevant weight, and then divide by the sum of the weights: mi=1 b 1+m i D t i F t = mi=1. b i Although the formula looks complicated, the idea is still relatively simple: if you are including n periods in your calculation, multiply the oldest data point by b 1 and the newest one by b n. Add them up, and divide by the sum of the b i s. 2.5 Exponential Smoothing A more accurate name for this method would be exponentially weighted moving average. The idea here is that we will build on the weighted moving average method, but instead of having to choose the parameters b i for each period, we will only have to choose one parameter, α, (alpha), which is a number between 0 and 1. This forecast is also easy to write the equation for: We can also write the same thing as: F t = α D t 1 + (1 α) F t 1. F t = α (D t 1 F t 1 ) + F t 1. 3

As you can see, it is very easy to use. All you need to decide ahead of time is what value to use for α, and you just need to know what the most recent demand was and what the most recent forecast was, and that s all you need to make a new forecast. If α = 0, the forecast never changes, and if α = 1, we just have the naive method. Usual values are around 0.1 to 0.3. The name comes from the fact that we are smoothing the numbers, getting a new estimate each time by modifying last period s number. The exponential part is a little trickier to explain, but here goes. If we make a forecast for period 10, we would write: F 10 = α D 9 + (1 α) F 9. But, we could ask ourselves, last month, when we made the forecast for period 9, how did we do it? We used this formula: F 9 = α D 8 + (1 α) F 8. So if you look at where the forecast for period 10 really comes from, if we substitute this expression in for the F 9 part, we could write it like this: F 10 = αd 9 + (1 α) [αd 8 + (1 α)f 8.] F 10 = αd 9 + α(1 α)d 8 + α(1 α)f 8. But where did F 8 come from? Again, the same formula: Substituting that in and arranging, we get: F 8 = α D 7 + (1 α) F 7. F 10 = αd 9 + α(1 α)d 8 + α(1 α) 2 D 7 + α(1 α) 2 F 7. We could play this game a few more times and get: F 10 = αd 9 +α(1 α)d 8 +α(1 α) 2 D 7 +α(1 α) 3 D 7 +α(1 α) 4 D 6 +α(1 α) 4 D 5 +α(1 α) 5 D 6 +... What you see then is that the new forecast that we so easily make for period 10 is really the sum of the past demand figures. Each demand number is getting a different amount of weight, so what this really is is a weighted moving average. But how do the weights change? Since α is between 0 and 1, (1 α) must also be a number between 0 and 1. Any time you multiply together numbers between 0 and 1, the result is a smaller number. (Try it.) So the weights get smaller exponentially, and hence the name. 3 What is Exponential Smoothing? The basic idea in exponential smoothing is that we take an average of our old estimate of some quantity, and some new information about that quantity. In exponential smoothing, we are assuming that there is no growth, no trend to the data. So every period, we are just making new estimates of the intercept. F t = α D t 1 + (1 α) F t 1. 4

The new demand gives us another data point about what the intercept might be, and the old forecast is our old estimate of the intercept. We can (and we are going to) use this idea to update estimates of other things, like a trend. Although the formula will look different, the idea is the same: 4 Setting Parameters New Estimate = α New information + (1 α) Old Estimate. How should we choose the best parameters of α or b? How do we make our forecasting methods get the best possible results? The idea is simple: try different parameter values until you get the MAD (or MSE, or whatever you want to focus on) as small as possible. You could do this by hand, but using a spreadsheet or some other computer tool is really the only way to do it quickly. However, there is a risk that by doing this, you will be cooking your method to fit the past data perfectly, but that doesn t mean it will work anywhere that well for the future. Continuing with baseball analogies, if you could throw the exactly same pitch 100 times (I could just stand there and rewind the ball like it was a video), I ought to eventually be able to hit a home run. But that doesn t mean I m going to hit a home run when I let you throw whatever you want to the next time. To get around that, a more trusted method is to take one part of your data for tweaking the parameters, probably the first half or two thirds of the data. Then look to see how the method performs on the remainder of your data. This is a more accurate portrayal of how it would do once you gave it some new data. 5 Forecasting with a Trend When our demand has a trend, there are two main methods that we can use. 5.1 Linear Regression I assume that you are all familiar with linear regression from your statistics classes. Basically, we assume that there is a linear relationship between one output (dependent) variable, Y, and the input (independent) variable, X. In our case, we will be looking at the independent variable as being time, t, and we think that demand is generally growing over time. We do a linear regression to get a formula like this: Y (t) = a + bt. For any time value, t, we put it into the equation, and get a straight-line forecast of the demand for that period. How well the data are approximated by the line is represented in the term R 2. R 2 can be literally interpreted as the percentage of changes in Y that can be explained by changes in X. To do a linear regression in Excel, there are four ways you could do, presented in the order of the things I like the least to the way I think is the best. First, you could dust off your statistics book and type in the formulas from it. That sounds like a lot of work, and there are lots of opportunities to make a mistake when typing in those big 5

formulas. But if you get it all in correctly, the spreadsheet will update automatically when you enter new data points. Secondly, could also use the Data Analysis ToolPak. You should find that under Tools Data Analysis. If it is not there, go to Tools Add-Ins. In that dialog box, check the box by Analysis ToolPak. If that does not appear in the dialog box, you need to get out your CDs and install that part of Excel. After you go to Tools Data Analysis, a dialog box comes up, where you tell it which cells are the X s, and which are the Y s. Tell it where to put the output, but be careful. If you put it on the sheet you are working on, the following 18 rows will get written over with the output. In that output, you will want to look at where the Intercept row and the Coefficient column intersect. That is the intercept. The slope is the row below that, in the X Variable column, where it meets the Coefficient column. R-squared is in the second row of numbers, under R Square. One problem with this method is that when you add new data point to the spreadsheet, you have to go back up to Tools Data Analysis every time to re-run the LR. The spreadsheet can t update automatically. Thirdly, another way to get the intercept and slope is to create a graph of the data. Right click on the data line, and select Add Trendline. In that dialog box, add a linear trendline, and under Options, you can have the equation of the trendline displayed on the graph, and also R-squared. The trouble with doing the LR in the graph, is that you can t make use of the numbers that appear in the graph in any calculations. Finally, the best way to do the LR is to use the SLOPE and INTERCEPT functions. SLOPE(range of x value, range of y values) gives you the slope, and INTERCEPT(x values, y values) gives the intercept. To find out R-Squared, use RSQ(x values, y values). 5.2 Double Exponential Smoothing The only problem with Linear Regression is that it gives all the demand points equal weight when trying to fit a line. Really, we would like it to try hardest to fit the line to the most recent data points, and not worry quite so much about fitting the line to the oldest data points. Linear regression cannot do that. However, we do remember that exponential smoothing had that type of behavior: give the most weight to the most recent. There is a way we can adapt exponential smoothing to work with a trend. This is also known as Holt s Method. We will define two terms: S i out estimate of the level, or intercept, at time i. T i our estimate of the trend in period i. T AF i+1 S i + T i. our estimate of what the demand will actually be in period i. In each period, we will revise our estimate of the level, and our estimate of the trend. To do that, we will use two different smoothing constants, α, and β. Like α, β is a number between 0.1 and 0.3, usually. 1. At the end of a period, compute a new intercept: S i = T AF i + α(d i T AF i ). We can also write this as: S i = αd i + (1 α) T AF i. 6

2. Compute a new, smoothed estimate of the trend T i = T i 1 + β(s i T AF i ). 3. Use these two new terms to predict the demand for period i + 1: T AF i+1 = S i + T i. If you want to make a forecast for the period after that, just add on another period s worth of growth: T AF i+n = S i + T i n. 5.3 Is this Right? When I first saw the equation for the trend, I thought why doesn t it look more like the equation for the intercept? I expected something like α times the new information, plus (1 α) times the previous estimate. It seemed to me that it should really look like T i = β(s i S i 1 ) + (1 β) T i 1. This follows the same scheme that the S i formula follows. (S i S i 1 ) shows us how much the intercept has changed recently, so it is the new information. We ll give it the weight of β, and give the old estimate of the trend the weight of (1 β). If we take this, and multiply out the parenthesis (using the distributive property) we get: We can rearrange this to be T i = βs i βs i 1 + T i 1 βt i 1. T i = T i 1 + βs i β(s i 1 + T i 1 ). Notice that the last part of this, S i 1 + T i 1, is just T AF i = S i 1 + T i 1. If we write it that way, we get T i = T i 1 + βs i βt AF i. If we put the last two terms together, we get the following, which is the original equation: T i = T i 1 + β(s i T AF i ). So even though the two equations look different, they are equivalent. 6 Problems 1. Explain the difference between bias and deviation. 2. Explain the difference between MAD and MSE. 3. Using the following data, create a forecast for each period, using the following methods: 7

(a) 3 period Moving Average (b) Weighted Moving Average - 5 periods, you choose the weights (c) Exponential smoothing - Compute the MAD of it, and play around with the alpha to try to get the MAD as small as you can get it. (d) Naïve method (e) Plot the demands and all of the forecasts on a graph. Which method seemed to work the best? 4. Using the data for Problem 4, below, create a forecast for each period. (a) Create the forecasts using Linear Regression. What is the R 2 value? What is the MAD from your forecasts? (b) Create the forecast for each period using double exponential smoothing. Use α = 0.2, β = 0.15, and use 55 as the initial intercept, and 4 as the initial slope. What is the MAD of your forecasts? Problem 3 Problem 4 Period Demand Demand 1 100 50 2 98 65 3 105 72 4 102 69 5 103 78 6 105 65 7 108 78 8 115 84 9 124 79 10 120 64 11 115 89 12 119 84 13 126 88 14 132 94 15 145 83 16 129 84 17 135 91 18 142 104 19 134 100 20 154 103 8