In Chapter 3, we discussed the two broad classes of quantitative. Quantitative Forecasting Methods Using Time Series Data CHAPTER 5

Size: px

Start display at page:

Download "In Chapter 3, we discussed the two broad classes of quantitative. Quantitative Forecasting Methods Using Time Series Data CHAPTER 5"

Suzanna Gardner
5 years ago
Views:

1 CHAPTER 5 Quantitative Forecasting Methods Using Time Series Data In Chapter 3, we discussed the two broad classes of quantitative methods, time series methods and causal methods. Time series methods are techniques built on the premise that future demand will mimic the pattern(s) of past demand. Time series methods rely on the identification of patterns (i.e., trend, seasonality, and/or cyclical) within the past demand history of those items being forecasted and assume the patterns will continue into the future. The basic premise of causal methods is that future demand of a particular product is closely associated with (or related to) changes in some other variable(s). For example, changes in demand can be associated with variations in price, advertising, sales promotions, and merchandising as well as economic and other related factors. Therefore, once the nature of that association is quantified, it can be used to forecast demand. Another key attribute of causal modeling is the ability to shape demand using 125

2 126 QUANTITATIVE FORECASTING METHODS USING TIME SERIES DATA what-if analysis utilizing the parameter estimates or elasticities associated with the causal factors to predict changes in demand as a result of varying the levels of the relationship variables. By changing price, say, from $1.34 to $1.40, you can determine what the impact will be on demand for that particular brand or product. In this chapter we discuss these statistical methods in more detail from a practical application standpoint using the beverage data set. When it comes to statistical modeling and forecasting, most textbooks and academic teaching are focused on validating the equations rather than on practical applications. In many cases, this is like teaching someone how to drive a car. You really do not need to know how to build the car; you just need to know how to drive the car from point A to point B. Doing this requires learning how to start the car by turning on the ignition, learning how to accelerate and stop using the accelerator pedal and brakes, and turning the car properly using the steering wheel, accelerator pedal, and brakes simultaneously. In addition, it requires learning the laws for the given state regarding signage, right-of-way at a four-way stop street, and other related driving rules and regulations. However, you may know how to drive a car, but you also need to understand how to look at the gauges and realize when to put gas, oil, and other fluids into it when there are warning signs. Finally, you must know the rules of the road (i.e., have a license) in order to properly operate a car. Otherwise, you can end up in an accident. Although most people can learn how to drive a car, they still require a license indicating they understand the rules of the road and have been tested to have a basic understanding of how to drive a car on the road alongside other drivers. Similarly, ongoing training in statistical forecasting should be required for all modelers and forecast planners. Many of my academic friends and colleagues will most likely disagree, but given the advances in statistical software, most people can learn how to apply advanced statistical methods without learning the underlying algorithms. The goal of this chapter is to teach practitioners how to apply statistical methods to sense demand signals, shape demand, and forecast demand for their products without focusing on the validation of the underlying algorithms. We also focus only on those methods that are most practical for demand-driven forecasting.

3 UNDERSTANDING THE MODEL-FITTING PROCESS 127 We are not in any way insinuating that other proven statistical methods cannot be used for demand-driven forecasting. We are simply focusing on the most common methods that have been used successfully in practice during my years of experience as a forecast practitioner. UNDERSTANDING THE MODEL-FITTING PROCESS Given that demand for a product is on a time scale, we can be standing at a certain point in time, which may not necessarily be at the beginning or the end of the demand history. This is called an observation or reference point in time. From that reference point, we can look backward over the past demand history and forward into the future. When selecting a forecasting method, we fit the model to a known data set and obtain fitted values. A critical outcome from fitting the model to a known data set allows the calculation of fitted errors, or measure of the goodness of fit, for that particular model. The output is a new set of demand periods that can be examined, and as these new demand points are derived, we can measure forecast error. 1 This can be illustrated by: Available Past Demand History (Actuals) Latest Demand Point Y t n + 1 Y t 2 Y t 1 T t Time New Fitted Model Values Forecast Periods F t + 1 F t + 2 F t + 3 F t n + 1 F t 2 F t 1 F t Time Fitted Errors Forecast Errors (Y t n + 1 F t n + 1).... (Y t 1 F t 1), (Y t F t ) (Y t + 1 F t + 1), (Y t + 2 F t + 2), (Y t + 3 F t + 3) Once we have collected actual demand history for a product at a point in time (reference point), we choose a model to fit to the demand history. Then we compare this new estimated (fitted)

4 128 QUANTITATIVE FORECASTING METHODS USING TIME SERIES DATA historical demand produced by the model to the actual (known) demand, thus allowing the calculation of the error associated with the fitted demand and known demand. We do the same for future forecast values once we have the actual demand to compare to the forecasted values. Note that a good (or low) fitted model error does not necessarily mean you will get a good (or accurate) future forecast. It only determines that the model can predict past historical demand well. However, chances are that if a model can predict past demand, it most likely will predict future demand with a similar error. The process for evaluating a forecasting methodology is important to determining what model is best to apply given the data set being forecast. This nine-step process is a proven strategy of evaluating and choosing the appropriate quantitative forecasting method. 1. Identify a time series or data set (in this case, demand history for a product). 2. Divide the data set into two parts, the in-sample set and the outof-sample set. 3. Choose the quantitative forecasting method. 4. Using the in-sample data set, run the model to get the fitted results. 5. Use a forecasting method to create a forecast. 6. Compare the forecast against the out-of-sample data set. 7. Evaluate the results to determine how well the model forecast fits to the demand history. 8. If the model is chosen, combine the in-sample and out-ofsample data sets. 9. Reinitiate the model with all the data (both data sets) and create a forecast. Step 1: Identify a time series or data set. Choosing and collecting the appropriate data set is important. The data set (or demand history) should be the most accurate reflection of true demand. The most appropriate data set is point-of-sale (POS) data or syndicated scanner data. A good substitute is customer order history. If neither is available, then shipment history is recommended.

5 UNDERSTANDING THE MODEL-FITTING PROCESS 129 The amount of demand history (or data points) is also very important. A minimum of three years of history is recommended, as you need three seasonal cycles to determine if any seasonality is associated with the data set. This is required for both weekly and monthly data (36 monthly periods or 156 periods of weekly data). Ideally, three to five years of demand history is preferred. Step 2: Divide the data set into two parts, the in-sample set and the out-of-sample set. The demand history or time series is then divided into two separate data sets: (1) in-sample data set and (2) out-of-sample data set. This will allow for the evaluation of the forecasting method being deployed. Step 3: Choose the quantitative forecasting method. Select the forecasting method from an available list of methods. There is no best method. The best method depends on the data, the purpose, the organizational environment, and the perspective of the modeler. The market, products, goals, and constraints should also be considered when selecting a forecasting method. Steps 4, 5, 6, 7: Using the in-sample data set, run the model to get the fitted results. Use a forecasting method to create a forecast. Compare the forecast against the out-of-sample data set. Evaluate the results to determine how well the model forecast. We fit the model to the in-sample data set, which should be a minimum of three years, and forecast out against the out-ofsample data set, which should be the most recent year of actual demand or at least several months (or weeks). For example, if you have demand history by month for 2010, 2011, 2012, and 2013, you should use the 2010, 2011, and 2012 monthly demand as the in-sample data set to fit the model, and 2013 as the out-ofsample data set to compare the forecasts from your model with the actual demand for those monthly periods (or weekly periods). You can use mean absolute percentage error (MAPE), mean absolute deviation (MAD), or other forecasting performance metrics.

6 130 QUANTITATIVE FORECASTING METHODS USING TIME SERIES DATA Step 8: If the model is chosen, combine the in-sample and out-ofsample data sets. Once you have selected the forecasting method, add the out-ofsample periods to the in-sample periods and refit the forecasting model to all the demand history or the entire data set. Step 9: Reinitiate the model with all the data (combining both data sets) and create a forecast. Finally, generate a forecast for the unknown periods chosen into the future based on the model estimates using the entire demand history or data set. Steps 4, 5, 6, and 7 are iterative phases in the process. You evaluate each candidate forecasting method chosen based on the results of the out-of-sample forecast and actual demand (data set) for those periods. INTRODUCTION TO QUANTITATIVE TIME SERIES METHODS Most quantitative forecasting methods are based on the premise that when an underlying pattern in the historical demand for a product exists, that pattern can be identified and predicted separately from any randomness. Most time series methods use smoothing (or averaging) to eliminate randomness so the pattern can be projected into the future and eventually used as a forecast for demand. In many cases, the pattern can be decomposed into additional patterns that identify several components within the time series, such as trend, seasonality, cycles, and randomness. Doing this also provides a better understanding of the behavior of the time series, which helps improve the accuracy of the forecasts. Most time series methods focus primarily on trend/cycles and seasonality to predict the underlying patterns within the demand history of the product being forecasted. The seasonal factor relates to periodic fluctuations associated to weeks, months, holidays, and other consistent factors that repeat in the same period every year. Seasonality can also be related to sales promotions and marketing events that occur in the same week or month every year. The trend/cycle components can be separated or combined, depending on the time series method being deployed, and represent the longer-term changes in the

7 INTRODUCTION TO QUANTITATIVE TIME SERIES METHODS 131 level of the time series. Most time series methods consider the trend/ cycle as one component. Time series methods assume that the demand for a product is made up of these components: Demand = Pattern + Unexplained error = f (Trend/cycle + Seasonality + Unexplained error) where f = function of In addition to the two components (trend/cycle and seasonality), there is also unexplained error or randomness present in the patterns. The unexplained error is the difference between the combined patterns of the trend/cycle, seasonal, and the actual demand. It is also called the irregular component, as it represents the unexplainable patterns left over, or irregular demand. There are several different approaches to identifying and measuring the trend/cycle and seasonal components using time series methods. In all cases, the goal is to isolate, separate, and remove the trend/ cycle and then identify and measure the seasonal component. Any residual left over is considered randomness or unexplained error. Although unexplained error cannot be predicted, it can be identified. Given my experience as a practitioner, this approach works fairly well, but only for products that have a stable trend over time, are highly seasonal in nature, and have little sales and market activities associated with them. Those products that fall into this category are normally harvest brands that are in their mature product life cycle stage. The key approach to time series methods involves smoothing the original demand history. Although many of the time series techniques date back to the early 1900s, they were updated and given more statistical precision during the late 1950s and early 1960s. Today, the most popular time series technique is Winters three-parameter exponential smoothing. Autoregressive integrated moving average (ARIMA) models are gaining some headway in use as software is making it easier to deploy more advanced time series methods. This is good news because ARIMA models normally outperform exponential smoothing methods in head-to-head competition, according to studies conducted by the International Institute of Forecasters (M1- through M3-Competitions that took place between ).

8 132 QUANTITATIVE FORECASTING METHODS USING TIME SERIES DATA The universally accepted statistical form for a time series decomposition approach 2 can be represented as: Y t = f (T t, S t, E t ) where Y t = time series value for actual demand at period t T t = trend/cycle component at period t S t = seasonal component at period t E t = irregular or randomness component at period t Two different statistical forms can be deployed using a time series decomposition approach, additive and multiplicative. The most common approach is the additive form: Y t = T t + S t + E t (additive form) Using the additive form, the trend/cycle, seasonal, and irregular components are simply added together to create the fitted series. The alternative time series decomposition form is multiplicative: Y t = T t S t E t (multiplicative form) In this case, the trend/cycle, seasonal, and irregular components are multiplied together to create the fitted series. Additive models are more appropriate if the magnitude of the seasonality does not vary with the level of the series. However, if the seasonality does fluctuate with the magnitude of the data (i.e., the fluctuations increase and decrease proportionality with increases and decreases in the demand for the product), then multiplicative models are more appropriate. When forecasting demand for products in the consumer packaged goods (CPG) industry, empirical findings have indicated that multiplicative time series decomposition methods are more useful, as most product seasonality does vary with increases in demand. Figure 5.1 shows the two key components of the beverage demand data set along with the unexplained error, or irregular component. In this case, there appears to be a slight trend in the data but no real cycles. The seasonal component does appear to be additive, not increasing or decreasing with increases and decreases in demand. The seasonal pattern is the same from the start to the end of the data series. If the seasonality were multiplicative, it would vary from the start to the end of the data series. Figure 5.2 further indicates that the weekly

INTRODUCTION TO QUANTITATIVE TIME SERIES METHODS 133 VOLUME 20000 15000 10000 5000 Series Plot 01 Jan 1999 01 May 2000 01 Sep 2001

Date Original Trend-Cycle Irregular Component Plot for VOLUME seasonal cycles are not varying much week by week each year and are in

There is some variation in the first half of the year, which possibly can be related to other nonseasonal factors, such as sales

01 Jan 1999 01 May 2000 01 Sep 2001 01 Jan 2 2000 0 2000 Figure 5.

9 INTRODUCTION TO QUANTITATIVE TIME SERIES METHODS 133 VOLUME Series Plot 01 Jan May Sep 2001 Date 01 Jan 2 Trend-Cycle Component Plot for VOLUME Trend-Cycle Jan May Sep Jan 2 Date Original Trend-Cycle Irregular Component Plot for VOLUME seasonal cycles are not varying much week by week each year and are in fact almost identical from year to year (cycle to cycle), particularly in the second half of the year. There is some variation in the first half of the year, which possibly can be related to other nonseasonal factors, such as sales promotions and marketing events, as well as the seasonal Seasonal Irregular Seasonal Component Plot for VOLUME Jan May Sep Jan Figure 5.1 Beverage Data Time Series Components VOLUME Seasonal Cycle Plot for VOLUME Date 01 Jan May Sep Jan 2 Date Season Figure 5.2 Beverage Data Time Series Seasonal Component Cycles

10 134 QUANTITATIVE FORECASTING METHODS USING TIME SERIES DATA Seasonally Adjusted Seasonally Adjusted Series Plot for VOLUME 01 Jan Sep May Jan 2001 Date Original Adjusted swings associated with the Easter holiday. The seasonal volumes in the first half of the year also indicate higher volumes associated with the first few weeks of January and the mid-weeks of May. This beverage product happens to be a premium product that is taken to holiday parties as a host gift. This may be why there are high-volume spikes in the first week of the year related to the New Year s holiday. Figure 5.3 shows the weekly adjusted seasonality and compares it to the original seasonality; it indicates several abnormal volume spikes in January 2000, July 2000, January 2001, June 2001, and January These spikes are most likely attributed to other factors, such as sales promotions and/or marketing events. Seasonally adjusted time series are easily calculated. When calculating additive seasonally adjusted demand, you simply subtract the seasonal component from actual demand, leaving the trend/cycle and irregular components. The statistical formulation is written as: 3 Y t S t = T t + E t 01 Sep May Jan 2003 Figure 5.3 Beverage Data Time Series Seasonal Adjusted Component Cycles In the case of multiplicative seasonally adjusted demand, the demand data are divided by the seasonal component to create the seasonally adjusted demand. The seasonally adjusted demand reflects the data series after all seasonal variations have been removed from the original seasonal component. All these decomposition graphs are generated as standard output in most demand forecasting software solutions.

11 QUANTITATIVE TIME SERIES METHODS 135 When you add all these components (trend/cycle, seasonal, and irregular) together, you get the original demand data series. These decomposition graphs are helpful in visualizing the key components that time series methods use to decompose the historical demand for a product to create forecasts of future demand. Although decomposing the time series data is helpful, I have found that decomposition alone does not always work well, particularly when there is a lot of unexplained error or randomness in the historical demand for a product. In most cases we need to utilize more robust methods that can incorporate causal factors or relationship variables to explain away the unexplained error. Shaping demand also requires more causal factors as trend and seasonality alone cannot provide all the necessary information to shape demand. QUANTITATIVE TIME SERIES METHODS The methods examined throughout the remainder of this chapter focus on techniques that use past demand historical data and then apply mathematical models to extrapolate the trends, cycles, seasonality, and other factors that influence demand into the future. The assumption of all these techniques is that the activities responsible for influencing the past will continue to impact the future. When forecasting short-term demand horizons, this is often a valid assumption, but in many cases it could fall short when creating medium- and long-term forecasts. The assumption with most statistical forecasting methods is that the further out you attempt to forecast, the less certain you should be of the forecast. The stability of the environment is the key factor in determining whether trend, cycle, and seasonal extrapolations are appropriate forecasting methods. There are many mathematical methods for forecasting trends, cycles, and seasonality. Choosing an appropriate model for a particular demand forecasting application depends on the historical data. The study of the historical data, known as exploratory data analysis, identifies the trends, cycles, and seasonality as well as other factors in the data so that appropriate models can be selected and applied. The most common mathematical models involve various forms of weighted smoothing methods. Another type of model is known as

12 136 QUANTITATIVE FORECASTING METHODS USING TIME SERIES DATA decomposition. This technique mathematically separates the historical data into trend, cycle, seasonal, and irregular (or random) components. ARIMA models, such as adaptive filtering and Box-Jenkins analysis, constitute a third class of mathematical model, while simple linear regression and multiple regression is a fourth. The common feature of these mathematical models is that historical data are the only criteria for producing a forecast. You might think that if two people use the same model with the same data, the forecasts will also be the same, but this is not necessarily the case. Mathematical models involve smoothing constants, coefficients, and other parameters must be decided by the modeler. To a large degree, the choice of these parameters determines the demand forecast. It is popular today to diminish the value of mathematical extrapolation. In fact, some well-known and highly regarded forecasting gurus stress that judgmental forecasting methods are superior to mathematical models. However, in many forecasting situations, computer-generated forecasts are more feasible. For example, large manufacturing companies often forecast thousands of items each month in a business hierarchy; it is difficult or simply not feasible to use judgmental forecasting methods in this kind of situation. MOVING AVERAGING When there is no seasonality associated with demand, only the trend/cycle and irregular components can be estimated. In these situations, the trend/cycle component can be estimated by using a smoothing technique to reduce, or smooth, the random variations. A range of smoothing techniques can be deployed, including simple moving averaging, double moving averaging, weighted moving averaging, and center moving averaging. Moving averaging techniques provide a simple method for smoothing past demand history. These decomposition components are the basic underlying foundation of almost all time series methods. Later in the chapter we use moving averaging in conjunction with trend/cycle and seasonality to model demand history and use it to predict future demand.

13 MOVING AVERAGING 137 The principle behind moving averaging is that demand observations (weekly/monthly periods) that are close to one another are also likely to be similar in value. So, taking the average of nearby historical periods will provide good estimates of trend/cycle for that particular period. The result is a smoothed trend/cycle component that has eliminated some of the randomness. The moving averaging procedure creates a new average as each new observation (or actual demand) becomes available by dropping the oldest actual demand period and including the newest actual demand period. The key to moving averaging is determining how many periods to include. For example, using the average of three periods to calculate the trend/cycle is called a moving average (MA) of order 3, or 3 MA. Table 5.1 shows the last three years of weekly actual demand for weeks 1 through 13 (January March) for the beverage data set. If the trend/cycle moving average were being calculated for week 4 in year 1, the estimated demand would include weeks 1, 2, and 3 of year 1. The formulation is: T = ( W1 W 2 + W3) = = where W = week Table 5.2 illustrates how the 3 MA and 5 MA can be applied to each week of the first 13 weeks of the first year for the beverage demand data. Note that there is no estimate for trend/cycle at week 1 through 3 due to the unavailability of the weeks prior to week 1. The number of periods included in a moving average affects the degree of smoothing within the estimate. The more weeks included in the moving average, the more smoothed the fitted values and the one-period-ahead forecast. As a result, the 5 MA smoothing is simply the average of each actual period of demand for periods 1 through 5. The formulation is: 1 T t = ( ) = 5 = 8232

14 Table 5.1 Beverage Data Set (Weekly Demand) Year Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week

15 MOVING AVERAGING 139 Table 5.2 Week Beverage Data Set (Weekly Moving Average) Actual Week Week Week Generally speaking, a moving average forecast of order k, or MA (k), can be written as: 4 F 1 1 t + = Y t k t i= t k+ 1 3-Week Moving Average 3 MA Week Week where k = number of periods in the moving average 5-Week Moving Average 5 MA Week Week Week Week Week Week Week Week Week Fitted Error Analysis Number of test periods 10 8 Mean Errror (ME) Mean Absolute Error (MAE) Mean Absolute Percentage Error (MAPE)

16 140 QUANTITATIVE FORECASTING METHODS USING TIME SERIES DATA To avoid any confusion, MA (k) indicates the moving average forecast for order k and kma indicates a moving average fitted value or smoothing value of order k. A moving average of order k has two distinct characteristics: 1. It deals only with the latest k periods of known historical demand, or observations. 2. The number of data points in each moving average does not change as the time goes on. Moving averages also have these disadvantages. They require more storage because all the k latest actual historical demand points (fitted values) must be stored, not just averaged. This has become a moot point as computer storage capacity has substantially increased over the past 10 years and storage costs have gone down significantly. They cannot handle trend or seasonality very well, although they usually do better than a mean. They can predict only one period ahead with any degree of accuracy. Predictions tend to fall apart after two or more periods into the future. The forecast analyst or planner must choose the number of periods (or k values) in the moving average, which can be difficult when trying to find the optimal value. The two extreme values are k = 1 and k = n. Note that the more historical demand points (or observations) included in the moving average, the more smoothed the effect on the fitted data set and one-period-ahead forecast. This is illustrated in Table 5.2 by comparing the most recent 3 MA and 5 MA fitted demand and one-period-ahead forecasts. In Figure 5.4, you can clearly see that the 3 MA is much more reactive to the spikes in the data, or less smoothed, than the 5 MA. Choosing the inappropriate smoothing length can have dramatic effects on predictions. Determining the optimal length of a moving average is difficult but important. The standard rule of thumb is that the larger the number of periods in the moving average, the more randomness is removed from the trend/cycle component. However,

17 MOVING AVERAGING 141 Forecasts Forecasts Forecast Plot for VOLUME 0 01 Jan Sep May Jan Sep May 2002 Date 3 MA Forecast Plot for VOLUME 01 Jan Sep May Jan Sep May 2002 Date 5 MA Figure 5.4 Beverage Data Time Series with 3 MA and 5 MA Smoothing it also means the trend/cycle component is more smoothed and not picking up critical fluctuations in the demand history. It also requires a longer demand history (or data set), which may not be available. In other words, the longer the length of the moving average, the more terms and information may be lost in the process of averaging. A moving average of order 1, MA (1), where the last known demand point (Y t ) is taken as the forecast for the next demand period (F t + 1 = Y t ) is an example showing that last week s demand will be the same as next week s demand. This is also known as the naive forecast, as it assumes the current period will be the same as the next period. The forecast analyst or planner must be pretty naive to think that last week s demand will be the same as next week s demand. However, you may be surprised at how accurate a naive forecast can be. In fact, you should use the naive forecast as the benchmark when comparing other quantitative methods. In other words, if more sophisticated methods cannot outperform the naive method, why are you using them? However, an MA (n) is the mean (or average) forecast of the entire demand history (or all observations).

18 142 QUANTITATIVE FORECASTING METHODS USING TIME SERIES DATA In summary, a moving average forecast, or MA (k), requires k data points or points of demand history to be stored at any given time. If k is small (say, 3 or 5), then the storage requirements are minimal. However, if you have several hundred thousand data series (or stockkeeping units [SKUs]), this could become a problem. With today s data storage capabilities and improved processing, size of computer storage required has become less an issue. The main issue with moving averaging is that it can forecast accurately only one or two periods ahead. It tends to smooth the forecasts by removing fluctuations in demand that may be important (e.g., sales promotion, marketing event, or economic activities). As a result, this quantitative method is not used very often; methods of exponential smoothing are generally superior to moving averaging. Finally, if there is a sudden shift in demand, the moving average is unable to catch up to the change in a reasonable amount of time. In my experience, I have never found moving averaging alone to be useful for demand-driven forecasting. The main reason is that moving averages tend to smooth the forecast too much. The objective of demand-driven forecasting is to predict unconstrained demand as accurately as possible. This would include predicting the peaks and valleys that resonate in true demand. When we smooth the forecasts, we normally overlook the peaks and valleys. However, I have found that moving averaging used in conjunction with other components, such as seasonality and causal factors, tend to work well. This will become apparent in the next two chapters which introduce more sophisticated methods that can be used to sense demand signals and shape future demand. EXPONENTIAL SMOOTHING Thus far, we discussed that in time series forecasting, there is random error (or unexplained error) using a structured process that assumes the mean (or average) is a useful statistic that can be used to forecast future demand. However, in many cases, the time series data contain an upward or downward trend, and/or seasonal effects associated with time of year, and other factors. When trend and seasonal effects are strong in the demand history of a product, moving averaging is no

19 SINGLE EXPONENTIAL SMOOTHING 143 longer useful in capturing the patterns in the data set. A variety of smoothing methods were created to address this problem and improve on the moving averaging methods to predict the next demand period. Those methods are known as exponential smoothing models, and they require that particular parameter values be defined and adjusted using a range from 0 to 1 to determine the weights to be applied to past demand history. Three well-known exponential smoothing (ES) methods are widely used in most software packages and solutions: 1. Single exponential smoothing 2. Holt s two-parameter 3. Holt s-winters three-parameter Although all these methods are available in most demand forecasting solutions, Winters three-parameter exponential smoothing is the most widely used based on benchmarking surveys conducted by several forecasting trade organizations. However, in my experience, these three ES methods seem to work the best when sensing demand signals with limited parameters and data. They work very well for identifying and predicting trend/cycle, seasonality, and unexplained error. These ES methods can also be classified as additive or multiplicative, meaning the trend/cycle component and seasonal components can be either added together or multiplied together. In addition, the trend/cycle component can be linear or damped. A damping trend is a linear trend that diminishes (up/down) over time. Damping trends seem to work well when sensing demand signals, as most product trends do not continue linearly into the future forever. At some point in time, they tend to trail off or accelerate. SINGLE EXPONENTIAL SMOOTHING The most practical extension to the moving average method is using weighted moving averaging to forecast future demand. The simple moving average method discussed so far in this chapter uses a mean (or average) of the past k observations to create a future oneperiod-ahead forecast. It implies that there are equal weights for all the k data points. However, in many cases, the most recent demand

20 144 QUANTITATIVE FORECASTING METHODS USING TIME SERIES DATA history or observations provide the best indication of what future demand will be. So it makes sense to create a weighting scheme that introduces decreasing weights as the observations get older. In other words, give more weight to the most current observations or recent demand periods. As we discussed earlier, the future demand forecasts are denoted as F t. When a new actual demand period is observed, Y t becomes available, allowing us to measure the forecast error, which is Y t F t. The single exponential smoothing (SES) method essentially takes the forecast for the previous demand period and adjusts it using the forecast error. Then it makes the next forecast period: 5 F t+1 = F t + α(y t F t ) where α = is a constant between 0 and 1 Each new forecast is simply the old forecast plus an adjustment for the error that occurred from the last forecast. An α close to 1 will have an adjustment value that is substantial, making the forecasts more sensitive to swings in past historical demand based on the previous period s error. The closer the α value is to 1, the more reactive the future forecast will be, based on past demand. When the α value is close to 0, the forecast will include very little adjustment, making it less sensitive to past swings in demand. In this case, the future demand forecasts will be very smoothed, not reflecting any prior swings in demand. These forecasts will always trail any trend or changes in past demand, since this method can adjust the next forecast based only on some percentage of change and the most recent error observed from the prior demand period. In order to adjust for this deficiency associated with this simple method, there needs to be a process that allows the past error to be used to correct the next forecast in the opposite direction. This has to be a self-correcting process that uses the same principles as an automatic pilot in an airplane, adjusting the error until it is corrected, or we have equilibrium. As simple as this principle appears, it plays a key role in improving the SES model. When applied appropriately, it can be used to develop a self-adjusting process that corrects for forecasting error automatically.

21 SINGLE EXPONENTIAL SMOOTHING 145 With this approach, we can rewrite the equation as: F t+1 = αy t +(1 α) F t = αy t + α(1 α) Y t 1 + (1 α) 2 F t 1 If we repeat this process by replacing F t 1 by all its components, F t 2 by its components, and so on, the result is: 6 F t+1 = αy t + α(1 α) Y t 1 + (1 α) 2 Y t 2 + α(1 α) 3 Y t 3 + α(1 α) t 1 Y 1 + (1 α) t F 1 In other words, F t+1 is actually a moving average of all past demand periods, which can be described as a = 0.2, 0.4, 0.6, 0.8, or any number between 0 and 1. Suppose we choose α = 0.2; we can illustrate it as: (.2)(.08) Actual Historical Demand t t + 1 Time Two key factors are associated with this equation: (1) the weights for all the past periods sum approximately to 1; and (2) if you plot the weights as depicted, you can see that they decrease exponentially. This is the reason for the name exponential smoothing. Figure 5.5 illustrates the SES method using the beverage data with a smoothing α = 0.2 and 0.6. As you can see from the results, the choice of α has considerable impact on the week of January 1, 2002, forecast, and for the fitted MAPE for weeks January 16, 1999, through May 25, 2002, as the MAPE values are percent (α = 0.2) and percent (α = 0.2). In addition, the forecast for the week of January 1, 2002, is considerably different at 11,134 (α = 0.2) and 12,748 (α = 0.2). Although the weekly fitted MAPE is lower over the 176 weeks of historical demand for the model using α = 0.2, the ME (sum of errors/n) is lower for the model using α = 0.2, suggesting that the weekly variance in the α = 0.2 model may be picking up more of the peaks and valleys associated with the historical demand. In other words, it is more sensitive to swings in demand, reducing the week-toweek error. It is also clear from Figure 5.5 that a large value for α (0.6) gives less smoothing in the forecast, whereas a smaller value of α (0.2) provides more smoothing.

146 QUANTITATIVE FORECASTING METHODS USING TIME SERIES DATA Forecasts 20000 15000 10000 5000 Forecast Plot for VOLUME Date Forecasts This final equation is the general form used for exponential

22 146 QUANTITATIVE FORECASTING METHODS USING TIME SERIES DATA Forecasts Forecast Plot for VOLUME Date Forecasts This final equation is the general form used for exponential smoothing methods. In this equation the forecast (F t+1 ) is based on weighting the most recent demand history (Y t ) with a weight (α value) and weighting the most recent forecast (F t ) with a weight of 1 α. This form also substantially reduces any storage problems, because there is no longer a need to store all the past historical demand data or subsets, as in the case of the moving average. Only the most recent actual demand periods, the most recent forecast, and a value for α require storage. Therefore, SES is much more attractive when forecasting demand for a large number of SKUs. There are some problems with SES. The main issue with this method is trying to find the optimal value for α. This is usually done by trial and error using a test of data and a performance metric such as mean squared error (MSE) or MAPE. Each MAPE is compared to find the value of α that gives the minimum or smallest MAPE. This generally requires only a few trials, as its value can be approximated simply by comparing a few MAPEs and α values. However, given the error measurement you choose, the results can be quite different. For example, if we chose ME using the beverage data example in Figure 5.5, we most likely would have chosen α = 0.2 over α = 0.2, although the MAPE was lower for the model using α = 0.2. The good news is that most forecasting software packages automatically optimize the α value based on the error metric or criteria you chose Forecast Plot for VOLUME Date Figure 5.5 Beverage Data Time Series Using Single Exponential Smoothing with Smoothing α = 0.2 and 0.6

HOLT S TWO-PARAMETER METHOD 147 Forecasts 20000 15000 10000 5000 01 Jan 1999 Forecast Plot for VOLUME 01 Jan 2000 01 Jan 2001 Date Another problem with SES is it assumes that the forecast horizon is

23 HOLT S TWO-PARAMETER METHOD 147 Forecasts Jan 1999 Forecast Plot for VOLUME 01 Jan Jan 2001 Date Another problem with SES is it assumes that the forecast horizon is just one period ahead. As a result, longer-range forecasts are normally flat. A flat forecast is used because it works best for data that have no trend, seasonality, or other underlying patterns. Figure 5.6 illustrates a longer-range forecast for the beverage data using an SES method with α = 0.2. As you can see, the longer-range forecast over a 52-week forecast horizon is flat at 11,134 units. In this case, the MAPE will be much higher as actual demand occurs into the future. As a result, SES may not be a good choice for forecasting demand even for those brands and products segmented into the harvest brands quadrant, as most products do have at least a trend associated with their past historical demand. However, if you have products with short historical demand history that is virtually random, SES is most likely the best quantitative method to deploy for those particular brands or products. HOLT S TWO-PARAMETER METHOD 01 Jan Jan 2003 Figure 5.6 Beverage Data Time Series Using Single Exponential Smoothing with Smoothing α = 0.2 with a 52-Week-Ahead Forecast In 1957, Charles C. Holt expanded single exponential smoothing to include a linear trend component, enabling the ability to forecast data with trends. Holt s two-parameter exponential smoothing uses two α