Decision 411: Class 11

Size: px

Start display at page:

Download "Decision 411: Class 11"

Clinton Manning
5 years ago
Views:

1 Decision 411: Class 11 ARIMA models with regressors Forecasting new products Review of models: what to use and when

2 ARIMA models with regressors By adding regressors to ARIMA models, you can combine the power of multiple regression and ARIMA: Ability to include causal variables (e.g., promotion effects) in a time series model Ability to fit the optimal time series model to residuals of a regression model regression with ARIMA errors Ability to fit trend-stationary as well as difference-stationary models

3 Simplest case: ARIMA(1,0,0) + regressor = regression with AR(1) errors The ARIMA(1,0,0) forecasting equation is ˆt Y = μ+ φy 1 t 1 When a regressor X is added, it becomes Y ˆ = μ+ φ Y + β( X φ X ) t 1 t 1 t 1 t 1 which is equivalent to Y ˆ β X = μ+ φ ( Y β X ) t t 1 t 1 t 1 The same AR transformation is also applied to X! i.e., the regression errors Yt β Xt are assumed to be an AR (1) process

4 Regression with AR(1) errors This is the workhorse model of econometrics, since most regressions of econometric variables have positively autocorrelated errors (DW<<2) with an autoregressive signature. The AR(1) error process is essentially a proxy for the effects of other, unmodeled variables whose effects are slowly changing in time The Cochrane-Orcutt transformation option in multiple regression fits the AR(1) error model Equivalently, you can fit an ARIMA(1,0,0) model with (up to 4) regressors.

5 More general example Consider an ARIMA(1,1,1) model*, whose forecasting equation is yˆ t = μ + φ y θ e 1 t 1 1 t 1 where y t = Y t Y t-1 *This model is used here for purposes of illustration because it includes all ARIMA components

6 Example, continued If a regressor X is added, the new equation is y ˆ = μ + φ y θ e + β ( x φ x 1 ) t 1 t 1 1 t 1 t 1 t where x t = X t X t-1 X is differenced in the same way as Y The same AR transformation is also applied. Equivalently: yˆ β x = μ+ φ ( y βx ) θ e t t 1 t 1 t 1 1 t 1 i.e., the regression errors t t are assumed to be an ARIMA(1,1,1) process Y β X

7 What s s the logic of this approach? By applying the same differencing and AR transformations to both Y and X,, the ARIMA model is effectively fitted to Y βx, i.e., Y controlled for the effect of X. This is the correct thing to do if X has the same degree of nonstationarity as Y and if its effect on Y is contemporaneous. It assumes that the errors of the regression model are an ARIMA process. Y = β + ε t X t t

8 Reprise: trend-stationarity vs. difference-stationarity Most naturally-occurring time series in business and economics are nonstationary by virtue of being trended and/or random-walkish walkish but some are more non-stationary than others. Trend-stationary means that a series can be stationarized merely by detrending. Difference stationary means that it can only be stationarized by differencing (i.e. it is some kind of random walk more unpredictable). There are two corresponding types of regression models for nonstationary data.

9 Trend-stationary model Assumption: Y and X are trend-reverting reverting,, and their deviations from their respective trend lines are correlated. Model: first regress Y on X and the time index by fitting an ARIMA(0,0,0) model with Y as the input variable and X and Time as regressors. Identify # s# s of AR and/or MA terms to explain autocorrelation in residuals, ending up with ARIMA(p,0,,0,q)) + 2 regressors.

10 Difference-stationary model Assumption: Y and X are random walks,, and their respective steps are correlated. Model: first regress DIFF(Y) ) on DIFF(X) ) by fitting an ARIMA(0,1,0) model with Y as the input variable and X as a single regressor. Identify # s# s of AR and/or MA terms to explain autocorrelation in residuals, ending up with ARIMA(p,1,,1,q)) + 1 regressor. Both types of models can be fitted as ARIMA+regressors

11 Example: housing starts vs. mortgage rates 1700 Housing starts (1000 s s SAAR) and mortgage rates (%) appear to be negatively related Let s s consider models 500 for predicting 6 housing starts from 2005mortgage HousesSAAR MortgageRate rates lagged by 1 month

12 Prelude to a trend-stationary model: scatterplot of detrended variables Plot of HOUSESdetrend vs lag1mortgagedetrend HOUSESdetrend lag1mortgagedetrend A trend-stationary model looks for a linear relationship between the detrended variables (r = -.74 here)

13 Here is the regression of HousesSAAR on lag(mortgagerate,1) and Time. (The time index serves to detrend both variables.) Regressors are highly significant, but DW stat is only 0.33, and lag-1 1 residual autocorrelation is 0.83! Residual-vs vs-time plot looks bad as expected. Lagged mortgage rate coefficient of -75 suggests 75,000 fewer annual starts per 1 point rate increase in the long run.

14 Instead of adding lagged variables as regressors (our previous approach to autocorrelated errors) let s s turn on the Cochrane-Orcutt transformation option, which fits an AR(1) model to the errors.

15 With an estimated AR(1) coefficient of 0.849, the residuals now look much better, and the standard error has been reduced by nearly 50% (down to 74). The plots of (studentized( studentized) ) residuals and predictions look MUCH better. Note that the mortgage rate coefficient is now around -53 (presumably more correct).

16 What happened to R-squared?! R Q: Why is R-squared R now only 10%, despite the dramatic reduction in the standard error? A: In this regression model the dependent variable is no longer considered to be HousesSAAR.. Instead it is really HousesSAAR 0.849*lag(HousesSAAR,1). This variable has a much lower variance than HousesSAAR,, because much of the original variance is explained merely by the AR(1) transformation. Because there is now less variance to be explained by the regressors,, a 10% R-squared R in this model is actually better than 56% in the original model.

17 Now let s s fit the same model in the Forecasting procedure as an ARIMA(1,0,0) model + 2 regressors The estimated AR(1) coefficient is the same as in the Cochrane-Orcutt model, and the regression coefficients are also the same (apart from minor variations due to slightly different nonlinear estimation algorithms)

18 Residual ACF and PACF are good, but not perfect.. We could consider adding more ARIMA terms, e.g., AR(2) or AR(3)

19 Going up to AR(3), we get a slightly lower RMSE (70 instead of 74) and higher estimated effect of lagged mortgage rates (-63( instead of -53), but at the cost of additional model complexity.

20 Prelude to a difference-stationary model: scatterplot of differenced variables Plot of diff(housessaar) vs diff(lag(mortgagerate,1)) diff(housessaar) diff(lag(mortgagerate,1)) A difference-stationary model looks for a linear relationship between differenced variables (r = -.18 here)

21 The regression of the differenced variables is a (0,1,0) model with HousesSAAR as the input and lag(mortgagerate,1) as the single regressor.. One order of differencing is applied to both the input variable and the regressor(s) when d=1. The time index has been dropped because it becomes irrelevant when a difference is used the trend is now represented by the mean in the model.

22 The residual ACF has two negative spikes MA(2) signature

23 After setting MA=2, we have a (0,1,2) model + 1 regressor. The estimated coefficient of lag(mortgagerate,1) is larger than in the trend-stationary model, but it plays a slightly different role in this model: the predicted change from previous month s s annual rate is -69,000 per 1 point rise in mortgage rate from the previous month. The MA(2) term is significant, and the MA coefficients add up to less than 0.5. The mean is not significant, and probably the constant term should be removed anyway.

24 Residuals again look fine.

25 Here are comparisons of the various trend-stationary and difference stationary models (recall that B = Cochrane-Orcutt model) The regressors do not reduce the errors by much! (Random walk w/drift yields RMSE=78.) Most of the work is done by the time series components of the models. On the basis of these results, it s s hard to choose between the (fine-tuned) trend- stationary and difference-stationary models, although trend-stationary models tend to work well in practice.

26 Another example: convenience store sales in a university town Variables TOTSALES (total sales at 3 stores) USESSION (dummy for university in session) HOMEGAME (dummy for home football games) 326 daily values from 1/1/99 to 11/22/ Time Series Plot for TOTSALES TOTSALES

27 Regression model with 2 dummy variables (only) yields highly nonstationary residuals. The series evidently underwent a step change and also has local autocorrelation patterns.

28 After adding 1 st difference to ARIMA model, slight autocorrelation remains at lag 2. This is a random walk + regressor model.

29 After adding MA(2) factor, autocorrelation is eliminated and error stats are slightly improved. This is essentially a simple exponential smoothing model plus regressors.

30 Conclusions ARIMA models with regressors provide a flexible tool for fitting regression models to nonstationary time series data. The same order of differencing and/or AR terms are automatically applied to all variables AR and MA terms allow for fine-tuning to eliminate residual autocorrelation

31 Forecasting new technologies and products The problem: how can you forecast when you don t t have much or any historical data for the variable of interest? The solution: base the forecast on some other type(s) of data

32 Methods for new product forecasting 1. Conduct marketing experiments Intentions to purchase based on product characteristics Experimental purchasing behavior Test markets

33 Methods for new product forecasting 2. Poll the experts Ask 10 (+/-) ) independent experts for their estimates of market size, market penetration, etc., and take the median

34 Methods for new product forecasting 3. Search for analogous data Try to obtain historical data for products with similar characteristics and/or customer bases Try to find out what assumptions and/or models have been used by other forecasters in similar applications

35 Methods for new product forecasting 4. Use a diffusion-of of-innovation/life cycle/growth curve model Models we have discussed so far have either assumed constant growth (zero, linear, or exponential) or else randomly- varying growth New products and/or technologies often follow a classic S-shaped S growth curve characterized by linear or exponential early growth and subsequent market saturation

36 Growth curve models There are many different S-curve or growth- curve models, originally popularized in the 1960 s: logistic curves, Gompertz curves, exponential curves, etc. The S-curve S model in Statgraphics is a simple exponential formula: exp(a + b/t) The best known growth curve model is the Bass diffusion model (Bass 1969).

37 Typical growth curve NewProductSales Initial period of exponential growth (increasing trend) Slower growth as market begins to saturate Inflection point at which trend stops increasing 0 Jan-04 Feb-04 Mar-04 Apr-04 May-04 Jun-04 Jul-04 Aug-04 Sep-04 Oct-04 Nov-04 Dec-04 Jan-05 Feb-05 Mar-05 New ProductSales Forecast of New ProductSales Fitted Values Some curves level off, others may eventually turn downward due to product obsolesence and/or loss of market share

38 Bass Model in terms of market fractions where: f(t)/[1 F(t)] )] = p + q F(t) f(t) = additional fraction of total market potential that is captured at time t F(t) = cumulative total fraction that had been captured up until time t f(t)/[1 F(t)] )] = fraction of remaining market that is captured at time t p q = coefficient of innovation (external influence) = coefficient of imitation (internal influence)

39 Interpretation of the Bass model p represents a mass-media effect while q represents a word-of-mouth effect. At time t,, mass media influences a fraction p of the remaining market to adopt, while word of mouth influences a fraction qf(t) of the remaining market to adopt. The word-of of-mouth effect grows in proportion to the fraction of potential customers who have already adopted, hence it implies exponential early growth.

40 Bass model in terms of unit sales n t = dn t /dt = p(m p N t ) + q(n t /m)( )(m N t ) where: Fraction of potential market already captured n t N t = sales/adoption rate at time t = cumulative sales up to time t m = market potential # potential customers who have not purchased yet p q = coefficient of innovation (external influence) = coefficient of imitation (internal influence) (N t /m = F(t), n t /m = f(t) on previous slide)

41 Bass model, continued The inflection point in the growth curve occurs at time t* = (1/( (1/(p+q))ln(p/q) As the market nears saturation, the rate of new adoption approaches p+q.

42 Fitting the Bass model to data Implied growth curve formula (exact): N m(1 exp( t( p + q))) t = 1+ ( q / p)exp( t( p + q)) Difference equation (approximate): n t+1 = N t+1 N t = pm + (q p)n t (q/m)n 2 t The difference equation can be used to estimate p, q, and m by linear regression, although it is better to fit the exact equation by nonlinear least squares and/or Bayesian methods. The catch: to get reliable parameter estimates, it s s best if you are already past the inflection point!

43 Forecasting from the Bass model Extrapolation of a curve fitted to a few data points is always dangerous! It is especially dangerous to try to forecast the inflection point and/or the market potential from very early growth data. In practice, it s s best to try to estimate m, p,, and q by independent methods (e.g., survey data analysis, expert opinion, analogous data) The parameters of choice are often m, p+q, and first year sales, from which p and q can be backed out.

44 Representative values of p and q (Sultan et al. 1990) Innovation Imitation Inflection point Electronic fuel injectors Hybrid corn Cellular telephones MB DRAM's Record players " floppy drives Color TV McDonalds fast food All product averages Steam irons B&W TV Clothes dryers Air conditioners Water softeners Motels Electric blankets

45 Cumulative adoption 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Hybrid corn Cellular telephones Record players Color TV McDonalds fast food Steam irons B&W TV Clothes dryers Air conditioners Water softeners Motels Years Electric blankets

46 Extensions of the Bass model Repeat sales (additional parameters for probability that an adopter will become a regular customer and the rate of sales to a regular customer) Time varying parameters (market size, innovation coefficient, etc.) Market mix variables (p,( q, and m could be functions of price, advertising, promotions, etc.)

47 References New Product Diffusion Models For Marketing: A Review and Directions for Further Research by Mahajan,, Fuller, and Bass, Journal of Marketing, v. 54, no. 1, January 1990 A A Meta-Analysis of Applications of Diffusion Models by Sultan, Farley, and Lehmann, Journal of Marketing Research,, v. 27, no. 1, Feb Reflections on A A Meta-Analysis of Applications of Diffusion Models by Sultan, Farley, and Lehmann, Journal of Marketing Research,, v. 33, no. 2, May 1996 New Product Forecasting by Jeffrey Morrison, (part 1 of 3)

48 Review Of Everything We ve Covered or what to use, and when

49 DATA TRANSFORMATIONS 1. Deflation by CPI or another price index Properties: Converts data from nominal dollars (or other currency) to constant nt dollars; usually helps to stabilize variance When to use: When data are measured in nominal dollars (or other currency) and you want to explicitly show the effect of inflation-- --i.e., uncover real growth Points to keep in mind: To generate a true forecast for the future in nominal terms, you will need to make an explicit forecast of the future value of the price index-- --i.e., you will need to forecast the inflation rate (but this is easy if you're in a period of steady inflation)

50 2. Natural Logarithm Properties: Converts multiplicative patterns to additive patterns and/or linearizes exponential growth; differences of logged data are approximate percentage differences; stabilizes the variance of data with compound growth, regardless of whether deflation is also used; when the dependent variable is logged, the model attempts to minimize squared percentage error rather than squared error in original units When to use: When compound growth is not due to inflation (e.g. when data is not measured in currency); when you do not need to separate inflation from real growth; when data distribution is positive and highly skewed (e.g., exponential or log-normal distribution); when variables are multiplicatively related; when you want to estimate elasticities. Points to keep in mind: Logging is not the same as deflating: : it linearizes growth but does not remove a general upward trend; if logged data still have a consistent stent upward trend, then you should use a model that includes a trend factor (e.g., random walk with drift, ARIMA, linear exponential smoothing).

51 3. First difference Properties: Converts levels to changes When to use: When you need to stationarize a series with a strong trend and/or random-walk behavior (often useful when fitting regression models to time series data but not required) Points to keep in mind: Differencing is an explicit option in ARIMA modeling and it is implicitly a part of random walk and exponential smoothing models; therefore you would not manually difference the input variable (using the DIFF function) when specifying model type as random walk or exponential smoothing or ARIMA First difference of LOG(Y) is approximately the percentage change in Y for changes on the order of less than 10%.

52 4. Seasonal difference Properties: Converts levels to seasonal changes When to use: When you need to remove the gross features of seasonality from a strongly seasonal series without going to the trouble of estimating seasonal indices Points to keep in mind: Seasonal differencing is an explicit option in ARIMA modeling; you MUST include a seasonal difference (as a modeling option, not an SDIFF transformation of the input variable) if the seasonal pattern is consistent and you wish it to be maintained in long- term forecasts

53 5. Seasonal adjustment Properties: Removes a constant seasonal pattern from a series (either multiplicative or additive) Fancier versions (Census X-12) X allow time-varying indices When to use: When you wish to separate out the seasonal component of a series and then fit what's left with a nonseasonal model (regression, smoothing, or trend line); normally use the multiplicative version unless data has been logged Points to keep in mind: Adds a lot of parameters to the model one one for each season of the year. (In Statgraphics,, the seasonal indices are not explicitly shown in the output of the Forecasting procedure-- --you must separately run the Seasonal Decomposition procedure to display the seasonal indices.)

54 FORECASTING MODELS 1. Random walk Properties: Predicts that next period equals this period (perhaps plus a constant); a.k.a. ARIMA(0,1,0) model When to use: As a baseline against which to compare more elaborate models; when applied to logged data, it is a geometric random walk the default model for stock market data Points to keep in mind: Plot of forecasts looks exactly like a plot of the data, except lagged by one period (and shifted slightly up or down if a drift term is included); long term forecasts follow a straight line (horizontal if no growth term is included); confidence intervals for long-term forecasts widen according to a square-root root law (sideways-parabola shape); logically equivalent to MEAN model fitted to DIFF(Y)

55 2. Linear trend Properties: Regression of Y on the time index When to use: Rarely the best model for forecasting use only when you have few data points, lots of noise,, and no obvious pattern in data other than a trend; can be used in conjunction with seasonal adjustment but if you have enough data to seasonally adjust, you probably should use another model! Points to keep in mind: Forecasts follow a straight line whose slope equals the average trend over the whole estimation period but whose intercept is anchored somewhere in the past; short-term term forecasts therefore may miss badly and confidence intervals for long-term forecasts are usually not reliable; very sensitive to amount of past data used to fit the trend line; other models that extrapolate a linear trend into the future (random walk with drift, linear exponential smoothing, ARIMA models with 1 difference w/constant or 2 differences w/o constant) often do a better job by reanchoring the trend line on recent data

56 3. Simple moving average Properties: Simple (equally weighted) average of recent data; average age of data in forecast (amount by which forecasts lag behind turning points) is (k+1)/2( for a k- term moving average When to use: When data is in short supply and/or highly irregular Points to keep in mind: Primitive but relatively robust against outliers and messy data; long-term forecasts are a horizontal line extrapolated from the most recent average A long-term trend can be incorporated via fixed-rate deflation at an assumed interest rate No theoretical basis for confidence limits for forecasts more than 1 period ahead. SG just shows constant- width limits based on an assumption that the data- generating process is stationary (no trend, etc.)

57 4. Simple exponential smoothing Properties: Exponentially weighted moving average of recent data; average age of data in forecast (amount by which forecasts lag behind turning points) is 1/alpha; same as an ARIMA(0,1,1) model without constant When to use: When data are nonseasonal (or deseasonalized) ) and display a time-varying mean without a consistent trend; when many series must be forecast in parallel Points to keep in mind: Long-term forecasts are a horizontal line extrapolated from the most recent smoothed value; same as a random walk model without drift if alpha=0.9999; forecasts get smoother and slower to respond to turning points as alpha approaches zero; confidence intervals widen less rapidly than in the random walk model; a long-term trend can be incorporated via fixed-rate deflation at an assumed interest rate or by fitting an ARIMA(0,1,1) model with constant.

58 5. Linear exponential smoothing Properties: Assumes a time-varying linear trend as well as a time-varying level (Brown's uses 1 parameter, Holt's uses separate smoothing parameters for level and trend); essentially an ARIMA(0,2,2) model without constant; damped trend versions also available When to use: When data are nonseasonal (or deseasonalized) ) and display time-varying local trends; when data that smoother in appearance i.e., less noisy than what would be well fitted by a simple exponential smoothing Points to keep in mind: Long-term forecasts follow a straight line whose slope is the estimated local trend at the end of the series; confidence intervals for long-term forecasts widen rapidly the model assumes that the future is VERY uncertain because of time- varying trends; often does not outperform simple exponential smoothing, even for data with trends, because extrapolation of time-varying trends is risky

59 6. Winter's seasonal smoothing Properties: Assumes time-varying level, trend, and seasonal indices (either multiplicative or additive seasonality) When to use: When data are trended and seasonal and you wish to decompose it into local level/trend/seasonal factors; normally you use the multiplicative version unless data is logged Points to keep in mind: Initialization of seasonal indices and joint estimation of three smoothing parameters are sometimes tricky watch to see that parameter estimates converge and that forecasts and confidence intervals look reasonable. A popular choice for automatic forecasting because it does a little of everything, but has a lot of parameters and sometimes overfits the data or is unstable

60 7. Multiple regression Properties: A general linear forecasting equation involving several variables When to use: When data are correlated with other explanatory or causal variables (e.g., price, advertising, promotions, interest rates, indicators of general economic activity, etc.). When your objective is perhaps not only to forecast, but to measure the impact of various factors on the dependent variable for purposes of decision making or hypothesis testing (e.g., for determining optimal values of control variables or doing bang-for for- buck comparisons) The key is to choose the right variables and the right transformations of those variables to justify the assumption of a linear model. Useful transformations may include logging, deflating, lagging, differencing, seasonal adjustment, taking reciprocals or ratios, etc., but don t t try them blindly. Transformations should have an intuitive explanation and/or be strongly suggested by patterns in the data.

61 Points to keep in mind: Regression, continued: Forecasts cannot be extrapolated into the future unless and until values are available for the independent variables; for this reason the independent variables must often be lagged by one or more periods; but when only lagged variables are used, a regression model may fail to outperform a time series model which relies only on the history of the dependent variable R-squared is not the bottom line. Regressions of nonstationary variables often have high R-squared but this does not necessarily indicate a good model! The standard error of the regression (RMSE error) is the best single stat to focus on, although it can only be trusted if the model passes p the various diagnostic tests of its assumptions. Beware of over-fitting the data by including too many regressors for the amount of data at hand. (Think: how many data points per coefficient?) As a reality check, it is good practice to validate the model by testing it on hold-out out data and by comparing its performance to a random walk model or other time series model

62 Regression, continued: When fitting regression models to time series data, it often helps to include lags of the dependent and independent variables as additional regressors,, and/or to stationarize the dependent variable. (Suggestion: try 1 or 2 lags first, don t rush to difference the variables. Beware of overdifferencing or including high-order lags (>2) without good reason.) Including a time index variable as a regressor is equivalent to de-trending all the variables before fitting the model, i.e., it is a trend-stationary model. Corrections for autocorrelated errors (Cochrane-Orcutt or ARIMA+regressors) ) are other options for time series models. Automatic model selection techniques such as stepwise regression and all-possible possible-regressions are available, but beware of overfitting: : pre-screen the variables for relevance and rank models on the basis of an error measure that penalizes complexity (e.g., C p or BIC). Also, remember YOU, not the computer, are ultimately responsible for the model!

63 8. ARIMA Properties: A general class of models that includes random walk, random trend, seasonal and non-seasonal exponential smoothing, and auto-regressive models Forecasts for the stationarized dependent variable are a linear function of lags of the dependent variable and/or lags of the errors When to use: When data are relatively plentiful (4 seasons or more) and can be satisfactorily stationarized by differencing and other mathematical transformations When it is not necessary to explicitly separate out the seasonal component (if any) in the form of seasonal indices

64 ARIMA, continued Points to keep in mind: ARIMA models are designed to explain all autocorrelation in the original time series; a systematic procedure exists for identifying ing the best ARIMA model for any given time series; features of ARIMA and multiple regression models can be combined in a natural way; ARIMA models often provide a good fit to highly aggregated, well-behaved data; they may perform relatively less well on disaggregated, irregular, and/or sparse data Regressors can be added to ARIMA models. The resulting model is really a regression model with an ARIMA error process instead of independent errors. This is often useful as a proxy for the effects of other, unmodeled variables. The simplest ARIMA+regressor model is the ARIMA(1,0,0) model plus regressors,, also known as a regression with AR(1) errors. This model can also be fitted using the Cochrane-Orcutt transformation option in regression.

66 Reporting of forecasts Forecasts usually should be accompanied by plots (showing how the forecasts extend from the recent data) and credible confidence intervals Confidence intervals need not always be 95% (± 2 standard errors): : sometimes a 50% interval (± 2/3 standard error) or 80% interval (±( 1.3 standard error) will be more meaningful for decision making.

67 A final word Remember that all models are based on assumptions about the patterns in the data: how much past data is relevant, what is the nature of the trend or volatility, what are the key drivers etc. What assumptions are you comfortable with, based on your knowledge of the data and the results of your analysis? What assumptions would make sense to your boss or client?