ARIMA LAB ECONOMIC TIME SERIES MODELING FORECAST Swedish Private Consumption version PDF Free Download

Bo Sjo 2011-11-10 (Updated) ARIMA LAB ECONOMIC TIME SERIES MODELING FORECAST Swedish Private Consumption version 1.1 Send in a written report to bosjo@liu.se before Wednesday November 25, 2012. 1

1. Introduction 1.1 About the Lab and Data This lab will introduce you to some basic time series techniques focusing on how to build an ARIMA model for forecasting. The variable to be forecasted, in this lab, is the quarterly private consumption in Sweden. The (raw) data is stored in BNP_CONS.xls. Read in the data. Inform EViews that it is Quarterly data starting at 1993:1 and ending 2012:4. You must do this when you read in the data. In EViews you need to add additional data observations to make room for forecasts, and seasonal variables to be used in the forecasts. To add variables to the sample do this: Make sure Workfile is highlighted and the active window (you might have to close windows that display data series). Click on Proc and then Structure and Resize current page. In the following window you can reset the ending data of the sample. For the variable cons, form the log level, the first, second and the third log differences. 1.2 Seasonality in EViews You will also need to generate seasonal dummies for this lab, Click Genr on the Workfile menu and enter the equation s2 = @seas(2) This creates a seasonal dummy that has value 1 for the second quarter of each year. Create s3 and s4 using the same method. You can also create seasonal in the command window. Furthermore, see Eviews Help Seasonal dummy variable. Finally, run the first log difference against seasonal dummies to create a deseasonalized series. DLx = constant + d2*s2 + d3*s3 + d4*s4 + e. In EViews the residual (e) is saved as resid in the Workfile. You might want to rename the series using the genr window if you intend to keep it. If you want to see all seasonal effects, for say the first log difference a series (DLx) then calculate the mean of dlx, (mean_dlx), create a new mean adjusted variable as DLx_m = DLx mean_dlx 1. Next run the regression 1 In many (ARIMA) regression programs you will be given the option to model the de-easonalized mean adjusted series as an alternative to the not mean adjusted series. 2

DLx_m = d1*s1 + d2*s2 +d3*s3 + d4*s4 + e The coefficients d1, d2, represent the seasonal effects, which can used to show seasonal effects and be added to a predicted de-seasonalized series to make it seasonalized. 2. ARIMA Modeling 2.1 Intro and Identification The models to be found is an ARIMA model of the form d ( 1 L) x = t a( L) x + t 1 b( L) ε t, where (1-L) = is the difference operat 2 or with d=1, a(l) and b(l) represents the autoregressive structure (the lag parameters) AR and the Moving Average (MA) structure, respectively. The term ε t is a white noise stochastic process. The idea is to use the Auto-Correlation Function and the Partial Auto-Correlation Functions to identify the order of the process, and then to estimate it. In this lab you are asked to estimate a number of AR and MA models learn how ARIMA modelling works in practice. IMPORTANT: Since we want to do forecasts it is a good idea to extend the sample such that it includes the forecast horizon as well. Thus, you need to expand the sample with 8 observations in the end. These additional observations should be missing. Typically, you will need a log transformation, and first and second differences of the log. (In this case you should also take the third difference, for reasons explained later). Building an ARIMA model, or any dynamic econometric time series model, involves the following steps 3 1) Identification 2) Estimation 3) Testing 4) Re-estimation until a well-defined statistical model is found. In the Identification step you should do the following. Study the graphs of the data in log level and differences. First you need to determine if the series is stationary or non-stationary. Graph the series in level, in first differences, second difference, of course graph the ACF:s and PACF:s. Third, find the approximate lengths of the ACF:s and PACF:s. The latter will help you to determine the number of lags used in the estimated 2 Fractional integration, and fractional differencing, means that d can be non-integer values. If -0.5 d < 0.5 the series would still be an integrated series but long-run stationary, and if d is 0.5 d < 1.5, the series would be integrated and non-stationary. 3 These steps represents the Box-Jenkins approach to time series modelling. You are supposed to explain how and why we use these steps in some detail. 3

models You will find a huge amount of seasonality here! Remove the seasonality using the simplest way. Run a regression of ex. DLcons against seasonal dummies and analyse the residual with corrolegram. Now, you can see things more clearly. Compare the series in level, with first, second and third differences. Determine when the series becomes stationary. Use the graph module to graph the ACF and the PACF. An integrated series has an ACF that starts around unity (or 0.99 to 0.80) and dies out slowly, while the PACF cuts off after the first lags. It is sometimes easy to see when you have over differentiated the data. Over-differencing is spotted when the differenced series starts to display a moving average process. Differencing is a form of temporal aggregation of the data. Temporal aggregation typically gives rise to an MA process. Second, look for structural breaks in the data, this can be important depending on the series. The effect of structural breaks, as well as large outliers, is typically that the AR lag structure tends to be come relatively long. Adjusting the sample to avoid including (pre-) historical periods representing a different economic policy regime is always a good idea in a univariate model. And, look for outliers and possible shifts in the series that might need dummy variables. (Not much beaks in Swedish private consumption data) 2.2 Identification after deciding on differencing and seasonality After finding d in ARIMA(p,d,q), the next step is to identity p and q, or get some idea about them. To our help we have the following scheme: ACF PACF AR(p) Tails off Cuts off at lag p MA(q) Cuts of at lag q Tails off ARMA(p, q) Tails off Tails off If the series is integrated of order one I(1) this is indicated by a very slow decay in the ACF:s, and by a first partial autocorrelation coefficient close to unity. Seasonal effects typically show up as spikes at seasonal frequencies. Of course, a white noise process has no significant ACF or PACF. 3. Identifying, Building, Estimating and Testing ARIMA Models 4

Start with the log of consumption in levels. Is log of the series stationary? How many times do you need to differentiate to achieve stationarity? This is first of all a matter of judgment (in a forthcoming lab you will learn to back up your judgment with statistical tests). After deciding on appropriate differencing, and formed some initial conclusions about the appropriate process and the order of p and q, continue with Estimation. Our aim is to find the exact order of the AR and MA processes for a stationary variable. Thus, decide on the order of integration (differencing) and continue with ARMA. We have three criteria for selecting the final model and lag order: 1) The estimated residual should display no significant autocorrelation (very important). Use the Box- Ljung (portmanteau) test, look at the ACF:s and PACF:s of the residual, perhaps in combination with some other test of autocorrelation if that is available. 2) The final model should have the lowest possible residual variance, among all models with no autocorrelation 3) The model should be parsimonious and be easy to understand. (We don t want long lag structures, or the inclusion of significant lags beyond what can be meaningful from an economic perspective). Use your judgment. Whether there is autocorrelation in the residual can be judged by inspecting the AFC and the PACF of the residual, in combination with the test above. The standard test for ARMA models is the portmanteau test (Box-Ljung test). This test builds on the assumption, that the estimated ACF:s are a zero-mean white noise process, with an asymptotic normal distribution. Take the square of each ACF and sum the squares, the outcome of this summation is a chi-square distributed random variable with value not significantly different from zero. The test makes some additional adjustment for degrees of freedom and small sample corrections. (Look up the test in any textbook and confirm this!). Start by estimating a number of AR(p) models, second estimate a number of MA(q) models, third estimate a number of ARMA(p, q) models, finally pick the best model. Start with a high lag order around 8. Estimate the model up to 2010:q4. Save the last two observation for you forecasts. First, model AR models from AR(8) to AR(0). For each estimated model write down the test for autocorrelation (Q-test) and the value of AIC. It is sufficient to look at AIC here. Pick the best AR model according to the three design critera above. Second, model MA models from MA(8) to MA(0). Pick the best MA model according to the three design critera above. Third, try combinations of ARMA(p,q) model. Pick the best ARMA model according to the three design critera above. Finally, pick the best model of three models above. 5

How to model ARMA models in EViews: In Workfile window select the variable to be modeled. Chose Object and then Equation. In the Equation window a AR(2) with quarterly dummies for dx is set as Dx c ar(1) ar(2) s2 s3 s4 An ARMA(1,1) Dx c ar(1) ma(1) s2 s3 s4 Etc. The portmanteau test (Q-test) is found under Equation Window output chose view and residual diagnosis, and select 18 lags for the test. In the results window, confirm that the model has converged. This is important for MA models. The output shows the estimated parameters, and the t-values. The t-values can guide you, but are only indicative as long as we have not confirmed that there is no autocorrelation in the residual process. There are also other information criteria, reported by other modules of the program. Look under Help, Akaike for more info. The AIC criteria is developed for AR(p) models but works, ok even under other dynamic models. Remember that you are looking for a model with (in order) i) white noise residuals (no autocorrelation in residuals), ii) low residual variance (low information critera), iii) few parameters and which is understandable, make common sense (a parsimonious model). Thus among all possible AR(p), MA(q) and ARIMA(p,d,q) models select models with white noise error terms. 4 Among those with white noise errors pick the model with the lowest information criteria. Different people might find different models, given the same data set, depending on dummies, seasonal adjustment, and choices. The interesting question is how do you motivate your final model? You can compare a model with seasonal dummies with a model where cons have been seasonally adjusted, Is your model consistent with the ACF:s and PACF:s you saw in the beginning? Go back to the graphs of the ACF:s and PACF:s is the information in the graphs in line with the model you picked? 4 In the context of ARIMA models, we identify white noise as no significant autocorrelation in the estimated residual process. 6

At this stage you should have found a final model and can turn to generating forecasts. 5. FORECAST INFLATION Finally, produce forecasts up to 8 quarters from your selected model. Try to calculate the level SEK change in consumption. The program will print the forecasts and their standard errors. EViews produces forecasts given that the Workfile sample includes the dates for forecasts and that exogenous variables (such as seasonals) have observations over the forecast horizion. One can produce forecasts with seasonal effects in them, or forecasts without seasonals. In the latter case you can add the seasonality afterwards. You need to get confidence intervals. These are typically shown in the graph of forecasts. However you need to calculate them numerically as well. The Rule of Thumb is construct confidence intervals according to +/- 2 x the standard error. Yu can do easily for the forecast you generate. If you model DLx t you get forecasts of DLx t+i. To construct levels you have recreate them according to Lx t+1 = Lx t + DLx t+1, and Lx t+2 = Lx t+1 + DxL t+2 etc. From Lx t it is possible to take the antilog to get X in its original form. 6. LEARNING OBJECTIVIES AND LAB INSTRUCTIONS MOTIVATE YOUR FINAL CHOICE OF MODEL. SHOW AND COMMENT ON YOUR FORECASTS. PRESENT THE FORECAST IN DIFFERENT WAYS, AND SO THEY MAKE SENSE FOR THOSE INTRESTED IN RECIVING THESE FORECASTS. GIVEN THAT YOU USE ARIMA CONVIVE THE RECIVER OF THE FORECASTYS THAT THEY ARE AS ACCURATE AS THEY CAN BE. COMMENT ON THE WEAKNESS OF THE MODEL. In the end time series modeling is about choices that you make and you must defend. What is needed? - A final ARIMA model. - Motivations for choosing this model. - Forecasts for the next year as described above. - Confidence intervals for the forecastss. - A graph comparing seasonal and seasonally adjusted series. - A comparison with actual inflation rates during the last years. Remember, this is a real life experiment. Time series modeling is about judgment. Experienced forecasters will always look at the estimate and consider if they are reasonable or not. If not a common tactic is to adjust 7

the constant in the model, the point from which you start your forecasts. There is no totally correct answer. Hence, the reader should be convinced after reading your report that your model is as good as it can get. And you should not show 100 different estimates of 100 different models or graphs. Consider which are the basic figures a reader might want to look at? The point here is that you should be able to see the final model from the original ACF and PACF functions calculated from the first differences. Do you think your forecast is bad? Don t worry. This is after all only single equation model. From here we can turn to Transfer functions, rational distributed lag models, VAR models, Error Correction Models and Vector Error Correction Models. (In order of improvement). Finally, all forecaster will in the end use their judgment and adjust the constant in the equation to get better forecasts, at least adjust the forecast range. 7. Notes on using EViews The manuals for EViews are stored in the same subdirectory as the program on the harddrive of the computer. Remember that EViews doesn t like capital letters. For the first difference write, again in the Generate window, DLx = Lx Lx(-1). Study the graph of Lx and DLx (and maybe DDLlx) to learn about non-stationarity and outliers. Do not use EViews facility calculate per cent differences. We want the first (natural) log difference because it is a close approximation to per cent change and represents continuous compounding. If you double click on the series, you can under view find Descriptive statistics and tests and Correlogram. Correlogram gives you the estimated AFC:s and PACF:s in combination with the Box- Ljung test (Q) for autocorrelation. Use the Correlogram to identify the possible ARIMA process. Notice that correlogram can be found through other ways in the program, such as Quick and Series Statistics. An AR model is estimated in several ways. Under the saved workfile go to Object and New object. Define an Equation and give it a name. Formulate your model. For an AR(p) model write: dlx ar(1) ar(2) ar(p). The ar(.) tells EViews that it is AR model you want. Writing MA(1) gives the MA(1) model/coefficient. Start with longer AR models and see how the models, information criteria white noise test, and their significant lags change as the model is reduced. Continue with MA models in the same way, finally try 8

ARMA models. Notice that, in this case, (1) means the first lag (2 )means the second lag etc. As long as you don t close the window, it remains open and can just click Estimate to change the equation. Start with a long lag structure and work down. For each estimated model gather information about information criteria (AIC) will do and the Q-stat. The Q-stat should not be significant at any point after p. 5 Test if there is autocorrelation. In the output window, under View find residual diagnostics and the Qtest. of course, there are no prob. values for the first p lags, since they are already eaten by the lags. The program suggests the number of lags to use, so use that. If you look around you can also find the roots (=latent=inverted) of the process to study the dynamics. In particular if there are any roots left close to the unit circle indicating a unit root. Using these criteria, find the best fitting AR model. Next, do the same for MA models. These are formulated as dlx ma(1) ma(2) etc. Again, find the best, MA model using the same criteria. Next try a combination of AR and MA to find the best ARMA representation. Finally pick the best model of the three final models. Appendix - for other data IMPROVING THE MODEL (?) A1. Dummies The graphs of the series and most of all the residual in the ARIMA model indicate outliers. These models are sensitive for outliers. Graph under test menu look for big outliers. Save the residuals and look at the numbers and the graph. How to search for outliers? First you might of course inspect the estimated residual in a graph. But you can also ask the program to list extreme residual values. The simplest way of doing this is to run an OLS regression of the variable of interest against a constant and inspect the residual. Usually the model and tests will improve after including a few dummies. Sometimes, it possible to get white noise normally distributed residuals after removing some extreme outliers. Be careful though, adding dummies might just hide the poor fit of the model instead of improving it. Investigate the residual from your estimated model through the ACF & PACF. 5 Learn about AIC and the Q stat, notice how they are formulated. 9

A.2 Perform Seasonal Adjustment try to model deseasonlized data First, use the ACF and the PACF function to identify seasonal effects in the stationary series. For this you need to do a log transformation and take difference. That is, if you conclude that the series is non-stationary. Typically seasonality is indicated through significant seasonal ACF:s at the seasonal lags. In the lab using Russian data you might skip adjusting for seasonal effects 6. It does not mean that they are not there, it is only that the sample too short to identify them. Again look at graphs! And look at section 4.1 if you think dummies are needed to deal with outliers. Seasonal effects can be dealt with through: - Seasonal dummies in the regression (Centered or non-centered, see model formulation menu) 7 - Seasonal differencing - X12 program Using seasonal differencing, recommended by Box & Jenkins, but might be too crude on the data. You might impose a seasonal unit root in the process, which might not be a valid transformation. 8 Seasonal (impulse) dummies are a standard tool. Centered seasonal dummies is better since they will leave the constant intact. Centered seasonal dummies with quarterly data include three dummy variables in the regression with values that sum to zero. Centred seasonal is often the better solution 9. The X12 program is a black box of all sorts of transformations that remove seasonal effects. It is an ad hoc procedure, but works o.k. and it is used by many official departments that publish seasonally adjusted data. If you don't like black boxes then don't use it. The program will do things with your data that you cannot control. (In principle, X12 will not affect unit roots. Thus, it is possible but not recommended to apply the technique first and then test for unit roots on the deseasonalized data.) Please remember tough that the X12 procedure will remove degrees of freedom from your sample even though the program returns the same number of observations. In effect you are estimating 12 seasonal variables in monthly data, and 4 seasonal variables in quarterly data. A.3 More on X12 and Seasonal Dummies X12 or seasonal dummies, or no seasonal effects? 6 7 Any text book in regression analysis explains the use of seasonal dummies. 8 If other methods and models don t work well it might be worth coming back to seasonal differencing. Though, if you are careful, you should test for seasonal unit roots before taking using seasonal differencing. 9 Observe that it s recommended that centred seasonals has a sample with complete years, or exactly the same number of observations for seasonal dummies. 10

One might chose between using X12 and seasonal dummies in the model. Centered seasonal dummies should be the first choice, if it do not work well one might try X12. Centered seasonal dummies have the advantage that they sum to zero. Thus, their inclusion in the model will not affect the estimated constant term. In time series modeling this can be important, since the constant reflects the average growth rate in the sample. You can test if seasonal dummies are significant. Under test menu chose exclusion test and exclude the dummies. The outcome is an F-test with H 0 : seasonal dummies have zero parameters. The program will produce a huge amount of output (if we ask for it) but we are only interested in a series the program calls D11. The latter is the seasonally adjusted series, make sure D1 is marked on in the Window Graphic analysis). Next, the program shows a graph with the original series and the adjusted series. You can now judge if there is a difference or not. Return to the X12arima window and save the series under Test, Store in Database. In the following window mark Seasonally Adjusted (D11). D11 is the name of the series given by the X12 procedure. The outcome of using X12 is not always predictable. Sometimes you get a de-seasonalized series which is easy to model leading to a nice parsimonious model, sometimes leads to complex models with long lag lengths. 11