EURO-INDICATORS WORKING GROUP FLASH ESTIMATES 6 TH MEETING JULY 2002 EUROSTAT A6 DOC 105/02

Size: px
Start display at page:

Download "EURO-INDICATORS WORKING GROUP FLASH ESTIMATES 6 TH MEETING JULY 2002 EUROSTAT A6 DOC 105/02"

Transcription

1 EURO-INDICATORS WORKING GROUP 6 TH MEETING JULY 2002 EUROSTAT A6 DOC 105/02 FLASH ESTIMATES ITEM II-10 ON THE AGENDA OF THE MEETING OF THE WORKING GROUP ON EURO-INDICATORS

2 Summary of Employment Nowcasting: A report submitted to EUROSTAT by the National Institute of Economic and Social Research 1 Outline The report describes the results of a study looking at the capacity of forecasting models assessed using a variety of econometric testing procedures to produce short-term forecasts of EU employment data. The important innovations considered is to econometrically pre-test potential forecasting models before using them to forecast as a method to help identify good forecasting models. Thus tests of significance, of functional form, for serial correlation, for ARCH effects, for residual normality and for structural stability are applied to potential models. These tests are used to reject forecasting models. The remaining mod- els are then selected using an information criterion. We also include forecast direction-of-change tests, seasonal dummies and investigate the power of intercept correction and a priori restrictions on the regression equation coefficients. We reject models, for example, which do not have correctly signed coefficients on explanatory variables. The importance of short-term forecasts is paramount for the early provision of data. It is an important goal for EUROSTAT to produce a variety of data in a consistent and timely manner which can be effectively used by both member states and policymaking bodies. This can be approached using the econometric models, statistical tests and simple forecasting procedures outlined here. In general we find that this is a clear improvement on simple extrapolative procedures. The report considers employee employment data for an aggregate of seven European countries for six categories of employment. The countries are Germany, France, Italy (the largest countries of the Eurozone), Belgium, Spain, Austria and Finland. The report establishes the usefulness of the proposed methods on this very representative data. The models are augmented by indicator variables that are available on a timely basis. The econometric models estimated are vector autoregressions with the lag length determined by an information criterion. In our assessment we split 1

3 the sample into two periods. The first period is used solely for estimation purposes. The second, or forecast period, provides data on which we assess the performance of the different models using recursive techniques. Apart from the matter of data revisions, this indicates how each model is likely to perform in real time. For each forecast model we store the information provided by the various tests to enable us to post-process the models using a qualitative assessment based on those diagnostics. The models that survive the various tests are compared on the basis of root mean square forecast errors, statistics which provide a succinct summary of performance. The best, median and worst performing model for each class which survives the testing procedures both singly and in combination for three forecast horizons are shown. Thus for each employment category there are models with and without seasonal adjustment and with and without two types of intercept correction. Thus there are six potential models for each employment series. The remainder of this summary paper is as follows. In section 2 we discuss the modelling and testing framework. Section 3 describes the data and selected empirical results, to see if testing procedures eliminate the worst models and retain the best. Without this the econometric tests could still require considerable intervention to be useful. We are able to determine a pool of models for consideration and choose the best on an a priori criterion to produce good forecasts. In section 4 we give the conclusions. 2 Models, tests and intercept adjustments The overall approach is to look at a number of regression models, from which are selected `surviving models' that pass the various econometric tests. In the report we tabulate extensive results, with the Root Mean Square Forecast Error (RMSFE) evaluated recursively for all models that pass the relevant criteria, all for one, two and three steps ahead. We also list them for three different ways of applying the criteria. These are over the shortest estimation period, the longest estimation period and over all estimation periods. This last one is implemented such that if a test is ever failed then the model is rejected, and is therefore very conservative. For full details of the tests see Pesaran and Pesaran (1991) or Greene (1997) or the references given. The categories of models combined with tests are: ffl An autoregressive model with automatic lag length selection. This model is used as a reference point against which to compare other more complex models. We assume that the change in the series rather than the level of the series is governed by an autoregressive process. We choose the relevant lag length using the Schwarz-Bayesian information criterion (S-BIC). ffl All possible VARs evaluated without additional testing. We look at all possible VARs which can be estimated using all the indicators. 2

4 This model jointlymodelsavector of time series and is a natural extension of the simple autoregressive forecasting model. We will use this our main tool for modelling employment using indicator variables. It can be estimated by ordinary least squares and easily used for forecasting purposes. All possible combinations of the various indicators are considered in the VARs. As with the simple AR, the S-BIC is used to automatically choose the lag length. The forecasts are based both on the latest available data and also on the latest estimates of the parameters. Thus all models are recursively re-estimated as the forecasts are made. 1 ffl Models where the R 2 for the employment equation exceeds 0.4, an arbitrarily chosen level consistent with a reasonably good fit for a model in growth rates. Given that the number of regressors can potentially change and not all lags need to be significant, individual t-tests may be of little help. The squared correlation coefficient may be useful, and it might allow us to identify when there has been over-fit. ffl Models which pass an F test of overall significance. Equally useful is the regression F -statistic which tests the joint significant of the slope coefficients. ffl Models which pass an LM test for first order serial correlation. We implemented a general test for serial correlation, where serial correlation of up to order p can be tested using an LM procedure, similar to the RESET test where effectively the fitted values of the residuals are tested for joint significance in the regression. ffl Models which pass the reset test of functional form. The RESET test of functional form adds powers of the regressors as potential other regressors as an easy-to-interpret test. We implement the modified LM test (Harvey 1990, Chapter 5), where powers of the fitted values are added as potential other regressors, which has the benefit of reducing the number of added regressors. ffl Models which pass the Jarque-Bera normality test. Normality of regression residuals is often difficult to determine. The Jarque-Bera test is really a test of the third and fourth moments. Spectacular failures are often caused by single outliers. However this statistic is easy to calculate and appropriately distributed. ffl Models which pass the Engle (1982) LM test for ARCH. Engle (1982) suggested that volatility in a time series might be important for certain behaviour, violating the usual classical error assumptions. Resulting models are inefficiently estimated. The proposed LM test for these effects was implemented. 1 This means that we have a sequence of possible tests to consider as well. 3

5 ffl Models which pass the Hansen (1992) test for a stable constant term. and ffl Models which pass the Hansen (1992) test for parameter constancy. Two methods of testing for gradual change, proposed by Brown, Durbin, and Evans (1975), are the cumulative sum (CUSUM) of residuals or residuals squared (CUSUMSQ) tests. The CUSUM test is essentially a test of the constancy of the intercept and the CUSUMSQ a test for instability in the regression error variance. We adopt the approach of Hansen (1992), which generalises the approach to individual parameters and nests both of these by allowing tests each coefficient and the error variance separately and jointly. ffl Models for which the estimated coefficients pass a priori restrictions. These were chosen to be sign restrictions on coefficients. For example, an estimated negative effect for a variable that should theoretically be positive is rejected as a good model. ffl Models which pass Pesaran and Timmermann's (1992) directional change test. This is a non-parametric test of association between the forecast and the realised observations. Under the null hypothesis that the outturn and the forecast are distributed independently this test has a standard normal distribution in large samples. We implemented this test independently of the other econometric tests because it differs from them significantly: It requires out-of-sample information to enable it to be used. ffl Models which pass a variety of these tests jointly. We have also amended the forecast values by `intercept correction', a common method used to adjust forecasts. The idea is that recent forecast residuals can be extremely informative about short-term forecast errors. Clements and Hendry (1999) argue that intercept correction can also help `robustify' a forecast model against structural breaks. We implemented intercept correction in two ways. Firstly, we adjust the intercept by the value of the residual in the last period and secondly by the average value over the last four periods. The latter is perhaps a better measure of how much the equation has gone off-track on average, despite the recursive re-estimation of the model. Note that a well specified econometric model may exhibit no biases. We combine intercept correction with the econometric testing procedures, and with the coefficient restrictions imposed. 3 Empirical Results 3.1 Data The focus of the applied work is to forecast aggregate Eurozone employment data. We forecast aggregates for seven individual countries for which there was 4

6 sufficient data, for a variety of categories. These countries are Germany, Italy, France, Belgium, Spain, Austria and Finland. The data are quarterly total employee employment from New Cronos, EU- ROSTAT, based on ESA95 National Accounts. For France we have the shortest span of quarterly data, from 1991Q1 to 1999Q4, so that we have 36 observations. This dictates the maximum length of time series available to us when we take into account the span of the indicator series we are using. When evaluating our out-of-sample forecasts we have a period of 10 observation. The maximum lag length we usefor our VAR models is two and the actual lag length chosen on the basis of the S-BIC. The series are aggregated to form seven combined employment series for the Eurozone countries considered. We have six disaggregate classifications. These are: agriculture, total industry, construction, wholesale, financial and public employees. Each is forecast as a growth rate, based on pre-testing which indicated a unit root in several of the series having pre-tested the series for stationarity. To build regression models we use a set of indicator variables. Four of the indicators used are Eurostat supplied ones for production expectation, production tendencies, new orders and order books. Where these are monthly indicators we have used the mid-quarter month. These series are combined, on the basis of OECD output ratios, to form a single indicator series. It should be noted that for some of the employment categories these are not particularly relevant indicators, and better ones should clearly be sought. It is also the case that the Eurostat supplied data for Austria is rather short for some of these indicators and thus the Austrian data is not always included in the weighted average of the survey measures. The weights have then been adjusted accordingly. We have an additional indicator in the form of the short run interest rate. Again, the series are combined, on the basis of OECD output ratios, to form a single interest rate series relevant to the aggregated Euro 7 employment series. The coefficient restrictions we consider are therefore that the survey-based indicator variables should always have positive signs for the most recent lag, and that the interest rate should always have a negative sign. These are tested jointly across all the possible VARs. Note that the models which have only a single indicator, for example, are only testing the coefficient on that indicator. The more variables included the more restrictions are tested, and as the number increases the likelihood of a test being failed necessarily increases commensurately. More complex restrictions can be straightforwardly incorporated in our programs as desired by the investigator. In the report we considered the use of German data (for the relevant employment category) as a potential regressor in each candidate model. This is done not on the basis of timeliness (as we have no information on this) but rather because we consider Germany to be a very influential economy in the Eurozone. There was little benefit from this indicator. 5

7 3.2 Forecast Results We estimate six distinct models: a benchmark model; an intercept corrected model using the last residual and intercept corrected using the average of last four residuals, all three of which are estimated with and without seasonal dummies. We can get an idea of the basic performance of these separate approaches by examining the univariate results. Table 1 concentrates on the RMSFE from the AR(p) models. Table 1: RMSFE of AR(p), one step ahead, Total employment No intercept Intercept Annual intercept No Seasonal Seasonal As we move to a seasonal model with quarterly dummies and use an annually averaged intercept correction we find that the RMSFE actually falls. We find that the intercept correction which merely adds the last period does not perform particularly well, and indeed worsens the forecast error. Even before we extend the models to include indicator variables the overall contribution of the seasonals and the intercept adjustments can be seen as very important to producing good forecasts, although the intercept adjustments only make for a marginal improvement once seasonals are included. 3.3 VAR results for total employment For the VAR results with the testing procedures we outline selected results here. We begin with total employment. In Table 2 we give the RMSFE associated with the lowest value of S-BIC for the surviving models in each category of test. If the same models survive, then the same final forecast model is picked out. In Table 3 we givetwo preferred model forms and the initial coefficient estimates for each one. 2 These are for the model associated with passing the first five econometric tests (with S-BIC of ) and for all tests and restrictions (with S-BIC of ). In terms of the procedure for choosing a final model we note that the test for serial correlation actually eliminates the best forecasting model, but then imposing coefficient restrictions improves the unadjusted forecasts marginally. This is still not enough to beat the AR model. However, the four quarter intercept adjusted, all tests passed including coefficient restrictions is the best model on show, although restrictions on their own would pick the same model. Note that a lower S-BIC is available for a forecasting model which accepts the tests but with a perverse sign, although for small coefficient. The interest rate is an important indicator, appearing in both selected models as does the survey measure of production tendency. 2 Remember the coefficients are updated recursively during the forecasting process. The reported results are only for the model with the lowest initial S-BIC. 6

8 Table 2: RMSFE: Total employment, seasonals included Test No Int 1 Quart. 4 Quart. S-BIC AR(p) Short sample R 2 >: F sig χ 2 sc χ 2 reset χ 2 norm χ 2 arch L i L c Restricts Tests All tests Long sample R 2 >: F sig χ 2 sc χ 2 reset χ 2 norm χ 2 arch L i L c Restricts Tests All tests Notes: AR(p) are the results for AR with lag length chosen by S-BIC; Subsequent results for VARs surviving following tests, model chosen by S-BIC: F sig F test of significance; χ 2 sc Serial correlation; χ2 reset Reset test; χ2 norm Normality test; χ 2 arch LM test for ARCH; L i Hansen's (1990) test for intercept constancy; L c Hansen's (1990) test for parameter constancy; Restricts. sign restrictions imposed; Tests 1 5 F sig, χ 2 sc, χ2 reset, χ2 norm and χ2 arch passed jointly; All tests Tests 1 5 and L i, L c and Restricts passed jointly. Short sample tests at start of evaluation period; Long sample tests at end of evaluation period. 7

9 Table 3: Models: Total employment, seasonals included S-BIC E I1 I2 I3 I4 I5 X t X t e Q1 Q2 Q3 C S-BIC E I1 I2 I3 I4 I5 X t X t Q1 Q2 Q3 C Notes: S-BIC is value for the chosen model, Q1 Q3 the quarterly dummies, C the constant, X t i the coefficient on the i th lag for variable X. Indicator variables are: I1 is production expectation, I2 production tendencies, I3 new orders, I4 order books and I5 the interest rate. E is the dependent variable. For aggregate employment we can see how the value of automatic intercept adjustments can be exploited. Without it the forecasts are seldom as good as the AR models, and although they can also be improved, not by as much as the econometric model. Our preferred forecasting model, with the intercept correction, is very promising. 3.4 VAR results for sectoral employment We briefly discuss the results obtained for the different employment categories without complete details to give a flavour of the generality of the results obtained above. To begin with, the forecast results for agricultural employment arenotgood. This seems mainly a problem with timing rather than the direction of change. As with total employment, the main improvements in the forecasting performance are from the tests applied jointly, particularly the use of restrictions and the serial correlation test. In this case there is much less use for intercept correction, although it does little harm. For total industry our econometric approach does considerably better than the AR model, both with and without intercept correction. Models which satisfy the econometric criteria or parameter restrictions individually are slightly worse than the AR models, but can just beat the beat the AR forecast with intercept correction. However, jointly applying all the tests gives us the best model. Indeed, the imposition of parameter restrictions gives a much more satisfactory model all round, with sensible autoregressive coefficients of a good size, and a role for the interest rate and the production tendency indicator variable. We might expect that the chosen set of indicators would be most useful for this category of employment, and so it proves. Construction is the worst result for the econometric model chosen using 8

10 our rules. The structural stability tests give us the `least worst' outcome, but this a marginal improvement over the `all tests passed' model. With annual intercept adjustments this is the best of the econometric models, but a long way from competing with a simple seasonal AR. Our procedures do robustify against the worst possible outcomes and can therefore be seen as an important check. The apparent extreme volatility of the series must have some impact on its forecastability, and perhaps the use of specific indicators is called for. For wholesale employment, there is little variation in the results form the econometric models. However, they almost all beat the AR model. Both the four quarter or one quarter intercept corrections are of marginal use. However, if we were to choose a forecast model based on the long sample, then the (mostly) unusual tests are of help. The chosen models exhibit a great deal of variation between the ones that do not satisfy the coefficient restrictions and those that do, with switches between the significant variables. However, the (correctly signed) interest rate is always retained. A clear winner emerges in the forecasting models for the financial sector. Annual intercept adjustments and `all tests passed' yields the best forecasting model based on the S-BIC criterion quite easily. Again, there is a marked difference between models which have the correct signs and those that do not, but interest rates always figure. The best model makes use of the production tendency variable. Finally, public employees is another category where the approach can do little more than guard against poor models. The more tests the models pass improves the forecast performance, but again the AR dominates. There is little variation, but the restricted coefficients give us a dominant econometric model. However, this cannot beat the AR when the same intercept corrections are applied. Indeed, there is some evidence that the one quarter correction is best, but this would be inadvisable in most circumstances. 4 Conclusions The main results can be summarised as follows. The report discusses appropriate forecasting models for each category of employment considered. When we consider the use of econometric testing procedures, seasonal dummies and intercept correction for constructing forecasting models we find the following: ffl We find consistent evidence that our multivariate models lead to improvements over the benchmark AR(p) approach although this does not always stand up for the worst VAR forecast. ffl We find that the seasonal dummies are important for all models and can quite often result in some quite substantial reductions in RMSFE. ffl On the aggregate results we find that an intercept correction based on a four period average dominates all other approaches, although this result was not always born out for the disaggregate data. Nevertheless, the best choice would typically be the average of the last four residuals. 9

11 ffl Successfully passing the test for serial correlation did appear to improve forecast performance. ffl The Pesaran-Timmermann direction-of-change tests only occasionally improve the three step ahead residuals. However, they fail to do this for the annual intercept correction. ffl The a priori coefficient restrictions play a significant role in determining good models. This latter conclusion indicates that all forecast results are conditional on the indicators used and effort should be expended on determining good indicators. To decide on final forecasting models the report develops a selection strategy where surviving models are chosen on the basis of an information criterion. It is these which are used to forecast. On the basis of this strategy we can draw the following additional conclusions: ffl Forecasters should use intercept adjustments, in particular four-quarter ones. These seldom do harm and can be extremely useful. ffl It remains true that applying as many tests as possible seems to be a useful way of identifying a good model, but the absence of serial correlation and stability are most important. ffl A priori coefficient restrictions almost always improve a model. Forecasters should therefore think carefully about the form these should take. It is important to emphasise that these results are specific to the forecasting problem we have considered, and more diverse datasets and indicators are needed before we can safely conclude that this approach will always bear fruit. However, we consider these results to be extremely promising. 10

12 References Brown, R., J. Durbin, and J. Evans (1975): Techniques for Testing the Constancy of Regression Relationships Over Time," Journal of the Royal Statistical Society, Series B, 37, Clements, M. P., and D. F. Hendry (1999): Forecasting Non-stationary Economic Time Series. The MIT Press, London, England. Engle, R. F. (1982): Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom inflation," Econometrica, 50(4), Greene, W. H. (1997): Econometric Analysis. McGraw Hill, third edn. Hansen, B. E. (1992): Testing for Parameter Instability in Linear Models," Journal of Policy Modeling, 14, Harvey, A. C. (1990): The Econometric Analysis of Time Series. Philip Allan: London, second edn. Pesaran, M., and B. Pesaran (1991): Microfit 3.0: An Interactive Econometric Software Package. Oxford University Press: Oxford. Pesaran, M., and A. Timmermann (1992): A Simple Non-Parametric Test of Predictive Performance," Journal of Business and Economic Statistics, 10,