Methods and estimation techniques of euro area GDP flash at T+30 days: preliminary reflections

Size: px
Start display at page:

Download "Methods and estimation techniques of euro area GDP flash at T+30 days: preliminary reflections"

Transcription

1 Methods and estimation techniques of euro area GDP flash at T+30 days: preliminary reflections (preliminary and incomplete) Filippo Moauro ISTAT - Direzione centrale di Contabilità Nazionale, Via A. Depretis 74/B, Roma, Italy moauro@istat.it Tel EUROSTAT task force GDP Flash at T+30 days Working group Methods and estimation technique Contribution to the second meeting, Lisbon, 9 December 2013 Abstract. The document presents preliminary reflections around methods and estimation techniques of euro area GDP flash at T+30 days. Main challenges of the exercise are presented with two case studies and a proposal of possible modeling strategies. Problems of forecasting are mentioned, as well as some compilation issues and chain linking. Some final concluding remarks are given. 1.Introduction Quarterly GDP data are probably the most relevant economic short term statistics produced by the European statistical system. Their release is coordinated by Eurostat and from September 2014, in occasion of entering in force of the new ESA 2010 data transmission program, it will become effective the calendar at 45, 60 and 90 days after the end of the reference quarter. In recent discussions at national and international level involving statistical institutes, users and practitioners emerged a clear need to anticipate the GDP flash from a delay of 45 to 30 days. Several are the reasons: timely information is requested by policy makers and stakeholders and the need of an efficient early warning system have been exacerbated by the economic crisis; moreover the GDP flash at T+30 would align the European calendar to that of US, where GDP is released according to a days timetable. Eurostat s purpose is to compile euro area and European Union GDP flash following the so called direct method, i.e. using member states estimations and imputing missing countries in a second step. Then, if the final target is the euro area and the European Union GDP, their compilation require data at member state level. The focus is on seasonal adjusted quarterly growth rates of volume measures. For their compilation a revised version of the handbook on quarterly national accounts has been recently released by Eurostat (2010). GDP flash at t+30 days poses a problem of missing data for the last observation, as well as the need to provide modeling solutions to efficiently combine scarce available information with forecasts. Aim of this first paper is to touch the issues around methods and estimation techniques for an open discussion in the next meeting of the task force, with particular emphasis on: description of main challenges of the exercise (section 2) proposing a first classification of possible situations of missing data

2 with a real example, a first case study (section 3), proposals of modeling strategy (section 4), forecasting with a second example (section 5), compilation issues and chain linking (section 6) and some preliminary concluding remarks (section 7). 2. Main challenges Main challenges of the T+30 estimation exercise for GDP concern the availability of short term indicators. In particular the lack of: (i)the last one or two months of the quarter T referred to the short term business statistics and other relevant monthly indicators like for industrial activities and external trade; (ii)last quarter for market service activities, which represent the most relevant part of the European Union economies. In this respect the current European Regulations concerning most of production and turnover short term statistics represent a limit as, according to them, NSIs are required to transmit these indicators between T+40 and T+60 days; (iii)last quarter of non-market figures, based on information coming from the Treasury; (iv)last quarter of data coming from the budget household survey when used to produce the GDP estimate from the demand side or, indirectly, some from the supply side. In this respect forecasting techniques are required whose choice is itself an issue. In the following sections I will try to present the problem for a given component of GDP, with a selection of forecasting methods in use among statistical officers. Let s consider a quarterly time series for t=1,, T of a GDP component and let s try to track typical situations a statistical agency faces when engaged in estimating within 30 days. At least two dimensions of the problem can be taken into account: the method used in current production of quarterly accounts, notably direct or indirect; and the availability of information in the last quarter T, fully available, partial or absent. Then, six situations might be identified: 1) is directly estimated until quarter T-1 and a related series measuring the same phenomenon is available until quarter T which allows to implement an alternative indirect estimate. Vary rare, but it can occur for external trade components and some energy products. Here data are directly used to compile quarterly accounts and a related series is also available. At T+30 days y T is unobserved whereas xt is available, which allows an alternative model-based estimate; 2) is indirectly estimated until quarter T-1 using an extrapolation method based on a related series, but is only partially available. It occurs for example when is used through a quarterly disaggregation method and the information of this indicator covers only one or two months of the last quarter T. It is the typical case of value added for a manufacturing industry: here annual data are regressed versus the related quarterly production index. At T+30 days is unobserved, is only partially observed and requires to be extrapolated before to be used in the usual regression for value added estimation; 3) is directly estimated and is only partially available reflecting situation of point 2); 4) is indirectly estimated and, is available over all the sample including time T. It is a straightforward situation where can be computed in the ordinary way; 5) is indirectly estimated and no related series is available. It could occur for service components of GDP; 6) is directly estimated and no related series is available. It should also be considered the straightforward situation in which data of is available under a direct method of estimation. Many other situations can be discussed. For instance when a related series is available but it is a measure of a correlated phenomenon with respect to. I refer especially to

3 employment data or price statistics. For example, according to the current regulation, both producer and consumer price indexes are available at T+30 and their utility could be investigated. Same for employment when released in time. 3. A first case study An example based on Italian seasonal adjusted data is provided in Figure 1. This case study well adapts to situations 2) or 3). Here the Italian industrial value added is shown together with the industrial production index (IPI) in the sample 1990 first quarter to 2013 third quarter. Data are presented in their levels (upperleft panel), log-differences (upper-right panel), log- seasonal-differences (lower-left panel) and together with the confidence indicator in levels (lower-right panel) over a monthly span in the period January 2001 to October It emerges a clear correlation among value added and IPI apart the presence of a couple of level shifts in the series. Considering that Italy compiles quarterly value added disaggregating annual data using quarterly IPI through a regression, level shifts find a natural solution in the modeling specification. More problematic is extrapolation of last quarters, when the two series drift apart. Figure 1: Italian industrial value added, production index and confidence indicator: sample 1990q1-2013q3 Move now to Table 1 and consider the case of estimate of at end October: available information in 2013q3 consists in IPI data for July and August and up to September for the confidence indicator. Then, assuming reliable the information provided by this latter indicator, we can follow a two-step approach: i)model together IPI and the confidence indicator to obtain an estimate of IPI for September and ii)run the usual regression of annual value added versus quarterly IPI to produce an estimate of. This strategy is the natural extension of current estimation of quarterly accounts for member states adopting indirect methods (see Bloem at al. 2001) and temporal disaggregation methods.

4 An alternative is to work on the last available release of quarterly data and to approach the problem as a forecasting exercise. Then a solution could be to identify similar regressions of value added versus IPI at the same quarterly frequency, with the advantage of opening the choice to a wider spectrum of specifications: modeling data in the levels, in the growth rates of the original data if they produce more accurate forecasts. Moreover, this alternative is the more suitable in case quarterly accounts are produced according to a direct approach. Further methods move towards single-frequency or mixed-frequency multivariare setups. Table 1. Italian industrial value added, production index and confidence indicator: quarterly and monthly data in 2012 and 2013 date industrial value added industrial production confidence indicator levels growth levels growth growth levels rates rates rates 2012q1 60, q2 60, q3 60, q4 58, q1 58, q2 58, q july august september Modelling strategy The literature around time series analysis has proposed several modelling strategies to face a problem of flash estimate. The aim here is to limit the attention to popular methods which found large application among statistical agencies and practitioners. A first big difference is between models able to the treatment of explanatory variables and pure time series models where forecasts are produced using only current and past observations of the time series under analysis. As seen in the example above, flash estimate is most of the time a composite forecasting exercise where both the target and the explanatory variables require to be forecasted. Then a first method is given by the so called bridge modeling approach. It consists in the following 3-step procedure: (1) the indicator is predicted by means of a univariate technique or using the information coming from an alternative indicator; (2) then it is aggregated at the same time span of the target variable; (3) the indicator is regressed versus the target variable and estimated coefficients are used to obtain a forecast of the latter variable. Current practice in the last step of a bridge model is to adopt dynamic linear regressions. Here the target or dependent variable is explained by current and past values of the related series and by past values of the dependent variable itself. Data can be modeled in their original levels or subject to a transformation among which logarithm, difference or log-differences. This approach has been popularized by Hendry and Mizon (1978) under the name of autoregressive distributed lag ADL models. A methodology for selecting ADL models according to an approach from general to specific has also become common among practitioners

5 for its simplicity and the diffusion of computer routines for automatic model selection procedures. Furthermore ADL models could find an extension to temporal disaggregation thank to the work by Proietti (2006). The classical approach of pure time series models moves around the class of autoregressive moving average ARMA models for stationary time series and the wider class of autoregressive integrated moving average ARIMA models for non-stationary time series. Here stationarity is gained through differencing original time series data. Model selection is based on the Box and Jenkins (1976) methodology. An alternative is given by structural time series STS models where the series is expressed in terms of components of interest like trend, cycle, seasonal and irregular terms. See Harvey (1989) for a full treatment of STS models. ARIMA models has the advantage of the parsimony in the model representation of time series dynamics; by contrast STS models are designed according to the analyst desire to see given components in the form describing the data evolution. Modern techniques make use of the Kalman filter for the statistical treatment of both ARIMA and STS models. Several software routines are available for their estimate. Both ARIMA and STS models are univariate and their possible extensions move towards multivariate settings. In a multivariate model both the target and the explicative variables are treated as a cross section of time series. Following Harvey (1989, p.429) under a multivariate setup it is assumed that the different series are not subject at any cause-and-effect relationship between them. However, they are subject to the same overall environment, such as the prevailing business climate, and so a multivariate model will seek to link them together. Therefore, main advantage of multivariate model is that they overcome the assumption of exogeneity of explanatory variables implicit in regression methods. Furthermore, such models often provide more useful information on the dynamic properties of the series and produce more accurate forecasts. By contrast their statistical treatment is more complicated since the number of unknown parameters to be estimated increases rapidly with the number of series to be treated together and their identifiability might become complicated. The multivariate extension of ARIMA models is given by vector-arima models and for its simplicity in the statistical treatment the restricted class of vector-autoregressive VAR models. In parallel the extension to multivariate settings of univariate STS models goes towards seemingly unrelated time series equation SUTSE models. In this latter context also dynamic factor analysis should be mentioned. Under this approach given a representation of a cross-section of time series into components (trend, cycle, ) it might exist a specification for which certain components are in common. In other words, a reduced number of these components is informative for the entire set, simplifying the model specification. When common factors are found in the trend, the model is co-integrated. Very interesting in approaching a problem of flash estimate is the extension of multivariate settings to models handling mixed frequency data. As discussed in previous sections in a real world problem data are often available at different time span (in the example above data were monthly and quarterly) and cast them together in a unique setting could simplify their statistical treatment and forecasting. In particular adopting multivariate models with mixed-frequency data, the multi-step procedure of bridge models is overcome and flash estimate finds a solution in one step. Here missing data at the end of the sample - ragged edge problem- are estimated together in one-step once unknown model parameters have been obtained. For last contributions see amongst others the works by Kuzin et al. (2009), Clements and Galvao (2008) and Banbura at al. (2013). Also temporal disaggregation has been treated by the literature in a context of mixed-frequency multivariate time series models. See the contributions by Frale et al. (2010 and 2011) for constructing a euro area monthly indicator of economic activity based on factor models and by Moauro and Savio (2005)

6 and Moauro (2010) for an application on employment based on SUTSE models. In this case estimates of quarterly data are a by-product since obtained by temporal aggregating monthly estimates. This list of methods is of course not exhaustive and several other model strategies are possible. Among others, the nonlinear class of state-dependent models (SDM) or switching regime models (SRM) introduced by Brian Priestley in Here the dynamic paths are governed by a set of autoregressive parameters, a set of moving average parameters and a local intercept, each of them dependent by past information. For a general outline on nonlinear models the references are the classics Priestley (1988), Tong (1990) and Granger and Teräsvirta (1993). 5. Forecasting As discussed in previous section a model-based forecasting exercise involves two main problems: the selection of a model class and its specification also in terms of explanatory variables to be included. The goal is to produce estimates which assure the respect of some good statistical properties, like low ex-ante forecast errors e.g. measured by the mean square error associated to the forecast or low ex-post forecast errors, i.e. errors with respect to a subsequent estimates. For statistical agencies involved in flash estimates keeping control of this latter measures is the most relevant aspect of the exercise, as immediately linked to revisions of subsequent releases. Main challenge of any forecasting exercise is the stability of a selected model over time: in other words the model selection at a certain date, especially if it involves the use of an explanatory variable, could lead to stability in estimated parameters in subsequent repetitions, without guarantee that this model continue to be valid after some time. The consequence is that without continuity in maintenance of a model settled at the start of the exercise, there is the risk to produce inaccurate forecasts after some time, whose quality is lower than those produced by pure time series models. All model settings (ARIMA models, STS models, ) have their own model selection procedures to help the analyst to choice the best specification. Good practice is also to run a rolling forecasting exercise (inside the sample) in order to understand the stability of model estimated parameters and to compute a set of forecasts over the required horizon. Afterwards, some summary error statistics (like mean absolute or square errors, respectively MAE or MSE) are computed to rank both model settings and their specifications. In presence of a detailed release database a real time exercise could also be implemented Let s now present a forecasting exercise in practice. For continuity with the example of Figure 1 I refer to the need of forecasting the Italian value added for services. In figure 2 I present this data together with the industrial value added in levels, log differences and seasonal log differences. Difference in their evolution are evident. In particular it emerges the wider cyclical variability of industrial data with respect to service over all the sample, as well as a more marked level shift at end 2008 in the former with respect to latter sector. As an example I fitted to the service value added an ADL(1,1) model with industrial data as explanatory indicator. Data are modeled in their logarithms. An ADL(1,1) model is such that: = , ~ (0, ); (1) in other words, service value added is a linear function of a constant plus their lagged values of 1 period ( ) where is its regression coefficient and of industrial data at lag 0 ( ) and at lag 1 ( ), with and respectively their coefficients; finally is a white noise with zero mean and variance. Unknown parameters of model (1) are,,, and whose estimate can be obtained for instance through ordinary least squares. Results of these estimates are shown in table 2.

7 Figure 2: Italian value added for service industry activities: sample 1990q1-2013q3 The fit of the model is very good considering that the statistic is equal to 0.99 and all the coefficients are significant apart the constant term ; moreover, the combination of both their values and signs lead to the conclusion that the dynamic relation service-industry of value added is close to a static regression of growth rates of original data. In fact, data are taken in their logs, the estimated coefficient for is almost 1, and those for industrial value added and, respectively at lag 0 and 1, are similar in absolute terms, but with the former taking a positive value and the latter a negative one. In other words the model could be simplified without any apparent loss of fit. Table 2 Estimated coefficients of an ADL(1,1) model of Italian service value added versus industrial value added. Model in log-levels of original data in the sample 1990q1-2003q3 estimated value standard errors t-value statistics One-step-ahead forecast of is straightforward using the ADL(1,1) model of equation (1). 6. Compilation issues and chain linking Volume QNA data are compiled according to chain-linking methods which poses a problem of loss of crosssectional additivity when summing-up elementary components into totals. Elementary GDP volume component are referred to a fixed base year and current practice among member states is to chain-link this

8 data according to the annual-overlap approach as described in Bloem at al. (2001 chapater IX). Only Austria adopts the one-quarter-overlap approach. If the fixed base year approach were adopted, then no consistency problem would arise as the last quarter estimates would be perfectly additive. Then, for aggregating elementary volume components the following multistep approach is suggested: (1)elementary volume GDP components together with their forecasts are de-chained and expressed in terms of previous year prices; (2) previous year data are summed up in order to obtain GDP at previous year prices; (3) GDP at previous year prices is chain linked which allows to properly compute growth rates. 7. Short conclusions This first discussion on methods and estimation techniques is not exhaustive. However, a first proposal of possible scenarios is defined for the definition of the problem of GDP flash estimate according to the method followed in the compilation of quarterly accounts and the lack of available information. Next steps should complete the scenarios and to define common guidelines. The document also presents two case studies with Italian data. The goal is to present a graphical analysis of a GDP component accompanied by explanatory variables and the role of data transformations. The second example also make an hypothesis of estimate of the service component of value added, strongly characterized by a lack of information, by mean of a dynamic regression of the industrial component for which related information is available. Both the examples are indicative since referred to macro components of GDP. By contrast the efforts should go towards investigating the role of the detail at which quarterly accounts are compiled and if the available split is of help in producing accurate forecasts. Proposals for modelling strategies could be completed of delimited in some way, not excluding the possibilities of the options provided by combining-forecasts. References Banbura M., Giannone D., Modugno D., Reichlin L. (2013) Now-casting and the real-time data flow, forking paper series no. 1564, July 2013, European Central Bank; Bloem at al. (2001) Quarterly National Accounts Manual Concepts, Data Sources and Compilation, International Monetary Fund; Box and Jenkins (1976) Time Series Analysis: Forecasting and control, revised edition, San Francisco, Holden Day; Clements, Michael P., Galvao Ana Beatriz, (2008) Macroeconomic Forecasting With Mixed-Frequency Data, Journal of Business & Economic Statistics, American Statistical Association, vol. 26, pages Eurostat (2010) Handbook of quarterly national accounts, Office for Official Publication of the European Communities, Luxembourg; Frale, C., Marcellino, M., Mazzi, G. and Proietti, T. (2010). Survey Data as Coincident or Leading Indicators. Journal of Forecasting, ; Frale, C., Marcellino, M., Mazzi, G. and Proietti, T. (2011). EUROMIND: A Monthly Indicator of the Euro Area Economic Conditions, Journal of the Royal Statistical Society - Series A, no. doi: /j X x; Granger C. W. J. and Teräsvirta T. (1993) Modelling nonlinear economic relationship. Oxford University Press Harvey (1989) Forecasting structural time series models and the Kalman filter, Cambridge University Press

9 Hendry and Mizon (1978) Serial correlation as a convenient simplification, not a nuisance: A comment on a study of the demand for money by the Bank of England. Economic Journal, 88, ; Kuzin V., Marcellino M. and Shumacher C. (2009) MIDAS versus mixed-frequency VAR: nowcasting GDP in the euro area, discussion paper series 1 economic studies, n.07/2009, Deutsche Bundesbank; Moauro F. e G. Savio (2005), Temporal disaggregation using multivariate structural time series models, Econometrics Journal, 8: ; Moauro (2013) Monthly employment indicators of the euro area and larger member states: real time analysis of indirect estimates, Journal of Forecasting, forthcoming; Priestley M. B. (1980), State dependent models: a general approach to nonlinear time series analysis. Journal of Time Series Analysis, 1, pp Priestley M.B. (1988), Nonlinear and nonstationary time series analysis. Academic Press Proietti T. (2006), Temporal disaggregation by state space methods: Dynamic regression methods revisited, Econometrics Journal, Vol. 9, pp Tong H. (1990) Nonlinear time series, a dynamical system approach. Oxford University Press.