Department of Economics, University of Michigan, Ann Arbor, MI

Similar documents
Testing the Predictability of Consumption Growth: Evidence from China

What Central Bankers Need to Know about Forecasting Oil Prices

The Performance of Unemployment Rate Predictions in Romania. Strategies to Improve the Forecasts Accuracy

Targeted Growth Rates for Long-Horizon Crude Oil Price Forecasts

4.3 Nonparametric Tests cont...

COORDINATING DEMAND FORECASTING AND OPERATIONAL DECISION-MAKING WITH ASYMMETRIC COSTS: THE TREND CASE

Examination of Cross Validation techniques and the biases they reduce.

The information content of composite indicators of the euro area business cycle

Semester 2, 2015/2016

Economic Forecasting in the Private and Public Sectors. Remarks by- Alan Greenspan. Chairman of the Board of Governors of the Federal Reserve System

Nonlinearities in the Oil Price-Output Relationship *

PhD Student (presenting): Ian Gregory.

) ln (GDP t /Hours t ) ln (GDP deflator t ) capacity utilization t ln (Hours t

Near-Balanced Incomplete Block Designs with An Application to Poster Competitions

DRAFT FOR DISCUSSION AND REVIEW, NOT TO BE CITED OR QUOTED

THE LEAD PROFILE AND OTHER NON-PARAMETRIC TOOLS TO EVALUATE SURVEY SERIES AS LEADING INDICATORS

The Role of Oil Price Shocks in Causing U.S. Recessions

Which is the best way to measure job performance: Self-perceptions or official supervisor evaluations?

Distinguish between different types of numerical data and different data collection processes.

Money and output viewed through a rolling window

No. 460 The Role of Oil Price Shocks in Causing U.S. Recessions. Lutz Kilian, Robert J. Vigfusson

Technological Diffusion News

tay s as good as cay: Reply 1

Introduction to Structural Econometrics

5 CHAPTER: DATA COLLECTION AND ANALYSIS

Exchange Rate Determination of Bangladesh: A Cointegration Approach. Syed Imran Ali Meerza 1

Money and Output Viewed Through a Rolling Window * ABSTRACT

Chapter 3. Database and Research Methodology

Consumer Confidence Surveys: Can They Help Us Forecast Consumer Spending in Real Time?

Folia Oeconomica Stetinensia DOI: /foli FORECASTING RANDOMLY DISTRIBUTED ZERO-INFLATED TIME SERIES

Revision confidence limits for recent data on trend levels, trend growth rates and seasonally adjusted levels

Oil and US GDP: A Real-Time Out-of-Sample Examination

Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology

Introduction to Research

Appendix A Mixed-Effects Models 1. LONGITUDINAL HIERARCHICAL LINEAR MODELS

Testing the long-run relationship between health expenditures and GDP in the presence of structural change: the case of Spain

M. Zhao, C. Wohlin, N. Ohlsson and M. Xie, "A Comparison between Software Design and Code Metrics for the Prediction of Software Fault Content",

Financing Constraints and Firm Inventory Investment: A Reexamination

Kristin Gustavson * and Ingrid Borren

Measuring long-term effects in marketing P.M Cain

DETECTING AND MEASURING SHIFTS IN THE DEMAND FOR DIRECT MAIL

Bayesian Uncertainty Quantification in SPARROW Models Richard B. Alexander

EFFICACY OF ROBUST REGRESSION APPLIED TO FRACTIONAL FACTORIAL TREATMENT STRUCTURES MICHAEL MCCANTS

Treatment of Influential Values in the Annual Survey of Public Employment and Payroll

Why Agnostic Sign Restrictions Are Not Enough: Understanding the Dynamics of Oil Market VAR Models

TAMPERE ECONOMIC WORKING PAPERS NET SERIES

DO CONSUMER CONFIDENCE INDEXES HELP FORECAST CONSUMER SPENDING IN REAL TIME? Dean Croushore. University of Richmond. January 2005

A NOTE ON METHODS OF ESTIMATING REGIONAL INPUT-OUTPUT TABLES: CAN THE FLQ IMPROVE THE RAS ALGORITHM?

THE SOURCE OF TEMPORARY TECHNOLOGICAL SHOCKS

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Band Battery

A THRESHOLD COINTEGRATION ANALYSIS OF ASYMMETRIC ADJUSTMENTS IN THE GHANAIAN MAIZE MARKETS. Henry de-graft Acquah, Senior Lecturer

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Data Science in a pricing process

Does foreign aid really attract foreign investors? New evidence from panel cointegration

Sales Response Modeling: Gains in Efficiency from System Estimation

EURO-INDICATORS WORKING GROUP FLASH ESTIMATES 6 TH MEETING JULY 2002 EUROSTAT A6 DOC 105/02

CHAPTER 8 PERFORMANCE APPRAISAL OF A TRAINING PROGRAMME 8.1. INTRODUCTION

Counterintuitive Signs in Reduced Form Price Regressions

VALUE OF SHARING DATA

An Assessment of the ISM Manufacturing Price Index for Inflation Forecasting

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS

Taylor Rule Revisited: from an Econometric Point of View 1

A Cross-Country Study of the Symmetry of Business Cycles

Energy Efficiency and Changes in Energy Demand Behavior

The process of making an informed hiring decision depends largely on two basic principles of selection.

Forecasting Admissions in COMSATS Institute of Information Technology, CIIT, Islamabad

as explained in [2, p. 4], households are indexed by an index parameter ι ranging over an interval [0, l], implying there are uncountably many

Energy savings reporting and uncertainty in Measurement & Verification

MBF1413 Quantitative Methods

Review of Risk Free Rate Calculation

New Developments in Panel Data Estimation: Full-Factorial Panel Data Model

Linear model to forecast sales from past data of Rossmann drug Store

Effects of time-based Biases in Review Communities Kevin Ho, Sean O Donnell CS224W Final Report

Forecasting Construction Cost Index using Energy Price as an Explanatory Variable

ECON 690: Topics in Applied Time Series Analysis Professor Mohitosh Kejriwal Spring Lectures: Tuesdays and Thursdays, 2:50-4:20pm in Rawls 2079

Introduction to Artificial Intelligence. Prof. Inkyu Moon Dept. of Robotics Engineering, DGIST

A simple model for low flow forecasting in Mediterranean streams

Testing for Seasonal Integration and Cointegration: The Austrian Consumption Income Relationship LIYAN HAN GERHARD THURY. Shima Goudarzi June 2010

Revisiting Energy Consumption and GDP: Evidence from Dynamic Panel Data Analysis

Economic theory and forecasting: lessons from the literature

Ethanol and food prices: price relations and predictability.

Leveraging Smart Meter Data & Expanding Services BY ELLEN FRANCONI, PH.D., BEMP, MEMBER ASHRAE; DAVID JUMP, PH.D., P.E.

QUANTIFYING DATA FROM QUALITATIVE SURVEYS*

Real Estate Modelling and Forecasting

CHAPTER 6 OPTIMIZATION OF WELDING SEQUENCE

2 Population dynamics modelling of culturebased fisheries

Chapter 14. Simulation Modeling. Learning Objectives. After completing this chapter, students will be able to:

GLOSSARY OF COMPENSATION TERMS

Multilevel Modeling and Cross-Cultural Research

Yt i = " 1 + " 2 D 2 + " 3 D 3 + " 4 D 4 + $ 1 t 1. + $ 2 (D 2 t 2 ) + $ 3 (D 3 t 3 ) + $ 4 (D 4 t 4 ) + :t i

ARE MALAYSIAN EXPORTS AND IMPORTS COINTEGRATED? A COMMENT

A Quantitative Approach to Detect Structural Breaks in the Trend of Bid Prices

Chapter 12. Sample Surveys. Copyright 2010 Pearson Education, Inc.

Efficiency and Productivity Growth in Ukrainian agriculture

Customer Wallet and Opportunity Estimation: Analytical Approaches and Applications

ROADMAP. Introduction to MARSSIM. The Goal of the Roadmap

Small Trends and Big Cycles in Crude Oil Prices

Comparison of Residual based Cointegration Tests: Evidence from Monte Carlo

Genetic Programming for Symbolic Regression

Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction

Transcription:

Comment Lutz Kilian Department of Economics, University of Michigan, Ann Arbor, MI 489-22 Frank Diebold s personal reflections about the history of the DM test remind us that this test was originally designed to compare the accuracy of model-free forecasts such as judgmental forecasts generated by experts, forecasts implied by financial markets, survey forecasts, or forecasts based on prediction markets. This test is used routinely in applied work. For example, Baumeister and Kilian (22), use the DM test to compare oil price forecasts based on prices of oil futures contracts against the no-change forecast. Much of the econometric literature that builds on Diebold and Mariano (995), in contrast, has been preoccupied with testing the validity of predictive models in pseudo out-ofsample environments. In this more recent literature the concern actually is not the forecasting ability of the models in question. Rather the focus is on testing the null hypothesis that there is no predictive relationship from one variable to another in population. Testing for the existence of a predictive relationship in population is viewed as an indirect test of all economic models that suggest such a predictive relationship. A case in point is studies of the predictive power of monetary fundamentals for the exchange rate (e.g., Mark 995). Although this testing problem may seem similar to that in Diebold and Mariano (995) at first sight, it is conceptually quite different from the original motivation for the DM test. As a result, numerous changes have been proposed in the way the test statistic is constructed and in how its distribution is approximated In a linear regression model testing for predictability in population comes down to testing the null hypothesis of zero slopes which can be assessed using standard in-sample t - or Wald - tests. Alternatively, the same null hypothesis of zero slopes can also be tested based on recursive

or rolling estimates of the loss in fit associated with generating pseudo out-of-sample predictions from the restricted rather than the unrestricted model. Many empirical studies including Mark (995) implement both tests. Under standard assumptions, it follows immediately that pseudo out-of-sample tests have the same asymptotic size as, but lower power than in-sample tests of the null hypothesis of no predictability in population, which raises the question why anyone would want to use such tests. While perhaps obvious, this point has nevertheless generated extensive debate. The power advantages of in-sample tests of predictability were first formally established in Inoue and Kilian (24). Recent work by Hansen and Timmermann (23) elaborates on the same point. Less obviously it can be shown that these asymptotic power advantages also generalize to comparisons of models subject to data mining, serial correlation in the errors and even certain forms of structural breaks (see Inoue and Kilian 24). WHERE DID THE LITERATURE GO OFF TRACK? In recent years, there has been increased recognition of the fact that tests of population predictability designed to test the validity of predictive models are not suitable for evaluating the accuracy of forecasts. The difference is best illustrated within the context of a predictive regression with coefficients that are modelled as local to zero. The local asymptotics here serve as a device to capture our inability to detect nonzero regression coefficients with any degree of reliability. Consider the data generating process y, where t t and /2 T,, t NID(,). The Pitman drift parameter cannot be estimated consistently. We restrict attention to one-step-ahead forecasts. One is the restricted forecast y ; the other is the unrestricted forecast yt t ˆ, where ˆ is the recursively obtained least-squares estimate of. t t 2

This example is akin to the problem of choosing between a random walk with drift and without drift in generating forecasts of the exchange rate. It is useful to compare the asymptotic MSPEs of these two forecasts. The MSPE can be expressed as the sum of the forecast variance and the squared forecast bias. The restricted forecast has zero variance by construction for all values of, but is biased away from the optimal forecast by, so its MSPE is 2. The unrestricted forecast in contrast has zero bias, but a constant variance for all, which can be normalized to unity without loss of generality. As Figure illustrates, the MSPEs of the two forecasts are equal for. This means that for values of, the restricted forecast actually is more accurate than the unrestricted forecast, although the restricted forecast is based on a model that is invalid in population, given that. This observation illustrates that there is a fundamental difference between the objective of selecting the true model and selecting the model with the lowest MSPE (also see Inoue and Kilian 26). Traditional tests of the null hypothesis of no predictability (referred to as old school WCM tests by Frank Diebold) correspond to tests of H : which is equivalent to testing H :. It has been common for proponents of such old school tests to advertise their tests as tests of equal forecast accuracy. This language is misleading. As Figure shows, testing equal forecast accuracy under quadratic loss corresponds to testing H : in this example. An immediate implication of Figure is that the critical values of conventional pseudo-out-ofsample tests of equal forecast accuracy are too low, if the objective is to compare the forecasting accuracy of the restricted and the unrestricted forecast. As a result, these tests invariably reject the null of equal forecast accuracy too often in favor of the unrestricted forecast. They suffer from size distortions even asymptotically. 3

This point was first made in Inoue and Kilian (24) and has become increasingly accepted in recent years (e.g., Giacomini and White 26; Clark and McCracken 22). It applies not only to all pseudo-out-of-sample tests of predictability published as of 24, but also to the more recently developed alternative test of Clark and West (27). Although the latter test embodies an explicit adjustment for finite-sample parameter estimation uncertainty, it is based on the same H : as more traditional tests of no predictability and hence is not suitable for evaluating the null of equal MSPEs. BACK TO THE ROOTS Frank Diebold makes the case that we should return to the roots of this literature and abandon old school WCM tests in favor of the original DM test to the extent that we are interested in testing the null hypothesis of equal MSPEs. Applying the DM test relies on Assumption DM which states that the loss differential has to be covariance stationary for the DM test to have an asymptotic N(,) distribution. Frank Diebold suggests that as long as we carefully test that assumption and verify that it holds at least approximately, the DM test should replace the old school WCM tests in practice. He acknowledges that in some cases there are alternative tests of the null of equal out-of-sample MSPEs such as Clark and McCracken (2,22), which he refers to as new school WCM tests, but he considers these tests too complicated to be worthwhile considering in practice. It is useful to examine this proposal within the context of our local-to-zero predictive model. Table investigates the size of the DM test based on the N(,) asymptotic approximation. We focus on practically relevant sample sizes. In each case, we choose in the data generating process such that the MSPEs of the restricted and the unrestricted model are /2 equal. Under our assumptions /T under this null hypothesis. We focus on recursive 4

estimation of the unrestricted model. The initial recursive estimation window consists of the first R sample observations, R T. We explore two alternative asymptotic thought experiments. In the first case, 5,3,45 R is fixed with respect to the sample size. In the second case, R T with.25,.5,.75. Table shows that, regardless of the asymptotic thought experiment, the effective size of the DM test may be lower or higher than the nominal size, depending on R and T. In most cases, the DM test is conservative in that its empirical size is below the nominal size. This is an interesting contrast to traditional tests of H : whose size invariably exceeds the nominal size when testing the null of equal MSPEs. When the empirical size of the DM test exceeds its nominal size, the size distortions are modest and vanish for larger T. Table also illustrates that in practice R must be large relative to T for the test to have reasonably accurate size. Otherwise the DM test may become extremely conservative. This evidence does not mean that the finite-sample distribution of the DM test statistic is well approximated by a N(,) distribution. Figure 2 illustrates that even for T 48 the density of the DM test statistic is far from N(,). This finding is consistent with the theoretical analysis in Clark and McCracken (2, p. 8) of the limiting distribution of the DM test statistic in the local-to-zero framework. Nevertheless, the right tail of the empirical distribution is reasonably close to that of the N(,) asymptotic approximation for large R / T, which explains the fairly accurate size in Table for R.75 T. The failure of the N(,) approximation in this example suggests a violation of Assumption DM. This fact raises the question of whether one would have been able to detect this problem by plotting and analyzing the recursive loss differential, as suggested by Frank Diebold. The answer for our data generating process is sometimes yes, but often not. While there are some draws of the loss differential in the Monte Carlo study that show clear signs of nonstationarity 5

even without formal tests, many others do not. Figure 3 shows one such example. This finding casts doubt on our ability to verify Assumption DM in practice. Leaving aside the question of the validity of the N(,) approximation in this context, the fact that the DM test tends to be conservative for typical sample sizes raises concerns that the DM test may have low power. It therefore is important to compare the DM test to the bootstrapbased tests of the same null hypothesis of equal MSPEs developed in Clark and McCracken (2). The latter new school WCM tests are designed for nested forecasting models and are motivated by the same local-to-zero framework we relied on in our example. They can be applied to estimates of the MSPE based on rolling or recursive regressions. Simulation results presented in Clark and McCracken (2) suggest that their alternative tests have size similar to the DM-test in practice, facilitating the comparison. Clark and McCracken s bootstrap-based MSE t test in several examples appears to have slightly higher power than the DM test, but the power advantages are not uniform across all data generating processes. Larger and more systematic power gains are obtained for their bootstrap MSE test, however. This evidence suggests that there remains a need for alternative tests of the null hypothesis of equal MSPEs such as the MSE F F -test in Clark and McCracken (2). While such tests so far are not available for all situations of interest in practice, their further development seems worthwhile. There is also an alternative test of equal forecast accuracy proposed by Giacomini and White (26), involving a somewhat different specification of the null hypothesis, which is designed specifically for pseudo out-of-sample forecasts based on rolling regressions. An interesting implication of closely related work in Clark and McCracken (22) is that even when one is testing the null hypothesis of equal MSPEs of forecasts from nested models (as 6

opposed to the traditional null hypothesis of no predictability), in-sample tests have higher power than the corresponding pseudo out-of-sample tests in Clark and McCracken (2). This finding closely mirrors the earlier results in Inoue and Kilian (24) for tests of the null of no predictability. It renews the question raised in Inoue and Kilian (24) of why anyone would care to conduct pseudo out-of-sample inference about forecasts as opposed to in-sample inference. Frank Diebold is keenly aware of this question and draws attention to the importance of assessing and understanding the historical evolution of the accuracy of forecasting models as opposed to their MSPE. A case in point is the analysis in Baumeister, Kilian and Lee (23) who show that the accuracy of forecasts with low MSPEs may not be stable over time. Such analysis does not necessarily require formal tests, however, and existing tests for assessing stability are not designed to handle the real-time data constraints in many economic time series. Nor do these tests allow for iterated as opposed to direct forecasts. THE PROBLEM OF SELECTING FORECASTING MODELS This leaves the related, but distinct question of forecasting model selection. It is important to keep in mind that tests for equal forecast accuracy are not designed to select among alternative parametric forecasting models. This distinction is sometimes blurred in discussions of tests of equal predictive ability. Forecasting model selection involves the ranking of candidate forecasting models based on their performance. There are well established methods for selecting the forecasting model with the lowest out-of-sample MSPE, provided that the number of candidate models is small relative to the sample size, that we restrict attention to direct forecasts, and that there is no structural change in the out-of-sample period. It is important to stress that consistent model selection does not require the true model to be included among the forecasting models in general. For example, Inoue and Kilian (26) 7

prove that suitably specified information criteria based on the full sample will select the forecasting model with the lowest out-of-sample MSPE even when all candidate models are misspecified. In contrast, forecasts obtained by ranking models by their rolling or recursive MSPE with positive probability will inflate the out-of-sample MSPE under conventional asymptotic approximations. Frank Diebold s discussion is correct that PLS effectively equals the SIC under the assumption that R does not depend on T, but the latter nonstandard asymptotic thought experiment does not provide good approximations in finite samples, as shown in Inoue and Kilian (26). Moreover, consistently selecting the forecasting model with the lowest MSPE may require larger penalty terms in the information criterion than embodied in conventional criteria such as the SIC and AIC. The availability of these results does not mean that the problem of selecting forecasting models is resolved. First, asymptotic results may be of limited use in small samples. Second, information criteria are not designed for selecting among a large-dimensional set of forecasting models. Their asymptotic validity breaks down, once we examine larger sets of candidate models. Third, standard information criteria are unable to handle iterated forecasts. Fourth, Inoue and Kilian (26) prove that no forecasting model selection method in general remains valid in the presence of unforeseen structural changes in the out-of-sample period. The problem has usually been dealt with in the existing literature by simply restricting attention to structural changes in the past, while abstracting from structural changes in the future. Finally, there are no theoretical results on how to select among forecasting methods that involve model selection at each stage of recursive or rolling regressions. The latter situation is quite common in applied work. CONCLUSION The continued interest in the question of how to evaluate the accuracy of forecasts shows that 8

Diebold and Mariano s (995) key insights still are very timely, even twenty years later. Although the subsequent literature in some cases followed directions not intended by the authors, there is a growing consensus on how to think about this problem and which pitfalls must be avoided by applied users. The fact that in many applications in-sample tests have proved superior to simulated out-of-sample tests does not mean that there are no situations in which we care about out-of-sample inference. Important examples include inference about forecasts from realtime data and inference about iterated forecasts. Unfortunately, however, existing tests of equal out-of-sample accuracy are not designed to handle these and other interesting extensions. This fact suggests that much work remains to be done for econometricians going forward. ADDITIONAL REFERENCES Baumeister, C., and L. Kilian (22), Real-Time Forecasts of the Real Price of Oil, Journal of Business and Economic Statistics, 3, 326-336. Baumeister, C., Kilian, L., and T.K. Lee (23), Are there Gains from Pooling Real- Time Oil Price Forecasts? mimeo, University of Michigan. Clark, T.E., and M.W. McCracken (22), In-Sample Tests of Predictive Ability: A New Approach, Journal of Econometrics, 7, -4. Clark, T.E., and K.W. West (27), Approximately Normal Tests for Equal Predictive Accuracy in Nested Models, Journal of Econometrics, 38, 29-3. Inoue, A., and L. Kilian (24), Bagging Time Series Models, CEPR Discussion Paper No. 4333. Mark, N.C. (995), Exchange Rates and Fundamentals: Evidence on Long-Horizon Predictability, American Economic Review, 85, 2-28. 9

Table : Size of Nominal 5% DM Test under Local-to-Zero Asymptotics T 6 2 24 48 R 5% % 5% % 5% % 5% % 5.8 3.9.8 2..3..2.7 3 3.7 7..5 3.5.6.9.4. 45 7..3 2.3 4.9. 2.5.5.4.25T.8 3.9.4 3.5.4 3.2. 3..5T 3.7 7. 3. 6.4 2.6 5.5 2.3 5.2.75T 7..3 5.3 9.4 4.2 8. 3.7 7.3 NOTES: All results based on 5, trials under the null hypothesis of equal MSPEs in population. The DM test statistic is based on recursive regressions. R denotes the length of the initial recursive sample and T the sample size. Figure : Asymptotic MSPEs under Local-to-Zero Asymptotics 4 3.5 Restricted forecast Unrestricted forecast 3 Asymptotic MSPE 2.5 2.5.5.2.4.6.8.2.4.6.8 2 NOTES: Adapted from Inoue and Kilian (24). denotes the Pitman drift term. The variance has been normalized to without loss of generality. The asymptotic MSPEs are equal at, not at, so a test of H : is not a test of the null hypothesis of equal MSPEs.

Figure 2: Gaussian Kernel Density Estimates of the Null Distribution of the DM Test Statistics in the Local-to-Zero Model T=48.8.6 DM, R=.25T N(,) DM, R=.75T.4.2.8.6.4.2-3 -2-2 3 DM Statistic Notes: Estimates based on recursive forecast error sequences based on an initial window of R observations. All results are based on 5, trials. Figure 3: Assessing Assumption DM Recursive Loss differential.5 -.5-5 5 2 25 3 35 Squared demeaned recursive loss differential.8.6.4.2 5 5 2 25 3 35 Evaluation Period Notes: Loss differential obtained from a random sample of length T R with T 48 and R.25 T.