Targeted Growth Rates for Long-Horizon Crude Oil Price Forecasts

Targeted Growth Rates for Long-Horizon Crude Oil Price Forecasts Stephen Snudden Queen s University Department of Economics snudden@econ.queensu.ca July 2017 This paper proposes growth rate transformations with targeted lag selection to improve long-horizon forecast accuracy. The method targets lower-frequencies of the data which correspond to respective forecast horizons. The method is applied to models of the real price of crude oil. Targeted growth rates can significantly improve forecast precision at horizons up to five years. For the real price of crude oil, the method can achieve the same degree of accuracy up to five years that has previously been achieved only at shorter horizons. JEL classification: C1, C53, Q43 Keywords: Forecasting and Prediction Methods, Oil Prices, Filters, Spectral Analysis

1 The existing oil price forecasting literature has provided evidence that models of the real price of oil outperform the no-change forecast at short horizons (Alquist et al., 2013; Baumeister and Kilian, 2012; Baumeister and Kilian, 2014; Baumeister and Kilian, 2015, among others). These studies focus on horizons less than two years and were not intended to forecast at longer horizons. Much less is known about how to forecast the real price of crude oil at longer horizons. This paper focuses on extending the forecast success of individual models of the real price of oil at horizons up to five years by proposing the method of targeted growth rate transformations. Longer-term forecasts are of central interest for investment decisions and public policy institutions. For example, an oil producer deciding to invest in drilling or an airline company deciding to purchase a new fleet of aircraft will care about the payoff over the lifetime of the investment. Many policy institutions produce forecasts for longer horizons to inform policy decisions. While there have been some studies focused on horizons beyond five years (Bernard, et al., 2017), less is known of model forecast performance of the real oil price at horizons between two and five years. This paper fills this gap. This paper proposes the method of targeted growth rate filtering, a modification to the standard forecasting method. Lags in growth rate transformations are chosen to target lower frequencies. The method removes high frequencies and emphasizes select low frequencies which correspond to respective forecast horizons. When applied to forecasting models of the real price of crude oil, the method significantly improves recursive forecast mean squared prediction error (MSPE) ratios and directional accuracy at horizons of up to five years. The method of targeted growth rate transformations exhibits robust improvements in forecast performance whether the real price of oil is in log levels or differences, across sub-samples, and for alternative oil price series. Employing this method can achieve the same degree of accuracy at longer horizons that has only previously been achieved at shorter horizons. This analysis begins by considering simple univariate benchmark models for forecasting the real price of crude oil for horizons of up to five years. An attempt is made to answer the open question of benchmarks at longer horizons and to discover if simple models can provide better benchmarks than no-change forecasts. The evidence suggests that exponential smoothing and backward averages of the real price of oil can robustly outperform no-change forecasts for horizons beyond one year. The no-change forecast using the average real oil price over the last year provides a simple alternative rule to forecast the real price of oil at horizons beyond one year, and works particularly well at the two and three year horizons.

2 Targeted growth rate transformations are then applied to univariate models for forecasting the real price of crude oil. This includes univariate auto-regressive and fractional auto-regressive models, as well as the low-frequency forecasting technique proposal by Müller and Watson (2016). Univariate auto-regressive models have been well documented to outperform no-change forecasts up to six months (Alquist et al., 2013; Baumeister and Kilian, 2012) and serve as an intuitive way to illustrate the modification of targeted growth rates to the standard univariate Box-Jenkins method (Box et al. 2008). Applying targeted growth rate transformations to these methods can extend the out-of-sample forecast preference at longer horizons compared to models that rely on period-over-period growth rates. Univariate auto-regressive models with targeted growth rate transformations can consistently out-perform no-change forecasts for horizons of up to three years. The method of targeted growth rates is also applied to multivariate forecast methods using vector auto-regressive (VAR) models. Monthly VAR models of the real price of crude oil have been shown by Alquist et al. (2013), Baumeister and Kilian (2014) and Baumeister and Kilian (2015), among others to produce forecasts that can beat the no-change forecast of the real price of crude oil for up to one year. Applying targeted growth rate transformations to VARs can consistently outperform VAR models that rely on period-over period growth rates in out-of-sample forecast performance at longer horizons. Global real activity and global crude oil inventories are not directly observed nor well measured. Alternative series have been examined for forecasts at short horizons, Baumeister and Kilian (2012, 2014) among others. This is the first paper to extend this analysis to longer horizons by conducting a systematic investigation of alternative global real activity and crude oil inventory variables in VAR model forecasts for horizons up to five years. Targeted growth rates applied to world industrial production and Kilian s global real activity index (Kilian, 2009) can produce comparable forecasts at longer horizons. Moreover, U.S. crude oil and petroleum inventories are found to produce superior forecasts at longer horizons. This extra predictive power of U.S. crude oil inventory series is consistent with the improved forecast performance at short-run forecasts first established by Baumeister et al. (2015), and used at shorter horizons by Baumeister et al. (2014) and Baumeister and Kilian (2015). This paper introduces targeted growth rate transformations and is intentionally focused. For example, it ignores real-time data constraints that have been shown to be crucial in forecasting the real price of oil (see, e.g., Alquist et al. 2013, Baumeister, et al. 2016). Moreover, none of the models allow for stochastic variances (see, e.g., Baumeister et al. 2018). Similarly, no attempt is made to study forecast combinations (see, e.g.,

3 Baumeister et al. 2014; Baumeister and Kilian 2015). Finally, the analysis focuses on monthly data and forecast horizons. No quarterly models or forecasts are discussed (see, e.g., Baumeister and Kilian 2014, 2015). These extensions are exciting avenues of further research for targeted growth rate transformations. The paper is structured as follows. Section 2 introduces the method of targeted growth rate transformations for forecasting using spectral analysis. Section 3 analyzes these methods when applied to univariate real oil price forecasts. Section 4 extends the analysis to vector auto-regression forecasts of the market for crude oil and examines the robustness for alternative oil price series. Section 5 concludes. 2. Growth Rate Filter Growth rates data transformations are commonly applied in time series econometrics. For example, simple period-over-period growth rates are applied to achieve stationarity. Higher lags in growth rates are often applied to produce more intuitive scales. example, macroeconomic data is often announced as year-over-year or quarter-on-quarter growth rates due to its intuitive format. For This section provides the intuition behind targeted growth rate transformations and why departing from the standard first lag may be desirable for forecasting. Let Y be a covariance-stationary series with absolutely summable autocovariances. Let s Y (ω) be the population spectrum and g Y (κ) be the autocovariance generating function of Y, where s Y (ω) = (2π) 1 g Y (e iω ), (1) and ω is the frequency with ω (0, π), and κ is a complex scalar. Let X be a transformation of Y given by X = h(l)y, with h j <. The autocovariance generating function of X is known to be calculated from Y by: g X (κ) = h(κ)h(κ 1 )g Y (κ) (2) The population spectrum of X is: s X (ω) = (2π) 1 h(e iω )h(e iω )g Y (ω) (3) Hence the population spectrum of X is related to that of Y by: s X (ω) = h(e iω )h(e iω )s Y (ω) (4)

4 Thus, applying the h(l) filter to Y is the same as multiplying the spectrum of Y by h(e iω )h(e iω ). The original series must be co-variance stationary so that s Y (ω) exists. Otherwise, the population spectrum is not zero at frequency zero. The growth rate of Y can be approximated by applying the first difference filter to the log of Y. Hence, h(e ω ) = 1 L, and h(e iω )h(e iω ) is given by (1 e iω )(1 e iω ) = 2 2 cos(ω). Define the operator, L Z, such that L Z x t = x t Z for Z R. More generally, transformation of the logged variable Y using the difference on the Z th lag, (1 L Z ), to produce the spectrum of X, is equivalent to filtering the spectrum of Y by h(e izω )h(e izω ) = 2 2 cos(zω). (5) Figure 1 graphs the growth rate filter applied to the spectral density of Y using Z = 1, 2, 3 and 6, respectively. The frequency, ω, is converted into the period of a cyclical function, T, using T = 2π/ω. The growth rate filter using the first lag, Z = 1, preserves high frequencies, maximized at ω = π, but removes the lowest frequencies, minimized at ω = 0. Hence, the first difference filter is a form of a high-pass filter as it retains higher frequencies. Applying growth rates on the third lag, (q-o-q for monthly data) places zero weights at frequencies ω {0, 2/3π}, and maximum weights at frequencies ω {1/3π, π}. Thus, for a monthly time series, q-o-q growth rates preserve cycles two and six periods in length. The derivation of the growth rate filter shows how with Z = 1, lower frequencies beyond six months are filtered away. In addition, the filter not only preserves the highest frequencies but exacerbates these frequencies with the maximum weight of the filter equal to four. Since the high frequencies are emphasized and preserved, they are a higher order of magnitude. Hence, it should not be expected that a model built using data transformed with Z = 1 will be able to capture frequencies beyond six months in length or forecast well at longer horizons in small sample. Conversely, growth rates using Z > 1 emphasizes lower frequencies and may be able to generate better forecasts at their corresponding horizons. For example, year-on-year and quarter-on-quarter growth rate transformations on monthly data preserve select lower frequencies. While such growth rates are special examples, this paper proposes a spectral representation that is maximized. The lag in the growth rate filter is selected to maximize information at frequencies that correspond to the respective forecast horizon. The method of targeted growth rate transformations can be defined as follows. Target the lag in the growth rate, Z, to maximize the variance at frequencies related to the targeted horizon, H.

5 In practice, this implies that a forecaster estimates a model for each forecast horizon H, using data transformed with Z to maximize the cyclical information available for that horizon. Despite the non-linearity of the spectral filter, the method can be summarized by a simple rule. Let H, C N + and Z R + so that the frequency corresponding to horizon H/C is targeted. Then, the optimal lag in the growth rate filtration, Z, is given by Z = H/(2C). For example, to maximize the cyclical information available at the forecast horizon H, with C = 1, the rule is to select ZC=1 = H/2. Targeted growth rate transformations have desirable properties for forecasting. Growth rates are backward-looking filters with no end-point bias. The method is easily implemented, and the level of a series can always be recovered. Moreover, the filter not only maximizes select lower frequencies but also excludes select higher frequencies. For integer values of Z > 1, there are Z + 1 extrema in the filter. Hence, longer lags imply that zero weights are applied more often. The inclusion of some higher frequencies is advantageous as model estimates are more stable in small samples. In contrast, applying a low-pass or band-pass filter to a series prior to estimation can also target low frequencies, but it is known to result in over-fitting and introduces end point bias (see Azevedo and Pereira, 2013). The filtered series depends on the filter as well as the spectral density of the original series. The choice of the cycle to target with Z depends on the available sample size and the forecast horizon. Specifically, modeling lower frequencies is a problem in small sample and the elimination of some higher frequencies allows the model to isolate the lower frequencies in estimation. Let us now explore the use of C. One issue with low-frequency econometrics in small samples is that filtering the data to preserve only lower frequencies may exacerbate the small sample problem since there are fewer lower frequencies given a fixed sample size. Hence, in small sample, if the forecast horizon is too long or the sample to short, the forecaster may choose to target a frequency that is a fraction of the full cycle. In this case, using C > 1 may be more appropriate as it allows for more preservation of lower frequencies, but attempts to forecast the horizon using the C th cycle of the forecast. For example, the rule suggests transforming the data using ZC=3 = H/4 when targeting cycles half of the length of the forecast horizon (C = 2). In this case, the second cycle of the forecast will correspond with the desired forecast horizon H. We explore the use of C when applied to the real price of crude oil, but generally C will depend on sample size, and the information available at frequencies that correspond to respective forecast horizons. While there no formal tests for choosing C, the ultimate criteria is the forecast performance.

6 Up to this point, the derived filter was that for the log-difference growth rate transformation. However, growth rates, g t, can be calculated using per cent change, g t = (y t y t 1 )/y t 1 or approximated using log differences, g t = (1 L)ln(y t ). Since, (1 L)lny t = g t 1 2 g2 t + 1 3 g3 t..., it follows that g t = g t e t, where e t = n=2 ( 1)n+1 g n t /n. As e t is a lower order of magnitude than g t, it is a close approximation for low values of g t. Percent change is a non-linear filter, so it is not possible to derive exact simple rules for targeted transformations as for the log-difference filter. However, the rules for targeted growth rate transformations derived above for the log-difference growth rate is a first-order approximation of the growth rate filter. This point is important, since there are two factors that will make the approximation less precise. Consider when the two growth rate filters differ. First, if g e Z > 0 then Z > 0. This implies that taking larger lags when log-differencing will create a larger discrepancy between these methods. Therefore, the rules for targeted filtering may only be a good approximation at low values of Z when applied to exact percent change. Second, suppose we want to calculate dynamic forecasts for lnŷ T +1 T with an estimated value of ˆ g T +1 T. Then, generally, lnŷ T +h T = lny T + h i=1 (ĝ t+i T + ê t+i T ). Thus, the approximation error between the two methods can compound in forecasting at long horizons. For these reasons, this paper focuses on forecast horizons less than five years using monthly data, as forecast performance is less consistent beyond five years. A downside of using log-differences instead of growth rates for forecasts at longer horizons is that if a constant is estimated in sample, where one does not exist in population, it is applied linearly. Thus, at long horizons the forecast can exhibit large trends and the level of the forecast may become less than zero. This suggests some advantage to calculating growth rates using percent change rather than log differences at long-forecast horizons for small samples. This paper thus proceeds using exact growth rate filtering. 3. Univariate Forecasts This section applies targeted growth rate transformations to univariate forecast models of the real price of crude oil. The real U.S. dollar price of oil is measured as the monthly nominal U.S refiners acquisition cost of crude oil imports from the U.S. Energy Information Administration (EIA) deflated by the U.S. consumer price level from the Federal Reserve (henceforth real oil price). Alternative oil price series are examined in the next section. Figure 2 graphs the log-level real price of crude as well as the filtered real oil price series with Z = 1 and Z = 6.

7 Dynamic, recursive, out-of-sample forecasts are evaluated from 1992.1 to 2015.10. All models are estimated beginning in 1974.1. Forecast criteria reported include the recursive MSPE expressed as a ratio relative to the no-change forecast. Success ratios are calculated to quantify the accuracy of the forecast direction and represent the fraction of times the forecast correctly predicts the direction of change in the real price of oil. All forecast criteria are evaluated in the real levels of the price of oil. The p-values of the success ratios are calculated following Pesaran and Timmermann (2009). The p-values of the recursive MSPE ratios are calculated following Diebold and Mariano (1995). A divisive caveat is that Diebold and Mariano (1995) test is not valid with parameter uncertainty and there is no alternative test either that could be used. That said, it is still reported with this caveat in mind. 3.1 Benchmarks The analysis begins by considering a number of simple benchmark models for forecasting the real price of crude oil at longer horizons. An attempt is made to answer the open question if other types of simple models can provide better benchmarks compared to the standard no-change benchmark used at short horizons. At longer forecast horizons, mean reversion may be a more plausible assumption than a no-change forecast. Due to this, two simple mean reverting models are examined and compared to the no-change forecast: exponential smoothing and backward-moving means. The no-change forecast is motivated by the random-walk model. Forecasts are generated ˆR oil T +H T = Roil T using, where ˆR oil T +H T oil at the H step ahead forecast horizon. is the forecast of the level of the real price of crude Recursive exponential smoothing has been employed with relative success for the real price of gasoline by Baumeister, et al. (2017) and is suitable for series without a trend. This model converts the observed log-real level of the series, rt oil into a smoothed series r oil and uses the smoothed series as the forecast for all future horizons. That is, ˆr oil T +H T = r T oil, H. As for all forecasts ˆroil T +H T is converted to levels, Roil T +H T, by exponentiating when calculating forecast criteria. The smoothed series is constructed recursively from: r oil T = αr oil t + (1 α) r oil t 1, t = 2,..., T, (6) where α [0, 1] is the smoothing parameter. The smaller α, the smoother the oil price series. The degree of backward looking smoothing, α is searched from 0.05 to 0.95 in increments of 0.05. An α of 0.3 is found to be produce the lowest recursive MSPE ratios and the highest success ratios at horizons between one and five years. This is similar to the values commonly used in macro-time series and exactly the value employed in

8 Baumeister, et al. (2017). The relative performance of this model is reasonably robust to changes in this parameter. This paper introduces the model of backward-moving means to forecast the real price of crude oil. The method is similar to the recursive exponential smoothing method, but places equal weights on past observations. In particular, the forecast is generated using ˆr T oil +H T = roil T = (P ) 1 P p=0 roil T p. For example, if P is equal to 24, the average of the real oil price over the last two years is used as the forecast all future horizons. The value of P is searched from 3 months up to the end of the available data in three month increments. The results suggest that P = 12 produces the lowest recursive MSPE ratios and the highest success ratios for forecasts between one and five years. The forecast performance of the simple benchmark models are presented in Table 1. The real price of oil forecasts from the exponential smoothing and backward mean model can outperform no-change forecasts for both the recursive MSPE ratios and directional accuracy for horizons between one year and three years. For the recursive MSPE ratios, the models outperform the no-change forecast for up to five years. Interestingly, the model of backward looking means with P = 12 outperforms the exponential smoothing model with α = 0.3 at all horizons. This suggests that the no-change forecast of the mean real oil price over the last year outperforms other simple methods, such as the no-change forecast at forecast horizons between one and three years. In addition to evaluating the forecast performance to date, the evolution of the forecast performance is presented in Figure 4. Ideally, the models robustly outperform the nochange forecast not just by the end of the sample but consistently over the full sample. The first 30 periods are dropped to allow for the law of large numbers to begin working. Both the exponential smoothing and backward-moving mean model are unable to consistently out-perform the no-change forecast, using both criteria, over the full sample at any forecast horizon. In particular, the run up in the real price of oil in the middle of the 2000s causes the no-change forecast to outperform both models at horizons up to three years. Moreover, despite being lower by the end of the sample, the forecast from the model of backward-looking mean with P = 12 is more volatile than that from the exponential smoothing model with α = 0.3 for the time horizon considered. Despite the flaws of these simple models, they do outperform the no-change forecast most of time for the two and three year horizons, especially in directional accuracy. This suggests that a simple rule would be to use the mean of the real price over the last year, rather than this month s price, to forecast the real price of crude oil at the two- and three-year horizon.

9 3.2 ARI and ARFI Univariate auto-regressive (AR) models have been well documented to outperform nochange forecasts up to six months (Alquist et al., 2013; Baumeister and Kilian, 2012) and serve as an intuitive way to illustrate the modification of targeted growth rates to the standard univariate Box-Jenkins method (Box et al. 2008). Let y be a stationary process, φ p denote the p th autoregressive parameter, c a constant, and ε t the error. The AR representation takes the following form: P y t = c + φ p y t p + ε t, (7) p=1 At shorter horizons, AR models estimated in log-real levels are preferred (Alquist et al., 2013)). At longer horizons, it is an open question whether the AR model estimated in log-real level or in growth rates generates better forecasts. For comparison, the analysis includes AR models estimated in log-real levels and period-over-period growth rates. Figure 3 illustrates the sample parametric spectral density for the AR model estimated in growth rates with Z = 1 and to target frequencies corresponding to the two year horizon with C {1, 2, 3}. The model is estimated using 24 lags. The parametric sample density is similar for models estimated with 12 or 18 lags, or when selected using the AIC criterion (Akaike, 1974). The model is estimated using maximum likelihood, although the model estimated with using least squares is also similar. The parametric spectral density can be calculated as shown as: f(ω; φ, σ 2 ε, γ 0 ) = σ2 ε 1 2πγ 0 1 φ 1 e iω φ 2 e i2ω φ P e ip ω (8) where ω [0, π], and γ 0 is the variance of the variable, and σ 2 ε is the variance of the error, see Box et al (2008). The area under the sample periodogram represents the portion of the variance of the series attributable to cycles of different frequencies, ω. The sample periodograms show that targeted growth rates increase the portion of the variance attributable to the frequencies that are 24 periods in length. This suggests that the model estimated using data transformed by targeted growth rates is able to better isolate frequencies corresponding to two years in length. The forecast performance may be superior at horizons that correspond to frequencies where the maximum portion of the variance is explained. Simple autoregressive fractionally integrated models (ARFI) are included as a comparison to the model estimated in log-real levels since they may increase efficiency in

10 estimation for series with long memory. The ARFI model takes the following form: ˆρ(L)(1 L) d y t = ɛ t (9) where d is the order of integration estimated using maximum likelihood. The model is estimated in log-real levels, and one autoregressive lag is found to suffice. AR models estimated with 12 lags are generally found to produce superior forecasts at shorter horizons compared to other fixed lag-lengths or by lag selection using information criteria (Alquist et al., 2013; Baumeister and Kilian, 2012). However, which lag order is suitable for longer horizon forecasting is an open question for two reasons. First, previous papers focused on short horizons, and the choice of lag may affect the forecast performance at longer horizons. The second novelty is using Z > 1. For example, using period-over-period growth rates implies that one lag is lost compared to a model estimated in log-real levels. In contrast, when using targeted growth rates with Z > 1, the lower frequencies are emphasized which results in more partial auto correlation compared to Z = 1. Given the presence of model uncertainly and the often-employed benchmark of 12 lags, models estimated using Z > 1 are also estimated with 12 lags in the benchmark results. The models estimated with 18 or 24 lags or when selected using the AIC criterion, (Akaike, 1974) is qualitatively similar. The method of targeted growth rate filters is applied to AR models. The method is directly comparable to Z = 1 using targeted filtration with C {1, 2, 3} and Z given by Z = H/2C. The parameters of all models considered are estimated recursively in each time period, and the out-of-sample dynamic forecasts are evaluated against observed data for 1992.1 2015.10. The model is estimated using data from 1974.1. Forecasts in log-real levels are converted back to real levels by exponentiating. Forecasts using growth rates, Xt,z oil = 100(Rt oil /Rt z oil 1), are mapped into real levels using the available data at the point of the forecast. That is, ˆR oil t = (1 + oil oil ˆX t,z /100) ˆR t z, (10) where ˆR oil t z uses historical data up to T and recursively estimated forecasts thereafter. Table 1 reports the forecast from these models. Consistent with previous findings, the model estimated in log-real levels has lower MSPE ratios than the model in growth rates with Z = 1 at horizons less than a year. The ARFI model is able to outperform both the AR model in log-real levels and with Z = 1, as well as the no-change forecast for up to one year. However, as shown in figure 4, the end of the sample improvement in the MSPE ratios from the ARFI at the one year horizon comes at the cost of poor performance between 2004 to 2008. The ARFI forecasts also provide comparable directional accuracy

11 to the AR models, although slightly less than the model estimated with Z = 1. None of these methods is able to outperform the no-change forecast in both directional accuracy and MSPE ratios for horizons between two and five years. In contrast, the models whose data was transformed using targeted growth rates produce improvements in success and MSPE ratios for horizons up to four years. Targeted filtration with C = 1 produces superior forecasts compared to the AR and ARFI models at horizons between one and three years. At horizons of two years and beyond, values of C greater than 1 are desirable as the small sample makes isolating the lower-frequencies more difficult. In particular, C = 1 has lower MSPE ratios in the first two years, while C = 3 outperforms beyond two years. Directional accuracy of the forecasts using targeted filtration is quite high for all horizons. Success ratios greater than 0.6 are found for horizons of one year and beyond for the model estimated with C = 3. These are large values when compared to the empirical finance literature (Pesaran and Timmermann 1995) and even slightly larger than the short-horizon success ratios of the multivariate modeling method of Baumeister and Kilian (2013). Univariate auto-regressive models with targeted growth rate transformations out-perform no-change forecasts in both MSPE and success ratios for most of the sample up to the three year-ahead horizon. In Figure 5, the model with C = 1 is shown for the first two years, and the model with C = 3 is reported for the three year and beyond horizons. Targeted transformations can consistently improve the out-of-sample forecast performance at longer horizons compared to models that rely on period-over-period growth rates. Success ratios are consistently higher than 0.5 and the recursive MSPE ratios are more often below 1 up to the three year-ahead horizon. 3.3 Müller-Watson The application of targeted growth rate transformations can be useful for any method where growth rate transformations are present. This section examines the performance of the method when applied to a variant of the modeling techniques of Müller and Watson (2016). In particular, the real price of crude oil is forecasted using a weighted average of trigonometric functions. Note that the model used to forecast the real price level is based on the specification in Müller and Watson (2015) and not an average growth rate as in Müller and Watson (2016). In particular, the ˆr oil T +h T forecast is constructed using: r oil t = r + Ψ((t 1/2)T ) ˆβ t + ε t (11) where Ψ j (s) = 2cos(jsπ), and Ψ j (s) has period 2T/j. Ψ j (s) is a R + valued function and Ψ T is a T q matrix by evaluating Ψ at s = (t 1 2T ), t = 1,..., T. As in Müller

12 and Watson (2016) q, j = 12, although variations are not found to affect the qualitative results. Table 2 reports the forecasts from this model when prices are estimated in log-real level, Z = 1, and C {1, 3}. The model is able to achieve directional accuracy and MSPE ratios less than one at the one- and two-year horizon. At horizons greater than two years, the method does not fair well. Moreover, when the model is estimated with Z = 1, the MSPE ratios does not outperform the no-change forecast at any horizon but still produces success ratios greater than 0.5. With targeted growth rate transformation, the MSPE ratios improve relative to the model with Z = 1, although it still does not outperform the no-change forecast. Interestingly, the success ratios are quite large for the model with C = 1, approaching 0.72 at the three-year-ahead horizon. The results suggest that the method of targeted growth rate transformation is able to improve the accuracy compared to the model with period-over-period growth rates. The failure of the Müller-Watson framework to produce low MSPE ratios by the end of the sample is mainly due inconsistent forecast performance over time. At selected historical episodes the forecast accuracy does very poorly, even though the mean-squared errors can be very low at other times over the sample. The performance of the forecasts is quite variable, albeit often with the correct sign for directional accuracy. As parameters are estimated recursively, the results may suggest over fitting in small sample. An exploration of alternative forms of information updating may be worth exploring, but is left to future research. 4. VAR Forecasts This section evaluates the forecast performance of targeted filtration when applied to vector auto-regression (VAR) models. The VAR takes the following form: P y t = c + B l y t p + αd t + ɛ t (12) p=1 where ɛ t is a vector of innovations, c is the vector of constants, D t is a matrix of seasonal dummy variables and α is the corresponding coefficient matrix, B p, p = 1,..., P is the matrix of auto-regressive coefficients, and y t is a vector of endogenous variables. The multivariate analysis favors using variables that are related to economic fundamentals for the global crude oil market. In particular, the VAR models uses series related to latent global crude oil supply, demand, and inventories. This would allow for structural analysis if additional restrictions were imposed following, for example, Kilian and Lee

13 (2014) and Kilian and Murphy (2014). That said, the models used in this paper for forecasting are unrestricted as the focus of this paper is on forecast performance. Some liberty is taken by allowing a U.S. petroleum inventory series that may be inappropriate for structural analysis but improves long-run forecast performance. The author makes notes when it is used, and presents results for the next best inventory measure. Such series are included in the forecast exercise so that researchers may evaluate for themselves the potential trade-off from deviating from a potential structural interpretation to gain forecast precision. Observable measures of global crude oil production and the real price of crude oil are measured reliably at the monthly frequency. In contrast, there exists alternative measures of global real activity and inventory variables. The multivariate analysis begins by conducting a systematic investigation of the forecast performance of alternative measures of global real activity and oil inventories. The purpose is to examine if the measures currently used for forecasting up to the two year horizon continue to be the preferred variables when forecasting at longer horizons. 4.1 Alternative Series As with the univariate forecasts, the real price of oil is measured using the monthly nominal U.S refiners acquisition cost of crude oil imports deflated by the U.S. consumer price level. Forecasts of the Brent real price of crude oil are also considered in section 4.3. Global crude oil production is measured as international production of crude oil available from the EIA, which includes lease condensate but excludes natural gas plant liquids. Five series of crude oil related inventories are systematically evaluated for their forecasting performance: crude oil inventories by Kilian and Murphy (2014), U.S. crude oil inventories with and without the strategic petroleum reserve (SPR), U.S. petroleum inventories excluding SPR, and an OECD crude oil series. The OECD crude inventory series uses the level of OECD crude inventories when available and extrapolates the series prior to 1987.11 using the period-over-period growth rate of US crude oil inventories excluding SPR. The use of petroleum series is inconsistent with a structural interpretation of the global market for crude oil but is nonetheless included to evaluate forecast accuracy. Moreover, two measures of global real economic activity are evaluated: world industrial production (WIP) from Global Data Services and Kilian s global real economic activity index (REA) (Kilian, 2009). A complete list of variables with their sources, date range, and summary statistics is given in Table 6 in the appendix.

14 The WIP series includes both advanced economies and emerging market economies. The series incorporates measured industrial production (IP), making adjustments for their inclusion, once the data series for countries become available. For example, China s IP was included when it began to be measured starting in 1992.1, Brazil starting 1985.1, and India starting in 1971.1. While not reported here, the forecast performance of IP indexes for advanced economies, developing economies, and MER weighted were also compared, but the global series was consistently found to have better forecast performance. The real price of crude oil will be estimated in either log-real levels or in percent change. The REA will always be estimated in levels. Measures of oil production and inventories will always be estimated in growth rates. The use of growth rates instead of differences for inventories differs from Kilian (2009) but does not affect the relative forecast performance. Importantly, the transformation is consistent with the other variables, and does not change the results. However, it is not the approach recommended à la Kilian (2009). Forecasts are always converted into real levels before evaluating forecast performance. VAR models estimated with 12 lags are generally found to produce superior forecasts at shorter horizons compared to other fixed lag-lengths or when using information criteria (Alquist et al., 2013; Baumeister and Kilian, 2012). Again, at longer horizons and with Z > 1, the appropriate lag selection is an open question, so models estimated using 24 lags are explored. Both these lag orders ensure residuals pass the Portmanteau (Q) test for white noise (Box and Pierce, 1970; Ljung and Box, 1978) both when the models are estimated in log-real levels or in percent change. To evaluate the forecast performance of the alternative series, ten models are compared; the combinations of the five inventory series and the two measures of global real activity. Recursive, dynamic, out-of-sample forecasts are conducted over the 1995.1 2015.10 evaluation period. 1 All models are estimated beginning in 1974.1. A complete summary of forecast performance across various lags, series, and oil price transformations is available in the appendix. Let us begin by considering the case when the real price of oil is estimated in log-real levels and all other variables, excluding the REA, are estimated in percent change with Z = 1. Of the inventory series, using either WIP or REA, U.S. petroleum inventories followed by U.S. crude oil inventories, are found to produce superior forecasts between one and two years for the MSPE ratios. The success ratios of the forecasts slightly favor the use of U.S. crude oil inventories including SPR, followed closely by U.S. petroleum oil inventories, at the one- and twoyear horizon. At the three- to five-year horizon, the U.S. petroleum inventories have 1 The later start of the sample evaluation period is due to a few cases where Z is very large. In this case, too much of the sample is lost to the growth rate transformation and forecasts are clearly non-sensible in the early part of the sample.

15 similar success to the OECD crude oil inventories. The OECD crude oil inventory series outperforms the Kilian and Murphy (2014) crude oil inventory series in both the MSPE ratios and directional accuracy, especially at the four- to five- year horizon. The extra predictive power of U.S. crude oil inventories at longer horizons, compared to the Kilian and Murphy (2014) crude oil inventory series, is consistent with the improvement in predictive power for short-run forecasts first established by Baumeister et al. (2015) and used in Baumeister et al. (2014) and Baumeister and Kilian (2015). The largest MSPE improvements at horizons between one and five years can be made by using WIP, rather than the level of the REA. This holds for all oil inventory series. The improvement in the MSPE using the WIP sometimes come at the cost of smaller success ratios when compared to REA, especially at the four- and five-year horizon. The forecast performance when all variables, excluding the REA, are estimated in growth rates with Z = 1 differ from estimates when the real price of crude oil in log-real levels. The MSPE ratios are substantially lower for the models estimated with WIP compared to the REA, especially beyond the one-year-ahead horizon. The success ratios are slightly higher in some cases at the four- and five-year horizons using the REA. The performance of the OECD oil inventories is generally consistent when the real price of oil is estimated in log-real levels. The U.S. crude oil and petroleum inventory series, with and without SPR, are generally the best predictors, followed by the OECD inventory series. When using WIP, the results are quite similar for VAR models where oil prices are estimated in log-real levels or period-over-period growth rates. The model estimated with the oil price in log-real levels outperforms in MSPE ratios up to a year, and at four and five years, but underperforms the model with real prices transformed with Z = 1 at the two- to three-year horizon. In contrast, when the forecasts use the level of the REA, the log-real price is clearly preferred. The superior performance of WIP at longer horizons may be an artifact of not using real-time data. As in many cases, the forecast performance may differ in real time than when using historical data. Nevertheless, the evidence does suggest that alternative series provide unique information at select horizons and specifications. This is further explored when expressed with targeted growth rates. The relative performance of the series persists when the VAR model is estimated with 24 lags. Generally, the models estimated with 12 lags outperform models estimated with 24 lags at horizons up to one year. For the MSPE ratios, the models estimated with 24 lags generally outperform at all horizons between one and five years. This is complicated by a general loss of directional accuracy at three years and beyond for the models estimated with 24 lags. This presents a trade-off of MSPE ratios and success ratios at three years and beyond. Note that this trade-off has not been found in previous studies due to the

16 focus on shorter horizons. It is, however, confirmed by the univariate analysis of the previous section. Since, ideally, both the MSPE ratios and directional accuracy should confirm each other, it suggests that when forecasting at longer horizons, the model estimated with 24 lags may be preferable at the one- and two-year ahead horizons and the VAR model with 12 lags may be preferred at the three- to five-year horizons. 4.2 Targeted Growth Rates The method of targeted growth rates is applied to VAR models in order to improve forecast performance at horizons between one and five years. The forecasting exercise is identical to that described in the previous section, except that now, targeted growth rates will be employed on all transformed series. We begin by analyzing the effect on forecast performance when targeted growth rates are applied to the VAR model with WIP and U.S. crude oil inventories, using 12 lags. The filtered series depends on the filter as well as the spectral density of the original series. Hence, when targeted transformation is employed, the performance of the alternative series may not continue to hold. Table 3, presents the forecast performance for the VAR models estimated when the real price of crude oil is in log-real levels and in growth rates. All other variables are transformed into growth rates. The model uses U.S. crude oil inventories and WIP. When the model is estimated with real oil prices in log-real levels, the forecast performances of both the MSPE ratios and directional accuracy are improved using C = 1 at the one- to three-year-ahead forecasts. Using C = 3 improves MSPE ratios at almost all horizons relative to the model using growth rates with Z = 1. Using C = 2 results in MSPE ratio and sucess ratio performance in between C = 1 and C = 3 so are not reported in the tables for brevity. Targeted growth rates are unable to improve directional accuracy at the four- and five-year ahead horizons, despite the improvements in the MSPE ratios. Consistency of the MSPE ratio and success ratios is achieved when modeling the growth rate in percent change. In particular, the model with C = 3 is able to outperform the model with Z = 1 in almost all periods in both the MSPE ratios and directional accuracy at horizons of one year and beyond. The performance of the targeted transformations is further improved at the one- and two-year horizons when applied to VAR models estimated with 24 lags and oil prices in log-real levels (see Table 4). Although the MSPE ratios are generally better than the model estimated with 12 lags, it comes at the cost of slightly lower success ratios at horizons of three years and beyond. The model with variables transformed using C = 3 improves both directional accuracy and the MSPE ratios, relative to the VAR model with Z = 1. As noted previously, it may be preferable to use the VAR model with

17 24 lags at the one- and two-year horizon, but the model with 12 lags at the three-year horizon and beyond for consistency of the forecast criteria. For most specifications, the U.S. petroleum inventories series produces superior forecasts when targeted filtered, suggesting superior information at the selected lower frequencies. Again, while this may negate structural interpretation, the forecast performance of U.S. crude oil and petroleum inventories is presented in Table 4. The MSPE ratios are lower for the model using U.S. petroleum inventories in almost all cases. This comes with a trade-off of slightly lower success ratios, especially at longer horizons. This large improvement in MSPE ratios using U.S. petroleum inventories is a trade-off for researchers looking for structural interpretation. Ideally, the VAR forecast of the models using targeted growth rates outperform the traditional VAR models, not just at the end of the sample, but consistently over time. Moreover, it would be remarkable if an individual model could outperform the no-change forecast consistently over the entire sample. To this end, Figure 6 presents the evolution of the MSPE ratios and the success ratios of the baseline VAR for one- to five-year horizons. The model uses WIP and U.S. petroleum inventories. The results using U.S. crude oil inventories have the same quantitative ranking of models and maintain the qualitative insights. The VAR model at the one- and two-year horizon employs 24 lags and the models at the three- to five-year horizons employ 12 lags. The MSPE ratios and success ratios are reported by month, with the first 30 periods dropped to allow for the law of large numbers to begin. The VAR model with crude oil prices estimated in log-real levels or with Z = 1 is unable to consistently outperform the no-change forecast in MSPE precision. Moreover, the success ratios are consistently below 0.5 for most of the sample at the one-and two-year-ahead horizon. This evidence is consistent with VAR forecasts from individual monthly VAR models having difficulty forecasting consistently well over all sample periods (Baumeister et al., 2014; Baumeister and Kilian, 2015). Interestingly, and unlike the univariate case, the two models have similar performance at horizons of up to two years, with both models performance improving and deteriorating at similar times. When the real price of oil is estimated in log-real levels, transforming the other variables using C = 3 is able to consistently outperform using Z = 1 for both the MSPE ratios and directional accuracy. At the four- and five-year horizons, the MSPE ratio improvements from using targeted growth rates come at the cost of lower success ratios. Similar improvements hold when all variables are modeled in percent change. When growth rates of C = 3 are used for all variables, the model is able to outperform the model with Z = 1 consistently up to the three year horizon. At horizons of four years and beyond, employing C = 3 outperforms the model with Z = 1 for most of the sample.

18 The model estimated in percent change and targeted growth rates is consistently able to outperform the no-change forecast for horizons up to three years. The performance of the model exploiting targeted growth rates is particularly notable at the two-year ahead horizon with the MSPE ratio close to 0.8 and the success ratio close to 0.6 for most of the sample. This consistency of performance is better than that observed at shorter forecast horizons and previously ever observed by any monthly individual VAR forecast. The insight that data transformations are explicit filters is quite useful. Previous efforts to forecast the real price of oil have focused on log-real level or log-differenced data with Z = 1, so it should be expected that these models are designed to forecast well at shorter horizons. Moreover, other forms of filters, including backward-moving mean filters, could also be beneficial for targeting select lower frequencies. For example, using data at lower observation frequencies, such as annual or quarterly data, is a form of a moving average filter. This may explain why there is evidence that time series models that explicitly model trends and are estimated using annual data observations can outperform random walk forecasts in the long run (see, for example, Bernard, et al., 2017). It could also explain why VAR models estimated with quarterly data generally perform better at horizons up to two year (Baumeister and Kilian, 2014; 2015). 4.3 VAR Robustness As shown in the previous sections, the forecast performance up to five years was similar to the performance previously found at shorter horizon by Alquist et al. (2013) and Kilian and Baumeister (2012). In these previous studies, the VAR model was not intended for forecasting beyond short horizons, except for use in forecast combinations. The improved performance at longer horizons found in this paper can be attributed to targeted transformations and well as the choice of variables. We now employ the REA index of Kilian (2009) for VAR forecasts at longer horizons. The REA may be advantageous, since it is better able to be employed in real-time studies. Table 5 replicates the baseline VAR results using the level of the REA. When the price is modeled in log-real levels, at the end of the sample the level of the REA performs very similarly to the model using WIP. Applying targeted growth rates to the oil production and inventory variables is able to further improve the forecast performance, producing low MSPE ratios and high success ratios up to the three year horizon with C = 1. In contrast, when the real price of oil is modeled in percent change and uses the REA, the forecast performance is poor at longer horizons. These results suggest that the REA of Kilian (2009) closely matches the performance of the WIP.