Forecasting Short-Term Timber Prices with Univariate ARIMA Models

Size: px

Start display at page:

Download "Forecasting Short-Term Timber Prices with Univariate ARIMA Models"

Jonas Watkins
5 years ago
Views:

Forecasting Short-Term Timber Prices with Univariate ARIMA Models Runsheng Yin, Warnell School of Forest Resources, The University of Georgia, Athens, GA 30602.

1 Forecasting Short-Term Timber Prices with Univariate ARIMA Models Runsheng Yin, Warnell School of Forest Resources, The University of Georgia, Athens, GA ABSTRACT: In this paper, we conductimber price forecasts with univariate autoregressive-integratedmoving-averoage, or ARIMA, models employing the standard Box-Jenkins modeling strategy. Using quarterly price series from Timber Mart-South, we find that most of the selected pine pulpwood and sawtimber markets can be evaluated using ARIMA models, and that short-term forecasts, especially those of one-lead forecasts, are fairly accurate. We believe that forecastingfuture prices could aid timber producers and consumers alike in timing harvests, reducing uncertainty, and enhancing efficiency. South. J. Appl. For. 23(1): In testing the hypothesis thatimber markets are nformationally efficient, several recent attempts addressed the question of whether or not stumpage prices are predictable (Washburn and Binkley 1990 and 1993, Haight and Holmes 1991, Yin and Newman 1996). Although it remains unresolved as to whether the benefits derived from stumpage arbitrage will outweigh the corresponding transaction costs, these studieseem to agree that some future timber prices are predictable and benefits can be generated from taking advantage of predicted prices. Thus, efforts could be made to pursue timber price prediction. Unfortunately, little work has been done so far to tackle the question of how to predict t mber prices. This is the motivation for the current study. One approach to obtaining price forecasts is to set up a structural econometric model, estimate its parameters from the available data, and use this model to predict future prices. Indeed, this is the approach that has been widely employed in t mber market studies. The timber assessment market model (TAMM) used in the RPA assessment (Forest Service 1989) s a typical example in this regard. However, because the focus of this type of model is often on long-term changes in price and consumption patterns, it is not well suited to predict short-term cyclical and volatile shocks. Anotherelated problem is its use of aggregatedata, which makes it less useful n forecasting prices for specific areas. An alternative approach that has proved quite successful, especially for shortterm forecasting, is to use the past values of a particular price series to predict its future values. In this time-series analysis NOTE: Runsheng Yin can be reached at (706) ; Fax: (706) ; rsyin@arches.uga.edu. Funding for the research was recelved from the Economics of Forest Protection and Management Working Unit of the Southern Research Station of the USDA Forest Service. The author thanks the reviewers and the Associate Editors for their comments, which have improved the paper. Manuscript received July 3, 1997, accepted March 22, approach, it is assumed that the data are generated by a stochastic process that can be directly modeled (Box and Jenkins 1976). Intuitively, it may seem that by not using knowledge about the economic structure, we neglect information and thus make inefficient use of data. This would in fact be true if the observedata were generated precisely by the models economists have provided to explain economic phenomena. However, as Judge et al. (1988, p.675) put it, "our information about the underlying sampling mechanism is generally incomplete, and thus economic and econometric models are at best rough approximations to reality." Thus, it should not be surprising that time-series models that use only the information from a set of observations on a single variable have in some instances provided forecasts that are superior to predictions from large-scale econometric models. The price prediction in this article adopts univariate autoregressive-integrated-moving-average (ARIMA) models estimated with the standard Box-Jenkins methodology. We first introduce the concepts and procedures involved in the univariate ARIMA modeling. Then we discuss our data and present an illustrative example of price forecasting before we summarize our results. Some concluding remarks follow. Concepts and Method Notations Consider a set of n time-sequenced observations on a single variable (Yl, Y2... Yn), which represent the realization of a particular ARIMA process. Although we do not have exact knowledge of this stochastic process, we can characterize its generating mechanism. Thus, an ARIMA model is an algebraic statement chosen in light of the available realization. We hope that a model which fits the available realization SJAF23(1)

2 (the data) will also be a good representation of the unknown underlyin generating mechanism. An autoregressive moving-average process ARMA(p, q) can be, written as Yt -- 01Yt-I + 02Yt OpYt-p + a t + *lat_l + *2at_2 + *qat_ q where p is the order of autoregression (AR), q is the order of moving-average (MA), a's are the random shocks, t, t index time periods, and O's and,'s are the coefficients to be estimated. For notational convenience, the above model is expressed as (1-01L - 02 L OpZP)yt = (1 + *il+,2 L ,qlq)at where L is the lag operator LPy t = Yt-p' Or more compactly (1) (2) O(L)y t =,(L)a t (3) In reality, the means of many processes are not stationary, but the Box-Jenkins procedure only applies to stationary realizations, including those which can be made stationary by suitable transformation. To convert processes with changing mean values to stationary ones, differencing is often involved. If a stationary mean is obtained by differencing the data once, then we model the new series x t = Yt - Yt-I = (1- L)y t. Similarly, if the data are differenced twice, then we model zt = (Yt- Yt-1 )- (Yt-I- Yt- 2 ) = (1- L) 2 Yt. Let d stand for the number of times a realization must be differenced to achieve a stationary mean, then a differenced series is integratedtimes to return the data to the appropriate overallevel: The letter 'T' in the acronym ARIMA thus refers to this integration step, and it corresponds to the number of times the original series has been differenced. Therefore, a complete ARIMA(p, d, q) model takes the form of 0(L)(1 - L)a yt =,(L)a t (4) It should be noted that nonstationarity in a series may come from sources other than changing mean value, such as changing variance. In this situation, we may impose logarithm transformation on the data (Pankratz 1983). Ifa series is both mean and variance nonstationary, then we may do both log transformation and differencing. Properties It can be seen from above that after transformation (including differencing), we will be dealing with an ARMA(p, q) process. An ARMA(p, q) process must display two important properties--stationarity and invertability, as they guarantee that there are no fundamental changes in the structure of the process that would render prediction difficult or impossible (Judge et al. 1988). Stationarity applies to the AR part of ARMA(p, q), and it implies that the AR coefficients must satisfy certain conditions. Invertability applies to the MA part of ARMA(p, q), and it implies that MA coefficients must satisfy certain conditions. 54 SJAF23(I) 1999 Because the detailed definitions and derivations of stationarity and invertability can be found in almost every time-series textbook (see, for example, Hamilton 1994 and Judge et al. 1988), here we merely state those conditions indicated by stationarity and invertability. Further, because cases withp > 2 or q > 2 do not occur frequently in practice, we only summarize the conditions for p < 2, q < 2. Stationarity.--Forp = 0, we have either a pure MA model or a white-noise series. All pure MA models and white noise are stationary. For an AR(1) or ARMA(1, q) process, stationarity requires that [ 011 < 1. For an AR(2) or ARMA(2, q) process, stationarity requires that 1021 < 1, < 1, and < 1; all three conditions must be satisfied at the same time. Invertability.--Algebraically, the conditions for invertability are identical to those f r stationarity if 0i s replaced with *i. That is, for q = 0, we have either a pure AR model or a white-noise series. All pure AR models and whae noise are invertible. For a MA(1) or ARMA(q, 1) process, invertability requires that I '11 < 1. For a MA(2) or ARMA(p, 2) process, invertability requires that 1,21 < 1,,2 +,l < 1, and '2- '1 < 1; again, all three conditions must be satisfied at the same time. In practice, however, we do not know *i, Oi. Instead, we find their estimates, 03i and fii' and apply the above conditions to these estimates. The Procedure The univariate Box-Jenkins procedure for ARIMA modeling consists of three iterative stages--identification, estimation, and diagnosti checking. Stage 1: Identification.--Our goal is to find a good model--a' statistically adequate and parsimonious representation of the given realization. Since the statistical relationship between pairs of observations separated by various time spans (Yt, Yt+ :), k = 1, 2, 3... is reflected in its autocorrelation function (ACF) and partial autocorrelation function (PACF), we can use these devices as guides to choosing one or more ARIMA models that seem appropriate. So, the basic idea of model idennfication is to compare the ACF and PACF estimated from the available data with various theoretical ACFs and PACFs to find a match. We choose, as a tentative model, the ARIMA process whose theoretical ACF and PACF best match the estimated ACF and PACF. For stationary AR processes,,their theoretical ACFs taper off toward zero rather than cut off to zero, while their PACFs cut off to zero after lag p (the AR order). On the other hand, the theoretical ACFs of MA processes cut off to zero after lag q (the MA order), while their PACFs taper off toward zero. Stationary ARMA processes show a mixture of AR and MA features. Both the theoretical ACFs and PACFs of these mixed processes taper off (most time-series textbooks have a detailed illustration of theo- retical ACFs and PACFs for various processes). Stage 2: Estimation.-- After we have identified a model, we obtain precise estimates of its coefficients and examine these coefficients for stationarity and invertability as well as statistical significance. If the estimated coefficients not only

3 satisfy the properties of stationarity and invertability but also s gnificantly differ from zero, then we move onto the diagnostichecking stage. OtherWise, if the estimated coefficients do not satisfy the properties of stationarity and invertability or they are simply insignificant, then the model s rejected and we go back to the identification stage to find another plausible model for the data. Even if we have succeeded in model identification, we must do diagnosticheckmg before we proceed to forecasting. Stage 3: Diagnostic Checking.-- At the diagnostic-checklng stage, we mainly examine the residuals of the estimated model to see if they are white noise. We do so by either calculating autocorrelation coefficients or conducting Chisquare tests to various lags. If the residual auto-correlation coefficients are not significantly different from zero, then we can declare that we have successfully chosen an ARIMA model for the data series. If they are not, we have to return to the identification stage to tentatively select another model and repeat the whole process. It is quite possible that we have more than one model to representhe data generating mechanism. In this case, one way to discriminate the models is to examine their residual structures. We would choose the one whose residuals look more like white noise. To be more precise, we should make our choice on the basis of the Akaike information criterion (AIC). Since models are developed for the purpose of prediction, another way is to pick the one with the most accurate forecasts. Empirical Analysis Data Our data set is compiled from Timber Mart-South (TMS; Norris Foundation ). It consists of quarterly price series of pine sawtimber and pulpwood stumpage for region 2 (Coastal Plain) in six southern states-- South Carolina, Georgia, Florida, Alabama, Louisiana, and Mississippi. Pine sawtimber and pulpwood price series are used since these are the key products in the South; region 2 of various states is chosen because it has been largely geographically consistenthroughouthe time series. This will guarantee the integrity of our data. The reason for us to select the six southern states is that their region 2, stretching from the mid-atlantic down to the Gulf coast, has the most active timber transactions in the country. It should be noted that TMS initially reported monthly stumpage and delivered prices for various categories of Umber. In March 1988, it switched to reporting these timber prices on a quarterly basis. The change was made because of heavy costs of monthly data collection on the one hand, and the lack of new price quotes on the other. In order to provide consistent information, TMS then converted monthly prices of the early years to quarterly ones by simply averaging the three monthly observations in a quarter. This treatment seems to be questionable. But going through reports of the earlier years, we can easily find observations in neighboring months that remained unchanged. Moreover, a brief inspection of various series 35 3O ' 1986 t989:1 1992: t time (quarter) Figure 1. Pulpwood prices by quarter, region 2, Alabama. reveals that price variations were relatively small prior to the late 1980s. Therefore, we believe that those quarterly prices calculated for the early years can be equally used in our analysis. As such, each of the quarterly price series, covering 1976:4-1996:3, contain 80 observations. Also, it should be made clear that given that our primary interest is price forecasting, we use nominal rather than real prices in this work. An Example To illustrate the Box-Jenkins modeling procedure, we first present a concret example using the pulpwood price series from region 2, Alabama. The analytic routine used is the ARIMA procedure in the SAS software package. From inspection of the plotted data (Figure 1), we see that while the series trends upward over time, its variance seems to be stationary. So, differencing might be needed. However, we should examine the estimated ACF to determine the necessity of differencing. Figure 2 indicates that the auto-correlations of the undifferenced data decay very slowly; they do not cross the zero line even by the fifteenth lag. This supports our observation that the mean of the series is nonstationary and first differencing (d = 1) is clearly needed. Figure 3, a plot of the first differences, does not show the same upward trend emerged from the original realization, as registered in Figure 1. Although Figure 3 also '0 1' , 15 lag Figure 2. Autocorrelation function (ACF) for price levels. SJAF23(1)

4 1 o time (quarter) "-' :1 Figure 3. Pulpwood price differences, region 2, Alabama. indicates that prices in the 1990s became a bit more volatile, the ACF and PACF for the first differences in Figure 4 do not suggest a log transformation. The ACF moves toward zero quickly, only the autocorrelation at the first lag spikes across the range of two standard errors. Therefore, the first differences are stationary and they can be represented by a MA(1) model: the spike followed by a cutoff to zero in the estimated ACF suggests a MA model, and since the spike occurs at lag 1 we choose a MA model of order q = 1. The estimated PACF is consistent with a MA(1) model: pure MA models of order one are typically associated with PACFs that taper off toward zero starting at lag 1; the PACF in Figure 2[ roughly displays this behavior. But since sample ACFs, calculated from a limited number of observations, do not exactly match the theoretical ACF for any ARMA model, it is possible to have a sample ACF pattern similar to that of several different ARMA models. From this analysis, we tentatively select an ARIMA(O, 1, I) model as (1 - L)y t = (1 + {ll)at (5) Estimated results and the residual ACF are listed in Table 1. We used the conditional least squares estimation method. All indications are that the above model is satis- factory. The estimated MA coefficient is significant judg- Table 1. Estimation results for region 2, Alabama , '0 11 1' lag Figure 4. ACF and PACF for price differences. ing by i ts large t-value (4.39), and the invertability condition [ qb ] = < 1 is met. In this example, the constant term was insignificantly different from zero and thus excluded. In addition to the parameter estimation, Table 1 hsts (from top to bottom): (1) a set of goodness-of-fit statistics, which aid in comparing this model with others; (2) a check of the residual ACF for white noise; and (3) price forecasts for two periods and corresponding lower and upper limits for 95% confidence intervals for the forecasts. The Chi- square test statistics indicate whether the residuals are uncorrelated (white noise) or contain additional information that might be utilized by a more complex model. The test statistics of our example indicates that we cannot reject white noise. Thus, we concluded that the estimated model is an adequate representation of the pulpwood price series in region 2, Alabama. Comparing the first forecast figure ($29.80) to the actual figure ($30.44) for the fourth quarter of 1996, which became available when this analysis was completed, we find that the forecast, only about $0.64 lower, is pretty good. Note, however, that because the effect of the MA term is lost after predicting for the first period, the forecasts converge to the mean value quickly. To further verify our model, we also did one-lead price predictions for 1996:1-3 by deleting the corresponding observations from estimation. The actual prices were $29.17, $27.39, and $29.55 respectively, whereas our predicted ones were $29.47, $28.28, and $ It can be Conditionaleast squares estimation Parameter Estimate SE T value MA Goodness-of-fit statistics Variance SE AIC No. of residuals Autocorrelation check for residuals To lag Chi-square DF Prob Price prediction Time Forecast Lower 95% Upper 95% 1996: : SJAF23(1) 1999

5 Table 2 Estimated ARIMA models for selected pulpwood markets Series Estimated models Pa Pr SC2 ARIMA(4,1,0) ( L L L4)(1 -L)y =a, GA2 ARIMA(2,1,0) ( L2)(1 -L)y, =a, FL2 ARIMA(I,I,3) x (0,1,1)4( L)(1 -L)(1 -L4)y, = ( L3)( L4)a, LA2 ARIMA(6,1,2) ( L6)(1 - L)y, = ( L2)a, MS2 ARIMA(2,1,0) ( L2)(1 - L)y, = a, NOTE: SC2 means region 2, South Carolina, and other markets are similarly noted; Pa and Pf represent actual and forecast prices for 1996:4. seen that all our forecasts differ less than $1.0 from the actual prices. Results for Other Markets Estimations for other markets are reported in Table 2 for pulpwood price series, and Table 3 for sawtimber price series. Several points are worth mentioning. First, all of the pulpwood price series are stationary but do not reduce to pure white noise after differencing. Therefore, these series can be modeled using ARIMA, and our one-lead price forecasts are generally good. In sawtimber markets, however, some series became either white noise (region 2 in South Carolina and Florida) or close to white noise (region 2 in Alabama) after &fferencing. In the former case, no ARIMA representation for a series can be formulated. The only prediction for future prices is the most recent actual price--a feature of white noise. In the latter case, the first differences are almost white noise, but it is still possible to find a significant MA coefficient at a longer lag. Comparing the forecasts with the actual prices for those markets having an ARIMA representation, we find that the one-lead forecasts for 1996:3 are quite accurate. However, much of accuracy is lost in the two-lead forecasts. This implies that we should focus on one-lead forecasts and that, if possible, our model should be updated every quarter. Further, we see that many sawtimber markets have a similar behavior pattern, and three of the four predictable series can be characterized by an ARIMA(2,1,0), which is easy to be identified and estimated. In contrast, the pulpwood markets behave differently,. only two of the five markets are represented by an ARIMA(2,1,0). The remaining three have either longer AR lags (regions 2 in South Carolina and Louisiana) or seasonality (region 2 in Florida and South Carolina). Thus, no generalization can be made. Discussion We have shown how to predict timber prices using umvariate ARIMA models. Overall, we are encouraged by the degree of approximation of our short-term price predictions. Previously, stumpage price forecasting, out of the need of public policy, was largely based on econometric models, which only give a long-term average projection. On the other hand, although forest analysts and consultants practice price prediction with time series, more often than not, they simply draw a rough trend line to indicate the possible future market movement. Our work suggests that we may be able to do better in the short-run by adopting the Box-Jenkins ARIMA modeling procedure. For the U.S. South, where the timber economy is important, forecasting future prices will aid timber producers and consumers alike in timing harvests, reducing uncertainty, smoothing market changes, and enhancing efficiency. Given the fact that TMS has accumulated historical price information for the last twenty years, we anticipate that the interest in price forecasting will increase. Our results suggest that sawtimber and pulpwood markets behave in different manners. While three of the four predictable sawtimber series can be represented by an ARIMA(2,1,0), only two of the five pulpwood markets are characterized similarly. This appears to indicate that markets for different grades are subjecto different demand and supply fomes. Also, it can be argued that because univariate ARIMA models can only predict prices with certain precision for a few leads, their usefulness is limited. Generally speaking, the more the leads, the worse the forecasts. This is because for any stationary series, forecasts converge to the mean of the series. The convergence may be rapid or slow depending on the specific model structure. Forecasts from pure MA models converge more rapidly to the mean, since we quickly lose information about past estimated random shocks as we forecast further into the future. With pure AR or mixed models, we can "bootstrap" ourselves by using forecast values to replace observed values. From the business point of view, however, successful price forecasts for even one lead (quarter) ahead are a significant achievement, which will benefit various parties engaged in timber transactions. Instead of Table 3. Estimated ARIMA models for selected sawtimber markets. Series Estimated models P, Pc SC2 ARIMA(O, 1,0) (1 - L)y, = a, 287/315 N/A GA2 ARIMA(2,1,0) (I L2)(1 - L)y, = a, 319/ /313 FL2 ARIMA(O, 1,0) (1 - L)y, = a, 227/242 N/A AL2 ARIMA(0,1,7) (1- L)y, = ( L7)a, 277/ /272 LA2 ARIMA(2,1,0) ( L2X 1 - L)y, = a, 225/ /241 MS2 ARIMA(2,1,0) ( L 2 X 1 - L)y, = a, 245/ /255 NoTE SC2 means region 2, South Carolina, and other markets are similarly noted; Pa and Pfrepresent actual and forecast prices for 1996:3-4; N/A indicates not available. SJAF23(1)

6 attempting to predict prices for longer terms, we should update our estimations and forecasts frequently as new data become available. Finally, we saw that seasonality appears in some of the pulpwood markets. Seasonality is well noted by the forest products industry (Miller Freeman Inc. 1994). During the winter, worsened logging conditions and limitations in inventory capacities can result in higher prices. But it seems that a seasonal price pattern is not present in all the markets nor all the years. Future research should pay closer attention to this phenomenon. In addition, a plot of the price series suggests that many timber markets have become more volatile in recent years. This situation may indicate that the variances of these price series are becoming nonstationary, which calls for representing them alternatively, using autoregressive, conditionally heteroscedastic, or ARCH, models (Hamilton 1994). Another research direction is to develop multivariate time-series models, which involves variables and methods different from those employed in the usual econometric models of timber supply and demand, for price prediction. Literature Cited Box, G.E.P., AND G.M. JENKINS Time series analysis: Forecasting and control. Holden-Day Publisher, San Francisco, CA. 235 p. HAlGUT, R.G., AND T.P. HOLMES Stochastic price models and optimal tree cutting: Results for 1oblolly pine. Natur. Resour. Model. 5: HAMILTON, J.D Time series analysis. Princeton University Press, Princeton, NJ. 799 p. JutmE, G.G., R.C. HILL, W.E. GRIFFITHS, H. LUTKEPOHL, AND T.C. LEE Introduction to the theory and practice of econometrics. Wiley, New York p. Miller Freeman, Inc Pulp & paper factbook. Miller Freeman, San Francisco, CA. 469 p. NORRIS FOUNDATION Timber Mart-South. University of Georgia, Athens, GA. PANKRAX'Z, A Forecasting with univariate Box-Jenkins models: Concepts and cases. Wiley, New York. 562 p. USDA FOREST SERVICE An analysis of the timber situation in the United States: Gen. Tech. Rep. RM p. WASHBURN, C.L., AND C.S. BINKLE¾ informational efficiency of markets for stumpage. Am. J. Agric. Econ. 72: WASHBURN, C.L., ANt> C.S. BINKLE¾ informational efficiency of markets for stumpage: Reply. Am. J. Agfic. Econ. 75: YIN, R.S., AN D.H. NEWMAN Are markets for stumpage informationally efficient. Can. J. For. Res. 26: SJAF23(1) 1999