Better ACF and PACF plots, but no optimal linear prediction

Similar documents
Econometric Modeling and Forecasting of Food Exports in Albania

MODELING OF EXPORTS IN ALBANIA

New Procedure to Improve the Order Selection of Autoregressive Time Series Model

Time-Aware Service Ranking Prediction in the Internet of Things Environment

IBM SPSS Forecasting 19

Exploring the sources of uncertainty: why does bagging for time series forecasting work?

Modelling and Forecasting the Balance of Trade in Ethiopia

Nord Pool data overview

Univariate Time Series Modeling for Traffic Volume Estimation

INTRODUCTION BACKGROUND. Paper

COMPUTER SECTION. AUTOBOX: A Review

FORECASTING THE GROWTH OF IMPORTS IN KENYA USING ECONOMETRIC MODELS

Vipul Mehra December 22, 2017

Near-Balanced Incomplete Block Designs with An Application to Poster Competitions

Research Article Forecasting Bank Deposits Rate: Application of ARIMA and Artificial Neural Networks

Forecasting Saudi Arabia Daily Stock Market Prices

ARIMA LAB ECONOMIC TIME SERIES MODELING FORECAST Swedish Private Consumption version 1.1

Bagging Exponential Smoothing Methods using STL Decomposition and Box-Cox Transformation

Forecasting Major Food Crops Production in Khyber Pakhtunkhwa, Pakistan

COORDINATING DEMAND FORECASTING AND OPERATIONAL DECISION-MAKING WITH ASYMMETRIC COSTS: THE TREND CASE

Journal of American Science 2015;11(3)

Targeted Growth Rates for Long-Horizon Crude Oil Price Forecasts

The Role of Outliers in Growth Curve Models: A Case Study of City-based Fertility Rates in Turkey

General notice: Other product names mentioned herein are used for identification purposes only and may be trademarks of their respective companies.

Georgia Papacharalampous, Hristos Tyralis, and Demetris Koutsoyiannis

Forecasting Construction Cost Index using Energy Price as an Explanatory Variable

Time Series Analysis in the Social Sciences

Chapter 3. Table of Contents. Introduction. Empirical Methods for Demand Analysis

FEEMD-DR Model for Forecasting. Water Consumption

The Relationship Between Fossil Fuel Energy Consumption and Long-Run Growth: A Vector Autoregressive Approach

FORECAST MODEL USING ARIMA FOR STOCK PRICES OF AUTOMOBILE SECTOR. Aloysius Edward. 1, JyothiManoj. 2

An Assessment of the ISM Manufacturing Price Index for Inflation Forecasting

Assessing the effects of recent events on Chipotle sales revenue

Linear model to forecast sales from past data of Rossmann drug Store

Tutorial #3: Brand Pricing Experiment

Electricity consumption, Peak load and GDP in Saudi Arabia: A time series analysis

Short-Term Forecasting with ARIMA Models

Analyzing And Modelling Sewage Discharge Process Of Typical Area Using Time Series Analysis Method

Top-down Forecasting Using a CRM Database Gino Rooney Tom Bauer

Bitcoin Price Forecasting using Web Search and Social Media Data

Cluster-based Forecasting for Laboratory samples

Downloaded from irje.tums.ac.ir at 10:47 IRST on Monday December 10th 2018

Soybean Price Forecasting in Indian Commodity Market: An Econometric Model

Estimation of Relative Potency from Bioassay Data that Include Values below the Limit of Quantitation

TPD2 and standardised tobacco packaging What impacts have they had so far?

Spreadsheets in Education (ejsie)

Comparative study on demand forecasting by using Autoregressive Integrated Moving Average (ARIMA) and Response Surface Methodology (RSM)

A Statistical Comparison Of Accelerated Concrete Testing Methods

Spatial Pricing Efficiency: The Case of U.S. Long Grain Rice

Applying 2 k Factorial Design to assess the performance of ANN and SVM Methods for Forecasting Stationary and Non-stationary Time Series

Sudan Production of Sorghum; Forecasting Using Autoregressive Integrated Moving Average ARIMA Model

An Artificial Neural Network for Data Forecasting Purposes

Volume-5, Issue-1, June-2018 ISSN No:

Seminar Master Major Financial Economics : Quantitative Methods in Finance

Model Building Process Part 2: Factor Assumptions

Predicting tourism demand by A.R.I.M.A. models

Managers require good forecasts of future events. Business Analysts may choose from a wide range of forecasting techniques to support decision making.

Taylor Rule Revisited: from an Econometric Point of View 1

Temporal Modelling of Producer Price Inflation Rates of Ghana

Integrating Market and Credit Risk Measures using SAS Risk Dimensions software

reverse transcription! RT 1! RT 2! RT 3!

Understanding Fluctuations in Market Share Using ARIMA Time-Series Analysis

Comparison of Efficient Seasonal Indexes

Assessing the Impact of Exchange Rate on Major Agricultural Export Commodities of Thailand

Revision confidence limits for recent data on trend levels, trend growth rates and seasonally adjusted levels

Measuring Diversity in Australian Residential Property

TPD2 and standardised tobacco packaging What impacts have they had so far?

Measuring long-term effects in marketing P.M Cain

Testing the Predictability of Consumption Growth: Evidence from China

Forecasting fruit demand: Intelligent Procurement

FACTOR-AUGMENTED VAR MODEL FOR IMPULSE RESPONSE ANALYSIS

If you make a mistake, press Stop and re- try.

Winsor Approach in Regression Analysis. with Outlier

Top N Pareto Front Search Add-In (Version 1) Guide

Make the Jump from Business User to Data Analyst in SAS Visual Analytics

) ln (GDP t /Hours t ) ln (GDP deflator t ) capacity utilization t ln (Hours t

Application of epsilon method to modeling expectations in construction

Using Forecasting Models to Optimize Production Schedule in a Cafe

George Box and Gwilyni Jenkins

DRAFT FOR DISCUSSION AND REVIEW, NOT TO BE CITED OR QUOTED

Mining for Gold gets easier and a lot more fun! By Ken Deal

An Analysis of Cointegration: Investigation of the Cost-Price Squeeze in Agriculture

Seasonality and Forecasting of Monthly Broiler Price in Iran

Week 1 Business Forecasting

DIGITAL ECONOMY: DATA ANALYSIS ON THE CONTEXT ADVERTISING MARKET IN THE UK AND THE US

Chapter Six- Selecting the Best Innovation Model by Using Multiple Regression

Statistics and Data Analysis

Forecasting GDP Growth, or How Can Random Forests Improve Predictions in Economics?

The Cryosphere Discussions

An entropy based method for flood forecasting

AWERProcedia Information Technology & Computer Science

Empirical Exercise Handout

DEMAND FORECASTING FOR FERTILIZERS A TACTICAL PLANNING FRAMEWORK FOR INDUSTRIAL USE

DEFECT MONITORING IN IRON CASTING USING RESIDUES OF AUTOREGRESSIVE MODELS

Sugarcane and cotton are the two major cash crops. Research Article

ARIMA INTERVENTION ANALYSIS OF NIGERIAN MONTHLY CRUDE OIL PRICES

Building Confidence intervals for the Band-Pass and. Hodrick-Prescott Filters: An Application using Bootstrapping *

EURO-INDICATORS WORKING GROUP FLASH ESTIMATES 6 TH MEETING JULY 2002 EUROSTAT A6 DOC 105/02

Transcription:

Electronic Journal of Statistics Vol. 0 (0000) ISSN: 1935-7524 DOI: 10.1214/154957804100000000 Better ACF and PACF plots, but no optimal linear prediction Rob J Hyndman Department of Econometrics & Business Statistics Monash University Clayton VIC 3800 Australia printeade1 Abstract: I propose new ACF and PACF plots based on the autocovariance estimators of McMurry and Politis. I also show that the forecasting methods they propose perform poorly compared to some relatively simple autoregression algorithms already available. Keywords and phrases: autocorrelation, autoregression, data visualization, forecasting, serial correlation, time series graphics. It is always interesting to see new approaches to old problems, and McMurry and Politis have provided some very stimulating ideas in their approach to autocovariance estimation and forecasting. I am particularly interested in the usefulness of their results for analysing and forecasting real time series. There are three potential uses of their methods for data analysis that I would like to explore. First, the standard ACF plot could be replaced by a plot based on a tapered estimate of the autocovariance matrix. Second, a corresponding PACF plot could be obtained using the Durbin-Levinson recursions applied to a tapered estimate of the autocovariance matrix. Third, the proposed FSO or PSO predictors could be used for forecasting real time series (as in fact the authors do in Section 5.4). I will reflect on each of these in turn. 1. A better ACF plot The standard ACF plot is notoriously unreliable for large lags, and so any improvement in these estimates would be welcomed by time series analysts. In addition, most software produces ACF plots based on poor graphical choices making them more difficult than necessary to interpret. Consequently, I will propose a better ACF plot based on the tapered and banded autocovariance estimator of McMurry and Politis, and with improved graphical presentation. We shall define the estimated ACF using a data-based choice of the banding parameter, and with eigenvalue thresholding to ensure positive definiteness. For 0

R J Hyndman/Better ACF and PACF plots, but no optimal linear prediction 1 the time series {X 1,..., X n }, let ˆρ s = κ( s /l) γ s / γ 0 where s = 0, 1, 2,..., n 1, n s 1 if x 1 γ s = n 1 X t X t+ k, κ(x) = 2 x if 1 < x 2 t=1 0 otherwise; and l is the smallest positive integer such that γ l+k / γ 0 < c(log 10 n/n) 1/2 for k = 1,..., K. The corresponding autocorrelation matrix is ˆR with (i, j)th element ˆρ i j. We then use eigenvalue thresholding to define a new autocorrelation matrix R = V DV where ˆR = V ΛV is the eigendecomposition of ˆR and D is a diagonal matrix with (i, i)th element equal to the maximum of Λ i,i and ɛ/n β. Finally, we define n s 1 ρ s = (n s) d R k,k+s k=1 where d is the mean of the diagonals of D. Following the suggestions of McMurry and Politis, I use c = 2, K = 5, ɛ = 20 and β = 1. To demonstrate the new estimator, I have applied it to the seasonally differenced US monthly housing sales data (Makridakis et al. 1998, Chapter 3) in Figure 1 from 1973 1995. The plot on the left uses the standard estimator ρ s = γ s / γ 0 and was obtained using the default settings for the acf() function in R (R Core Team 2014), except that I selected 100 lags. The plot on the right uses the estimator ρ s defined above. The blue lines in the left-hand panel shows the 5% critical values at ±1.96n 1/2 under the null hypothesis of white noise. These are often misleading as we are usually interested in whether the autocorrelations are significantly different from zero under a null hypothesis that the data are from a stationary process (rather ACF 0 2 4 6 8 ACF 0 12 24 36 48 60 72 84 96 Fig 1. Left: traditional ACF plot with 5% critical values at ±1.96n 1/2. Right: Proposed new ACF plot based on a tapered estimator of the autocorrelation with bootstrapped confidence intervals shown as a gray shaded bars. Values significantly different from zero are shown as large solid points.

R J Hyndman/Better ACF and PACF plots, but no optimal linear prediction 2 than a white noise process). In particular, the long section of significant negative autocorrelations in the left-hand panel is probably not of particular interest, and I frequently have to tell my confused students to ignore such features. The horizontal axis is labelled but the axis is marked in units equal to one year rather than in lag units. It is possible to over-ride these defaults, but good software should have good default settings. The right-hand panel demonstrates an alternative plot. The shaded bars show 95% bootstrapped confidence intervals based on the linear process bootstrap (McMurry & Politis 2010) obtained using the same autocovariance estimate that is plotted. Autocorrelations that are significantly different from zero are highlighted using large solid circles, while insignificant autocorrelations are shown using small open circles. The x-axis shows the number of lags with tick-marks at multiples of years. Finally, the pointless lag 0 autocorrelation has been omitted. This version of an ACF plot should be much easier for students to read and interpret correctly. 2. A better PACF plot It is possible to obtain a corresponding estimate of the partial autocorrelation function using the Durbin-Levinson recursions (Morettin 1984) applied to the autocorrelation estimates { ρ 0, ρ 1,..., ρ n 1 }. Figure 2 shows the traditional and proposed PACF plots for the same housing sales data as shown in Figure 1. While the same improvements are evident, Partial ACF PACF 0 2 4 6 8 0 12 24 36 48 60 72 84 96 Fig 2. Left: traditional PACF plot with 5% critical values at ±1.96n 1/2. Right: Proposed new PACF plot based on a tapered estimator of the autocorrelation with bootstrapped confidence intervals shown as a gray shaded bars. Values significantly different from zero are shown as large solid points.

R J Hyndman/Better ACF and PACF plots, but no optimal linear prediction 3 the new plot obscures some potentially important information. In the left-hand panel, there are significant autocorrelations near lags 12, 24 and 36 indicating some seasonality in the data. Because they decline geometrically, this is suggestive of a seasonal MA(1) process. The tapering and shrinkage applied to the autocovariances has meant the corresponding autocorrelations near lags 24 and 36 in the right-hand plot are insignficant. It appears that the parameter choices for c, K, ɛ and β may need refinement, especially with seasonal data, to prevent important information being obscured. In other examples (not shown here), the partial autocorrelations increased in size for very large lags (and even became greater than one in absolute value). These problems are due to insufficient shrinkage of the eigenvalues, and provide further indication that better selection of the values of ɛ and β is required before these estimators could be routinely used in real data analysis. 3. Forecasting performance One surprising aspect of the results presented by McMurry and Politis is that their proposed forecasting methods do relatively well in the simulations. I had expected that with n observations, it would be impossible to satisfactorily forecast with an AR(p) model where p = O(n), but they have demonstrated otherwise. This is interesting, and deepens our understanding of the nature of the problem, but it does not help forecasters in practice. Even for the simulations with low-order stationary AR and MA processes, the proposed methods never give much improvement, and are often worse than the benchmark AR and BG methods. In these ideal circumstances, one would expect the proposed methods to perform at their best. For the real (M3) data, the reported results in Table 5 show that their PSO- Sh-Shr method does slightly better than a simple AR approach (with an RMSE of 0.942 compared to 0.987). The benchmark AR approach involves selecting the order by AIC, but it is not clear what method of estimation was used for the parameters. I tried to replicate these results using the ar() command in R with default settings (which employs Yule-Walker estimates and also uses the AIC to select the order), and I obtained an RMSE of 0.938. So it is very easy to beat the best of McMurry and Politis s methods on these same data. With the corresponding time-reversed data (Table 6), I obtained an RMSE of 1.031; again, better than any of the methods tested by McMurry and Politis whose best result was 1.133 for the FSO-WN-Shr method. I then tried using the auto.arima() function from the forecast package (Hyndman & Khandakar 2008) with full maximum likelihood estimation and AR order selection using the AICc (Hurvich & Tsai 1997). I restricted the models to purely autoregressive models in order to ensure comparability with the results of McMurry and Politis. The resulting RMSE was 0.871, substantially and significantly better than any of the results from McMurry and Politis. The corresponding result for the reversed series was 0.845.

R J Hyndman/Better ACF and PACF plots, but no optimal linear prediction 4 It seems that for forecasting purposes, the methods proposed by McMurry and Politis perform very poorly compared to the relatively simple algorithms already available. While their results appear to be very useful in estimating high-dimensional autocovariance matrices and autocorrelation functions, they do not provide optimal linear prediction as claimed. References Hurvich, C. & Tsai, C. (1997), Selection of a multistep linear predictor for short time series, Statistica Sinica 7, 395 406. Hyndman, R. J. & Khandakar, Y. (2008), Automatic time series forecasting: the forecast package for R, Journal of Statistical Software 26(3), 1 22. Makridakis, S. G., Wheelwright, S. C. & Hyndman, R. J. (1998), Forecasting: methods and applications, 3rd edition edn, John Wiley and Sons, New York. McMurry, T. L. & Politis, D. N. (2010), Banded and tapered estimates for autocovariance matrices and the linear process bootstrap, J. Time Series Analysis 31(6), 471 482. Morettin, P. A. (1984), The levinson algorithm and its applications in time series analysis, International Statistical Review 52(1), 83 92. R Core Team (2014), R: A Language and Environment for Statistical Computing, Vienna, Austria. URL: http://www.r-project.org