An evaluation of simple vs. complex selection rules for forecasting many time series. Fotios Petropoulos Prof. Robert Fildes

Size: px
Start display at page:

Download "An evaluation of simple vs. complex selection rules for forecasting many time series. Fotios Petropoulos Prof. Robert Fildes"

Transcription

1 An evaluation of simple vs. complex selection rules for forecasting many time series Fotios Petropoulos Prof. Robert Fildes

2 Outline Introduction Research Questions Experimental set-up: methods, data & rules Empirical investigation Implications & limitations Conclusions & perspectives

3 Introduction Forecasters regularly face the question of choosing from a set of alternative forecasting methods. The forecasting methods usually considered are simple. Two distinct approaches have been proposed for dealing with this problem (Fildes, 1989): aggregate selection individual selection The main objective of the current research is to investigate the conditions under which individual model selection may be beneficial.

4 Why is individual method selection important? If individual selection could be done perfectly, the gains over the best ex-post aggregate selection would be substantial. Improvements in univariate forecasting accuracy financially valuable. Results of various competitions: no single best method wide divergence in comparative performance by series within homogeneous data sets unexpected methods dominate The alternative: to develop a new general method. Is this feasible?

5 Research Questions (1) Segmenting the data: Predictable/Unpredictable Is the performance of non-seasonal Random Walk forecasting method is better than the median performance of all other methods under investigation as defined by Mean Absolute Error in the validation data? Trended/Non-Trended Cox-Stuart test over a 12-period centered moving average Seasonal/Non-Seasonal Friedman s non-parametric test RQ1. Is individual model selection more effective when applied to groups of time series with specific characteristics?

6 Research Questions (2) Number of models included in the pool of alternatives. Effectively a variant of over-fitting, the more models included, the higher the probability that the wrong model is chosen due to the randomness in the data. Every possible combination of smaller pools of two (2) up to twelve (12) methods is also examined. For example, in the case of a pool of methods equal to four (4), all 495 possible pools of methods are checked, the number of 4-combination in a set of 12 RQ2. What are the effects on individual selection of including more methods in the pool under consideration?

7 Research Questions (3) Many of the methods included in typical extrapolative competitions produce similar forecasts. Selection between methods that produce similar forecasts cannot prove valuable. Selecting among methods with low to medium correlated outputs and similar levels of accuracy is intuitively more promising. RQ3. Do pools of methods with low correlation, in terms of forecast error, provide better forecasting performance when individual selection rules are considered compared to more highly correlated pools?

8 Research Questions (4) Individual selection would be unlikely to be beneficial when a single method is dominant. Identify sub-populations where a dominant method exists (or not). Divide the data series into two groups in terms of the performance of one of the best methods. 1 st Group (Dominant method): Theta s performance in the top three 2 nd Group (Non-dominant method): Theta s performance outside the top three RQ4. Individual method selection is of most value when no dominant method is identified across the population.

9 Research Questions (5) The performance of the different methods is also analysed for their stability. Stability in a specific series can be measured by the average (across time origins) Spearman s rank correlation coefficient. A series is defined as stable when its Spearman s rho falls in the top quartile of the data set. RQ5. Individual selection is only effective compared to aggregate selection when relative performance in the pool of methods under consideration is stable.

10 How to choose a best method? In-sample fit rules MSE, AIC, BIC etc Makes intuitive sense, but on its own has poor predictive properties (Pant & Starbuck, 1990). Out-of-sample rules Makes even more sense: what has forecast the most accurately, will forecast the most accurately. Complex selection rules Based on numerous data characteristics & combination approaches (for example, Collopy & Armstrong, 1992; Shah, 1997; Meade, 2000).

11 Selection Rules (individual selection) Rule 1. Use the method with best fit as measured by the minimum onestep ahead in sample Mean Squared Error. Rule 2. Use the method with the best out-of-sample one-step-ahead forecast error, in terms of Mean Absolute Percentage Error. Rule 3. Use the method with best out-of-sample h-step-ahead forecast, in terms of Mean Absolute Percentage Error, and apply this method to forecast for just the same lead time. Rule 4. Use the best out-of-sample 1-18-steps-ahead, in terms of Mean Percentage Absolute Error, method to forecast for all lead times.

12 Benchmarks The empirical results focus on comparing the performance of individual selection versus the two simple benchmarks: Aggregate selection: this uses the single best method based on the one-step-ahead out-of-sample performance on the validation sample The combination of methods applying equal weights to each method included in the selection pool.

13 The set-up: data & methods M3 data monthly frequency 126 or more observations n = 998 series T 1 T 2 48 obs Initialization 42 obs Selection 36 obs Evaluation Rolling Forecasting: 18 forecasts ahead are calculated for each origin Methods: Naïve, SES, Holt, Damped, Holt-Winters, Theta, ARIMA

14 Individual selection approaches: A graphical illustration 48 obs 42 obs 36 obs

15 Individual selection approaches: A graphical illustration 48 obs

16 Individual selection approaches: A graphical illustration 49 obs

17 Individual selection approaches: A graphical illustration 50 obs

18 Individual selection approaches: A graphical illustration 89 obs

19 Individual selection approaches: A graphical illustration Best m i for all lead times arg min PFP i _ Method T k APE T T k 1 2 1, 2 1,6, m

20 Individual selection approaches: A graphical illustration Best m i for all lead times arg min PFP i _ Method T k APE T T k 1 2 1, 2 1,6, m

21 Individual selection approaches: A graphical illustration Best m i for all lead times arg min PFP i _ Method T k APE T T k 1 2 1, 2 1,6, m

22 Individual selection approaches: A graphical illustration Best m i for all lead times arg min PFP i _ Method T k APE T T k 1 2 1, 2 1,6, m 90 obs

23 Methods in selection pool Average Correlation Number of cases Rule 1 Rule 2 Rule 3 Rule 4 Rule 1 Rule 2 Rule 3 Rule 4 Individual Selection vs Aggregate Selection & Combination % of cases improved when all series are considered % of cases Individual Selection performed better vs. Aggregate vs. Combination Low High Low High Low High

24 Best Practices Segment Entire Data Set Predictable Unpredictable Trended Non-Trended Seasonal Non-Seasonal Best Practice Individual selection (Rule 4) applied on a high number of low correlated methods Combination applied on medium number of high correlated methods Individual selection (Rule 4) applied on a high number of high correlated methods Individual selection (Rule 4) applied on a high number of high correlated methods Combination applied on a medium number of high correlated methods Individual selection (Rule 3) applied on a medium number of high correlated methods Combination applied on a medium number of high correlated methods Dominant method Aggregate selection applied on a small number of high correlated methods Non-dominant method Stable Aggregate selection applied on a medium number of high correlated methods Individual selection (Rule 4) applied on a high number of low correlated methods Unstable Combination applied on a high number of high correlated methods

25 Aggregate Selection Combination Individual Selection Median Performance of Best Practice DDamp (Benchmark) Improvement Accuracy improvements (MdAPE %) Entire Data Set 7.4% 7.6% 7.2% 7.1% 7.5% 4.8% Predictable 6.5% 6.9% 6.6% 6.4% 6.6% 2.7% Unpredictable 9.1% 8.8% 8.7% 8.4% 9.5% 11.4% Trended 6.8% 7.0% 6.8% 6.7% 6.9% 2.4% Non-Trended 14.9% 15.8% 15.2% 14.6% 15.1% 3.3% Seasonal 9.2% 9.5% 9.0% 8.9% 9.2% 3.0% Non-Seasonal 4.5% 4.4% 4.6% 4.4% 4.5% 2.1% Dominant Method 7.8% 8.4% 7.9% 8.0% 8.2% 2.5% Non-dominant method 6.7% 6.9% 6.8% 6.6% 6.8% 3.0% Stable 6.9% 7.8% 6.8% 6.6% 6.9% 4.1% Unstable 7.6% 7.6% 7.6% 7.5% 8.1% 7.7%

26 Implications & Limitations Enhance the integrated automatic selection procedures. Best practices indicate a mix of simple and easy to apply approaches with more complicates selection schemes. Computational intensive specialized system design. Limited data history available. Data coming from a specific industry might appear more homogeneous. Adjustments in the design defining useful segments.

27 Conclusions Segmenting the series enables identifying suitable sub-populations of data, where the application of individual selection is more effective. Improvements from individual selection over aggregate selection are recorded when small to medium pools of methods are considered. Aggregate selection produces the best results when a single method displays dominant performance across a specific sample of series. Individual selection produces more accurate forecasts, when series are identified as stable. However, for the unstable series, the combination of methods is the most robust choice. The selection rule relying to the forecast performance over all horizons (Rule 4) proved better, while simply relying on past in-sample performance over the fitted data proved inadequate. Next step: Enhance the selection rules by using a large number of variables proposed in the literature to characterise a time series.

28 ? Thank you for your attention