Stock Price Forecasting Using Exogenous Time Series and Combined eural etworks Manoel C. Amorim eto, Victor M. O. Alves, Gustavo Tavares, Lenildo Aragão Junior, George D. C. Cavalcanti and Tsang Ing Ren Abstract Time series forecasting is useful in many researches areas. The use of models that provide a reliable prediction in financial time series may to bring valuable profits for the investors. An intelligent agent can be built from a suitable prediction model, to make operations in stock market daily. Furthermore, even that the investor had caution about the use of an automatic agent to make operations he can to use the prediction model as a valuable decision support. A methodology based on information obtained from exogenous series was used in combination with a neural network to predict stock series. Exogenous series were selected by analyzing the correlation between the series with the stocks series used. In this way, the prediction was obtained by not just using the previous values of the series but also by using information external to the main series. Additionally, the best trained neural networks were used in a combination to improve the prediction capacity of single networks. To evaluate the proposed models for prediction, some known metrics were used plus a proposed one - Prediction in Direction and Accuracy (PDA), which uses some features to determine if a model has a great accuracy and trend in prediction. Through this novel metric, we have used an evolutionary algorithm to choose the best trained models in order to obtain better results. Experiments with two of the most important Brazilian companies stock quotes have shown the usefulness of the proposed prediction system to generate profits in investments. I. ITRODUCTIO Time series are sets of variables observed over a defined period of time. These observations may be discrete or continuous and they are taken in an equal time interval [1]. There are many research areas involving time series analysis, such economy, physics, engineering, social sciences, computing, biology, medicine, meteorology and others. Perhaps the most applied analysis of a time series is in prediction. The prediction can be made using past observations of the series that will be forecast or even other time series. These different ones used to predict the main are know as Exogenous Time Series. There are two types of models in time series prediction: linear and non-linear. A known linear method is the ARIMA, proposed by Bob and Jenkins [2]. Some examples of non-linear models are: bilinear, exponential autoregressive, threshold autoregressive, smooth transition autoregressive, autoregressive with time dependent coefficients [3], autoregressive conditional heteroscedasticity (ARCH) and general Manoel C. Amorim eto, Gustavo Tavares, Victor M. O. Alves are with the Facilit Technology Company, Brazil, manoel,gustavotavares,victor}@facilit.com.br. Site: www.aistocktrend.com George D. C. Cavalcanti and Tsang Ing Ren are with the Center of Informatics, Federal University of Pernambuco, Brazil, gdcc,tir}@cin.ufpe.br. Site: www.cin.ufpe.br/ viisar autoregressive conditional heteroscedasticity (GARCH) [1] among other models. Artificial neural networks (A) for time series prediction have been successfully used in the last years, because of some interesting features such as universality in function approximations, robustness and fault tolerance [4]. For these reasons, neural networks are considered useful to build models for prediction of non-stationary time series [4]. Furthermore, A handles well noise data and it is able to predict nonlinear systems, which are the type of systems that we are interested to predict, the stock market. Among the various A models, the most used in literature is multilayer perceptron (MLP) [5]. Radial basis function (RBF), waveletbased and recurrent neural networks have been also applied with success [6]. Stock Market is a complex system composed of many investors selling and buying financial products in form of securities. Here, we are interested in the prediction of stocks of the biggest Brazilian oil Company, Petrobras, and one of the biggest miners companies of the world, Vale do Rio Doce. The Petrobras stock index is named PETR4 and the Vale do Rio Doce is named VALE5. These time series were analyzed between the years of 2003 and 2009. In this paper, a comparison between two models of A, named MLP and RBF networks, both with and without exogenous time series are presented. Additionally, we propose a novel performance metric to select the best trained models, which aims to maximize trend prediction and accuracy. The propose metric was used for selection of the best trained networks to be combined in a combination machine. This paper is organized as follows. Section II describes briefly the stock market and the exogenous time series used. Section III presents the performance metrics which were used and the novel introduced metric. Section IV presents the proposed methods for combining neural networks in a combination machine. Section V describes the experiments and results obtained. Finally, the Section VI presents the conclusions and final remarks. II. THE STOCK MARKET AD EXOGEOUS TIME SERIES The main function of the capital market is the trade of stocks with the purpose of finance development, which in its turn produce and nourish the market itself. On this way, a third function is attributed: the market of its own sources of incomes [7]. The monetary market, as a whole, is important for the economic development. However, when the economy and the market develops, the market of the source of capital
emerges, which are the stock market, debt titles and real estate market. Globalization is a trend that allows an intense interchange between countries. Consequently, it is common nowadays that the stock market of an emergent country like Brazil attain an increasing importance in the international scenario. Today the stock market is not only an important source of corporation finance but also an individual capitalization resource. When investing in a portfolio, the investor wishes to obtain a large return in other to compensate the risks associated, in other words, the objective is to minimize risk and maximize capital returns. Hence, a prediction method is most useful and a neural network is well-suited for this kind of optimization procedure. Currently, the Brazilian stock market, which is also known in the World Federation of Exchange (WFE) by São Paulo SE, has a global importance. From the 51 stocks monitored by WEF, BOVESPA was in eighth position among the biggest stock market in the world in terms of capitalization and stock values, in a ranking for developing countries. Two of the biggest companies in the BOVESPA stock market are the Petrobras oil company and Vale do Rio Doce, which makes them ideal stocks to be analyzed. For the professional investor to understand the behavior of a stock, at least five series are necessary: 1) The highest value that the stock was negotiated in a certain day. 2) The lowest value that the stock was negotiated during the same day. 3) The value of the first negotiation of the day: opening price. 4) The value of the last negotiation of the day: closing price. 5) The business volume of the stock during the same day. The closing prize is the series that is really important, since most of the professional investors and financial institutions take action based on its value. From the methods for forecasting time series, the choice of the input variables is an important step. In this work, we are interested in the prediction of the stocks quotations of PETR4 and VALE5. To predict these stock values, we have used exogenous time series that were chosen based on the autocorrelation analyzes, similarly to work done previously [8]. For the Petrobras Company (PETR4) the exogenous time series utilized were: Dollar, IBOV, CLF, SY:PBR, DAX and SP500. Dollar time series is the Brazilian Real quotation converted to United States Dollar. IBOV is the BOVESPA quotation. CLF is the Crude Light Oil Future quotation. SY:PBR is the quotation of Brazilian Oil. DAX is the German stock market index. SP500 is the S&P 500 index. For the Vale do Rio Doce Company (VALE5) the exogenous time series used were: Dollar and IBOV. This stocks were chosen based on economic analyzes [8]. III. PERFORMACE MEASUREMET OF PREDICTIO MODELS There are several metrics used to evaluate models of time series forecasting. In this paper we have employed five metrics that are commonly used in literature: MSE, MAPE, POCID, THEIL (or MSE) and ARV. Additionally, it was used SLG, which was proposed by Amorim eto [8], and a novel metric proposed in this work, named Prediction in Direction and Accuracy (PDA). A simple measure to evaluate the accuracy of a forecasting model is the diference between the expected value and the output value of model. From Equation 1, T t is a expected value and Y t is the output of the forecast model, and e t is the calculed error, both at time t. Consider this measure as a basis for the others. e t = T t Y t (1) The performance measurement metrics used in this work are briefly described here. Consider for every metric: T t as the desired output of the forecasting model at time t and Y t as the output of the proposed model and as the total amount of available patterns. A. MSE (Mean Squared Error) The Mean Squared Error is the most known metric to evaluate the performance of forecasting models. It is defined as: MSE = 1 (e t ) 2 (2) B. MAPE (Mean Absolute Percent Error) The Mean Absolute Percent Error measure the accuracy of model in percentage. It is defined as: MAP E = 1 e t Y t (3) A lower value of MAPE is the desired result from a prediction method. C. THEIL or MSE (ormalized Mean Squared Error) The ormalized Mean Squared Error evaluate the relationship of the model with the random walk model. Equation 4 defines this value. T HEIL = (e t) 2 (Y (4) t Y t 1 ) 2 When THEIL is equal to one, the proposed model is equivalent to random walk model. The random walk model proposes that the time series future value is equal to the current value. If THEIL is lower than one, then the proposed model is better than random walk model. If THEIL is greater than one, then the proposed model has a performance worse than random walk model.
D. POCID (Prediction On Change In Direction) POCID is the percentage of the correct trend of the model relative to the trend of expected value. This metric is defined by Equation 5. P DA = G t where G t is defined in Equation 11 : (10) P OCID =100 D t The value of D t is defined by Equation 6 1, if (Tt T D t = t 1)(Y t Y t 1) > 0, 0, otherwise. E. ARV (Average Relative Variance) The Average Relative Variance evaluates the relationship of the model with the other model, which proposes that the time series future value is equal to the arithmetic mean of the past values. It is defined as: (5) (6) ARV = (e t) 2 (Y (7) t T ) 2 When ARV is equal to one, the proposed model is equivalent to the mean of past values. If ARV is lower than one, then the proposed model is better than the mean of past values. If the ARV is greater than one, the proposed model has a performance worse than mean of past values. F. SLG (Sum of Losses and Gains) The SLG was proposed by Amorim eto [8] and was inspired by POCID. It defined as the mean of the losses and gains of the model. The SLG measurement is defined by Equation 8: SLG = L t In Equation 8, the value of L t is defined by Equation 9 + (Tt T L t = t 1), if (T t T t 1)(Y t Y t 1) > 0 (T t T t 1), otherwise. (9) SLG less than zero indicates financial losses. G. Prediction in Direction and Accuracy (PDA) PDA is the novel metric proposed in this paper. The objective is to benefit models of forecasting with the better behavior in trend and accuracy. This is possible by the maximization of POCID and the minimization relative error. The accuracy of a model is measured by maximum relative error. The best behavior in trend and the most accurate model will have a higher value of PDA. In other words, the higher the value of PDA implies in a better model. It is a improvement of the SLG metric. This model is mathematically described in Equations 10 and 11. (8) 1 re t, if(d t = 1) and re t <, 0, if(d G t = t = 1) and re t, 1 + re t (11), if(d t = 0) and re t <, 1, if(d t = 0) and re t where D t is defined by Equation 12, re t = et T t and = 0.02. This constant value is the relative maximum error accepted by the prediction. In this case, the maximum tolerance is 2% error. 1, if(tt T D t = t 1)(Y t Y t 1) > 0, 0, otherwise. (12) If the models have a right prediction in direction (D t = 1) and the relative error is lower than maximum error then 1 ret is added; if the models have a right prediction in direction (D t = 1) but the relative error is greater or equal than the maximum error then nothing is added; if the model has a wrong prediction in direction (D t = 0) and the relative error is lower than maximum error then 1+ ret is added; if the model have a wrong prediction (D t = 0) in direction and the relative error is greater or equal than maximum error then 1 is added. After the summation, the mean is calculed. IV. EURAL ETWORKS COMBIATIO eural etwork is a stochastic mathematical model that aims to simulate the functionality of a biological network. A eural etwork is formed by a set of connected neurons organized in layers. Each neuron can be considered a computational processing unit. There are several kinds of eural etworks [4], and Multi-layer Perceptron (MLP) and Radial Basis Function (RBF) were used in this paper. The training of MLPs using exogenous time series improves the ability of the model in forecasting, as it has been was demonstrated by Amorim eto [8]. Additionally, a combination of MLPs trained with exogenous time series improves the single MLP performance [8]. A combination of neural networks is an architecture which uses a set of trained models and combines the outputs of these models, in the same input, in a unique system. The combination architecture used here is depicted in Figure 1. This paper presents two ways to choose the neural networks which will integrate the combination: (i) the selection of the best networks through PDA, and (ii) selection through an evolutionary algorithm. The details are in the Section V. V. EXPERIMETS This Section describes all experiments performed to evaluate the prediction metric and methods describe above.
TABLE II DISTRIBUTIO OF STOCK QUOTES PER PATTER I THE DATABASES WITH EXOGEOUS. Fig. 1. Combination architecture used in this work. TABLE I DISTRIBUTIO OF STOCK QUOTES PER PATTER I THE DATABASES A. Databases WITHOUT EXOGEOUS. Stock Lag Stock Lag PETR4 close -1 VALE5 close -1 PETR4 close -2 VALE5 close -2 PETR4 close -3 VALE5 close -3 Two databases were used for the evaluation of the proposed methods: PETR4 stock quotes dataset and VALE5 stock quotes dataset. Besides, all exogenous time series described in Section III were used to complement the main series. The experiments were performed using two groups of datasets: dataset without exogenous and dataset with exogenous time series. Both PETR4 and VALE5 had the two dataset groups. Table I shows the stock quote distribution for each database without the exogenous time series and Table II shows the distribution with exogenous time series. The lag notation is equivalent to the time t of the series, i.e., lag 1 corresponds to the stock quote on previous day; lag 2 corresponds to the stock quote on two days past; and lag 0 corresponds to the stock quote from the current day. B. Experimental Setup Before the experiments, the databases were organized in three datasets: training, validation and test sets. The training set was used for the learning of the neural networks. The validation set was used for tuning of some training parameters. The training was performed varying other parameters such as: the number of hidden neurons for MLP and the spread of the RBF, resulting in a large number of experiments and neural networks trained. From 1, 500 days of stock quotation in the database, 1, 200 were used for training and validation and the last 300 days for testing. For each experiment, the training and validation sets were divided randomly, with 900 for training and 300 for validation. However, the test dataset remained the same. The experiments were done as follows. Stock Lag Stock Lag PETR4 close -1 VALE5 close -1 PETR4 close -2 VALE5 close -2 PETR4 close -3 VALE5 close -3 PETR4 open -1 VALE5 open -1 PETR4 highest -1 VALE5 highest -1 PETR4 lowest -1 VALE5 lowest -1 Dollar close -1 Dollar close -1 DAX close -1 IBOV close -1 IBOV close -1 CLF close -1 SP500 close -1 SY:PBR close -1 First, two experiments using the main time series, PETR4 without exogenous, were done. One was using MLP and another using RBF. The training of the MLP was made with a variation of hidden neurons in [20... 60], ten times for each number, generating 410 trained MLPs. The training of RBF was made with a variation in [10... 100], generating 910 trained RBFs. We have used the validation dataset to choose the best network configuration, and then use it in a test set. Afterwards, two more experiments were done, but now including the exogenous time series. Again, 410 trained MLPs and 910 trained RBFs were obtained. In the end, following Equation 13, where X is the total number of trained networks, the bet networks (trained with exogenous) were chosen for combination (according to validation dataset). = round(log 2 (X)) (13) We have seven metrics to choose the bests networks for combining. There are a lot of combination possibilities, and we have used two approaches: (i) the combination of the best networks based on P DA metric and (ii) a genetic algorithm (GA) for this task, where the variable to be maximized was P DA. Genetic Algorithm is a evolutionary technique which aims to get optimization by evolution, through some operators, such mutation and crossover. In this method, each possible solution to the problem, that must be optimized, is represented by a chromosome. In general, these results show that the RBF has a better performance than MLP according to the proposed metric. In the experiments with combination in, the genetic algorithm found the following metrics combination: POCID + THEIL + ARV + PDA for MLP and MSE + MAPE + THEIL + ARV + PDA for RBF. For VALE5 database, the GA found these metrics combination: POCID
+ PDA for MLP and MSE + MAPE + THEIL + ARV + PDA for RBF. Table III shows the combination results generated by GA. In both databases, the combination of MLP was better than RBF, according to PDA metric. In fact, for both databases, the MLP combination presented the best combination of accuracy and trend prediction than RBF combination. The results of the combination of best PDA networks can be seen in Table IV. As in GA combination, this results shows that MLP is the best choice. For both databases, the MLP combination presented the biggest PDA value, indicating that this combination had better accuracy and trend prediction than RBF combination. Even when RBF outperforms MLP in some metric, the performance of both are very close considering this metric. TABLE III BEST RESULTS WITH COMBIATIO BY GEETIC ALGORITHM (X(σ)). MSE 0.61812 (0.0063709) 0.62212(0.014382) MAPE 0.015824 (7.8961e-005) 0.016028(0.00015723) POCID 81.4094(0.17329) 81.5436 (0.27399) SLG 0.70212(0.010981) 0.70862 (0.003683) THEIL 0.27291 (0.0033154) 0.28875(0.0052983) ARV 0.0021179 (2.153e-005) 0.0021358(4.9668e 005) PDA 0.28546 (0.0054852) 0.26451(0.0044274) MSE 0.71341(0.021186) 0.6008 (0.017369) MAPE 0.015662(0.00017252) 0.015511 (0.00027798) POCID 83.1104 (0.47951) 81.0033(0.56406) SLG 0.83871 (0.012075) 0.80091(0.012646) THEIL 0.33196(0.0093126) 0.26268 (0.0055378) ARV 0.0017654(5.3089e 005) 0.0014873 (4.29e-005) PDA 0.27251 (0.0097393) 0.23946(0.012403) VI. COCLUSIO This paper presented a comparison between MLP and RBF neural networks using PETR4 and VALE5 time series with and without exogenous data. It also introduced a new performance metric for selection of trained networks to combine in combination machines. Experiments were made to verify the usefulness of the proposed metric and the proposed combinations. The experiments showed that: (i) without combination, RBF outperforms MLP in general; (ii) with combination, MLP makes an improvement in performance and overcomes RBF; (iii) the proposed novel metric is useful for network selection based on the main metrics for financial investments, especially it is suitable for minimization/maximization algorithms, as used in a genetic algorithm. The two proposed combination methods had similar gains in prediction. However, the selection by the best PDA can TABLE IV BEST RESULTS WITH COMBIATIO BY THE BEST ETWORKS RAKED BY PDA METRIC (X(σ)). MSE 0.6056 (0.01061) 0.61335(0.010417) MAPE 0.015732 (0.00015777) 0.016285(0.000224) POCID 81.5436 (0.47457) 79.4295(0.65319) SLG 0.72031 (0.021801) 0.69428(0.017808) THEIL 0.26983 (0.0035511) 0.29633(0.0038447) ARV 0.0020787 (3.5254e-005) 0.002107(3.6346e 005) PDA 0.27865 (0.004667) 0.2365(0.011121) MSE 0.72078(0.024093) 0.59518 (0.013024) MAPE 0.015687(0.00027368) 0.015478 (0.00025494) POCID 82.6756 (0.41113) 80.9365(0.49857) SLG 0.83816 (0.010863) 0.80828(0.010803) THEIL 0.33828(0.011696) 0.26519 (0.0069003) ARV 0.0017837(6.0259e 005) 0.0014727 (3.2269e-005) PDA 0.26401 (0.0080416) 0.22843(0.0074966) be more suitable because genetic algorithm have a high computational cost. Also, the proposed metric is a natural evolution of SLG that aims to improve the rank of network based on accuracy as well, instead of trend exclusively. The results obtained showed some relation between this metric and other accuracy/trend metrics. In other words, when PDA increases, the accuracy/trend has also an improvement. REFERECES [1] Brockwell, P. J. and Davis, R. A. Introduction to Time Series and Forecasting. ew York, USA : Springer Verlag, 1996. [2] Chu, Ching W., Ching Z. and Guoqiang P. A comparative study of linear and nonlinear models for aggregate retail sales forecasting. International Journal of Production Economics, pp. 217-23, 2003. [3] De Gooijer, Jan G., Jan K. and Kuldeep. Some recent developments in non-linear times series modeling, testing and forecasting. Prentice Hall, 1998. [4] Haykin S. eural etworks: a Comprehensive Foundation. Second Edition. International Journal of Forecasting, vol. 8, pp. 135-156, 1992. [5] Charkha, Pritam R. Stock Prediction and Trend Prediction using eural etwork. First International Conference on Emerging Trends in Engineering and Technology, pp. 592-594, 2008. [6] Ferreira T. A. E., Vasconcelos G. C. and Adeodato P. J. L. A ew Intelligent System Methodology for Time Series Forecasting with Artificial eural etworks. eural Process Letters, vol. 28, pp. 113-129, 2008. [7] Schumpeter J. A. The Theory of Economic Development: An Inquiry into Profits, Capital, Credit, Interest, and the Business Cycle. Transaction Publishers, 1982. [8] Amorim eto M. C., Calvalcanti G. D. C., Ren T. I. Financial time series prediction using exogenous series and combined neural networks. International Joint Conference on eural etworks, pp. 2578-2585, 2009.