Stock Market Prediction with Multiple Regression, Fuzzy Type-2 Clustering and Neural Networks

Similar documents
Available online at ScienceDirect. Procedia Computer Science 61 (2015 )

Forecasting Electricity Consumption with Neural Networks and Support Vector Regression

CHAPTER 3 FUZZY LOGIC BASED FRAMEWORK FOR SOFTWARE COST ESTIMATION

Fuzzy Logic based Short-Term Electricity Demand Forecast

Applying 2 k Factorial Design to assess the performance of ANN and SVM Methods for Forecasting Stationary and Non-stationary Time Series

Grouping of Retail Items by Using K-Means Clustering

Keywords: Breach width; Embankment dam; Uncertainty analysis, neuro-fuzzy.

Enterprise Transformation through Aspects and Levels: Zachman Bayesian Approach

A 2-tuple Fuzzy Linguistic RFM Model and Its Implementation

COMPUTATIONAL INTELLIGENCE FOR SUPPLY CHAIN MANAGEMENT AND DESIGN: ADVANCED METHODS

Available online at ScienceDirect. Procedia Computer Science 36 (2014 )

Water Treatment Plant Decision Making Using Rough Multiple Classifier Systems

Procedia - Social and Behavioral Sciences 109 ( 2014 )

CHAPTER VI FUZZY MODEL

Flood Forecasting using Adaptive Neuro-Fuzzy Inference System (ANFIS)

Evapotranspiration Prediction Using Adaptive Neuro-Fuzzy Inference System and Penman FAO 56 Equation for St. Johns, FL, USA

Time-of-use Short-term Load Prediction Model Based on Variable Step Size Optimized HBMO-LS-SVM Method in Day-head Market

Available online at ScienceDirect. Procedia Engineering 186 (2017 )

Effort Estimation in Information Systems Projects using Data Mining Techniques

ScienceDirect. Guidelines for Applying Statistical Quality Control Method to Monitor Autocorrelated Prcoesses

Wind Farm Power Prediction Based on Wind Speed and Power Curve Models

Quality of Data Set In Modeling Work: A Case Study in Urban Area for Different Inputs Using Fuzzy Approach

ScienceDirect. An Efficient CRM-Data Mining Framework for the Prediction of Customer Behaviour

K-means based cluster analysis of residential smart meter measurements

A Fuzzy Multiple Attribute Decision Making Model for Benefit-Cost Analysis with Qualitative and Quantitative Attributes

Design and Implementation of Office Automation System based on Web Service Framework and Data Mining Techniques. He Huang1, a

Using the Analytic Hierarchy Process in Long-Term Load Growth Forecast

THE ROLE OF NESTED DATA ON GDP FORECASTING ACCURACY USING FACTOR MODELS *

Applied Hybrid Grey Model to Forecast Seasonal Time Series

CONNECTING CORPORATE GOVERNANCE TO COMPANIES PERFORMANCE BY ARTIFICIAL NEURAL NETWORKS

What is Evolutionary Computation? Genetic Algorithms. Components of Evolutionary Computing. The Argument. When changes occur...

A Data-driven Prognostic Model for

Data Mining and Applications in Genomics

Fraud Detection for MCC Manipulation

Optimized Fuzzy Logic Based Framework for Effort Estimation in Software Development

Fuzzy Logic and Optimisation: A Case study in Supply Chain Management

A HYBRID MODERN AND CLASSICAL ALGORITHM FOR INDONESIAN ELECTRICITY DEMAND FORECASTING

State-of-charge estimation of lithium-ion batteries based on multiple filters method

Management Science Letters

Artificial Intelligence-Based Modeling and Control of Fluidized Bed Combustion

Development of an Intelligent On-Line Monitoring System Based on ANFIS Algorithm for Resistance Spot Welding Process

ScienceDirect. Public Transportation Service Evaluations Utilizing Seoul Transportation Card Data

ScienceDirect. Finite Element Simulation of a Mercantile Vessel Shipboard Under Working Conditions

A Project Cost Forecasting Method Based on Grey System Theory

A hybrid approach artificial bee colony optimization and k-means clustering for software cost estimation

A Novel Data Mining Algorithm based on BP Neural Network and Its Applications on Stock Price Prediction

Artificial Neural Network Analysis of the Solar-Assisted Heat Pumps Performance for Domestic Hot Water Production

Automated Negotiation System in the SCM with Trade-Off Algorithm

Benchmarking by an Integrated Data Envelopment Analysis-Artificial Neural Network Algorithm

OPTIMIZING SUPPLIER SELECTION USING ARTIFICIAL INTELLIGENCE TECHNIQUE IN A MANUFACTURING FIRM

FORECASTING OF ELECTRICITY CAPACITY PRICES ON EU RESERVE MARKET. Shaping our world

Eliminating Analogy Cost Estimation Errors Using Fuzzy Logic

Determining the Governance Controllability of Organizations in Supply Chain Management Using Fuzzy Expert System

The use of a fuzzy logic-based system in cost-volume-profit analysis under uncertainty

GA-ANFIS Expert System Prototype for Prediction of Dermatological Diseases

WATER QUALITY PREDICTION IN DISTRIBUTION SYSTEM USING CASCADE FEED FORWARD NEURAL NETWORK

The application of hidden markov model in building genetic regulatory network

Staff Performance Appraisal using Fuzzy Evaluation

Applying Quantitative Proxies for Qualitative Evaluation Indicators: Employee Satisfaction and Customer Satisfaction

Modeling a fuzzy system for assisting the customer targeting decisions in retail companies

Research Article Combining Diffusion and Grey Models Based on Evolutionary Optimization Algorithms to Forecast Motherboard Shipments

Energy Conservation and Power Consumption Analysis in China Based on Input-output Method

Analysis on Influencing Factors of Emergency based on System Engineering

A fuzzy inference expert system to support the decision of deploying a military naval unit to a mission

Research on Security Assessment Model of Water Supply System Based on Leakage Control

Organization of the repair and maintenance in road sector with ontologies and multi-agent systems

Measuring Environmental Impacts, Subjectivity, and Sustainability

arxiv: v1 [cs.lg] 26 Oct 2018

Comparative study on demand forecasting by using Autoregressive Integrated Moving Average (ARIMA) and Response Surface Methodology (RSM)

A Survey on Recommendation Techniques in E-Commerce

Hierarchy Based Reusability Assessment Model for Component Qualification in Reusable Verification Environment

Management Science Letters

FUZZY LOGIC APPROACH TO REACTIVE POWER CONTROL IN UTILITY SYSTEM

Research on Personal Credit Assessment Based. Neural Network-Logistic Regression Combination

Available online at ScienceDirect. Procedia Computer Science 83 (2016 )

A Softcomputing Knowledge Areas Model

Job shop flow time prediction using neural networks

BelVis PRO Enhancement Package (EHP)

A Study of Financial Distress Prediction based on Discernibility Matrix and ANN Xin-Zhong BAO 1,a,*, Xiu-Zhuan MENG 1, Hong-Yu FU 1

Modeling of Steam Turbine Combined Cycle Power Plant Based on Soft Computing

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT

IMPROVED FRAMEWORK FOR MODELING MUNICIPAL RESIDENTIAL WATER CONSUMPTION ESTIMATION USING WAVELET-MAMDANI FUZZY APPROACH

Session 15 Business Intelligence: Data Mining and Data Warehousing

Available online at ScienceDirect. Procedia Manufacturing 11 (2017 ) Wan Chen Chiang, Chen Yang Cheng*

DOKUZ EYLUL UNIVERSITY FACULTY OF ENGINEERING OFFICE OF THE DEAN COURSE / MODULE / BLOCK DETAILS ACADEMIC YEAR / SEMESTER. Course Code: IND 4919

A Decision Support Method for Investment Preference Evaluation *

A Fuzzy Inference System Approach for Knowledge Management Tools Evaluation

PERFORMANCE APPRAISAL USING BASIC T- NORMS BY SOFTWARE FUZZY ALGORITHM

Index Terms: Customer Loyalty, SVM, Data mining, Classification, Gaussian kernel

Models in Engineering Glossary

Further study of control system for reservoir operation aided by neutral networks and fuzzy set theory

A Short-Term Bus Load Forecasting System

TEST CASE PRIORITIZATION USING FUZZY LOGIC BASED ON REQUIREMENT PRIORITIZING

Zombie Infection Warning System Based on Fuzzy Logic

Day-Ahead Price Forecasting of Electricity Market Using Neural Networks and Wavelet Transform

POS Data and Your Demand Forecast

Proposing a New Software Cost Estimation Model Based on Artificial Neural Networks

ADAPTIVE NEURO FUZZY INFERENCE SYSTEM FOR MONTHLY GROUNDWATER LEVEL PREDICTION IN AMARAVATHI RIVER MINOR BASIN

Investigation on the Role of Moisture Content, Clay and Environmental Conditions on Green Sand Mould Properties

Transcription:

Available online at www.sciencedirect.com Procedia Computer Science 6 (2011) 201 206 Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 2011- Chicago, IL Stock Market Prediction with Multiple Regression, Fuzzy Type-2 Clustering and Neural Networks David Enke a *, Manfred Grauer b, Nijat Mehdiyev b a Department of Finance and Operations Management, The University of Tulsa, Tulsa, OK 74104, USA b Department of Information Systems, University of Siegen, 57076, Siegen, Germany Abstract Stock market forecasting research offers many challenges and opportunities, with the forecasting of individual stocks or indexes focusing on forecasting either the level (value) of future market prices, or the direction of market price movement. A three-stage stock market prediction system is introduced in this article. In the first phase, Multiple Regression Analysis is applied to define the economic and financial variables which have a strong relationship with the output. In the second phase, Differential Evolution-based type-2 Fuzzy Clustering is implemented to create a prediction model. For the third phase, a Fuzzy type-2 Neural Network is used to perform the reasoning for future stock price prediction. The results of the network simulation show that the suggested model outperforms traditional models for forecasting stock market prices. 2011 Published by Elsevier B.V. Open access under CC BY-NC-ND license. Keywords: Stock Market Prediction; Forecasting; Multiple Regression Analysis; Fuzzy type-2 Clustering; Fuzzy Neural Networks; Hybrid Model; Differential Evolution Optimization 1. Introduction The goal of stock market prediction research is to seek a system which can predict stock price levels precisely. Taking into account the nonlinearities and discontinuities of the factors which are considered to impact stock markets, the selection process of a manageable amount of the financial and economic data is often viewed as a necessary initial stage of any stock market prediction model. Multiple Regression Analysis, Principal Component Analysis, Hurst Exponent Analysis, and Grey Relation Analysis can be used during this initial step [Ince and Trafalis, 2007; Hurst, 1951; Kung and Wen, 2007]. Given stock market model uncertainty, neuro-fuzzy techniques are viable candidates to capture stock market nonlinear relations, returning good forecasting results without prior knowledge of the input data statistical distribution [Atsalakis and Valavanis, 2009]. Atsalakis and Valavanis [2009] analyzed more than 100 related published articles that focus on artificial neural networks and neuro-fuzzy techniques derived and applied to forecasting stock markets. None of the reviewed authors of these articles used * Corresponding author. Tel.: 918-631-2218; fax: 918-631-2037. E-mail address: david-enke@utulsa.edu. 1877 0509 2011 Published by Elsevier Ltd. Open access under CC BY-NC-ND license. doi:10.1016/j.procs.2011.08.038

202 David Enke et al. / Procedia Computer Science 6 (2011) 201 206 Differential Evolution (DE) based Fuzzy type-2 Clustering or Fuzzy type-2 Inference Neural Networks in their models. The following will present a hybrid model that combines Multiple Regression Analysis, Fuzzy type-2 Clustering, and a Fuzzy type-2 Neural Network (See Figure 1). Figure 1: Proposed Hybrid Model The remainder of the paper is organized as follows: Section 2 describes the application of Multiple Regression Analysis to reduce the dimensionality of the variable set. The structure of the Differential Evolution Optimizationbased Fuzzy type-2 Clustering system that is implemented in this paper is then presented in Section 3. Section 4 discusses the architecture of the Fuzzy type-2 Neural Network used for predicting S&P 500 stock market prices. The experiments and empirical results of computer simulations are presented and described in Section 5. Finally, Section 6 provides a conclusion. 2. Multiple Regression Analysis Recent studies in stock market prediction suggest that there are many factors which are considered to be correlated with future stock market prices. Nonetheless, using too many financial and economical factors can overload the prediction system [Thawornwong and Enke, 2003; Hadavandi et al., 2010; Chang and Liu, 2008; Esfahanipour and Aghamiri, 2010]. As a result, one of the initial and most challenging steps of stock market prediction is determining the manageable amount of the input variables which have the strongest forecasting ability and can be used as inputs to a prediction system (the Fuzzy type-2 Neural Network in the proposed model). In order to understand which of the input variables are significant and driving the output, Multiple Regression Analysis is implemented in our hybrid model. The Multiple Regression Analysis is performed on 25 finance and economic variables to both reduce the dimensionality of the variable set and identify which variables have a strong relationship with the market price of the S&P 500 Index for the subsequent testing period. Example variables include past prices, volume, technical indicators, T-bill rates, certificate of deposit rates, credit ratings, producer/consumer price indexes, industrial production levels, and money supply levels. The input variables are collected on a monthly basis from January 1, 1980 to January 1, 2009, for a total of 361 months. The variables with inappropriate t-statistics and p-values are excluded from the list of inputs. According to the experiment results, the Multiple Regression Analysis method identified the 3-month T-bill (T-Bill3) rate, 3- month Certificate of Deposit (CDR3) rate, past S&P 500 (SP500) Index price level, past Money Supply (M1) level, recent Industrial Production (IP) reading, and the recent Producer Price Index (PPI) as significant and relevant variables in the regression model (with p-values less than 0.005). Analysis of the t-statistics and significance value of each variable suggested that these variables contain relevant information about the future stock prices. Given that the absolute t-statistics values of each variable was greater than 1 and the p-values (or significance values) are less than 0.005, the selected variables are believed to have strong forecasting ability. The regression analysis resulted in

David Enke et al. / Procedia Computer Science 6 (2011) 201 206 203 the following equation for predicting the S&P 500 Index price level for the following month given variation in the inputs: SP500 t+1 = -33.690 + (11.999*T-Bill3) + (1.365*IP) + (.059*M1) (.781*PPI) (9.124*CDR3)+(.955*SP500) (1) The relationship generated from the regression (Equation 1) indicates that the SP500 t+1 is a function of the intercept (-33.690) plus various coefficients multiplied times the relevant variables. For this formula, positive changes in T-Bill3 t, SP500 t, IP t 1, and M1 t 1 have positive effects on the prediction of the stock market level for the next month (SP500 t+1 ), while the positive changes of CD3 t and PP t 1 have negative effects. The R-squared value of the model is 0.994, implying that the equation explains 99.4 percent of the variation of the future stock market price. Table 1: Model Summary R R Squared Adjusted R Squared Std. Error of Estimate 0.997 0.994 0.994 35.604 3. Differential Evolution Optimization-based Fuzzy type-2 Clustering Different types of clustering analysis have been applied by various researchers in numerous fields of science. The agglomerative hierarchical clustering and the nonhierarchical clustering are the two main types of clustering techniques. The agglomerative hierarchical clustering methods, which are used as an explanatory statistical technique to determine the number of clusters of data sets, include Single, Complete, and Average linkage methods. These methods are appropriate for both qualitative and quantitative variables, whereas the Centroid and Wards methods are applied only for quantitative variables [Johnson and Wichern, 2002]. K-means, Fuzzy c-means, SOM neural networks and Differential Evolution-based Fuzzy Clustering are the main types of nonhierarchical clustering analysis [Aliev et al., 2011, Gardashova et al. 2010]. Some researchers have conducted studies to compare the effectiveness of different clustering techniques. The proposed model utilizes a Differential Evolution-based Fuzzy type-2 Clustering technique. Fuzzy clustering is a well-established paradigm used to generate the initial type-2 fuzzy If-Then model [Hwang and Chung-Hoon Rhee, 2007]. For the proposed model, Fuzzy type-2 Differential Evolution-based Clustering is used since it has been proven to produce results that better suit the application of type-2 If-Then rules [Aliev et al., 2011]. This removes the uncertainty in choosing the m parameter existing in Fuzzy c-means by suggesting a solution for a range of its values covering {1.4, 2.6}, a meaningful range for m. Adequate choice of m is very important as it plays a visible role in forming the shape of resulting fuzzy clusters [Aliev et al., 2011]. As the experiments have shown, using a type-2 Fuzzy Clustering method provides better location of the cluster centers, and subsequently results in a better fuzzy rule model. This in turn allows capturing more uncertainty, while delivering higher robustness against the imprecision of the data. The objective function is as follows (with n data vectors, P = {p 1, p 2,..., p n } inputs; prototype v j of the j th cluster generated by the fuzzy clustering; membership degree u ij of the i th data belonging to the j th cluster represented by the prototype v j ): n c n c m1 (1) m2 (2) m1 ij i j m2 ij i j i 1 j 1 i 1 j 1 J u p v min, J u p v min (2) subject to constraints: n 0 u i 1 ij n ( j 1, 2,..., c ) and c u ij j 1 1 ( i 1, 2,..., n )

204 David Enke et al. / Procedia Computer Science 6 (2011) 201 206 The vector v~ i is formed as: v [min( v, v ), max( v, v )] where Ind arg min v, v (3) (1) (2) (1) (2) (1) (2) i i Indi i Indi i j i j In the proposed approach, Differential Evolution [Price et al., 2005] is used for minimization of the objective function (Equation 1) since it acts as a global search algorithm and is expected to be more advantageous than standard Fuzzy c-means for the case of a large number of highly-dimensional data vectors. From this fuzzy model, the linguistic hedges approach [Aliev et al., 2011] can be used to derive the corresponding interpretable linguistic model. Fragments of the Fuzzy type-2 IF-THEN model(rules) discovered by fuzzy clustering are shown below: 1) IF S&P500 is about 1222 AND T-Bill3 is about 1.69 AND CDR3 is 2.96 AND PPI is about 136.93 AND M1 is about 498.56 AND IP is about 12.37 THEN S&P500 is about 1044... 7) IF S&P500 is about 878.79 AND T-Bill3 is about 5.19 AND CDR3 is 5.90 AND PPI is about 97.40 AND M1 is about 679.06 AND IP is about 50.49 THEN S&P500 is about 899.20 4. The Structure of Type-2 Fuzzy Inference Neural Network The structure of the proposed type-2 Fuzzy Inference Neural Network (T2FINN) [Aliev et al., 2011] is shown in Figure 2. Layer 1 maps inputs to type-2 fuzzy terms used in the rules. Layer 2 defines nodes representing the chosen rules. Each rule node performs the Min operation on the outputs (interval valued membership degrees) of the incoming links from the previous layer. Layer 3 consists of output term membership functions of type-1. Layer 4 computes the fuzzy output signal for the output variables. Layer 5 realizes the defuzzification using the Center-of- Gravity (COG) defuzzification method. Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Inputs Input M Fs Rules Output MFs Aggregation D efu zzific a tio n COG COG 5. Computer Simulation Figure 2: Structure of the type-2 Fuzzy Inference Neural Network The goal of the hybrid model is to determine the next stock price value by using the observed numerical data. The dataset used for testing contains 360 records. As determined during regression analysis, the input included the 3- month T-bill (T-Bill3) rate, 3-month Certificate of Deposit (CDR3) rate, past S&P 500 (SP500) Index level, past Money Supply (M1) level, recent Industrial Production (IP) reading, and the recent Producer Price Index (PPI). The output was the next S&P 500 Index value. For the simulation, the Differential Evolution-based Fuzzy type-2

David Enke et al. / Procedia Computer Science 6 (2011) 201 206 205 clustering model included seven clusters, max iteration =1000, exponent =2, and population size=200. Parameters of the type-2 neural network (that was initiated during the clustering procedure) are further adjusted by the Differential Evolution algorithm on the training series (80% of all data). Figure 3 provides examples of the produced type-2 membership functions. 1.2 1.2 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0-60 -40-20 0 20 40 60 80 0-100 -80-60 -40-20 0 20 40 60 Figure 3: Examples of Parameters of the Fuzzy type-2 Network after Training of the Data Series Performance of the type-2 neural network on both the training and testing data series is shown in Figure 4. 1.8 1.6 1.4 S&P (1:1000) 1.2 1 0.8 0.6 0.4 0.2 0 1 18 35 52 69 86 103 120 137 154 171 188 205 222 239 256 273 290 307 324 341 358 se rie s Figure 4: Fuzzy type-2 Neural Network Performance The RMSE error comparison with a type-1 model is shown in Table 2. Table 2: Comparison of Performance (type-1 compared to type-2) RMSE Type-1 approach 0.948 Type-2 (the suggested approach) 0.909 6. Conclusion Taking into account that gradient-based optimization methods (such as Fuzzy c-means) may not give a solution that reaches the global minimum (since it may get stuck in a local minimum), stock price prediction has been studied

206 David Enke et al. / Procedia Computer Science 6 (2011) 201 206 using a Fuzzy type-2 Neural Network with Fuzzy type-2 clustering. Furthermore, since the standard iterative scheme may not be directly applicable to the considered problem when various distance functions are used, Differential Evolution-based algorithms were used for solving the clustering optimization problem. Further refinements were made by supplying the Differential Evolution optimization to the Fuzzy Neural Network inference system. When applied to the problem of stock market forecasting, simulations resulted in a lower prediction error when a Fuzzy type-2 approach is used as compared to a Fuzzy type-1 approach. References Aliev R. A., W. Pedrycz, B. Guirimov, R. R. Aliyev, U. Ilhan, M. Babagil, and S. Mammadli, Type-2 Fuzzy Neural Networks with Fuzzy Clustering and Differential Evolution Optimization, Information Sciences (2011): pp. 1591-1608. Atsalakis G. S., and K. P. Valavanis, Surveying stock market forecasting techniques Part II: Soft computing methods, Expert Systems with Applications, Vol. 36, Issue 3, Part 2 (2009): pp. 5932-5941. Chang, P.C., and H.C. Liu, A TSK type fuzzy rule based system for stock price prediction, Expert Systems with Applications, Vol. 34 (2008): pp. 135-144. Esfahanipour, A., and W. Aghamiri, Adapted neuro-fuzzy inference system on indirect approach TSK fuzzy rule base for stock market analysis, Expert Systems with Applications, Vol. 37 (2010): pp. 4742 4748. Gardashova, L. A., N. Mehdiyev, J. I. Ramazanov, and R. R Aliyev, Extraction Rules From Data By Using Differential Evolution Based Fuzzy Clustering Method, Sixth World Conference on Intelligent Systems for Industrial Automation, Tashkent, Uzbekistan (2010): pp. 322-329. Hadavandi, E., H. Shavandi, and A. Ghanbari, Integration of genetic fuzzy systems and artificial neural networks for stock price forecasting, Knowledge-Based Systems, Vol. 23, Issue 8 (2010): pp. 800-808. Hurst, H.E., Long-term storage of reservoirs: an experimental study, Transactions of the American Society of Civil Engineers, Vol. 116 (1951): pp. 770-799. Hwang C., and F. Chung-Hoon Rhee, Uncertain fuzzy clustering: Interval Type-2 Fuzzy Approach to C-Means, IEEE Transactions on Fuzzy Systems, Vol. 15, No. 1 (2007): pp. 107-120. Ince, H., and B. T. Trafalis, Kernel principal component analysis and support vector machines for stock price prediction, IIE Transactions, Vol. 39, No. 6, (2007): pp. 629-637. Johnson, R.A., and D.W. Wichern, Applied Multivariate Statistical Analysis, Prentice-Hall, 2002. Kung, C.Y., and K. L. Wen, Applying Grey Relational Analysis and Grey Decision-Making to evaluate the relationship between company attributes and its financial performance - A case study of venture capital enterprises in Taiwan, Decision Support Systems, Vol. 43, Issue 3 (2007): pp. 842-852. Price, K., R. Storn, and J. Lampinen, Differential evolution a practical approach to global optimization, Springer, 2005. Thawornwong, S., and D. Enke, The Adaptive Selection of Financial and Economic Variables for Use with Artificial Neural Networks, Neurocomputing, Vol. 56 (2003): pp. 205-232.