Water Treatment Plant Decision Making Using Rough Multiple Classifier Systems

Similar documents
Stock Market Prediction with Multiple Regression, Fuzzy Type-2 Clustering and Neural Networks

Feature Selection of Gene Expression Data for Cancer Classification: A Review

Procedia - Social and Behavioral Sciences 109 ( 2014 ) Selection and peer review under responsibility of Organizing Committee of BEM 2013.

The Application used RFID in Third Party Logistics

Environmental Quality Assessment and Trend Analysis of Petroleum in Offshore Area Influencing by Reclamation

Available online at ScienceDirect. Procedia Engineering 100 (2015 )

Genetic Algorithms-Based Model for Multi-Project Human Resource Allocation

EVALUATION OF LOGISTIC REGRESSION MODEL WITH FEATURE SELECTION METHODS ON MEDICAL DATASET 1 Raghavendra B. K., 2 Dr. Jay B. Simha

Data Mining and Applications in Genomics

Introduction to Information Systems Fifth Edition

9. Verification, Validation, Testing

A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET

The Combined Model of Gray Theory and Neural Network which is based Matlab Software for Forecasting of Oil Product Demand

Test and Study on Electrical Property of Conductive Concrete

Visual Data Mining: A case study in Thermal Power Plant

FUZZY LOGIC APPROACH TO REACTIVE POWER CONTROL IN UTILITY SYSTEM

Available online at ScienceDirect. Procedia Manufacturing 2 (2015 )

Management Science Letters

Development of Surrogate Reservoir Models (SRM) For Fast Track Analysis of Complex Reservoirs

Research on the Trajectory of Oil Spill in Near-shore Area

Available online at ScienceDirect. Procedia Engineering 137 (2016 ) GITSS2015. Hongzhu Ruan, Xiaofeng Ji*, Chuan Feng

On the Supply Chain Management Supported by E-Commerce Service Platform for Agreement based Circulation of Fruits and Vegetables

Available online at ScienceDirect. Procedia Engineering 121 (2015 )

An Efficient and Effective Immune Based Classifier

Available online at ScienceDirect. Procedia Engineering 81 (2014 )

Reliability Analysis of the Railway Time Synchronization Network Based on Bayesian

Available online at ScienceDirect. Procedia Engineering 121 (2015 )

Evaluation of Logistic Regression Model with Feature Selection Methods on Medical Dataset

Available online at ScienceDirect. Procedia Engineering 123 (2015 ) Creative Construction Conference 2015 (CCC2015)

The Intrinsic Safety Engineering Design Method for the Petrochemical Plant

Study on Predictive Maintenance Strategy

The Application of Data Mining Technology in Building Energy Consumption Data Analysis

Procedia - Social and Behavioral Sciences 109 ( 2014 )

ScienceDirect. Optimization of Power Transmission on Mechanical Forging Presses

Dynamic Reliability Assessment of Slurry Pumps via Condition Monitoring

PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING

What is Evolutionary Computation? Genetic Algorithms. Components of Evolutionary Computing. The Argument. When changes occur...

Research on Regional Logistics Industry Development Strategy Based on System Dynamics: Take Changzhou as a Case

Research on e-commerce logistics system informationization in Chain

Action-Rules: How to Increase Profit of a Company

Method of Optimal Scheduling of Cascade Reservoirs based on Improved Chaotic Ant Colony Algorithm

WEB SERVICES COMPOSING BY MULTIAGENT NEGOTIATION

Condition-Based Maintenance Decision-making Support System (DSS) of Hydropower Plant

Fraudulent Behavior Forecast in Telecom Industry Based on Data Mining Technology

IWA Publishing 2012 Water Practice & Technology Vol 7 No 3 doi: /wpt

Available online at ScienceDirect. Procedia CIRP 29 (2015 ) The 22nd CIRP conference on Life Cycle Engineering

Data Mining Research on Time Series of E-commerce Transaction. Lanzhou, China.

Price Prediction Evolution: from Economic Model to Machine Learning

Experimentation and Optimization of Shrinkage in Plastic Injection Molded GPPS Part

Permutation Free Encoding Technique for Evolving Neural Networks

Available online at ScienceDirect. Procedia CIRP 42 (2016 )

Analysis of Customer Fulfilment with Process Mining: A Case Study in a Telecommunication Company

A Fuzzy Optimization Model for Single-Period Inventory Problem

Available online at ScienceDirect. Energy Procedia 63 (2014 ) GHGT-12

This is a refereed journal and all articles are professionally screened and reviewed

Stock data models analysis based on window mechanism

Applying RFID Hand-Held Device for Factory Equipment Diagnosis

Theoretical Investigation on the Partial Load Feedwater Heating System with Thermal Vapor Compressor in a Coal-fired Power Unit

Available online at ScienceDirect. Procedia Engineering 145 (2016 )

WKU-MIS-B11 Management Decision Support and Intelligent Systems. Management Information Systems

Combining Back-Propagation and Genetic Algorithms to Train Neural Networks for Start-Up Time Modeling in Combined Cycle Power Plants

Improving Solar PV System Efficiency Using One-Axis 3-Position Sun Tracking

An RFID Data Cleaning Strategy Based on Maximum Entropy Feature Selection

AN ALGORITHM FOR REMOTE SENSING IMAGE CLASSIFICATION BASED ON ARTIFICIAL IMMUNE B CELL NETWORK

Available online at ScienceDirect. C. S. Heysel a, Y. R. Filion a *

Design of a Diabetic Diagnosis System Using Rough Sets

Grouping of Retail Items by Using K-Means Clustering

ScienceDirect. "City Intelligent Energy and Transportation Network Policy" "Based on the Big Data Analysis"

Available online at ScienceDirect. Energy Procedia 78 (2015 ) th International Building Physics Conference, IBPC 2015

Test Bank Business Intelligence and Analytics Systems for Decision Support 10th Edition Sharda

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista

CHAPTER 2 REACTIVE POWER OPTIMIZATION A REVIEW

Numerical study of the influence of injection/production well perforation location on CO 2 -EGS system

Research of Load Leveling Strategy for Electric Arc Furnace in Iron and Steel Enterprises Yuanchao Wang1, a*, Zongxi Xie2, b and Zhihan Yang1, c

by Xindong Wu, Kui Yu, Hao Wang, Wei Ding

Modeling of competition in revenue management Petr Fiala 1

Available online at ScienceDirect. Procedia CIRP 28 (2015 ) rd CIRP Global Web Conference

Available online at ScienceDirect. Procedia Engineering 146 (2016 )

Using grey model GM (1, N) to evaluate impact level of different factors in environmental impact assessment of sewer system

A Trust Evaluation Model for Social Commerce Based on BP Neural Network

Multi-stage linear programming optimization for pump scheduling

Study on the method and procedure of logistics system modeling and simulation

Software Safety Testing Based on STPA

International Journal of Advancements in Research & Technology, Volume 3, Issue 8, August ISSN

Path Planning of Robot Based on Modified Ant Colony Algorithm

Available online at ScienceDirect

Week 1 Unit 1: Intelligent Applications Powered by Machine Learning

TABLE OF CONTENTS ix

Chapter 8 Analytical Procedures

Dynamic Simulation and Supply Chain Management

Available online at ScienceDirect. Procedia CIRP 41 (2016 )

ScienceDirect. Data Visualization Application for Analyzing Public Company Financial Statement

Classification of Bank Customers for Granting Banking Facility Using Fuzzy Expert System Based on Rules Extracted from the Banking Data

Managing innovation. Philips Industry Consulting

Staff Performance Appraisal using Fuzzy Evaluation

A Novel Extrusion Microns Embossing Method of Polymer Film

NoFaRe Non-Intrusive Facility and Resource Monitoring

Design, Fabrication and Experimental Study of a Novel Loopheat-pipe based Solar Thermal Facade Water Heating System

Scanning space analysis in Selective Laser Melting for CoCrMo powder

Transcription:

Available online at www.sciencedirect.com Procedia Environmental Sciences 11 (2011) 1419 1423 Water Treatment Plant Decision Making Using Rough Multiple Classifier Systems Lin Feng 1, 2 1 College of Computer Science,Sichuan Normal University,Chengdu, Sichuan, China 2 Sichuan Key Laboratory of Visualization Computing and Virtual Reality Chengdu, Sichuan, China scfengyc@126.com Abstract For high-dimensional water treatment plant data sets and a single rough classifier s weak classification ability for data sets with many classes, a new computing approach, termed CMBMRCS (water treatment plant Classification Model Based on Multiple Rough Classifier Systems), is proposed. First, by combing rough sets theory, some subset of attributes is selected. Then, each simplified data set establishes a group of rough classifiers. Finally, the water treatment plant data classification result is obtained according to the absolute majority voting strategy. The experimental results illustrate the effectiveness of the proposed methods. 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of the Intelligent Information 2011 Published Technology by Elsevier Application Ltd. Selection Research and/or Association. peer-review under responsibility of [name organizer] Open access under CC BY-NC-ND license. Keywords:Rough sets theory; attribute reduction; decision rules; water treatment plant monitoring 1.Introduction With the rapid growth of industrialization and population, the earth has faced over the last one hundred years has considerably increased environmental pollution, and is affecting water resources, air and soil qualities. In the last decade, wastewater treatment modeling has become a standard engineering tool for wastewater treatment plant design, process optimization, operator training, and developing control strategies [1], and attracted extensive attention by many researchers. Many new wastewater treatment modeling are emerging constantly. For example, Oliveira-Esquerre et al. presented a model of combing the principal components analysis and the back propagated neural network to predict the biochemical oxygen demand (BOD) of the stream of the biological wastewater treatment plant at RIPASA S/A Celulose e Papel [2]. Rieger et al. designed and developed a system for decision construct and decision making of wastewater issues, which are sound use of wastewater and treatment processes. Measures of common use in this respect are mostly dominated by environmental impact assessment, risk assessment, and cost benefit analysis [1]. Luo et al. proposed a soft computing approach based on the back propagation neural networks and fuzzy-rough sets, which has been applied for forecasting effluent NH3-N, COD and TN concentration 1878-0296 2011 Published by Elsevier Ltd. Open access under CC BY-NC-ND license. Selection and/or peer-review under responsibility of the Intelligent Information Technology Application Research Association. doi:10.1016/j.proenv.2011.12.213

1420 Lin Feng et al. / Procedia Environmental Sciences 11 (2011) 1419 1423 of a real wastewater treatment plant. The fuzzy-rough sets theory is employed to perform input selection of neural network which can reduce the influence due to the drawbacks of BP such as low training speed and easily affected by noise and weak interdependency data. The model performance is evaluated with statistical parameters and the simulation results indicates that the FR-BP modeling approach achieves much more accurate predictions as compared with the other traditional modeling approaches [3]. However, model predictions can only be as good as the data fed as model input or otherwise used for calibration. Data reconciliation procedures include fault detection, fault isolation, fault identification, and preparation of a data set suitable for the modeling objective [1]. Data mining techniques can help improve model predictions by addressing each and every one of the above mentioned problems [4]. Among data mining techniques, rough sets theory, proposed by Pawlak [5], is a mathematical and suitable approach for handling imprecision, incompletion and uncertainty in data analysis. Its efficiency has been successfully demonstrated in many applications such as attribute reduction, pattern recognition, and fault detection [6, 7, 8]. It has unique advantage to deal with high-dimensional wastewater treatment plant data. On the other hand, the classification ability of a single rough classifier is not good for wastewater treatment plant with many classes. So, it is an interest research topic for multiple rough classifier systems. In this paper, a novel intrusion detection approach, termed CMBMRCS (water treatment plant Classification Model Based on Multiple Rough Classifier Systems), is constructed. First, some subset of attributes is selected according to rough sets theory. And then, each of reduced data sets is trained to create a rough classifier respectively. Finally, the final classification result for identifying intrusion data is obtained according to the absolute majority voting strategy. The simulation experiment is also given. The rest of this paper is organized as follows. Section 2 gives an overview of rough sets theory. In Section 3, we give an overview of CMBMRCS. Section 4 shows the results of evaluating CMBMRCS with wastewater treatment plant data set and Section 5 offers the summary and conclusions of the research. 2.Theoretical Foundations of Rough Sets For the convenience of later discussion, we first introduce some basic notions of rough sets theory. Definition 1: An information table S is a quadruple S= ( U, RV,, f), where U is a finite nonempty set of objects, R is a finite nonempty set of attributes, V = r RVr, V r is a nonempty set of values of r, and f : U R V is an information function that maps an object in U to exactly one value in V r. If R = C D, C D =, S is called a decision table or a decision information system, where C is a set of condition attributes describing the objects, and D is a decision attribute that indicates the classes of objects. Definition 2: Let S= ( U, RV,, f) be an information table. For any B R, there is an associated B- discernibility relation IND( B ): (1) INDB ( ) = {( xy, ) ( xy, ) U U, b B( f( xb, ) = f( yb, ))} Definition 3: Let S= ( U, RV,, f) be an information table, for any B R, and X U, the B-lower approximation of a set can be defined: (2) B ( X) = { x U [ x] B X} Where [ x] IND( B) = { y U ( x, y) IND( B)} is the equivalence class determined by x. Definition 4: Let P and Q be equivalence relations of U, the positive region POS ( Q ) is defined as: P

Lin Feng et al. / Procedia Environmental Sciences 11 (2011) 1419 1423 1421 POS ( ) ( ) P Q = P X X U / Q (3) Definition 5: Given S= ( U, C DV,, f), an attribute set B C is a reduct of C with respect to D if it satisfies the following two conditions: (1) Jointly sufficient condition: γ B( D) = γ C( D) (4) (2) Individually necessary condition: b B, γ B\{ b} ( D) γb( D) (5) 3.Overview of CMBMRCS CMBMRCS is shown in Fig.1. It is composed of three main modules: attribute reduction, classification rules generation and the final classification result. 3.1.Attribute Reduction Module Attribute reduction is one of the core notions in rough sets theory. It is the transformation of highdimensional data into a meaningful representation of reduced dimensionality that corresponds to the intrinsic dimensionality of the data. Rough sets can be used to reduce the number of attributes contained in the data set using the data alone, requiring no additional information. In CMBMRCS, attribute reduction can provide the same quality of classification as the original that only monitor a small amount of attributes from network packets. Many approaches of attribute reduction have been developed in the area of rough sets. In rough sets theory, a subset of attributes defines an equivalence relation. Based on the equivalence partition, one can induce a set of positive rules and a set of boundary rules, respectively. In machine learning, this is commonly referred to as the problem of feature selection. In rough set analysis, the problem is called attribute reduction, and a selected set of attributes for rule induction is called a reduct. Intuitively speaking, an attribute reduct is a minimal subset of attributes whose induced rule sets have the same level of performance as the entire set of attributes, or a lower but satisfied level of performance [9]. Water treatment plant data sets Attribute reduction Attribute reduction 1 Attribute reduction n Rough classifier 1 Rough classifier 2 Rough classifier n Class label C 1 Class label C 2 Class label C i Absolute majority voting model Output the final result Figure 1. Overview of CMBMRCS

1422 Lin Feng et al. / Procedia Environmental Sciences 11 (2011) 1419 1423 3.2.Classification rules An important application of rough sets is to induce decision rules, in terms of rough classifiers that indicate the decision class of an object based on its values on some condition attributes in the decision table. Decision rules are expressed by the form if [condition] then [consequent]. In rough sets, decision rules can be generated by an inductive learning principle, and can be fall into two categories: certain rules and possible rules. Certain rules are generated from the lower approximation sets, while possible rules are generated from the upper approximation sets [4]. In rough sets, the process by which the maximum number of condition attribute values is removed without loosing essential information is called value reduction. The resulting rules are optimal because their conditions maximally general or minimal length. In this paper, we use rules inductive learning algorithm (value reduction) to generate a set of rules. 3.3.Classification Process For identification data, we can give an approach based on [10]. Approach: Classification process for identification data. Input: identification intrusion detection data x; Output: classification result of x. Step1. x input rough classifiers 1, rough classifiers 2,, rough classifiers n respectively. Step2. For i =1 to n Calculate the result of rough classifiers i. Step3. The final classification result of x is obtained according to the absolute majority voting strategy. 4.Experimental testing and analysis In order to evaluate the CMBMRCS approach, a water treatment plant database [11, 12] was chosen. The dataset is a set of historical data charted over 521 days, with 38 different input features measured daily. Each day is classified into one of thirteen categories depending on the operational status of the plant. The purpose of experiment is to compare the classification performance of single rough classifier method with multiple rough classifiers method. The evaluation criterion is the classification accuracy. In the experiment, we use the Nguyen improved greedy algorithm to discrete continuous-valued attributes and use the general attribute value reduction to generate decision rules. The minority priority matching strategy is adopted for testing data. The experimental results are showed in Table 1. TABLE I. THE RESULTS OF EXPERIMENT The times of experiment Multiple rough classifiers Single rough classifier Classification accuracy (%) Classification accuracy (%) 1 78.43 75.67 2 77.52 75.88 3 75.79 73.24 The results of experiment show that multiple rough classifiers method is the higher classification accuracy than single classifier method. Consequently, multiple rough classifiers have better performances for the complex water treatment plant data classification problems. 5.Conclusions Over the past ten years, data mining techniques have been successfully applied in the fields including marketing, manufacturing, process control, fraud detection, and network management. In this paper, we

Lin Feng et al. / Procedia Environmental Sciences 11 (2011) 1419 1423 1423 proposed a CMBMRCS approach. The experimental results illustrate that the proposed method has the higher classification accuracy than single classifier method, and could meet the demand of accuracy for the water treatment plant data classification problems with many classes. 6. Acknowledgment The authors thank the anonymous referees for their good comments. This paper is in part supported by the Scientific Research Fund of Sichuan Provincial Education Department under Grants No.09ZC079, and the Key Research Foundation of Sichuan Normal University, respectively. References [1] L. Rieger, I. Takacs, K. Villez, et al., Data reconciliation for wastewater treatment plant simulation studies planning for high-quality data and typical sources of errors, Water Environment Research, vol. 82, no. 5, pp. 426-433, 2010. [2] K. P. Oliveira-Esquerre, M. Mori, and R. E. Bruns, Simulation of an industrial wastewater treatment plant using artificial neural networks and principal omponents analysis, Brazilian Journal of Chemical Engineering, vol. 19, no. 4, pp. 365-370, 2002. [3] F. Luo, R. H. Yu, Y. G. Xu, et al., Effluent quality prediction of wastewater treatment plant based on fuzzy-rough sets and artificial neural networks, International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 09), IEEE Press, Aug. 2009, vol. 5, pp. 47-51, doi:/10.1109/fskd.2009.494. [4] L. Feng and Y. L. Yuan, An intrusion detection architecture based on multiple rough classifiers integration, Journal of Experimental & Theoretical Artificial Intelligence, unpublished. [5] Z. Pawlak, Rough sets, International Journal of Information and Computer Sciences, vol. 11, no. 5, pp. 341-356, 1982. [6] L. Feng, G. Y. Wang, and X. X. Li, Knowledge acquisition in vague objective information systems based on rough sets, Expert Systems, vol. 27, no. 2, pp. 129-142, 2010. [7] L. Feng, G.Y. Wang, and T.R. Li, Knowledge acquisition from decision tables containing continuous-valued attributes, Acta Electronica Sinic, vol. 37, no. 11, pp. 2432-2438, 2009. [8] Q. Wu and Z.T. Liu, Real formal concept analysis based on grey-rough set theory, Knowledge-Based Systems, vol. 22, no. 1, pp. 38-45, 2009. [9] Y. Y. Yao and Y. Zhao, Attribute Reduction in Decision-Theoretic Rough Set Models, Information Sciences, vol. 178, no. 17, pp. 3356-3373, 2008. [10] L. Shang, J. G. Wang, W. S. Yao, et al., A classification approach based on evolutionary neural networks, Journal of Software, vol. 16, no. 9, pp. 1577-1583, 2005. [11] Q. Shen and R. Jensen, Selecting informative features with fuzzy-rough sets and its application for complex systems monitoring, Pattern Recognition, no. 37, pp. 1351 1363, 2004. [12] C.L. Blake and C.J. Merz, UCI Repository ofmachine learning databases, Department of Information and Computer Science, University of California, Irvine, CA, 1998, http://www.ics.uci.edu/mlearn/mlrepository.html.