Journal of Applied Research and Technology ISSN: Centro de Ciencias Aplicadas y Desarrollo Tecnológico.

Size: px
Start display at page:

Download "Journal of Applied Research and Technology ISSN: Centro de Ciencias Aplicadas y Desarrollo Tecnológico."

Transcription

1 Journal of Appled Research and Technology ISSN: Centro de Cencas Aplcadas y Desarrollo Tecnológco Méxco Dng, Y. R.; Ca, Y. J.; Sun, P. D.; Chen, B. The Use of Combned Neural Networks and Genetc Algorthms for Predcton of Rver Water Qualty Journal of Appled Research and Technology, vol. 12, núm. 3, juno-, 2014, pp Centro de Cencas Aplcadas y Desarrollo Tecnológco Dstrto Federal, Méxco Avalable n: How to cte Complete ssue More nformaton about ths artcle Journal's homepage n redalyc.org Scentfc Informaton System Network of Scentfc Journals from Latn Amerca, the Carbbean, Span and Portugal Non-proft academc project, developed under the open access ntatve

2 The Use of Combned Neural Networks and Genetc Algorthms for Predcton of Rver Water Qualty Y. R. Dng* 1, Y. J. Ca 2, P. D. Sun 3 and B. Chen 4 1 Department of computer scence and technology, JangNan Unversty, Wux, Chna. * yr_dng@jangnan.edu.cn 2 School of Botechnology, JangNan Unversty, Wux, Chna., 3 School of Chemcal and Materal Engneerng, JangNan Unversty, Wux, Chna. 4 Envronmental Montorng Staton of Bnhu Dstrct, Wux, Chna. ABSTRACT To effectvely control and treat rver water polluton, t s very crtcal to establsh a water qualty predcton system. Combned Prncpal Component Analyss (PCA), Genetc Algorthm (GA) and Back Propagaton Neural Network (BPNN), a hybrd ntellgent algorthm s desgned to predct rver water qualty. Frstly, PCA s used to reduce data dmensonalty. 23 water qualty ndex factors can be compressed nto 15 aggregatve ndces. PCA mproved effectvely the tranng speed of follow-up algorthms. Then, GA optmzes the parameters of BPNN. The average predcton rates of non-polluted and polluted water qualty are 88.9% and 93.1% respectvely, the global predcton rate s approxmately 91%. The water qualty predcton system based on the combnaton of Neural Networks and Genetc Algorthms can accurately predct water qualty and provde useful support for realtme early warnng systems. Keywords: back propagaton neural network, genetc algorthm, prncpal component analyss, water qualty predcton. 1. Introducton Rapd economc growth nevtably causes water polluton. To effectvely control water polluton, automatc water qualty montorng statons are bult n many mportant dstrcts. Accurate water qualty predcton methods are very mportant to montor and control water polluton tmely. Therefore, a powerful water qualty predcton methods are vtal when automatc water qualty montorng systems are establshed So far, many methods are used to predct water qualty ncludng grey relatonal method [1], mathematcal statstcs method [2], model-based approach [3], Bayesan approach [4], neural network model [5-8], and Genetc Algorthm (GA) [9-11]. Approxmately, 85%-90% of the water qualty predcton work have been completed usng Neural Network. Neural network has many favourable characterstcs, ncludng mass nformaton processng, dstrbuted assocaton, and the ablty of self-learnng and self-organzng [12-16]. As a hgh non-lnear system, t also has a good fault-tolerance ablty and a good applcablty to complex problem. However, the non-lnear transfer functon of Neural Network has multple local optmum solutons. Generally, the optmzaton process s nfluenced by the selecton of ntal pont. If the ntal pont s closer to the local optmum pont than to the global optmum pont, t wll cause the mult-layer network falng to obtan global optmum solutons. However, GA can avod these problems easly. GA cannot be restrcted by search space, t can obtan a global optmum soluton of dscrete, mult-extremum hghdmensonal problems wth nose. GA has been used n water qualty model calbraton [9], rver water qualty management model optmzaton [10], and water qualty montorng networks optmzaton [11]. Then, combnng BP Neural Network (BPNN) wth GA can mprove predcton accuracy and speed of BPNN [16-18]. In ths paper, GA s used to optmze BPNN parameters to speed the Journal of Appled Research and Technology 493

3 predcton process. The dfference from other works s that we apply Prncpal Component Analyss (PCA) n the system to reduce data dmensonalty and speed the learnng process. Many factors affect water qualty (There are 23 factors n our work, see materals and methods secton). These factors have complex non-lnear relatonshp wth water qualty. Then, the data dmensonalty should be reduced to extract the most mportant factors. PCA s a technology that can compress multple orgnal ndces nto a few aggregatve varable ndces, whch can represent orgnal data nformaton. PCA has been successfully appled n envronmental data analyss [19,20]. Here, PCA s appled to optmze and select the sample set. In ths work, we combned PCA, BPNN and GA to predct water qualty. By ntegratng the advantages of these algorthms, the water qualty predcton system can not only ensure the predcton accuracy of water qualty, but also can mprove predcton speed. 2. Materals and methods 2.1 Dataset Expermental data are from the detecton data of rvers flowng nto Tahu Lake, Chna. There are 2680 sample data. They were categorzed nto two groups, that s, non-polluted and polluted water. The rato s approxmately 1:1. 23 nfluencng factors of water qualty are ph, NH3-N, volatle phenol, TN, Cr6+, CODMn, TP, BOD5, TCN, COD, petroleum, Cd, Cu, Zn, Pb, Hg, As, Se, F-, sulfde, dssolved oxygen, electrcal conductvty, and LAS. 2.2 Prncpal component analyss (PCA) PCA apples the dea of dmensonalty reducton under the premse that the mnmum orgnal data loss s guaranteed. It can compress multple orgnal ndces nto a few aggregatve varable ndces. In ths paper, we assume the water sample number s n (here n=2680), the number of factors affectng the water qualty s p (here p=23); thus, a water qualty data matrx of n*p (2680*23) order s consttuted. The orgnal sample data x11 x12 x1 p x21 x22 x2 p X xn 1 xn2 xnp matrx s, The new varable target denotes as vector y 1, y 2, y 3, y m (m p). Y s lnear combnaton of the data X. y1 a11x1a12x2 a1pxp y2 a21x1a22x2 a2pxp... y a x a x a x m m1 1 m2 2 mp p m p (1) In the Eq. 1, the loadng vector a a, a,..., a ( 1,2,..., m) s determned by 1 2 p ( I) a 0, satsfyng the followng condtons: (1) y s uncorrelated to y j to form the orthogonal subspace ( j). (2) a T a, the varance of y, s maxmzed. T (3) a a 1, a s standardzed. Egenvalue decomposton of the covarance matrx of X determnes the loadng vector a as an egenvector assocated wth egenvalues. p / j( 1,2,..., p) s the contrbuton of PC. The j1 PC contrbuton ndcates the ablty of PCs to represent the orgnal data. After rankng the value of (usually n descendng order), the frst PCs wth the largest egenvalues are selected. The crteron s the cumulatve value up to 85%. The selected PCs are aggregatve ndces that are used n BPNN. 2.3 Optmze BPNN usng GA The BP network model contans one hdden layer. For the determnaton of hdden layer node number, emprcal formula estmatng or tral 494 Vol. 12, June 2014

4 method of repeated tral calculaton are mostly adopted. Here, the hdden layer neuron number s determned accordng to the expermental Eq. 2. The dfferent Q are tested and 8 s more approprate on condton that the goal and gradent are met as possble. In the end, the fnal network structure s Q ( nput nodes output nodes ) C C 1,10 (2) In our experment, the output value s lmted to the range of [0, 1], and we select logsg as the transfer functon from the nput to the hdden layers and from the hdden to the output layers. BPNN tranng Levenberg-Marguardt (LM) s applcable to the centre network of suffcent memory. Applyng the LM optmzaton algorthm to water qualty predcton may shorten the learnng tme and mprove the tranng speed. BPNN has problems n slow convergence rates and appearances of a local mnmum n convergence learnng. The bg challenge of water qualty predcton s that there s a complex non-lnear recessve relatonshp between nput and output data. Then, t s very practcal to obtan an useful model through a large amount of sample learnng and tranng. random testng; (4) Randomly dvded the 2680 normalzed aggregatve ndex sample data nto tranng (2000 sample data) and testng sets (680 sample data); BPNN s carred out wth GA (GA optmzes BPNN weghts and threshold values). 2.5 Verfy and test the combned model We randomly selected 2000 groups of data for tranng, wth 1000 groups of polluted samples and non-polluted samples. A fve-fold cross valdaton s used to estmate the performance of the hybrd ntellgent algorthm. The predctve value outputted by BPNN wth GA approached 1 or 0, whch could predct whether the water qualty s polluted. The local and global predcton accuraces are computed accordng to Eqs. 3 and The combned model of PCA, BPNN and GA We combned PCA, BPNN and GA algorthms to establsh a water qualty predcton system. PCA s used to remove some redundant nformaton to reduce data dmensonalty and obtan prncpal components. Usng obtaned prncpal components as network nput neurons has many advantages: (1) reducng node number of the network nput layer, (2) smplfyng neural network structure, (3) mprovng both BPNN tranng speed and model predcton rate accuracy wth GA optmzaton network parameters. Fgure 1 s the smplfed flowchart of the combned model. The steps of the combned PCA, BPNN and GA algorthm to predct water qualty are: (1) Converted 2680 groups of sample data nto ther correspondng 2680 groups of aggregatve ndex sample data; the data were normalzed and labeled; (2) Conducted PCA n nput samples X1,X2,,X23; converted them nto aggregatve ndex Z1,Z2,,Zm (m<23); (3) Selected BPNN hdden layer neuron number from repettve Fgure 1. The smplfed flowchart of the combned PCA, BPNN and GA algorthm. Local predcton accuracy (LA): LA P T 1, here, P n Overall predcton accuracy (TA): (3) T 1 (4) TA N In the Eqs.3 and 4, N s the number of all sample data, s class of sample data (non-polluted or Journal of Appled Research and Technology 495

5 polluted water), n s number of class, T s number of correctly predcted samples n class. After that, the remanng 680 sample data s used to test the combned model. 3. Results and dscusson 3.1 Prncpal component analyss After conductng PCA, 23 orgnal sample ndces are compressed nto 15 aggregatve ndces. Table S1 shows the related coeffcent matrx n PCA. Table 1 shows egenvalues and contrbuton rates. Prncpal Contrbuton Cumulatve Egenvalue component rate (%) rate (%) Table 1. Egenvalues and contrbuton rates. The relevant matrces show that there s a strong correlaton between volatle phenol and NH3-N, TN and COD and NH3-N, hexavalent chromum and volatle phenol, CODmn and COD, TP and NH3-N and TN, BOD5 and CODmn. Obvously, the nformaton overlapped. The characterstc values and contrbuton rates n Table 1 show that the frst 15 prncpal components can represent 87.43% nformaton of the orgnal data. Then, 15 prncpal components can replace 23 prmary data. And these 15 prncpal components are nput neurons for BPNN. The dmensonalty reducton can speed the tranng process wth less nformaton loss. 3.2 The performance of the combned model We use the remanng 680 sample data to test the performance of the combned model. Table 2 shows that predcton accuracy of polluted water, predcton accuracy of non-polluted water, and global predcton accuraces are 93.1%, 88.9% and 91% respectvely. And, the predcton accuraces of polluted water are all larger than that of non-polluter water. In ths work, the rver data are determned from In 2007, a large bloom of blue-green algae n Tahu Lake caused water qualty to deterorate severely. When we randomly choose the tranng data, f the number of the data n 2007 s larger, the predcton accuracy of polluted water s hgher, whle the non-polluted water s lower. The strong characterstc of heavly polluted water n ths perod may affect the result. At the same tme, these predcton accuraces show that the combned model s sutable for predctng water qualty. Most of all, ths algorthm s very stable due to usng GA to adjust BPNN connecton weght and threshold values. 3.3 Comparson of BPNN performance wth and wthout GA We also compared BPNN predcton rates wth or wthout usng GA. Table 3 shows the results. Table 3 and Fgure 2 shows that BPNN search process wth GA s unlkely to be entangled wth the local optmum soluton. Most predcted rates are approxmately 90%, although the predcted accuracy s hgher. BPNN predcted rate wthout usng GA optmzaton sometmes acheves rates above 80%, but repeated experments show that the traned model predcted rates float larger and sometmes converge to local optmum solutons n the BP network wthout genetc algorthm optmzaton. That can be proved by the MSE. The MSE wth GA s sgnfcantly smaller than the MSE wthout GA. The smaller the MSE, the better the convergence. In the search process of BPNN wthout GA, the optmum soluton cannot be searched, and the predcted accuracy declnes. To overcome the dsadvantages of BPNN, GA s necessary to optmze BPNN parameters. 496 Vol. 12, June 2014

6 COD ph NH3-N volatle phenol TN Cr6+ CODMn TP BOD5 TCN petroleum Cd Cu Zn Pb Hg As Se F-- sulfde LAS dssolved oxygen electrcal conductvty COD ph NH3-N volatle phenol TN Cr CODMn TP BOD TCN petroleum Cd Cu Zn Pb Hg As Se F sulfde LAS dssolved oxygen electrcal conductvty Table S1. The related coeffcent matrx. Number of tmes Sze of tranng set Sze of testng set Percent of accuracy n predctng polluted water Percent of accuracy n predctng non-polluted water Percent of overall predcton accuracy Average Table 2. Predcton accuracy of polluted water and non-polluted water. No. Average predcton rate Average predcton rate Mean Squared Error Mean Squared Error wth GA wthout GA wth GA wthout GA Table 3. Average predcton rates and Mean Squared Error wth and wthout GA n BPNN. Journal of Appled Research and Technology 497

7 solutons, optmze global optmal network parameters, and sgnfcantly mprove the accuracy of water qualty predcton. Ths model makes full use of the advantages and characterstcs of PCA, BPNN and GA algorthms to predct water qualty. Ths model can obtan hgh tranng speed and good predcton rate and can be extended to other classfcaton problem. Acknowledgements Fgure 2. The predcton rate of water qualty wth and wthout GA algorthm. Table 3 and Fgure 2 shows that BPNN search process wth GA s unlkely to be entangled wth the local optmum soluton. Most predcted rates are approxmately 90%, although the predcted accuracy s hgher. BPNN predcted rate wthout usng GA optmzaton sometmes acheves rates above 80%, but repeated experments show that the traned model predcted rates float larger and sometmes converge to local optmum solutons n the BP network wthout genetc algorthm optmzaton. That can be proved by the MSE. The MSE wth GA s sgnfcantly smaller than the MSE wthout GA. The smaller the MSE, the better the convergence. In the search process of BPNN wthout GA, the optmum soluton cannot be searched, and the predcted accuracy declnes. To overcome the dsadvantages of BPNN, GA s necessary to optmze BPNN parameters. 4. Conclusons We present a water qualty predcton model that combnes PCA, BPNN and GA. Usng BPNN model to study water classfcaton and predcton can overcome dsadvantages ncludng the large workload of tradtonal evaluaton methods and strong subjectvty. Ths model possesses objectvty, unversalty and practcalty. PCA converts the mult-ndces nto a few aggregatve ndces wth lttle orgnal data nformaton loss and reduces the nput data to speed the tranng process. Usng GA to optmze network parameters can effectvely prevent the search process from convergng to local optmum Ths work was supported by the Natonal Natural Scence Foundaton of Chna ( ), the Natonal Hgh Technology Research and Development Program (2009AA02C210) and the Fundamental Research Funds for the Central Unverstes (JUSRP11126). References [1] W. C. Ip, et al., "Applcatons of grey relatonal method to rver envronment qualty evaluaton n Chna," Journal of Hydrology, vol. 379, no. 3-4, pp , [2] E. M. Smet, et al., "Treated water qualty assurance and descrpton of dstrbuton networks by multvarate chemometrcs," Water Research, vol. 43, no. 18, pp , [3] D. A. Brydon and D. A. Frodsham, "A model-based approach to predctng BOD5 n settled sewage," Water Scence and Technology, vol. 44, no. 2-3, pp. 9-15, [4] Lu Y, et al., "Water qualty modelng for load reducton under uncertanty: a Bayesan approach," Water Research, vol. 42, no. 13, pp , [5] V. Chandramoul, et al., "Backfllng mssng mcrobal concentratons n a rverne database usng artfcal neural networks," Water Research, vol. 41, no. 1, pp , [6] H. G. Han, et al., "An effcent self-organzng RBF neural network for water qualty predcton," Neural Network, vol. 24, no. 7, pp , [7] E. O'Connor, et al., "A neural network approach to smarter sensor networks for water qualty montorng," Sensors (Basel), vol. 12, no. 4, pp , [8] O. Senkal, et al., "Precptable water modellng usng artfcal neural network n Cukurova regon," Envronmental Montorng and Assessment, vol. 184, No. 1, pp , Vol. 12, June 2014

8 [9] J. H. Cho, et al., "A rver water qualty management model for optmsng regonal wastewater treatment usng a genetc algorthm," Journal of Envronmental Management, vol. 73, no. 3, pp , [10] R. Zou and W. S. Lung, "Robust water qualty model calbraton usng an alternatng ftness genetc algorthm," Journal of Water Resources Plannng and Management, vol. 130, no. 6, pp , [20] R. L. Olsen, et al., "Water qualty sample collecton, data treatment and results presentaton for prncpal components analyss--lterature revew and Illnos Rver Watershed case study," Water Research, vol. 46, no. 9, pp , [11] Y. Icaga, "Genetc algorthm usage n water qualty montorng networks optmzaton n Gedz (Turkey) rver basn," Envronmental Montorng and Assessment, vol. 108, no. 1-3, pp , [12] L. L. Rogers, et al., "Optmal feld-scale groundwater remedaton usng neural networks and the genetc algorthm," Envronmental Scence & Technology, vol. 29, no. 5, pp , [13] S. Ledesma-Orozco, et al., "Hurst Parameter Estmaton Usng Artfcal Neural Networks," Journal of Appled Research and Technology, vol. 9, no. 2, pp , [14]-A. Baharodmehr, et al., "Capactve MEMS accelerometer wde range modelng usng artfcal neural network," Journal of Appled Research and Technology, vol.7 no. 2, pp , [15] M. R. Arab, et al., "Electroencephalogram Sgnals Processng for the Dagnoss of Pett mal and Grand mal Eplepses Usng an Artfcal Neural Network," Journal of Appled Research and Technology, vol. 8 no.1, pp , [16] J. Rvera-Mejía, et al., "PID based on a sngle artfcal neural network algorthm for ntellgent sensors," Journal of Appled Research and Technology, vol.10, no. 2, pp , [17]-V. Petrds, et al., "A hybrd neural-genetc multmodel parameter estmaton algorthm," IEEE Transactons on Neural Networks, vol. 9, no. 5, pp , [18] J. T. Kuo, et al., "A hybrd neural-genetc algorthm for reservor water qualty management," Water Research, vol. 40, no. 7, pp , [19] S. G Dalal, et al., "Evaluaton of sgnfcant sources nfluencng the varaton of water qualty of Kandla creek, Gulf of Katchchh, usng PCA," Envronmental Montorng and Assessment, vol. 163, no. 1-4, pp , Journal of Appled Research and Technology 499