Application of a PCA based water quality classification method in water. quality assessment in the Tongjiyan Irrigation Area, China

Size: px
Start display at page:

Download "Application of a PCA based water quality classification method in water. quality assessment in the Tongjiyan Irrigation Area, China"

Transcription

1 Internatonal Conference on Energy and Envronmental Protecton (ICEEP 06) Applcaton of a PCA based water qualty classfcaton method n water qualty assessment n the Tongyan Irrgaton Area, Chna ue-feng Tao, a, Tao Huang,b, ao-feng L,c, Dao-png Peng,d Faculty of Geoscences and Envronmental Engneerng, Southwest Jaotong Unversty, Chengdu 6756, Chna Chongqng Muncpal Research Insttute of Desgn, Chongqng 40000, Chna a @qq.com, btaohuang70@6.com, c @qq.com, dpdp0330@swtu.cn Keywords: Water qualty assessment; prncpal component analyss (PCA); water qualty classfcaton Abstract: Appled prncpal components analyss(pca) to assess the water qualty of Tongyan Rver n 04, based on montorng data of 8 ndcators, such as CODMn, NH3-N, DO, etc.. As PCA could not classfy water samples accordng to ther water qualty, a PCA based water qualty classfcaton method, whch was smlar to Nemerow approach, was proposed to overcome ths problem. Classfcaton results were compared wth other methods, such lke Fuzzy Evaluaton, Nemerow ndex method and mproved Nemerow ndex method. Result showed that PCA could present an ntutve descrpton of rver s polluton patterns n dfferent months. Based on PCA results, we used PCA based water qualty classfcaton method to classfy water samples so that we could get a deeper understandng of water polluton degree. Introducton Rver plays as one of the most mportant roles n soco-economc actvtes, such as drnkng water supply, agrculture, aquaculture and ndustral actvtes. However, due to anthropogenc actvtes many rvers have been contamnated by the pollutant nputs and the water qualty was deterorated. In order to montor and assess the rver water qualty, many water qualty assessment methods have been appled n Chna, such as the Nemerow polluton ndex, fuzzy comprehensve evaluaton, prncple component analyss and so on[]. The Nemerow polluton ndex s a water polluton ndex takng extreme values nto account usng a weghted envronmental quantty ndex and frequently used n water qualty assessments around the world. However, ths method tends to overemphasze the nfluence of the maxmum evaluaton factor (.e., most serous pollutant factor). Thus, the comprehensve score wll be ncreased n stuatons where the ndex value for one evaluaton factor s much hgher than those of others []. Hence, there exsts the potental problem that the assessment results may dsagree wth the overall water qualty status. The fuzzy comprehensve evaluaton s the process of evaluatng an obectve utlzng fuzzy set theory, whch comprehensvely consders the contrbutons of multple related ndcators accordng to weghts and decreases the fuzzness by usng membershp functons. The fuzzy set comprehensve evaluaton method can mprove understandng of the dverse processes and complex phenomena nvolved n envronmental studes, whch s why t has been successfully used to assess polluton levels for water qualty. Ths method can gve us the assessment result but cannot compare wth water samples [3]. Prncpal component analyss (PCA) s desgned to convert the orgnal varables nto new, uncorrelated varables (axes), called the prncpal components. The PCA provdes nformaton on the most meanngful parameters, whch descrbes the whole data set nterpretaton, provdes data reducton, and summarzes the statstcal correlaton among water qualty consttuents wth 06. The authors - Publshed by Atlants Press 8

2 mnmum loss of the orgnal nformaton. It has been frequently employed for the purpose of evaluatng water qualty [4]. Whle the PCA method can fgure out the qualty of varous water samples but could not tell us the polluton extents (water qualty classfcaton). However, PCA lmtatons nclude gnorng the degree of data dsperson and a weakness n processng nonlnear data. Thus, prncple component analyss may not have good accuracy and relablty. In ths study, a modfed PCA water qualty classfcaton method was carred out based the Nemerow theory. By settng up three classfcaton prncples, ths method could classfy the water sample data by normalzaton and solved the lmtaton of PCA s weakness n water qualty classfcaton. A case study of Tongyan Rver was carred out and classfcaton results were compared between dfferent methods ncludng the modfed PCA method, Nemerow ndex method and fuzzy comprehensve evaluaton. Materals and methods Study Area and Samplng. The Tongyan Irrgaton Area (TIA) was frstly constructed around 5 AD, whch was another mportant large-scale hydraulc proect n ancent Schuan besdes the world-famous Duangyan Irrgaton System [5]. The TIA s stuated between a lattude of N and a longtude of E n the southwest edge of Chengdu Plan, wth an area of 880 km. The TIA has been an mportant trbutary of the Mnang Rver dranage. The man rver n the Tongyan Irrgaton Area s Tongyan Rver, whch flows from the west of TIA to the Mnang Rver at Qnglong. To characterze the water qualty of the TIA before emptyng nto the Mnang Rver, monthly samplngs were carred out at three samplng stes along the manstream of, from January to December, 04. Trplcate water samples from to 5 m below the water surface of each samplng ste were collected usng a portable water sampler (LB-8000E, Qngdao Shouhang Instrument Co., Ltd, Chna). The measurement of water qualty was conducted wthn 4 h after samplng. Water Qualty Classfcaton Method. Nemerow polluton ndex. The mathematcal formula for the Nemerow comprehensve ndex calculaton s as follows: P n max n P P where P s the Nemerow comprehensve polluton ndex, n s the total number of water qualty parameters, P s the polluton ndex of parameter, and (P)max s the maxmum polluton ndex. The followng formulas are used to calculate P: P / M 0 ; And for DO, P DO CDOf M 0 CDOf M / M 0 M 0 s the measured value of parameter, M0 s the desred water qualty standard value (GB ) of parameter, and CDOf s the saturated dssolved oxygen concentraton. Prncpal component analyss. The frst step of PCA s to normalze the measured values by the followng formula: () 9

3 ' ( ) / () where, s the normalzed value of parameter, s the mean of, σ s the varance of parameter. Then do KMO test and Bartlett test of sphercty to verfy data dependence before PCA. The prncpal component can be expressed as: z k a k Z ( z V a k z V... a... z m m kn n V ) /( V V... V m ) Where, z s the component score, a s the component loadng, s the measured value of a parameter, k s the component number, n s the total number of parameters, Z s the comprehensve score, V s the total varance of each component, and m s the total number of components. PCA based water qualty classfcaton. () Classfcaton prncples Accordng to the Nemerow theory, a seres of water qualty classfcaton prncples was set up as follows: I For arbtrary and, f M, M,+, then M, M,+ and vce versa; II For arbtrary and, f M, M,+,then M, M,+ and vce versa; 5 III For arbtrary and, f M, s decded, then = has mnmum value. where, M, s the Class water qualty standard value of parameter, M, s the normalzed water qualty standard value of parameter, s the measured value of parameter, s the normalzed value of parameter, s the mean of. Prncple I s to ensure the range consstency between the orgnal standards and the normalzed ones. Prncple II s to make sure that f the measured value of parameter satsfes Class + but not Class accordng to orgnal standards, the normalzed value I should also satsfes Class + but not accordng to normalzed ones. Prncple III s to ensure the normalzed standard values are as close as possble to the means of normalzed values. The classfcaton result wll destablzed due to a large devaton between the normalzed standard values and the means of normalzed values. () Classfcaton method Smlar lke the Nemerow method, the orgnal water qualty standard values were normalzed by the normalzaton formula (), then a new normalzed water qualty standard was set up for classfcaton. It s easly to prove that the normalzaton satsfes Prncple I and II, as t s ust a mathematcal translaton of orgnal standard values. In order to satsfy Prncple III and be consstent wth Prncple I and II, we suggest that once the Class water qualty standard value of parameter M, max or M, mn, we use the max or mn to substtute the water qualty standard value. As a result, we carred out the PCA based water qualty classfcaton method as follows: M, (3) 0

4 M M M ', ', ', M,,max s s s,,mn,, f f f,mn M M,, M,,max,mn,max (4) where s the mean of, s s the standard devaton of,, mn s the mnmum value of,, max s the maxmum value of. Results and dscusson Data analyss. Standardzaton and ndependence test were carred out to the monthly mean concentratons of 8 ndexes, ncludng CODMn(), NH3-N(), DO(3), Se(4), As(5), Zn(6), Pb(7) and Cu(8). After standardzaton, the KMO test and Bartlett sphercty test were used to check the feasblty for PCA. If the KMO result s larger than 0.5 and Bartlett result s smaller than 0.05, ths ndcates the non-mutual ndependence of data and can be appled for PCA [6]. In ths study, the data s KMO test s and Bartlett sphercty test s smaller than 0.00, whch means the feasblty for PCA. By usng the SPSS 0.0, we ganed the egenvalues, as shown n Fg. Fg. Scree plot of egenvalues The egenvalues of Prncpal Component and (short for PC and PC) are 4.9 and.997, both of whch are larger than. And the cumulatve % of varance of PC and PC s %, whch s larger than 85%. These ndcates that PC and PC have bascally ncluded the nformaton of raw data, whch could be replaced by PC and PC [7]. Prncpal component loadngs. The correspondng ntal factor loadngs of the PC and PC can be calculated by the SPSS 0.0, and by usng the formula below, we can get the prncpal component loadngs, L m = V m /SQR(λ m ) where, Vm and λm are the ntal factor loadng and egenvalue of prncpal component m (m= and ), respectvely. Lm s the prncpal component loadng of prncpal component m. The Prncpal component loadngs of PC and PC are shown n Fg..

5 Fg. PC loadngs (a) PC loadng (b) PC loadng As shown, the correlatons among PC and metal ndexes lke Se, As, Zn, Pb and Cu, are from to 0.448, ths ndcates the PC has manly revealed the stuaton of heavy metal ndexes; whle, the correlatons among PC and CODMn and NH3-H are and 0.558, ths ndcates the PC has manly revealed the stuaton of CODMn and NH3-H. And the correlatons among the prncpal components and DO are and , ths reveals a negatve correlaton, ndcatng the larger DO, the better water qualty. Then we can get the prncpal component score functons by eq. (3), F = (5) F = (6) And the comprehensve score functon s, F = 0.65F F (7) Water qualty by PCA. Accordng to eq. (5), (6) and (7), the prncpal scores of each month n 04 can be calculated, where a hgher score a heaver polluton, shown n Fg. 3. In Fg.3 (a), we found that except October, November and December, the PC scores ranged from to.54 n the rest months, ndcatng the TIA had severe heavy metal polluton n these 9 months; whle for the PC scores, hgher scores appeared from March to June, ndcatng severe CODMn and NH3-H polluton occurred n the TIA n these months; and totally, the most severe polluton condtons

6 appeared from March to June. In Fg.3 (b), we sorted the polluton severty by the comprehensve scores month by month and we found the TIA was most polluted n Aprl, whle November had the lghtest polluton. Fg.3 PC scores (a) PC versus PC scores (b) PC comprehensve score Water qualty classfcaton. The PCA can drectly descrbe the polluton characterstcs month by month, but s dsable to tell the water classfcatons. Ths study suggested a Water qualty classfcaton method based on PCA results. Accordng to the water qualty data and eq. (4), (5), (6) and (7), the PCA classfcaton standards were calculated, shown n Table. Table PCA classfcaton standards Category I II III IV V Score Based on Fg.3 (a), we can get the classfcaton plot accordng to Table and eq. (7), shown n Fg. 4. We can see that the water qualty n November was Class I and Class II for December; n Aprl t was Class IV, whch was the worst. 3

7 Fg.4 PCA classfcaton result Moreover, we compared the PCA classfcaton results wth Fuzzy Evaluaton, Nemerow ndex method and mproved Nemerow ndex method, shown n Fg.5. It can be found that the PCA classfcaton had smlar results wth other methods, except n January and February, whch was sorted as Class III by PCA method whle Class I or II by other methods. We found that the PC scores for January and February were -.36 and n Fg.3 (a), whle the PC scores were and.5. By eq. (7), the PC has a hgher weght of 0.65, larger than PC, resulted n a hgher comprehensve scores n January and February. In comparson to other methods, the PCA classfcaton method can avod the affectng of extreme data and s able to reflect the contrbuton of most ndexes n the classfcaton results. Fg.5 Classfcaton results wth 4 methods Conclusons Ths study appled the PCA based water qualty classfcaton method to assess the water qualty of Tongyan Irrgaton Area n Schuan Provnce, Chna. The PCA classfcaton method n ths study was based on the polluton characterstcs and the % data nformaton derved from PCA. And the classfcaton results were mpacted by the contrbuton of dfferent prncpal components. It was found that the TIA was most polluted n Aprl, whle November had the lghtest polluton n 04. The water qualty n November was Class I and Class IV n Aprl. 4

8 Acknowledgements Ths work was fnancally supported by the Fundamental Research Funds for the Central Unverstes (6806C095). References [] B. Wu, S.Y. Zang,.D. Na: J. Saf. & Env. Vol.5 (0), p [] N.L. Nemerow: Scentfc stream polluton analyss (Scrpta Book Co, USA 974) [3].G. Han, T.L. Huang,.Z. Chen: Acta Scentae Crcumstantae. Vol.33 (03), p [4] R.L. Olsen, R.W. Chappell, J.C. Lofts: Water Res. Vol.46 (0), p.30-3 [5] G. Lu: Schuan Water Cons. No. (05), p.50-5 [6] Q.Q. Du, K. Yan: J. Water Res. Water Eng. Vol.4 (03), p. -4 [7].Y. Lu: Natural Scence Journal of Harbn Normal Unversty. Vol. 3 (05), p