CORRELATION AND PREDICTION OF CHEMICAL PARAMETERS IN GROUND WATER OF THE MUNICIPALITY OF THERMI USING SPSS, G.I.S. AND ARTIFICIAL NEURAL NETWORKS

Size: px
Start display at page:

Download "CORRELATION AND PREDICTION OF CHEMICAL PARAMETERS IN GROUND WATER OF THE MUNICIPALITY OF THERMI USING SPSS, G.I.S. AND ARTIFICIAL NEURAL NETWORKS"

Transcription

1 Proceedings of the 13 th International Conference on Environmental Science and Technology Athens, Greece, 5-7 September 2013 CORRELATION AND PREDICTION OF CHEMICAL PARAMETERS IN GROUND WATER OF THE MUNICIPALITY OF THERMI USING SPSS, G.I.S. AND ARTIFICIAL NEURAL NETWORKS GEORGIOS K. FYTIANOS 1, and KONSTANTINOS L. KATSIFARAKIS 1 Aristotle University of Thessaloniki, Department of Civil Engineering. Division of Hydraulics and Environmental Engineering. Thessaloniki 54124, Macedonia, Greece EXTENDED ABSTRACT Chemical analysis of some elements and compounds that exist in water may be very expensive. Nothing can replace water laboratory testing, but with the help of statistical analysis tools, Geographic Information Systems (G.I.S.) and Artificial Neural Networks an additional complementary check of the water quality assessment can sometimes be achieved. This extra assessment of physical and chemical parameters could be extremely useful. In this paper the correlation and prediction of chemical parameters in ground water of the Municipality of Thermi is studied. The Municipality of Thermi gets water from wells of the greater basin of Anthemountas river and every year this water is checked for its quality and chemical parameters. The area under study is very important for the water supply of the Municipality of Thermi, one of the largest municapilites in the Thessaloniki region of Northern Greece. With the help of the G.I.S. the geology and the topography of the area were studied so as to select wells that share common geomorphological characteristics. Moreover, using the statistical analysis package SPSS, correlation check of the chemical parameters took place. For this check, results from laboratory testings from the year 2007 until 2010 were used. This correlation check is not only useful in itself, but it is also necessary for the successful construction of the Artificial Neural Network. The artificial neural networks (ANN) approach does not aim at mathematical description of the natural phenomena, but at obtaining quantitative results for given data sets, based on experience from similar known cases. ANN are particularly helpful when: a) mathematical simulation of the physical phenomena is either impossible or too complicated and b) Parameters necessary for mathematical simulation cannot be defined with acceptable accuracy. For the study of this specific problem, a three level feedforward artificial neural network was used, which is based on the well-known Quickprop algorithm. A number of network architectures has been investigated. One of the optimal combinations was the 4-7-1, which means that there are 4 input parameters, 7 neurons in the hidden level and 1 output parameter. The ability to predict concentrations of nitrates, sulfates, chromium and arsenic from concentrations of other chemical compounds or elements was studied. The final results show that the combination of SPSS, GIS and the use of ANN is of great value, especially when an efficient chemical analysis cannot take place. KEYWORDS: artificial neural networks, G.I.S., SPSS, chemical parameters, ground water 1. INTRODUCTION The Water supply system of the Municipality of Thermi gets water from wells of the greater Anthemountas river basin. Anthemountas river basin is situated in Central Macedonia, in the northern part of Greece. It is the northwest part of the Chalkidiki peninsula covering 318 Κm 2. It consists of two main sub basins, Vasilika sub basin (208 Km 2 ), which occupies the lower parts of the basin and Galatista sub basin (110 Km 2 ),

2 which is situated higher. A dense well-formed stream network drains the area, while the geological features include a large variety of different sediments combined with various rocks. Apart from that the existence of Anthemountas fault, a number of smaller parallel and vertical faults and geothermal activity make the geology of the area complicated (Fikos et al, 2005). Many different human activities are hosted in the area resulting to increased demand for water of good quality and of large quantity. Every year, water quality of the Municipality of Thermi is defined by analyzing it in terms of its chemical, physical and biological content. Since 2011 the annual analysis of water from each and every well has stopped due to financial problems and from 2011 and on analysis of mixed water from nearby wells is being made. This study uses Artificial Neural Networks, Statistical Tools (SPSS) and Geographic Information System (G.I.S.) for the water quality assessment of Anthemountas basin. This study focuses on correlation of chemical parameters and investigates if these computational techniques can help with water quality assessment and to what extent. The goal of this study is to investigate if an ANN can predict specific chemical concentrations from other ones. In order to achieve this, a possible correlation between chemical parameters from different wells was investigated with the help of SPSS. The geology and the topography of the area, the distance between different wells and the depth of each well were into account and a specific area of study was chosen, using GIS 2. ARTIFICIAL NEURAL NETWORKS Artificial Neural Networks (ANNs) are a machine learning computational technique, based on the function of biological neural networks. Τhey were initially introduced to simulate the processing ability of the human brain via mathematical modeling of its function (e.g. Argyrakis, 2001; Diamantaras, 2006). Up to now ANNs have been widely used to model environmental processes, since their ability to accurately represent the complex, non-linear behavior of relatively poorly understood processes makes them highly suited to complicated simulations( Lee at al.,2008) They consist of simple interconnected processing units (neurons), which are arranged in layers. ANNs use learning algorithms that alter the strength (weights) of the neuron connections, which are called synapses, in order to achieve optimal results. This learning procedure is called training (Antonopoulos et. al, 2012). In our study we have used for training the Quickprop algorithm. This algorithm is a variant of the classic back propagation algorithm that aims at increasing convergence rate. It uses two training parameters and two testing parameters: the training rate n, the initial addition to the sigmoid faction (sigmoid prime offset), the maximum growth factor (max factor) and the maximum range of weights (weight range). Fahlman (1988) was the first to introduce some initial estimates for the first three parameters. Using the Quickprop algorithm, the weight variation is calculated by means of the following equation: S( t 1) ( t 1) wij ( t) w ij (1) S( t) S( t 1) where w ij is the weight between i and j neurons, Δ(t+1) is the weight variation, S(t+1) is the partial derivative of the error function of the weight w ij and the S(t) is the previous partial derivative. With the Quickprop algorithm a faster conversion of the weights can be achieved than by using the classic back-propagation algorithm.

3 3. CALCULATIONS PERFORMED 3.1. GIS In order to study the various geological conditions of the area, Geographic Information Systems (G.I.S.) have been used. The geology and the topography of the area were studied so as to select wells that share common geomorphological characteristics. From a total of 29 wells of the sub-basin of Vasilika, that feed the water supply system with total flow rate of 425 m 3 /h, 9 wells were chosen. The selection was based on similarties at geology, elevation, depth of the well and on the relatively close distances among them. The procedure is described below. In Figure 3.1 the topography of Anthemountas Basin is shown, where with darker colour areas of lower elevation are represented. The chosen wells, exist in the west part of the area and they are in relatively lower elevation than others. Figure 3.1. Topography of Anthemountas Basin Each and every column of Figure 3.2 represents a different geological formation. The chosen wells belong to similar geological formation columns. Figure 3.2. Geological Map of Anthemountas Basin In Figure 3.3, Anthemountas river basin is shown in two layers. The upper one is the topography and the other represents the geology of the Anthemountas basin. With GIS, a combination of these characteristics was possible so as to determine similarities both in geology and topography of the wells.

4 Figure 3.3. Geological map (lower layer) and topography (upper layer) The coordinates of every well were inserted into the GIS. Except from the geology and the topography, the distance between them was taken under consideration. Finally, wells with relatively similar depths were chosen SPSS SPSS Statistics is a software package used for statistical analysis developed by IBM. For the 9 selected wells, 11 chemical parameters were studied for correlation among them: ph, electrochemical conductivity (EC), total hardness (ΤΗ), Νa +, Ca ++, Mg ++, NO 3,SO 4 Cl -, As and Cr (Table 3.1). Results from annual chemical measurements tests from the year 2007 until 2010 were examined. For every well there were 11 chemical measurements for each year; that means that there were 99 inputs for every year for all the wells and 396 for the whole 4 years. Correlation found among two groups. The first group is EC, Ca ++ and Cl - and the second one is SO 4 Νa + Cl -. Table 3.1: SPSS Correlation Results ph EC TH Νa + Ca ++ Mg ++ NO3 Cl - SO4 As Cr ph 1 -,476 * ,463 * ,389 * EC -,476 * 1,511 **,690 **,543 **.165,642 **,804 **,479 *.116 -,451 * TH -.209,511 ** ,844 ** ,528 ** -,425 * Νa + -,463 *,690 **.119 1,454 * -.106,641 **,870 **,697 ** ,423 * Ca ,543 **.363,454 * ,421 *,604 ** Mg ,844 ** ,518 ** NO ,642 **.193,641 **,421 * ,807 ** Cl - -,389 *,804 **.215,870 **,604 ** -.094,807 ** 1,534 **.015 -,514 ** SO ,479 * -.080,697 ** ,534 ** As ,528 ** ,518 ** Cr ,451 * -,425 * -,423 * ,514 ** Application of the Neural Network Construction and adjustment of suitable ANN for each particular case is not an easy task. Many times an optimization procedure has to be followed. Moreover, sufficient field data (e.g. accurate chemical measurements) are required for the so-called training phase.

5 Different attempts have been made for construction of an ANN, in order to predict the concentration of chemical parameters in Anthemountas ground water. Its features are outlined in the following paragraphs. According to correlation data from SPSS, there is correlation between NO 3, electorchmical conductivity, Ca ++ and Cl -. EC, Ca ++ and Cl - measurements from previous years (2007, 2008, 2009) were the inputs of the neural network. The output is the concentration of NO 3-. Therefore, the network had 4 neurons in the input layer. The first neuron represents the well and the other 3 correspond to the chemical parameters. The output layer has only one neuron, corresponding to nitrate concentration. For the hidden level, several trials have been made, in order to achieve the best results. The final architecture of the network was decided based on the total error of the last epoch of training. In the same way, a network was constructed for prediction of SO 4 from Νa + and Cl -. 4 RESULTS 4.1. NO 3 Prediction The investigation of the ANN structure included several trials, regarding the number of the neurons of the hidden level and the number of training epochs. The number of hidden layer neurons varies from 3 to 15 and that of the training epochs from 500 to The structure with the most satisfactory results was (Figure 4.1). Figure 4.1. The final network architecture The inputs were 4 (number of well, concentration of calcium, concentration of chloride, and EC). The hidden layer had 5 neurons and the output represents the concentration of NO 3. The number of epochs was 650 and 27 data sets were used for training of the ANN and 9 data sets for checking. The comparison between calculated and expected values for NO 3 - concentration is shown in Figure 4.2. The correlation coefficient between calculated and expected values was 0,97.

6 Figure 4.2. Comparison between calculated and expected values of nitrate concentration. Red line represents expected values and blue line calculated values 4.2. SO 4 Prediction SPSS showed correlation among Na +, Cl - and SO 4. Therefore, the inputs for SO 4 prediction are only 3 and not 4 as before. The first input represents the number of well and the other two the concentrations of Cl - and Na +. The output layer corresponds to SO 4 concentration. Different trials have been made so as to determine the number of the neurons at the hidden layer. Finally, the number of neurons of the hidden layer was 7. The final structure of the network was The number of epochs was 600 and 27 data sets were used for training of the ANN and 9 data sets for checking. Figure 4.3. Comparison between calculated and expected values of SO 4 concentration. Red line represents expected values and blue line calculated values

7 4.3. Prediction of Arsenic and Chromium For the prediction of arsenic (As) SPSS showed not direct correlation with other chemical parameters. Arsenic had average correlation with total hardness (0,528) and with Magnesium (0,518). That is why, together with the lack of many measurements for the training of the ANN, only a mediocre prediction was achieved. The correlation coefficient between calculated and expected values is not exceeding 0,5 and the total error is close to 1. Similar results were achieved for Cr, too. 5. CONCLUSIONS Despite the scarcity of field data, results are encouraging. The feed-forward back propagation ANN that was constructed and trained using the Quickprop algorithm, gave satisfactory results. Even better results can be achieved, if more regular chemical measurements are available. The results of this study showed that the combination of geoscience, statistics and the use of Artificial Neural Network is of great value, especially when an efficient chemical analysis cannot take place. As a future perspective, information on ground water flow together with aquifer parameters can be added to the process so as to study groundwater pollution in greater detail. REFERENCES 1. Fikos I, G. Ziankas,A. Rizopoulou,S. Famellos. Water Balance Estimation in Anthemountas River Basin and Correlation with underground water level ANATOLIKI S.A. 9 CEST 2005, Rhodes, Greece 2. Αntonopoulos Z., Vafeiadis M., Κatsifarakis K.L. and Spachos T. (2012) Simulation of a karstic aquifer using artificial neural network. Proceedings of the Protection and Restoration of the Environment XI Conference, Thessaloniki, Nagoulis A., ANATOLIKH S.A. (1998). Hydrogeological study, Water balance study of Anthemountas river basin, Prefecture Thessaloniki. 4. Argirakis P. (2001). Artificial Neural Networks and Applications, Hellenic Open University.Patra. 5. Diamantaras K., (2006). Aritificial Neural Networks, Computer Department, Τ.Ε.Ι. Thessaloniki. 6. Yugyung Lee, Alok Khemka, Jin-Wook Yoo et al. Assessment of diffusion coefficient from mucoadhesive barrier devices using artificial neural networks-international Journal of Pharmaceutics, 351(2008): Farmaki E.G., Thomaidis N.S., Efstathiou C.E. (2010). Artificial Neural Networks in water analysis: Theory and applications. International Journal of Environmental Analytical Chemistry Vol 99, pages