USE OF RESPONSE SURFACE METHODOLOGY FOR EXTRACTING A MODEL FROM AN ARTIFICIAL NEURAL NETWORK: A CASE OF INITIAL DILUTION MODELLING

Size: px
Start display at page:

Download "USE OF RESPONSE SURFACE METHODOLOGY FOR EXTRACTING A MODEL FROM AN ARTIFICIAL NEURAL NETWORK: A CASE OF INITIAL DILUTION MODELLING"

Transcription

1 1 st Water and Environment Specialty Conference of the Canadian Society for Civil Engineering 1 re Conférence spécialisée sur l eau et l environnement de la Société canadienne de génie civil Saskatoon, Saskatchewan, Canada June 2-5, 24 / 2-5 juin 24 WE-56 USE OF RESPONSE SURFACE METHODOLOGY FOR EXTRACTING A MODEL FROM AN ARTIFICIAL NEURAL NETWORK: A CASE OF INITIAL DILUTION MODELLING Mukhtasor 1, and L. M. Lye 2 1. Department of Ocean Engineering, Faculty of Marine Technology, Sepuluh November Institute of Technology (ITS), Surabaya, 6111, Indonesia. 2. Faculty of Engineering and Applied Science, Memorial University, St. John s, NL, A1B 3X5, Canada ABSTRACT: Artificial Neural Networks (ANN) are advanced soft computing technologies that have been widely used in many areas including environmental hydraulics. Traditionally, an ANN is used an alternative method for dealing with mathematically intractable problems and to capture the important features embedded in a large data set to produce predictable results. However, to extract rules from a network or impart them with some explanation capability is a challenging task. This paper explores the potential use of Response Surface Methodology (RSM) to extract an explicit model from an ANN for the case of initial dilution modelling. The study involves three parts: initial dilution estimations using a specified empirical initial dilution model, ANN modelling, and finally using Response Surface Methodology or RSM-based modelling. The empirical initial dilution model is used to provide a data set for training, cross-training (testing) and validation of the ANN. The study shows that the ANN performed well in reproducing the empirical initial dilution model results. However, the performance is only as good as the quality of the data used during training, testing and validation. On the other hand, the RSM-based model performed well in modelling the performance of the ANN, and is in very good agreement with the empirical model. Moreover, the RSM-based model performed better than the ANN when used to extrapolate a particular case, which is not available during the training of the ANN. For the problem under consideration, RSM may be used to extract a model from an ANN to provide a simple and explicit model. The explicit model provided information on the relative contribution of each variable and their interactions, and provided some clues on the physics of the phenomenon being investigated. Furthermore, it is easier to couple the explicit model with other models to solve bigger and more complex problems. Keywords: Artificial neural networks, initial dilution modelling, response surface methodology. 1. INTRODUCTION Artificial neural networks (ANNs) have been developed for more than a half century, inspired by a desire to understand the human brain and to emulate its functioning. ANNs have seen increasing use in diverse disciplines including engineering and science. Researchers from many branches of engineering and science have shown their enthusiasm in this method as indicated by a growing number of publications on the development or use of the method. Moreover, several professional societies have appointed task forces or committees to investigate the development of ANNs within their disciplines or to have special WE-56-1

2 issues of journals dedicated to the applications of ANNs. For example, the ASCE Task Committee (a, b). The reasons that ANNs are attractive computational tools include: (1) the ability to recognize the relation between the input and output variables without explicit physical considerations, (2) the ability to perform well even for training sets containing noise or measurement errors, and (3) once trained ANNs are easy to use (ASCE Task Committee a). However, apart from successful quantitative performance, ANNs are very sensitive to the quantity and the quality of data available, which cannot often be easily met. The ASCE Task Committee (a) noted that the primary reason for the skeptical attitude towards ANNs is their lack of physical concepts and relations. Furthermore, no generalized approach can be established to determine the relative importance among the input variables. The absence of a relatively comprehensible explicit model associated with the ANNs, makes it difficult for the networks to be reproduced. Therefore, if a case under consideration is a subset of a bigger complex problem, then the ANNs are not easily integrated to form coupled models. Considering the above listed problems, it is a challenging to extract an explicit model from an ANN. The objective of this study is to explore the use of Response Surface Methodology (RSM) for model extraction from an ANN. The choice of RSM for model extraction is based on other studies, in which RSM has been successfully used to develop simple replacement models from complex ones (e.g. Ebead et al., 22; Zangeneh et al., 22). For the purpose of extracting a model from an ANN using RSM, a case study based on an initial dilution modelling of a line diffuser was performed. Initial dilution modelling has been found helpful in the design of outfalls for effluent discharges, including the release of sewage, and heated waters from the cooling water system of a power plant (Mukhtasor et al. 22, Robert et al. 1989). For the ANN modelling, a set of training, cross-training, and validation data were prescribed using an empirical initial dilution model applicable for a line diffuser outfall (Robert et al. 1989). The RSM was then employed to extract a model from the developed ANN model. An evaluation of the proposed methodology was carried out by comparing the predictions from the extracted model with those from the ANN and the empirical model from which the data were generated. 2. INITIAL DILUTION MODELLING, ANNS, AND RSM APPROACHES Many studies have been conducted in the past for modelling initial dilution for the ocean discharge of wastewater. Mukhtasor et al. (22) have provided a discussion on previous works on initial dilution, and proposed a new modelling approach for initial dilution of single buoyant-jet outfalls. For a line diffuser outfall, Robert et al. (1989) proposed simple empirical relations of initial dilution, which have been frequently referred in the analysis of ocean outfalls. The equation was developed based on laboratory experiments, which were conducted in the stratified towing tank of the United States Environmental Protection Agency (U.S. EPA). Based on the data collected, models of initial dilution with various orientations of diffuser relative to the ambient current were developed. This paper considers the initial dilution of the line diffuser with perpendicular seawater currents. More detailed discussion of the model development and application can be found in Robert et al. (1989). To evaluate the performance of ANNs and RSM-based models, the initial dilution model from Robert et al. (1989) was considered as an empirical initial dilution model. The model developed by Roberts et al is semi-empirical model given by: 2 / 3 [1] 2b 1/ 6 S = [ 2.19F.52] a qn Where: S a = average initial dilution; b = g ρ/ρ a, where the density difference refers to the difference between the density of wastewater to that of ambient water, ρ = ρ a - ρ e, in which ρ a is the density of ambient water and ρ e is the density of wastewater; F = U 3 /b, where U is the ambient seawater current; q = WE-56-2

3 Q/L, the waste water discharge per unit length of diffuser; and N = buoyancy frequency. The buoyancy frequency reflects the degree of the density stratification of the ambient water. Having this empirical model, data for training, cross-training (testing) and validation of the ANN as well as data necessary for model development using RSM is generated. This empirical model also makes it possible to perform comparative assessment between the ANN and RSM by setting the empirical model as the true model. Use of actual laboratory data is not theoretically impossible, but this is beyond the scope of this study, in which the focus is on exploring the potential use of RSM to extract a model from an ANN. ANNs can be considered as an approach to the problem of computation and attempts at modelling the information processing capability of the nervous systems. A neural network is characterized by its architecture representing the pattern of connection between nodes, its method of determining connection weights and the functional relationship of the response of a node to the total input signal it receives (the ASCE Task Committee a). A more detailed discussion on development of ANNs is available elsewhere, for example, the ASCE Task Committee (a, b). In the area of water resources and environmental engineering, ANNs are becoming more popular, indicated by a growing number of publications with ANNs applications. For example, Neelakantan et al. (22) employ a neural network approach to relate risky Crystosporidium and Giardia concentrations with other biological, chemical and physical parameters in surface water. ANNs have also been applied for the purpose of forecasting with a case study of weekly nitrate-nitrogen (nitrate-n) in the Sangamon River near Decatur, Illinois, based on past weekly precipitation, air temperature, discharge, and past nitrate-n concentrations (Markus et al. 23). This growing use of ANNs as an attractive computational tool is partly because of the quantitative performance in relating the input and output variables without explicit physical consideration, even when training sets contained noise or measurement errors to some extent (the ASCE Task Committee a). In addition to the above advantages, the ASCE Task Committee (a, b) also highlighted limitations of the ANNs, particularly those in relation to hydrology. Several problems associated with applications of ANNs in hydrology, which were discussed in the ASCE Task Committee (a, b), appear to be a common problem in the application of ANNs in engineering. These include their lack of physical concepts and relations and no standardized way of selecting network architecture. In fact, the choice of network architecture, training algorithm, and the definition of error are commonly based on ones experience and preference, rather than the physical aspects of the problem. The many and complex connections of networks make it difficult for the ANNs to be presented in an explicit model, which can be easily comprehended. The networks are therefore considered to be a black box, which is not readily reproducible. If the case under investigation is a subset of a bigger and complex problem, the ANNs are not easily coupled to form an integrated model. Initial dilution modelling as a subset of hydrodynamic modelling of wastewater discharge into surface waters is just one of many examples of these complex problems. Considering the above advantages and limitations, RSM is evaluated in this study as a method to extract an explicit model from an ANN. The RSM is a set of techniques used in the empirical modelling of relationships between one or more responses and one or more variables. It comprises of (1) statistical design of experiments, (2) regression modelling, and (3) optimization analysis (Myers and Montgomery 1995). Use of RSM to find a replacement model from a complex model has been carried out in finite element modelling (Ebead et al. 22) and soil dynamics (Zangeneh et al. 22). Despite its success, it should be noted that in RSM, estimating of the approximation error is a difficult task. Furthermore, the method may be considered as a local analysis, which is only valid for the studied ranges of the specified input variables. WE-56-3

4 3. EVALUATION OF ANNS AND RSM FOR INITIAL DILUTION MODELLING A three-layer back propagation network, which is considered to have the ability to generalize well on a wide variety of problems, was employed in this study. The model development, training, cross training (hereafter called testing), and validation were undertaken with a help of a computer software, NeuroShell 2 (Ward System Group, Inc. 1995). For the case under investigation one output and four input variables are specified in the initial dilution model for a line diffuser under perpendicular seawater current. The output is the averaged initial dilution, and the inputs are the wastewater discharge per unit length of diffuser (Q/L), the ambient seawater current (U), the relative density difference ( ρ/ρ a ), and the buoyancy frequency (N). The choice of these input variables to determine the averaged initial dilution is based on the fact that these four variables are the basic variables affecting the mixing behaviour of the wastewater once it enters the marine waters through a diffuser. The length scale and dimensional analysis approaches are not employed here in order to be able to evaluate the direct effect of the above-mentioned basic variables and, if possible, their interaction on the averaged dilution. A specified range of the input variables is selected after considering parameters associated with realistic existing ocean outfalls. A set of 2 data pattern was used in the training and testing by using the Net- Perfect option available in NeuroShell 2. This option made it possible for the ANN being trained to find the optimum network for the data in the test set by splitting the 2 data patterns into two portions: 175 data patterns as a training set and the remaining 25 data patterns as a testing set. This scenario helps in developing a network which is able to generalize well and give good results on new data and in identifying if the network is over trained. For the case at hand, the training typically took approximately two to three hours to provide satisfactory results. Once the training was completed, the ANN was validated with a set of 55 data patterns, which have not been used in the training and testing. If the ANN gave satisfactory performance during validation, the network was assumed to be representative of an ANN-based initial dilution model, which can give predictions of averaged initial dilution for a specified set of input variables. The next step is to extract a model out from the established ANN by RSM. A RSM with a Central Composite Design (CCD) was employed in this study with a single centre point. This design required 25 experimental runs for the four input variables of Q/L, U, ρ/ρ a and N. The associated values of the output variable (i.e. the values of the averaged initial dilution) for this 25 data set were calculated from the ANN. The inputs-outputs were then analyzed using a stand-alone design of experiment software called Design- Expert 6..5 (State-Ease ) to give an appropriate model. For the case at hand, a second-order model after a natural-logarithmic transform of the output variable gave the best fit to the data. This model is given by: [2] ln (S a ) = C 1 + C 2 *Q/L + C 3 *U + C 4 * ρ/ρ a + C 5 *N + C 6 *(Q/L) 2 + C 7 *U 2 + C8*N 2 + C 9 *(Q/L)*U +C 1 *(Q/L) * ρ/ρ a in which C 1 to C 1 are coefficients defined in column 2 of Table 1 (Scenario 1: ANN-based RSM model). The goodness-of-fit measures of this equation are satisfactory as summarized in column 2 of Table 2 (Scenario 1: ANN-based RSM model). Figure 1 shows the comparison between the ANN and ANN-based RSM models, based on the 25 data points. It can be seen that, in general, both models performed well except for one single point. For this case, both models predicted well off the actual value. Upon rechecking, this value is associated with one set of input variables defined by the CCD, which was not part of the ANN training set. This set of input variables produced exceptionally lower dilution than it should be, and the residual diagnostic identified it as an outlier. This case suggested that the ANN only performs as good as the data used during the training of the network. It cannot predict a case for which they never learned during the training. In ANNs, nothing can be done to fix this problem except to provide additional data covering the case under consideration and to retrain the network by including the appropriate additional data. Interestingly, as shown in Figure 1, the model provided by RSM performs well in comparison to the ANN, and it provided a lower error (closer to the straight line) in predicting the case involving the outlier. WE-56-4

5 Unlike the ANN, the RSM-based model can still be developed even though some of the CCD-defined data is missing. Upon removing the data associated with the outlier, i.e. the case where the ANN never learned during training, a new model was developed using RSM. The model takes the same form of the previous model given in equation 2 with coefficients presented in column 3 of Table 1 (Scenario 2: RSM-based model after removing the outlier). The goodness-of-fit measures shown in column 3 of Table 2 (Scenario 2: RSM-based model after removing the outlier) showed that its statistical measures are better than those before removing the outlier. The comparison is also given in Figure 2, which showed that the RSM-based model performed well in extracting a model from the ANN, and that it may perform better than the ANN in extrapolating outside the range of the ANN training data set in some cases. The performances of the models were further evaluated based on the training, testing and validation data sets. The graphical comparisons are given in Figure 3 based on the training and testing data sets. In reference to these data sets, it is not a surprise that the ANN is better than the RSM-based model, which was derived from the ANN model. There is always a degradation of the quality when multistage modelling is carried out. However, when the models are used to predict cases outside the training pattern although data for each variable is quantitatively within the same range as that used during the ANN training, but one that is of a different combination among the variables the ANN was less accurate than the RSM-based model as shown by the five points with highest values of dilution in Figure 4. After removing the outlier, however, the RSM-based model was the best for this case. An effort was made to retrain the ANN by adding the additional training pattern that contained the data set producing the higher initial dilution. This modified ANN was then modelled using the RSM. The modified ANN-based RSM model is then compared with the modified ANN as shown in Figures 5 to 8. All the figures showed that RSM can be used to extract a model from the ANN with a minor reduction in performance. This implies that the RSM may offer a methodological solution to the problem encountered in ANNs, as discussed previously. The RSM can provide an explicit and simple model as shown in equation 1. The explicit model also helps in providing a physical interpretation among variables involved in the modelling. The RSM-based model provided a type of model, which is easier to use for rapid analysis for many practical problems. The model is easier to couple with others when it is a subset of a bigger and complex model. This is particularly important for many problems faced by civil and environmental engineers, for which engineering solutions involved complex models to account for various phenomena affecting the system under investigation. For example, for hydrodynamic modelling of the mixing zone in the case of effluent discharge into surface waters, initial dilution modelling is only a small part of the modelling, which involves turbulent diffusion and buoyant spreading modelling. In other cases, decay modelling may also be of interest. For this situation, an explicit and simple model is preferable so that it can be easily coupled with other models to solve the problem. 4. CONCLUSIONS This study showed that response surface methodology (RSM) can be used for extracting an explicit model from an artificial neural network (ANN). The ANN was developed using a three layer back-propagation method and the central composite design (CCD) was the experimental design used in the RSM. The case study considered initial dilution modelling of a line diffuser with Roberts et al. (1989) empirical model used as a reference to evaluate the model. Based on the results, it was found that the ANN performed as good as the data used in the training and that RSM is a good methodology for extracting a model from an ANN. An explicit and simple model provided by RSM can be used for a better physical interpretation among the input variables. In addition, the explicit and simple model developed using RSM made it possible for the model to be combined with other models if it is a subset of a bigger and complex model. This later case is difficult, to perform by conventional ANNs. It is recommended that further work is carried out in this area to look at the type and complexity of ANNs that can be handled using response surface methodology. 5. ACKNOWLEDGEMENT The first author greatly appreciates the financial support from Dr. J. J. Sharp, Emeritus Professor, through his NSERC research grant, the Faculty of Engineering and Applied Science, Memorial University of WE-56-5

6 Newfoundland, Canada, and the partial financial support from the Hibah Bersaing, The Ministry of National Education, Republic of Indonesia. 6. REFERENCES American Society of Civil Engineers (ASCE), Task Committee on Application of Artificial Neural Networks in Hydrology (a). Artificial Neural Networks in Hydrology. I: Preliminary Concepts. Journal of Hydrologic Engineering, 5(2): American Society of Civil Engineers (ASCE), Task Committee on Application of Artificial Neural Networks in Hydrology (b). Artificial Neural Networks in Hydrology. II: Hydrologic Applications. Journal of Hydrologic Engineering, 5(2): Ebead, U., Marzouk, H. and Lye, L.M. (22), Strengthening of Two-Way Slabs Using FRP Materials: A Simplified Analysis base on Response Surface Methodology, Proceeding of 2 nd World Engineering Congress, Kuching, Malaysia. Markus, M., Tsai, C W.-S and Demissie, M. (23), Uncertainty of Weekly Nitrate-Nitrogen Forecasts Using Artificial Neural Networks, Journal of Environmental Engineering, 129(3): Myer, R. H. and Montgomery, D.C. (1995), Response Surface Methodology: Process and Product Optimisation using Designed Experiments, John Wiley & Sons, 7 pp. Mukhtasor 21, Hydrodynamic Modelling Ecological Risk-based Design of Produced Water Discharge from an Offshore Platform, Ph.D. Thesis, Memorial University of Newfoundland, pp Neelakantan, T.R., Lingireddy, S. and Brion, G.M. (22), Effectiveness of Different Artificial Neural Network Training Algorithms in Predicting Protozoa Risks in Surface Waters, Journal of Environmental Engineering, 128(6): Roberts, P.J.W., Snyder, W.H., Baumgartner, D.J. (1989) Ocean Outfalls I: Submerged Wastefield Formation, Journal of Hydraulic Engineering, 115: Stat-Ease (), Version 6 User s Guide, Design-Expert Software. Ward Systems Group, Inc. (1995), NeuroShell 2, Manual, Third Edition August, 246 pp. Zangeneh, N. Azizian, A., Lye, L.M. and Popescu, R. (22), Application of Response Surface Methodology in Numerical Geotechnical Analysis, Proceeding of 55 th Canadian Geotechnical Conference, Niagara Falls, Canada. WE-56-6

7 Table 1: Coefficients of the RSM-based models for different scenarios The RSM-based model is in the form of: ln (S a ) = C 1 + C 2 *Q/L + C 3 *U + C 4 * ρ/ρ a + C 5 *N + C 6 *(Q/L) 2 + C 7 *U 2 + C8*N 2 + C 9 *(Q/L)*U +C 1 *(Q/L) * ρ/ρ a Coefficients Scenario 1: ANNbased RSM model Scenario 2: RSM-based model after removing the outlier Scenario 3: RSM-based model after modifying ANNs by retraining using a representative data set C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8 C 9 C N/A N/A N/A Table 2: Fitting measures of the various ANN-based RSM models Measures Scenario 1: ANN-based RSM model Scenario 2: RSMbased model after removing the outlier Scenario 3: RSM-based model after modifying ANNs by retraining using a representative data set R Squared Adj R-Squared Pred R-Squared MSE WE-56-7

8 Sa (ANN) Sa (ANN-based RSM model) As a result of outlier Figure 1: Performance of the models based on Central Composite Design of 25 data points Sa (ANN) Sa (ANN-based RSM model) Sa (RSM model-without outlier) Figure 2: Performance of the models based on CCD of 25 data points (RSM model when the model is developed after removing the outlier) WE-56-8

9 125 Sa (ANN) Sa (ANN-based RSM model) Sa (RSM model-without outlier) Figure 3: Performance of the models based on training and testing data Sa (ANN) Sa (ANN-based RSM model) Sa (RSM model-without outlier) Figure 4: Performance of the models based on validation data WE-56-9

10 1 Mean Median 8 1th Percentile Absolute percentage error 6 4 9th percentile Standard deviation Mean square error ANN ANN-based RSM model RSM model-without outlier Figure 5: Absolute percentage error of the models based on training and testing data Sa (Modified ANN) Sa (Modified ANN-based RSM model) Figure 6: Performance of the models based on Central Composite Design of 25 data points WE-56-1

11 Sa (Modified ANN) Sa (Modified ANN-based RSM model) Figure 7: Performance of the models based on training and testing data Sa (Modified ANN) Sa (Modified ANN-based RSM model) Figure 8: Performance of the models based on Validation data WE-56-11