Ozone forecasting in the urban area of Seville using artificial neural network technology

Size: px
Start display at page:

Download "Ozone forecasting in the urban area of Seville using artificial neural network technology"

Transcription

1 Ozone forecasting in the urban area of Seville using artificial neural network technology G. Reyes & V.J. Cortes Department of Chemical and Enviromental Engineering, University of Seville. Spain Abstract In this paper a short-term ambient ozone model based on artificial neural networks is presented. The method has been successfully applied in the city of Seville (Spain) to forecast ozone emission concentration typically one day in advance. The applied methodology is based on the following steps: I.The development of an exhaustive measurement campaign in order to evaluate ozone pollution in the city during 1999 summer, autumn and winter (partially) providing a consistent data series of ozone and meteorological variables. 2.A screening of the most relevant variables affecting ozone concentration to obtain accurate forecasting capabilities based on a reduced number of easy-to-measure parameters: traffic, meteorology and features of the area (sheet width, building height, etc.). 3.A neural network approach based on a backpropagation algorithm is adopted to provide both spatial and temporal predictions. In the first case, estimated ozone concentration can be obtained in any area of the city without direct measurements. In the second case, it is possible to predict future ozone concentration levels in the same locations based on weather and traffic flows forecasting. 1 Introduction Ozone (03) is a secondary pollutant, that is not usually emitted directly from tailpipes or stacks, but instead is formed in the atmosphere as a result of reactions between other directly emitted pollutants [l], [2], [3]. These primary

2 pollutants (ozone precursors) result from the use of gasoline, other petrochemicals and fossil fuels, and they are emitted largely by industry and automobiles. Ozone precursors fall into two related groups: Various nitrogen oxides (NOx). A range of volatile organic compounds (VOCs) such as evaporative solvents and other hydrocarbons. In suitable meteorological conditions (e.g., a warm, sunnylclear day), ultaviolet radiation (UV) causes the precursors to interact photochemically in a set of reactions that result in the formation of ozone (and several other photochemical pollutants). Ozone is simultaneously destroyed via oxidation of an additional reactant (D), tipically one of the NOx. These relationships can be expressed conceptually as: Ozone is a powerful oxidant, and as such can react with a wide range of cellular components and biological materials. In particular, damage can occur to all parts of the respiratory tract, the extent of which is dependent on the ozone concentration, exposure duration, exposure pattern and ventilation. Effects observed in the respiratory tract include inflammation; morphological, biochemical and functional changes; and decreases in host defense functions. 2 The ozone pollution problem in metropolitan areas and its regulation Due to the ozone formation process it is found that the following factors are directly implicated in the ground level ozone concentrations in metropolitan areas: 1 Traffic, responsible for the ozone precursors emission. 2 Weather conditions, with a double influence: In one hand, the temperature and the solar radiation cause the precursors to interact photochemically. On the other hand, wind speed and wind direction affect ozone and precursors dispersion phenomena in the atmosphere. 3 Features of the area, like street width and building height, also locally affecting atmospheric dispersion. Many large metropolitan areas, like is the case of Seville, experience elevated ozone levels more frequently in summer, due to high traffic emissions and favourable meteorological conditions [4]. Current Directive 92172lEEC on air pollution by ozone was adopted in It requires the EU Member States to monitor ozone levels, exchange information and inform and warm the population when certain thresholds are reached. Additionnally, the proposal for a European Directive relating to ozone in ambient air (1999) requires a forecast of ozone concentration for the following afternoon, day or even days. Consequently, metropolitan air-quality agencies

3 need to make daily air pollution forecasts of changes in concentration, in the geographical area concerned, and in duration. As reason for occurence andlor expected change in the situation must also be included, criteria for air quality management can be derived. Such forecasts can be improved by neural networks, as compared to statistical techniques, because they have the potential to incorporate complex, nonlinear relationships such as those controlling ozone formation. 3 Short-Term ozone forecasting experiences Systems for forecasting and information of ozone episodes operate on the basis of (combinations of) empirical methods, statistical models and causal models [5]. In statistical models [6] the predictor uses measured ozone and meteorological data combined with statistical information on the most likely evolution of the concentration in the givenlpredicted meteorological condition. In causal models ozone concentration is calculated from emissions of precursors and forecast meteorological conditions. Sucar [7] uses a causal network representation coupled to structure learning techniques to obtain interesting results for ozone prediction. Hadjiiski [S] predicts ambient ozone providing inputs like measured VOC, NO, NOz, UV and temperature. Comrie [9] performs a comparison of neural networks and regression models for ozone forecasting, concluding that the best model is a neural network incorporating lagged previous ozone date, which provides some improvement in performance over multiple regression models. 4 Neural Networks Technology 4.1 Fundamentals Neural network technology is an approach to describing physical system behavior from process data, using mathematical algorithms and statistical techniques [10], [l l], [12], [13], [14]. They are composed of simple elements operating in parallel. These elements (neurons) are inspired by biologial nervous sytems. As in nature, the network function is determined largely by the connections between elements. We can train a neural network to perform a particular function by adjusting the values of the connections (weights) between elements following a determined training algorithm. The basic structure of a neural network involves a system of layered, interconnected neurons. The neurons are arranged to form an input layer, one or more "hidden" layers, and an output layer, with neurons in each layer connected to all neurons in neighboring layers. Commonly neural networks are adjusted, or trained, so that a particular input leads to a specific target output. There, the network is adjusted, based on a comparison of the output and the target. until the network output matches the

4 target. Typically many such inputltarget pairs are used, in this supervised learning, to train a network. 4.2 Backpropagation Algorithm Backpropagation was created by generalizing the Widrow-Hoff learning rule to multiple-layer networks and nonlinear differentiable transfer funcions. The architecture of the network most commonly used with the backpropagation algorithm is the multilayer feedforward network. They often have one or more hidden layers of sigmoid neurons followed by an output layer of linear neurons. The term backpropagation refers to the manner in which the gradient is computed. There are a number of variation on the basic algorithm wich are based on other optimization techniques such as: Gradient descent. It is the simplest implementation of backpropagation learning and updates the network weights and biases in the direction in which the perfomance function decreases most rapidly (the negative of the gradient) proporcionally to a parameter called learning rate. Gradient descent with momentum. This implementation is similar to the previous but provides faster convergence due to the introduction of an additional parameter called momentum. This allows a network to respond not only to the local gradient, but also to recent trends in the error surface. Gradient descent with momentum and variable learning rate. The perfomance of the standard steepest descent algorithm is very sensitive to the proper setting of the learning rate. If the learning rate is set too high, the algorithm may oscillate and become unstable. If the learning rate is too small, the algorithm will take too long to converge. It is not practical to determine the optimal setting for the learning rate before training, and, in fact, the optimal learning rate changes during the training process, as the algorithm moves across the perfomance surface. Levenberg-Marquardt. This algorithm use standard numerical optimization techniques and it converges from ten to one hundred times faster than the algorithms discussed previously. 4.3 Improving generalization One of the problems that occurs during neural network training is called overfitting. The error on the training set is driven to a very small value, but when new data is presented to the network the error is large. The network has memorized the training examples, but it has not learned to generalize to new situations. One of the most commonly used method for improving generalization is called 'early stopping'. In this technique the available data is divided into three subsets. The first subset is the training set which is used for computing the gradient and

5 updating the network weights and biases. The second subset is the validation set. The validation error will normally decrease during the initial phase of training, however, when the network begins to overfit the data, the error on the validation set will typically begin to rise. When the validation error increases for a specified number of iterations, the training is stopped, and the weights and biases at the minimum of the validation error are fixed. The third subset is the test set which is not used during the training but it is used to compare different models. 4.4 Preprocessing techniques Neural network training can be made more efficient if certain preprocessing steps are performed on the network inputs and targets. In some situations the dimension of the input vector is large but the component of the vectors are highly correlated (redundant). It is useful in this situation to reduce the dimension of the input vectors. An effective procedure for performing this operation is the Principal Component Analysis. This technique has three effects: it orthogonalizes the components of the input vectors (so that they are uncorrelated with each other); it orders the resulting orthogonal components (called principal components) so that those with the largest variation come first; and it eliminates those components which contribute the least to the variation in the data set. S Elaboration of neural network models for ozone forecasting in Seville In order to elaborate a potential ozone forecasting model using neural network technology, the following steps were followed: 5.1 Measurement campaign The first step in the design of a neural network model is to obtain a great number of data from previous measurements. To evaluate ozone pollution in the city of Seville, an exhaustive measurement campaign took place during 1999 summer, autumn and winter (partially), providing consistent data series of ozone and meteorological variables (temperature, UV radiation, relative moisture, wind speed and wind direction). Additionnally, traffic flows data were re-collected and processed from the Traffic Control Center in Seville. 5.2 Model variables selection After a screening of the most relevant variables affecting ozone concentration in order to obtain accurate forecasting capabilities, the following inputs (l-hour average) were chosen:

6 Temperature P ~elative moisture Weather Conditions P Wind speed I P Wind direction P Lagged UV radiation ( 2 hours) P Lagged traffic flow ( 4 hours) P Lagged ozone concentration (ozone maximum level from the previous day) 1 > Street width k Building height Features of the area Street orientation Boundary of the area All input variables are continuous but the last one which is discrete.the inclusion of lagged data in ozone modelling is desirable when used for predictive purposes, frequently improving the accuracy of predictions. The model output variable is ozone concentration. 5.3 Training, validation and test sets elaboration. Preprocessing tehcniques A total of 600 inputloutput data pairs were used, 80% of which were used as training set; 10% as validation set; and the remaining 10% as test set. Additionally, a Principal Component Analysis has been performed and the input variables vector dimension has been reduced from 11 to Neural network architecture To find out the optimal architecture, several network architectures have been trained, with number of hidden layers between 1 and 2; and number of neurons in the hidden layerls between 1 and 20, in the case of only one hidden layer, and between 1 and 10, if the architecture consists in two hidden layers. 5.5 Weights initialization Before training the network, the weights and biases must be initialized. The specific technique used to initialize the network was based on a random number generator called 'seed'. Depending on the state of this parameter the initial value of the weights and biases are different. It was thus considered to re-initialize the network by setting this parameter to different values. 5.6 Training Algorithms Neural networks are trained, so that a particular input leads to a specific target output. The three algorithms used have been the gradient descent with mornentum; the gradient descent with momentum and variable learning rate; and the Levenberg-Marquardt. The neural network models were performed using the Neural Network Toolbox 3.0 for use with Matlab [15].

7 6 Results The perfomance of a trained network can be measured to some extent by the errors on the training, validation and test sets, but it is often useful to investigate the network response in more detail. One option is to perform a regression analysis between the network response and the corresponding targets, that returns three parameters; the first two, m and b, correspond to the slope and the y-intercept of the best linear regression relating targets to network outputs; the third variable returned is the correlation coefficient (R-value) between the outputs and targets. Additionally, the Mean Cuadratic Error (MCE) and the Mean Cuadratic Deviation (MCDV) between the network response and the corresponding targets have been the error indices used. The regression analysis of the optimal configuration of the six models considered(combination of the three training algorithms used and the two neural network architecture, one and two hidden layers) is presented in Figure different neural network configurations have been trained varying number of hidden neurons, the value of the 'seed' and the training algorithm parameters. It has been found that models 5 and 6, trained with the Levenberg-Marquardt algorithm provide the most accurate ozone forecasting. In addition, this algorithm converges faster than the others used. MCE for models 5 and 6 are and pg/m3 (20 C) respectively. MCDV is pg/rn2 (20 C) for model 5 and pg/rn3 (20 C) for model 6. Both models have been trained eliminating as inputs either the lagged ozone concentration of the area features. Prediction accuracy is greatly improved when these variables are processed by the model. 7 Conclusions The complex non-linear problem of ozone forecasting can be well-suited by neural networks technology as previously shown by other authors. Our contribution deals with a substantial simplification of the input variables required, as only basic weather conditions, traffic flow data (usually available on-line from Traffic Management Centres), ozone maximum level attained one day before and basic features of the area are required. The neural network models obtained provides both spatial and temporal predictions. In the first case, estimated ozone concentration can be obtained in any area of the city without direct measurements. This micro-scale modelling can be integrated into an integral tool for the whole city. Resolution can be adjusted as required. In the second case, it is possible to predict future ozone concentration levels in the same locations based on weather and traffic flows forecasting.

8 Figure 1 : Optimal configurations of the neural network models Model 1 Model 2 Gradient descent with momentum Gradient descent with momerituin One hidden layer (l9 neurons) Two hidden layers (7 and 10 neurons) Model 3 Model 4 Gradient descent with momentum Gradient descent with momentum and variable learning rate and variable learning rate One hidden layer (18 neurons) Two hidden layers (9 and 3 neurons) Model 5 Model 6 Leveeberg-Marquardt Levenberg-Marquardt One hidden layer (l7 neurons) Two hidden layers (10 arid 8 neurons)

9 Additionally, the development of the tool fulfils the requirements of current Directive (92172lEEC) allowing to satisfy public information requirements, to reduce and prevent exposure and to warn authorities, industry and public to adopt emission reduction measures. Besides, the proposal for the European Directive relating to ozone in ambient air (1999) requires a forecast of ozone concentration for the following afternoon, day or even days. The capabilities of the system developed perfectly match the requirements set up in Annex I1 of this future European regulation. Based on these predictions, environmental agencies could emit public advisories and input into decisions regarding abatement measures for air quality management. References Wark,K.; Warner, C.F. Contaminaci6n del Aire. Origen y Control. Limusa, Ad-Hoc Working Group on Ozone Directive and Reduction Strategy Development. Ozone position paper. July Finlayson - Pitts, B.J. ; Pitts, J. N.: Atmospheric Chemist~y. Fundamentals and Experimental Techniques. John Wiley & Sons Sluyter, R.; Camu, A. Air Pollution by Ozone in the European Unior~, Osewiew ofthe 1999 surnmer season (April-August). October 1999 Van Aalst, R.M; De Leeuw, F.A. National Ozone Forecasting Systems and International Data E.xchange in Northwest Europe. September 1997 Robeson, S.M. Steyn, D.G. Evaluation and comparison ofstatisticalforecast models for daily maximum ozorle concentrations. Atmos. Environ., 1990, 24B, Sucar, L.E.; P6rez-Brito, J; Ruiz-Suarez, J.C.; Morales, E.: Learning Structure from Data and Its Application to Ozone Prediction. Applied Intelligence, v01 7 no 4,Nov 1997 (pp ). Hadjiisski, L., Hople, P. Design of large scale models based on multiple neural network approach, in Proc. Artif. Neur. Networks in Eng Conf. ASME Press Nov Cornrie, A.C. Comparing Neural Networks and Regression Models for Ozone Forecasting.J.Air&Waste Management,vol 47 na6, June 1997 (pp ). [10]. ~ilera, J.R.; Martinez, V.J. Redes Neuronales Artlficiales. Fundamentos, tnodelos y aplicaciones. Ra-Ma, [l l]. Fausett, L. Fundamentals of Neural Networks, architectures, algorithrns and applications. Prentice Hall International, Inc Chen, C.H. Fuzzy logic and neural network handbook. McGraw-Hill, New York [13]. Hertz, J.; Krogh, A.; Palmer, R.G. Introduction to the theory of neural computation. Addison-Wesley, California [14]. Martin del Brio, B.; Sanz Molina, A. Redes Neuronales y Sistemas Borrosos. Ra-Ma, [15]. Demuth, H.; Beale, M.: Neural Network Toolbox User 'S Guide (for use with Matlab).The Math Works,Inc.,July 1998.