Ai Pollution XIX 105 Pediction of TSP concentation in a metallugical city of Bazil using neual netwoks M. M. C. Lima Reseach and Development Cente, Usiminas, Bazil Abstact The aim of this study was to pedict Total Suspended Paticulate concentation (TSP) in the main aeas of Ipatinga, a metallugical city located in Minas Geais state, southeast of Bazil. Atificial neual netwoks (ANN) wee the modelling tool used. This model is able to pedict pollutant concentation just by taining the input and output paametes. The input paametes wee meteoological such as wind diection, wind speed, ain, and ambient tempeatue and also seasonal such as, summe and winte. The output paamete used was the histoical data of the total suspended paticulate concentation taken between 1996 and 2004. In the modelling, the multilaye pecepton (MLP) model was tested. Among the MLP configuations evaluated, the topology 13-7-6 was chosen. The validation of the model was done by compaing the simulated with the obseved values. The esults of this model wee also compaed with the industial souce complex shot-tem dispesion model (ISCST3). The fou statistical tools used to evaluate the fitting wee mean squaed eo (MSE), factional bias (FB), index of ageement (IA) and linea coelation coefficient (R). Compaing the esults it was seen that the pedicted values wee bette in some booughs and wee oveestimated in othes. Besides, the pedicted esults of the ANN model wee bette than the ISCST3 dispesion model. Keywods: atificial neual netwoks modelling, multilaye pecepton, total suspended paticulate concentation, pediction, ISCST3 dispesion. 1 Intoduction This pape intoduces the study of pediction of Total Suspended Paticulate concentation (TSP) in Ipatinga city using atificial neual netwoks. doi:10.2495/air110101
106 Ai Pollution XIX Ipatinga is located in the Vale do Aço Region, in Minas Geais state of Bazil whee the main industial activity is ion and steel making. In the ion and steel making pocess, the aw mateial handling and fuel combustion ae the main causes of paticulate emission. Depending on the paticulate concentation, the ai quality can be modified and causes a lot of damage, such as inceasing the dust in esidential aeas, visibility impaiment and hamful to health effects (USEPA [1]). Mathematical models ae often used to estimate envionmental impacts, saving money fom ai quality monitoing. One of the most impotant featues of models is the ability to pedict o simulate futue impact scenaios. The dispesion o diffusion models have been taditionally applied to atmospheic mathematical modelling. These models ae able to pedict the pollutant concentation using mass balance of statistical data (pollutant emission, wind diection and speed, ambient tempeatue) and intoduce a Gaussian mathematical equation as a solution of pollutant dispesion. Mitkiewicz [2] pedicted TSP concentation in Ipatinga using industial souce complex shottem dispesion model (ISCST3). Recently, atificial neual netwoks model (ANN) has been used in modelling complex poblems. ANN, such as the ISCST3 model is able to pedict ai pollutant concentation just by taining a set of input and output vaiables. It offes a mathematical solution by adjusting the weights in such a way that output will be close to eal data. Compaing ANN to ISCST3 models, the fist one has the advantage in adapting few vaiables to evaluate ai pollutant dispesion. It can be seen in many papes published about this subject: Linyan and Wang [3], Wal and Janssen [4], Peez and Reyes [5], Viotti et al. [6], Zickus et al. [7], Peez and Reyes [8], Podna et al. [9], Odiees et al. [10], Hooybeghs et al. [11]. Thee ae a vaiety of atificial neual netwoks models being used in modelling. Among them, the multilaye pecepton (MLP) is the most cited in ai dispesion modelling (Gadne and Doling [12]). The MLP stuctue consists of pocessing elements and connections. The pocessing elements, called neuons, ae aanged in layes such as an input laye, one o moe occult layes and an output laye (Haykin [13]). They ae all inteconnected. In this context, this study aimed to develop ANNs using MLP. The input paametes wee meteoological data and the output was the TSP concentation data. This model was also compaed with the ISCST3 model. Specifically, it intended to: a) pedict TSP concentation using ANNs in six ai quality monitoing stations distibuted in Ipatinga, b) detemine the main vaiables esponsible fo the measued TSP concentation; c) investigate the behaviou of the ceated ANN due to diffeent configuation poposals; d) validate the ANN model using the compaison between the simulated and measued values and e) compae simulated esults between the ANN and ISCST3 models.
Ai Pollution XIX 107 2 Mateial and methodology 2.1 Sampling TSP concentations wee collected weekly between 1996 and 2004 fom six ai quality monitoing stations using High Volume sample (HiVol) distibuted in six disticts aound Ipatinga named: CA, BR, BA, NC, EC and CS. Meteoological vaiables (wind speed (m/s), wind diection, ain volume (mm), ambient tempeatue ( o C)) wee collected houly duing the same peiod and conveted to daily vaiables. The wind diections evaluated wee noth (N), south (S), east (E), west (W), notheast (NE), southeast (SE), southwest (SW) and nothwest (NW). Futhe vaiables wee also ceated as calm hous (wind speed less than 1 m/s) and seasonal cycle (winte and summe) duing the same peiod fom the same database to evaluate thei effect on the TSP concentations. The database contained 400 data. 2.2 Atificial neual modelling Mathematical outine was developed using the softwae Matlab [14]. MLP was the atificial neual netwok model used. It had been tested lots of models with diffeent configuations. The best topology was defined by the MSE (mean squae eo) analysis. This statistical analysis is suggested by Zhang et al [15] as being efficient in evaluating the atificial neual netwoks models. The meteoological and seasonal vaiables wee used in the input laye. The choice fo the model type and the input vaiables wee due to a lot of published studies in liteatue as mentioned above. Two models with thiteen and six input data wee evaluated. The thiteen data wee: daily eight wind diections fequency, daily mean wind speed, daily mean ain volume, daily mean tempeatue, daily calm hous and seasonal cycle (winte = 1 and summe = 2) fequencies. The six wee obtained using the pincipal component analysis (PCA). The pupose of this analysis is to obtain a small numbe of linea combinations which account fo most of the vaiability in the data (Haykin [13]). In this case, six components had been extacted fom the thiteen input data and they wee evaluated in the modelling as input data. TSP concentations, measued in the six monitoing sites, wee intoduced in the output laye. Only one occult laye was consideed in the modelling. The numbe of neuons in the occult laye was changed as suggested by Zhang et al [15] and Kóvacs [16]. Fist, an exploatoy data analysis was made, detecting if thee wee outlies in the database. Afte that the data wee nomalized and sepaated in taining and validation sets. The taining set was equivalent to 80% of the database and the validation set to 20% (Zhang et al [15]). The leaning algoithms used duing the taining test wee Levenbeg-Maquadt and Backpopagation. The ealy stopping citeion was applied to stop the taining. In the mathematical outine, expected eo, initial weight, bias, activation function, leaning algoithm, momentum tem and the iteation cycles wee established as usual. Finally, afte compaing the eal values to the simulated ones by MSE analysis, the best topology was defined.
108 Ai Pollution XIX 2.3 Pefomance evaluation The model validation was done by compaing between the eal and simulated values. These simulated values wee the model esults obtained fom the validation set. The aveage pecentage eo (E) was detemined as shown in eqn. (1). 1 E n n i 1 C C C e *100 whee n is the sampling size, C e, C simulated and eal values, espectively. The tendency of the simulated esults was evaluated by quantile-quantile plot and type I (false negative) and II (false positive) eo analyses. In the type I eo, the model unde pedicts the values, when they wee supposed to be above a citical value. In this case, the citical value (TSP concentation) was the ai quality standad of 80g/m 3 applied in Minas Geais state. In the othe hand, in the type II eo, the model ove pedicts the values. So they wee supposed to be below a citical value. The cluste analysis was applied to veify the effect of input data in the output data. This multivaiate statistical technique aims to classify the obsevations o vaiables due to thei similaity, applying a distance measue algoithm. 2.4 Compaison between ANN and ISCST3 models The compaison between ANN and ISCST3 models was made. Fo unning ISCST3 model, it was collected the meteoological paametes, topogaphy and ai emission souces chaacteization. The compaison among the two simulated esults and eal values, egisteed in the six monitoing sites, was made. The statistical evaluation tools used wee linea coelation coefficient (R), mean squae eo (MSE), mean factional bias (FB) and mean index of ageement (IA) (Olesen [17]). The linea coelation coefficient is shown in eqn. (2): C CeC C 1 n i e n 1 R C e C The mean squae eo is shown in eqn. (3): n 2 C C i e MSE 1 n The mean factional bias is shown in eqn. (4): 1 n Ce C FB i 1 n 0. 5C e C The mean index of ageement of (IA) is shown in eqn. (5): (1) (2) (3) (4)
Ai Pollution XIX 109 IA 1 n 2 n Ce C 1 i 1 2 C C C C e (5) whee Ce : aveaged simulated concentation, C : aveaged eal concentation, n is the sampling size, C e, C simulated and eal values, espectively, C : eal concentation standad deviation, C e : simulated concentation standad deviation. 3 Results The best ANN configuations esults consideing the MSE analysis ae shown in table 1. Table 1: MLP models esults. Model Neuons Alg. MSE Output neuons Input Occult Output CA BA BR EC NC CS 1 13 7 6 LM 466.7 + + + + + + 2 13 27 6 BP 636.5 + + + + + + 3 13 14 5 LM 354.7 + + + + - + 4 13 19 1 LM 251.5 + - - - - - 5 13 10 1 LM 385.1 - + - - - - 6 13 7 1 LM 208.4 - - + - - - 7 13 7 1 LM 205.7 - - - + - - 8 13 10 1 LM 911.3 - - - - + - 9 13 9 1 LM 336.2 - - - - - + 10 6 4 6 LM 509.1 + + + + + + 11 6 13 1 LM 323.5 + - - - - - 12 5 8 1 LM 386.4 - + - - - - 13 6 6 1 LM 220.9 - - + - - - 14 6 10 1 LM 188.6 - - - + - - 15 6 4 1 LM 869.5 - - - - + - 16 6 3 1 LM 382.0 - - - - - + (Alg.) Algoithm, (LM) Levenbeg Maquadt, (BP) Backpopagation, (+) simulated, (-) not simulated. Few ANN models had in the occult laye a half of neuons of the input laye while othe ones (models 2 and 11) had moe than double of neuons of the input laye as commented by Kóvacs [16]. The esults obtained fom models 1, 2 and 10, consideing 6 neuons in the output laye, had the same ode of magnitude. Leaving the NC ai monitoing site out of evaluation (models 1 and 3), the MSE was educed. Models 4 to 9 and 11 to 16 wee ceated to evaluate each esult fom each ai monitoing station. The esults wee vey simila except fo NC.
110 Ai Pollution XIX Model 12 had five input data due to PCA esults. The esults obtained fom models 8 and 15 wee pobably due to othe vaiables that wee not intoduced in the model, since the used vaiables wee not able to explain entiely the TSP concentations. Using MSE appoach fo compaison esults between model 1 and 2, it was showed that Levenbeg-Maquadt algoithm was bette than Backpopagation. On balance, the MSE esults among the models wee vey simila with the same ode of magnitude. Fo this eason model 1 was chosen to descibe the subsequent esults. Compaing between models 1 and 10, the use of pincipal components as input data did not alteed the MSE esults significantly. 3.1 Pefomance evaluation The aveage pecentage eos fo CA, BA, BR, EC, NC, and CS ai quality monitoing sites wee 35%, 24%, 27%, 31%, 42% and 37%, espectively. BA simulated esults showed a good ageement with measued values. NC had the wost esults. In egula meteoological conditions, BA is used in suffeing paticulate emissions fom Usiminas moe than the othe sites, as it is located downwind fom it. So the simulated esults wee bette in BA than in the othe ones. The esult obtained in NC was the wost due to othe vaiables that wee not intoduced in the model. Mitckiewicz [1] had also got the wost simulated esults in NC using ISCST3 modelling. The cluste analysis is shown in fig. 1. Analysing the distance measuement i t is possible to identify thee goups. 800 Dendogam Wad's Method, Squaed Euclidean 600 Distance 400 200 0 Cluste 3 Cluste 1 Cluste 2 ---------------- ---------------- --------------------- N NE windspeed E Calm hous Tempeatue seasoncycle Rain SE NW S SW W Figue 1: Cluste analysis. Cluste 1, epesenting 50% of the output data, was chaacteized by N, NE, E wind diections, wind speed less than 1m/s, stong aining stoms, ambient tempeatue about 21 o C and the pesence of two seasonal cycles. In othe wods,
Ai Pollution XIX 111 50% of TSP concentations esults wee gouped due to simila input data mentioned above. Cluste 2 gouped 19% of the output data due to input vaiables: SE, S, NW, SW, W wind diections, wind speed moe than 1,1m/s, weak aining, ambient tempeatue about 21 o C and winte cycle. Finally, cluste 3 (24% of output data) was chaacteized by N, NE, and E wind diections, wind speed moe than 2,1m/s, fequent aining, ambient tempeatue about 25 o C and the pesence of summe cycle. The analysis of quantile-quantile plot and type I and II eo was made fo all ai quality monitoing sites, but in this pape, it will be only shown BA and NC esults (figs. 2 and 3). The othe ones can be seen in moe details in Lima, M. M. C. [18]. In fig. 2, accoding to quantile-quantile plot analysis, the tendency of 52% of the pedicted values was to oveestimate the eal values. They occued in meteoological conditions fom cluste 1. Consideing the analysis of false positive and false negative eos, both wee veified. But, the type I eo was moe often and was detemined by the vaiables chaacteized by cluste 3. One hypothesis that could pobably explain is the location of main paticulate emission souces fom Usiminas in elation to BA ai quality monitoing site. Unde those meteoological conditions (chaacteized by cluste 3), BA ai monitoing quality site was downwind fom them and if thee was an emission incease duing that peiod, the model was not able to evaluate it, as they wee not intoduced in the modelling. It could explain why pedicted values was lowe than eal. 200 180 160 negative false eo Cluste 1 Cluste 2 Cluste 3 140 TSP eal values (g/m 3 ) (BA) 120 100 80 60 40 20 positive false eo 0 0 20 40 60 80 100 120 140 160 180 200 TSP pedicted values (g/m 3 ) (BA) Figue 2: Compaison between eal and simulated values fo BA. Accoding to fig. 3, quantile-quantile plot analysis showed that a geat pat of simulated values was oveestimated if compaed to the eal values. They also occued in meteoological conditions descibed in cluste 1. The type II eo was moe common than the othe one.
112 Ai Pollution XIX 240 200 negative false eo Cluste 1 Cluste 2 Cluste 3 TSP eal values (g/m 3 ) (NC) 160 120 80 40 positive false eo 0 0 40 80 120 160 200 240 TSP pedicted values (g/m 3 ) (NC) Figue 3: Compaison between eal and simulated values fo NC. The possible cause was its location is not favouable and exposed to exta contibutions. NC site is close to a paved oad with heavy taffic. This situation was not modelled. 3.2 Compaison between ANN and ISCST3 model The compaison between ANN and ISCST3 model esults is shown is table 2. Table 2: Statistical analysis esults. Site Statistical analysis FB IA R MSE ISCST3 ANN ISCST3 ANN ISCST3 ANN ISCST3 ANN CA -0.29 0.06 0.36 0.66 0.02 0.48 1754.86 334.79 BA -0.06-0.07 0.53 0.62 0.37 0.47 2944.57 671.00 BR 0.42 0.12 0.21 0.69 0.02 0.56 6124.87 173.68 EC -0.61 0.11 0.19 0.70 0.18 0.54 1652.21 202.97 NC -0.82 0.15 0.53 0.76 0.44 0.65 4003.12 1027.18 CS 0.29 0.04 0.27 0.61 0.08 0.44 14616.33 390.40 FB and MSE usually measue bias of a model. R is the coelation between the obseved and simulated values and IA shows the degee that the model pedictions ae eo fee. The ideal model would esult in values of MSE = 0, R = 1, FB = 0 and IA = 1. Analyzing the mean factional bias esults, NC, BR and EC ai quality monitoing stations had the wost esults in both models. The mean index of ageement (IA) showed the EC and CS esults wee the wost in ISCST3 and ANN models, espectively. The values of IA and FB obtained in
Ai Pollution XIX 113 ANN wee bette than the ISCST3 dispesion modelling. The values of R and MSE obtained in ANN wee also bette than the ISCST3 model. Fo this eason, ANN model should be consideed close to the ideal model. 4 Conclusions This study showed that ANN can be a poweful data analysis tool to evaluate ai pollutant dispesion. Even though, in the modelling, paticulate emissions fom Usiminas wee not intoduced as input vaiable, ANN was able to pedict the TSP concentation in Ipatinga atmosphee using meteoological and seasonal cycle data. On balance, the tendency of simulated values was to oveestimate the eal values. The best modelling esults wee obtained in BR, BA and CA. BA had the best esult and NC, the wost. BA monitoing site is favouable to suffe paticulate emissions fom Usiminas moe than the othe sites, as it is located downwind fom it in egula meteoological conditions. It could explain why the simulated esults wee bette in BA than in the othe ones. The simulated esult obtained in NC was the wost due to othe vaiables that wee not intoduced in the model and its unfavouable location. The type I and II eos esults wee not epesentative in the modelling. The type II eo only occued in NC monitoing site and type I eo was moe common in BA site. Those eos wee caused by thei location pobably. Accoding to statistical analysis of MSE, FB, IA and R, the pedicted esults fom the ANN model wee bette than the ISCST3 dispesion model. Refeences [1] United States Envionmental Potection Agency (USEPA). Ai Quality Citeia fo Paticulate Matte Vol. II EPA/600/P-99/002a-f, 2004, www.epa.gov/pmeseach/. [2] Mitkiewicz, G. F., Metodologia paa avaliação da dispesão atmosféica de poluentes povenientes de um complexo sideúgico industial, 2002, Depatamento de Engenhaia Sanitáia e Ambiental. (Dissetação de Mestado em Meio Ambiente), Escola de Engenhaia da UFMG, Belo Hoizonte, Basil. [3] Linyan, S. Wang, Y., A neual netwok model fo envionmental pedication: case study fo China. Computes and Industial Engineeing, China, 31, pp. 879-883, 1995. [4] Wal, J.T., Janssen, L.H.J.M., Analysis of spatial and tempoal vaiations of PM10 concentations in the Nethelands using Kalman filteing. Atmospheic Envionment, 34, pp. 3675-3687, 2000. [5] Peez, P. Reyes, J., Pediction of paticulate ai pollution using neual techniques. Neual Computing & Applications, Chile, 10, pp. 165-171, 2001. [6] Viotti, P.; Liuti, G.; Genova, P. D., Atmospheic uban pollution: applications of an atificial neual netwok (ANN) to the city of Peugia. Ecological Modelling, Rome, 143, pp. 27-46, 2002.
114 Ai Pollution XIX [7] Zickus, M., Geig, A.J., Nianjan, M., Compaison of fou machine leaning methods fo pedicting PM10 concentations in Helsinki, Finland. Wate, Ai, and Soil Pollution, 2, pp. 717-729, 2002. [8] Peez, P., Reyes, J., Pediction of maximum of 24-h aveage of PM10 concentations 30 h in advance in Santiago, Chile. Atmospheic Envionment, 36, pp. 4555-4561, 2002. [9] Podna, D., Koacin, D., Panoska, A., Application of atificial neual netwoks to modelling the tanspot and dispesion of taces in complex teain. Atmospheic Envionment, 36, pp. 561-570, 2002. [10] Odiees, J.B., Vegaa, E.P., Capuz, R.S., Salaza, R.E., Neual netwok pediction model fo fine paticulate matte (PM2.5) on the US-Mexico bode in El Paso (Texas) and Ciudad Juáez (Chihuahua). Envionmental Modeling & Softwae, 20, pp. 547-559, 2005. [11] Hooybeghs, J., Mensink, C., Dumont, G., Fieens, F., Basseu, O., A neual netwok foecast fo daily aveage PM10 concentations in Belgium. Atmospheic Envionment, 39, pp. 3279-3289, 2005. [12] Gadne, W. M.; Doling, R. S., Atificial neual netwoks (the multilaye pecepton) a eview of applications in the atmospheic sciences. Atmospheic Envionment, 32(14/15), pp. 2627-2636, 1998. [13] Haykin, S., Neual netwoks: a compehensive foundation, Pentice-Hall Inc.: Canada, pp. 1-842, 1999. [14] Matlab R12, Vesion 5.1, The language of technical computing: getting stated with Matlab, The Mathwoks Inc., pp. 1-86, 1997. [15] Zhang, G. Patuwo, B.E. HU, M. Y., Foecasting with atificial neual netwoks: the state of the at. Intenational jounal of foecasting, 14, pp. 35-62, 1998. [16] Kovács, Z.H., Redes neuais atificiais: fundamentos e aplicações, Editoa Livaia da Física: São Paulo, pp. 1-174, 2002. [17] Olesen, H., Model validation kit status and outlook, National Envionmental Reseach Institute: Denmak, 1997. [18] Lima, M. M. C., Estimativa de concentação de mateial paticulado em suspensão na atmosfea po meio da modelagem de edes neuais atificiais (Dissetação de Mestado em Meio Ambiente), Escola de Engenhaia da UFMG, Belo Hoizonte, Basil.