Virtual Sources for Spatio-temporal Monitoring Data Analysis

Size: px
Start display at page:

Download "Virtual Sources for Spatio-temporal Monitoring Data Analysis"

Transcription

1 Internatonal Congress on Envronmental Modellng and Software Brgham Young Unversty BYU ScholarsArchve 4th Internatonal Congress on Envronmental Modellng and Software - Barcelona, Catalona, Span - July 008 Jul 1st, 1:00 AM Vrtual Sources for Spato-temporal Montorng Data Analyss A. Nuzhny E. Saveleva S. Kazaov S. Utn Follow ths and addtonal wors at: Nuzhny, A.; Saveleva, E.; Kazaov, S.; and Utn, S., "Vrtual Sources for Spato-temporal Montorng Data Analyss" (008). Internatonal Congress on Envronmental Modellng and Software Ths Event s brought to you for free and open access by the Cvl and Envronmental Engneerng at BYU ScholarsArchve. It has been accepted for ncluson n Internatonal Congress on Envronmental Modellng and Software by an authorzed admnstrator of BYU ScholarsArchve. For more nformaton, please contact scholarsarchve@byu.edu, ellen_amatangelo@byu.edu.

2 EMSs 008: Internatonal Congress on Envronmental Modellng and Software Integratng Scences and Informaton Technology for Envronmental Assessment and Decson Mang 4 th Bennal Meetng of EMSs, M. Sànchez-Marrè, J. Béar, J. Comas, A. Rzzol and G. Guarso (Eds.) Internatonal Envronmental Modellng and Software Socety (EMSs), 008 Vrtual Sources for Spato-temporal Montorng Data Analyss A. Nuzhny, E. Saveleva, S. Kazaov, S. Utn Nuclear Safety Insttute Russan Academy of Scences, 5 B. Tulsaya, Moscow, , Russa ( nuzhny@nbox.ru, esav/aza/uss@brae.ac.ru ) Abstract: The current wor deals wth applcaton of ndependent vrtual sources as a tool for analyss and storng groundwater montorng data. Estmaton of vrtual sources s based on learnng from data prncple. Vrtual sources allow to analyse the man features of the process. They can be useful for both data exploratory analyss and predcton. Ths wor consders two methods for ndependent vrtual sources constructon: (1) based on usng Central Lmt Theorem and () based on the mxture of Gaussans. The methods were appled to real spato-temporal data on groundwater level dynamcs (D spatal case) and groundwater contamnaton by radoactve ntrates (3D spatal case). These data sets are characterzed by heterogeneous sample dstrbuton both n space and tme. Keywords: Data mnng, Independent sources, Gaussan dstrbuton, Spato-temporal data, Montorng. 1. INTRODUCTION Real montorng systems usually deal wth a problem of data analyss n spato-temporal contnuum One of possble ways s a data mnng methodology featurng adaptve learnng from data prncple [Cherassy and Muller, 1998]. Data mnng provdes a set of methods to handle, to model and to mae predctons basng on measured data and addtonal nowledge somehow connected wth data (physcal laws, emprcal rules, etc.). Dversty of data mnng methods s very hgh coverng all nds of data analyss problems: regresson, classfcaton, clusterng, etc. The current wor consders a data mnng approach based on ndependent sources (ndependent components) [Lee T.-W.1998]. Generally ndependent component analyss solves the blnd source separaton problem by decorrelatng the sgnals and reducng some hgher-order statstcal dependences. Independent sources store enough nformaton on the process for modellng and mang predctons. Thus ndependent sources also can be useful as a compressng tool to store the montorng data more effectvely. Independent sources can be consdered as one of possble non-lnear generalzatons of prncple component analyss. Classcal ndependent sources problem s an nverse problem. A measured value (a recever) s treated as a superposton of sgnals from some sources. The problem s to establsh sources. As any nverse problem ths one s an ll-posed problem wth unlmted number of possble solutons. In ths wor ndependent sources are consdered as vrtual (not real) sources responsble for a process, thus any relable soluton s equally good. Ths wor dscusses two approaches to fnd the ndependent vrtual sources. Both of them deal wth Gaussanty: (1) consder superposton more normally dstrbuted n comparson wth sngle sgnals (based on usng Central Lmt Theorem) and () consder the recever 1734

3 A.Nuzn, E.Saveleva et al. /Vrtual Sources for Spato-temporal Montorng Data Analyss as a weghted superposton of Gaussans. Both methods have some theoretcal lmtatons complcatng ther applcaton to real data. Methods were appled to analyss of real data from a groundwater montorng system studyng the vcnty of nuclear waste facltes.. THEORETICAL BACKGROUND.1 Method usng the Central Lmt Theorem To formulate an ndependent sources problem let us consder values measured at spatotemporal ponts x as recevers obtanng sgnals c =c(x ), whch are a superposton of ntal sgnals (b =b(r )) from some sources r. The tas s to restore ntal sgnals and ther sources. Otherwse t s to fnd the coeffcents w n: under the addtonal constrant: b = w w c = 1 Such formulaton of the ndependent sources problem requres the followng assumptons: the sources are ponts, no spatal sze; nfluence of sources on recevers s mmedate; dstrbuton of sgnals from sources s sotropc. The possblty to wor under these assumptons should be consdered before applcaton of the method. One of the possble ways to solve ths problem s to use the Central Lmt Theorem statng that the superposton of sources presents more normal dstrbuton then sngle sgnals. The level of normalty can be characterzed by the excess urtoss (the functon of the fourth standardzed moment) [Joanes and Gll, 1998]. Kurtoss equals to zero for normally dstrbuted data, the postve urtoss ndcates more narrow probablty densty functon n comparson wth the normal one, and the negatve urtoss s due to wder probablty densty functon then the normal one. Kurtoss (K) can be wrtten n terms of matrx W=(w ) from (1) as K 1 wc n = 1 n w where n s a number of sources. Thus the problem s stated as the mnmzaton of squared urtoss (3) under the constrant (). Numercal optmsaton under a constrant uses a penalty functon. Ths functon has zero value wthn the allowed range of values and t tends to nfnty n the forbdden area. The penalty functon for maxmzaton under the constrant () loos as c ( 1) ( ) 4 ( 4) 3 w 1. ( 3), The fnal expresson for optmzaton s thus: 1 Φ = K w 1 µ ( 5), 1735

4 A.Nuzn, E.Saveleva et al. /Vrtual Sources for Spato-temporal Montorng Data Analyss wth parameter µ monotoncally decreasng durng the optmzaton. A number of parameters of the optmsaton problem equals to a number of elements n matrx W, whch strongly depends on the dmenson of the ntal data set. Together wth the number of parameters grows the number of local optmsaton solutons. The number of functonal parameters of the optmsaton problem can be reduced by a prelmnary compressng data transformaton (AC), for example, by transformaton to prncple components. The fnal transformaton of data to ndependent sources n ths case s performed by matrx B = WA Coeffcents of matrx B descrbe the credt-debt dependences between sources and measurement wells. A postve coeffcent ndcates a postve dependence: the ncrease of a source value causes an ncome of a value at the recever. A negatve coeffcent s when a source value s ncreasng on account of the recever. Thus matrx B s an mportant tool for analyss of the nfluence between sources and montorng wells, especally n the case when sources can be assocated wth some facltes (for example, laes or rvers n the case of water montorng). The matrx B also allows to estmate the locatons of ndependent sources [Üzümcül et al., 003]: r ( 6) ( 7) = b( x x 0 ) x 0, + here x 0 s a centre of mass over all measurement locatons.. Gaussan mxture approach The process under study can be approxmated as a weghted superposton of Gaussan-le sgnals: n α c,, ( r t) = A exp r r ( 8) 1 t t t where r ndcates a spatal coordnate, t s a temporal coordnate, n s a number of sources (user-defned parameter), A, α I, r and t, (=1,,n) are fttng parameters. Parameters r and t ndcate a spato-temporal coordnates of sources. The model fttng procedure leads to an optmzaton problem. The model parameters (weghts) are estmated so as to mnmze the sum of squared resduals at the sample locatons: α D = t t ( c( r, t ) y ) = A exp r r y ( 9) here y are nown values, =1,,K wth K beng a number of samples. Mnmzaton was performed by RPROP [Redmller and Braun, 1993] method. RPROP s an effectve and stable teratve optmzaton algorthm. The drecton of a parameter's changes depends only on the sgn of the obectve functon's dervatve, and the learnng rate depends on the prevous state of the system. Such constructon of teratons sometmes allows to ump over local extremes and sgnfcantly decreases the tme of the approxmaton n the monotonc case. The functon (8) wth ftted parameters allows to mae predctons at any spato-temporal locatons. Any nds of dependences between sample locatons and sources are also avalable. The man drawbac of ths approach s the assumpton made on the form of 1736

5 A.Nuzn, E.Saveleva et al. /Vrtual Sources for Spato-temporal Montorng Data Analyss sgnals emtted by vrtual sources. From ths pont of vew the frst approach s more general, as t does not requre any assumpton on the form of sgnals. 3. DATA DESCRIPTION Groundwater montorng n the regon connected wth radoactve storng facltes controls groundwater level dynamcs and water qualty. We consder two case studes descrbng two dfferent events: a groundwater level dynamcs and groundwater contamnaton by radoactve ntrate. The groundwater level s a unque value per a well at a tme, thus t presents a two-dmensonal n space problem. Ntrate contents also depend on the depth of a sample and consequently lead to a three-dmensonal n space problem. The man problem s to extract the prncple data features allowng ther compact descrpton. For our analyss we used 3 geologcal montorng wells. Fgure 1 presents spatal dstrbuton of these wells together wth the man hydrologc obects of the regon. Each well contaned at least one groundwater level measurement per month durng perod from Aprl 1970 tll January 006. Total number of groundwater samples was In [Nuzhny et al, 007] a lnear feature extracton approach was dscussed (prncple component analyss). Frst Fgure 1. Spatal dstrbuton of hydrogeologc wells prncple component responsble for the averagng over the well gves satsfactorly good approxmaton for 3 spatally dstrbuted wells. Other components dd not sgnfcantly nfluence on that approxmaton. Thus, features wth physcal meanng were decded to be searched through non-lnear descrpton. Several not strong assumptons led us to a classcal ndependent sources problem. Ntrate samples were even sparser - an average perod between two measurements n the same poston s a year durng the perod wth the hghest samplng densty ( years). The total number of ntrate samples was Tang nto the account the curse of dmensonalty t s obvous that tradtonal data analyss fal for these data. The most reasonable approach n such case s a parametrc one. Gaussan-le approxmatng functons are the most common for a parametrc descrpton. 4. RESULTS AND DISCUSSION 4.1 Vrtual sources for groundwater levels A temporal cut for a groundwater level can be approxmated wth 98% confdence usng 10 prncple components [Nuzhny et al, 007]. Correlaton of the frst prncple component wth precptaton data shows the drect effect - nstantaneous mpact on the precptaton (maxmum of correlaton at zero). Other prncple components present dfferent structure of correlaton, thus they are after other physcal phenomena. So estmaton of ndependent sources for groundwater level was performed by mnmzaton of urtoss (3) usng 9 prncple components. The purpose was to delete drect nfluence of precptaton and to focus on analyss of other dependences. As for theoretcal assumptons of the method the rudest among them s the assumpton of sotropy, as ansotropy s a usual case n hydrogeology. But stll such assumpton s often made for prelmnary analyss and t does not serously dstort results. The mmedate reply of the recever on the source s fulflled as we have a monthly step n our data, so nstantaneous effect means the effect wthn the month. 1737

6 A.Nuzn, E.Saveleva et al. /Vrtual Sources for Spato-temporal Montorng Data Analyss Fgure a shows spatal dstrbuton of 9 ndependent sources, they are mared by pluses. All of them can be assocated wth surface water reservors (laes and a rver). Fgure b shows hstograms of lnes from matrx B (6) correspondng to the wells mared n fg.a as flled crcles. Analyss of matrx B allows to get some nformaton on possble nfluence of contamnated artfcal reservors on the water n wells. In terms of groundwater levels problem matrx coeffcents obtan real meanng: postve coeffcent ndcates that growng of water level n the source causes an ncome of water to the well; negatve coeffcent means that the source taes water from the well. For example, the well number 18 gets mpact from sources,3 and 7 assocated wth the same lae and the source 9 assocated wth the rver. Sources 1 and 5 (both assocated wth contamnated laes) tae water from ths well. So, we can conclude that water from contamnated laes do not get to ths well. But stll sources,3 and 7 can be nfluenced by contamnaton va well 7 tang water from the source 5. Such results are the subect for further analyss by experts and decson maers worng at the ste. a) b) Fgure. Results on the ndependent sources analyss: a) Schema of water reservors, 4 wells (flled crcles) and 9 vrtual sources (pluses); b) Hstograms of lnes of matrx B correspondng to 4 wells So, we estmated vrtual sources responsble for groundwater level dynamcs. These sources appeared to be useful tools for studyng nfluences between dfferent water subsystems. 4. Vrtual sources for ntrate contamnaton Vrtual sources for groundwater contamnaton by ntrates were estmated by Gaussan mxture approach (8). As ntal values of parameters A and α we too small values (0.001), ntal centres (sources) r were randomly dstrbuted over the regon. Examples of ntal dstrbutons of 8 and 4 centres are presented n fgure 3 (rhombus), flled crcles ndcate spatal dstrbuton of sample wells. Stablty of RPROP algorthm guarantees the stablty of the fnal result. Several ntal random dstrbutons of sources were used, and they led to very close results. We started wth 8 sources but the obtaned result ndcated surplus of such model - several sources unted nto one (fg. 4a). So fnally the model wth 4 sources (0 fttng parameters) was used (fg. 4b). The qualty of the model can be checed by the relatve error: R = D y ( 10) 1738

7 A.Nuzn, E.Saveleva et al. /Vrtual Sources for Spato-temporal Montorng Data Analyss where D s defned by (9). Fnal relatve error for model wth 8 sources s 0.13, and wth 4 sources Twce decreasng the complexty of the model we nearly preserved ts qualty. a) b) Fgure 3. Examples of ntal dstrbuton of vrtual sources (rhombus): 8 sources (a) and 4 sources (b). Flled crcles ndcate sample wells. a) b) Fgure 4. Fnal dstrbuton of vrtual sources (rhombus): 8 sources (a) and 4 sources (b). Grey flled crcles present sample wells wth dameter dependng on local measurements' densty 1739

8 A.Nuzn, E.Saveleva et al. /Vrtual Sources for Spato-temporal Montorng Data Analyss Postonng of vrtual sources appeared to be very stable. But t s not because of denser samplng. In fgure 4 montorng wells are drawn as crcles wth dameters dependng on the number of samples n the well. One can see rather frequent observed sample wells remote from vrtual sources. Thus spatal locaton of vrtual sources s due to the process. Estmated vrtual sources allow to descrbe the trend of the ntrate contamnaton by (8) and to mae predctons on future development of the ntrate contamnaton. The 4 vrtual sources ndcate the zone responsble for the ntrate contamnaton process. Detaled montorng n the vcnty of sources can be proposed to ste admnstraton. 4.3 Dscusson on vrtual sources Two studed examples have dfferent general features: groundwater level generally characterzes the water system; on the contrary the contamnaton process s caused by some real orgnatng mpact. Ths dversty arses wthn vrtual sources analyss. Consderng obtaned vrtual sources due to these processes we fnd the approvement to such pont of vew. Groundwater level's vrtual sources are dstrbuted over the whole montorng area somehow lnng to surface water reservors and perhaps some groundwater ones. Contamnaton vrtual sources concentrate n the vcnty of the contamnatng orgn. All of these sources tend to the same area ndependently of ther ntal dstrbuton. Geographcally contamnaton vrtual sources are located between two contamnated laes, whch can be the real orgns of the groundwater contamnaton. Thus, vrtual sources appeared to be not so vrtual, but ndcatng to real features. 5. CONCLUSIONS Independent vrtual sources appeared to be useful tool allowng to descrbe dependences between groundwater montorng wells and surface water reservors. Ths nformaton can be useful for analyss possblty of water mgraton n the neghbourhood of the contamnated water reservors. the localsaton of the orgn of the groundwater contamnaton. the model for groundwater contamnaton predctons. Vrtual sources also help to understand the general features on the process under study. ACKNOWLEDGEMENTS The wor was partly supported by a grant of the Russan fund for fundamental researches (RFFI). REFERENCES Cherass V., Muller F., Learnng from data, John Wley Interscence, New Yor, pp. 437, Joanes, D. N., Gll, C. A., Comparng measures of sample sewness and urtoss. Journal of the Royal Statstcal Socety (Seres D): The Statstcan V. 47: , Lee T.-W., Independent Component Analyss. Theory and Applcaton. Kluwer Academc Publshers, Boston, pp. 10, Nuzhny A., Saveleva E., Jastrebov A., Statstcal Analyss for Extractng Features on the Groundwater Level Dynamcs, n Proc. of IAMG007, Geomathematcs and GIS analyss of resources, Envronment and Hazards, eds. P. Zhao, F. Agterberg, Q. Cheng, pp.73-76,

9 A.Nuzn, E.Saveleva et al. /Vrtual Sources for Spato-temporal Montorng Data Analyss Redmller M., Braun H., A Drect Adaptve Method for Faster Bacpropagaton Learnng: The RPROP Algorthm, n Proc. of the IEEE Intl. Conf. on Neural Networs, 1993 Üzümcül M., Frang A.F., Reber J.H.C., Leleveldt B.P.F., Independent Component Analyss n Statstcal Shape Models. Medcal Imagng 003, n Proceed. of SPIE, V.503: ,