WATER RESOURCES RESEARCH, VOL. 42, W06413, doi: /2005wr004006, 2006

Size: px
Start display at page:

Download "WATER RESOURCES RESEARCH, VOL. 42, W06413, doi: /2005wr004006, 2006"

Transcription

1 WATER RESOURCES RESEARCH, VOL. 42,, doi: /2005wr004006, 2006 Development of a forecasting system for supporting remediation design and process control based on NAPL-biodegradation simulation and stepwise-cluster analysis G. H. Huang, 1 Y. F. Huang, 2,3 G. Q. Wang, 2 and H. N. Xiao 3 Received 9 February 2005; revised 8 January 2006; accepted 28 Marc 2006; publised 21 June [1] Effective process control is crucial in implementing remediation actions for petroleum-contaminated sites. However, in dealing wit in situ bioremediation practices, difficulties exist in incorporating numerical simulation models tat are needed for process forecasting witin real-time non-linear optimization frameworks tat are critical for supporting te process control. Wit suc difficulties, it is desired tat a statistical relationsip between remediation system performance and operating condition be establised. Neverteless, in te remediation systems, many variables can be eiter continuous or discrete, and te relations among tem can be eiter linear or non-linear. Tese lead to complexities in te related multivaraite analyses. In tis study, a forecasting system as been developed for supporting remediation design and process control based on tecniques of NAPL-biodegradation (non-aqueous pase liquid biodegradation) simulation and stepwise-cluster analysis (SCA). Te results indicate tat te developed system is effective in forecasting te effects of multiple cleanup actions under various conditions. Te predicted benzene concentrations ave acceptable error levels compared wit te outputs of numerical simulation. An optimization model for obtaining optimum operating conditions is ten proposed to illustrate ow te SCA metod can be used for supporting optimization of bioremediation operations. A unique contribution of tis researc is te development of a multivariate inference system associated wit simulation and optimization efforts for tackling te complex in situ bioremediation practices. Citation: Huang, G. H., Y. F. Huang, G. Q. Wang, and H. N. Xiao (2006), Development of a forecasting system for supporting remediation design and process control based on NAPL-biodegradation simulation and stepwise-cluster analysis, Water Resour. Res., 42,, doi: /2005wr Introduction [2] Soil and groundwater contamination as rapidly emerged as a primary environmental concern since te 1980s [Molz et al., 1986]. Numerous measures are currently used for te remediation of petroleum-contaminated sites [Coen and Mercer, 1993; Canadian Council of Ministers of te Environment (CCME), 1994; Wang and McTernan, 2002; Liu et al., 2003; Liu, 2005]. Effective process control is crucial in practical implementation of tese measures. Many factors suc as pumping rate, oxygen addition, nutrient supply, and groundwater temperature can be adjusted troug real-time process control to enance te efficiency and reduce te cost. [3] To date many researc works were conducted to determine dynamic process-control policies troug integrating numeric simulation and optimization tecniques. A number of researcers sowed tat optimization tecniques 1 Environmental Systems Engineering Program, Faculty of Engineering, University of Regina, Regina, Saskatcewan, Canada. 2 Institute of River and Coastal Engineering, Tsingua University, Beijing, Cina. 3 Department of Cemical Engineering, University of New Brunswick, Fredericton, New Brunswick, Canada. Copyrigt 2006 by te American Geopysical Union /06/2005WR could elp improve te design of pump-and-treat remediation and in situ bioremediation [Gorelick et al., 1984; Alfeld, 1990; Dougerty and Marryott, 1991; Cang et al., 1992; Culver and Soemaker, 1992; McKinney and Lin, 1995, 1996; Huang et al., 2003a, 2003b; Maqsood et al., 2004; Huang et al., 2006]. For example, Mansfield et al. [1998] discussed a metod of reducing te computational effort in performing te optimal control troug using a finite-element sparsity structure witin an optimization model. Minsker and Soemaker [1998] applied non-linear optimization tecniques to enance process control for in situ bioremediation; tey made efforts to reduce te costs of treatment troug identifying te optimal policies for remediation. On te oter and, global optimization metods suc as genetic algoritms (GA) were employed for optimizing groundwater management systems. For example, McKinney and Lin [1994] applied GA to groundwater resources management and pump-and-treat system design. Jonson and Rogers [1995] combined neural networks and GA for selecting te optimal pumping well locations for groundwater remediation systems. Smalley et al. [2000] used GA to tackle a risk-based in situ bioremediation design. Zeng and Wang [2002] sowed an application of GA to pump-and-treat system design under field conditions. Direct incorporation of te GA metods witin subsurface models would generally require intensive computational 1of19

2 HUANG ET AL.: DEVELOPMENT OF A FORECASTING SYSTEM efforts tat migt often limit teir practical applicability. Iassinovski et al. [2003] presented an on-going researc on te development of a framework for integrating multiple models to solve complex decision-making problems. Zang and Li [2004] developed an optimization metodology tat integrated discrete-event simulation into a euristic algoritm for dynamic resources allocation. [4] Enanced in situ bioremediation is an approac for cleaning up petroleum-contaminated sites, involving multipase, multi-component flows and various dynamic processes [Semprini and McCarty, 1991; Brown et al., 1994; de Blanc, 1998; Norris et al., 1999; Gallo and Manzini, 2001; Park et al., 2001; Woo et al., 2001; Lenczewski et al., 2003; Scoefs et al., 2004; Lamberti and Nissi, 2005]. In addition to te identifications of optimum locations for injection and extraction wells [Huang and Mayer, 1997; Minsker and Soemaker, 1998; Maskey et al., 2002] and optimum strategies for site remediation [Scaerlakekens et al., 2005] as well as te improvements of microbial activities [Venkatraman et al., 1998; Barenolz et al., 2003; Luo et al., 2005], a possible way to improve te remediation efficiency and cost-effectiveness is to adjust a number of control factors (e.g., pumping rate, oxygen addition, and nutrient supply) in order to dynamically provide desired conditions for microorganisms to degrade te contaminants effectively under varying source and environmental circumstances [Ruterford and Jonson, 1996; Tien et al., 2000]. However, most of te existing enanced in situ bioremediation processes tend to be operated in a simplified way wit a fixed set of values for process parameters, leading to relatively unsatisfied removal efficiencies. Tis is mainly due to te difficulties in incorporating complicated numerical simulation models tat are needed for process forecasting witin real-time non-linear optimization frameworks tat are critical for supporting te process control. Tus, considering te system complexities and te computation requirements in incorporating numerical simulation models directly into optimization frameworks, a statistical relationsip between remediation system performance and operating condition is desired. Huang et al. [2003a, 2003b] used te dual-response surface metod to develop a statistical relationsip between inputs and outputs of a numerical simulation model for a surfactant-enanced remediation process, and employed it for generating optimum operation conditions under various site conditions. [5] However, in te remediation systems, many variables can be eiter continuous or discrete, and relations among tem can be eiter linear or non-linear. Te conventional continuous and linear metods, suc as te regression and dual-response surface metods, cannot efficiently reflect suc complicated caracteristics and relationsips. Terefore, it is desired tat a more effective metod be advanced to establis a statistical relationsip between remediation system performance and operating condition under discrete and non-linear complexities. [6] In tis study, a forecasting system will be developed for optimal remediation design and process control based on NAPL-biodegradation (non-aqueous pase liquid biodegradation) simulation and stepwise-cluster analysis. Tis objective entails te following tasks: (1) to design a pilot-scale pysical model for simulating NAPL transport and biodegradation processes, wit te results being useful for te calibration and verification of a NAPL-biodegradation simulation model; (2) to develop a 3D multi-pase, multicomponent NAPL transport and biodegradation simulation system, were a contaminant-biodegradation model is integrated witin a general fate and transport modeling framework; (3) to undertake stepwise-cluster analysis for establising a bridge between simulation and optimization models, were te interrelationsip between remediation actions and contamination situations will be quantified and ten be used for identifying desired operating conditions. 2. Metodology 2.1. General Framework [7] Figure 1 sows te general framework in developing a forecasting system for an enanced in situ bioremediation process in a pilot-scale model. It consists of five steps. Firstly, a tree-dimensional pilot-scale model will be designed for supporting te operation of enanced in situ bioremediation. After te occurrence of ydrocarbon spill, an enanced in situ biodegradation process will be undertaken. Ten, a multi-pase, multi-component NAPL transport and biodegradation model will be developed. After te calibration and verification of te developed model troug te experimental results, te interrelationsips between contaminant concentrations and operation conditions will be analyzed based on te developed simulation model. Under eac contamination situation, effects of various operating conditions on contaminant distributions at concerned locations will be examined. In te subsequent step, a stepwise-cluster analysis (SCA) model will be developed to establis te relationsips between te respondent contaminant concentrations and te operating conditions. Tus, a bridge between te 3D numerical simulation and remediation-process optimization models will be establised, wic is critical for furter identification of desired operating scemes Tree-Dimensional Pilot-Scale Model [8] In situ bioremediation process involves stimulating indigenous bacterial and/or introducing non-indigenous bacterial to degrade te organic contaminants by supplying required nutrients and electron acceptors. A typical in situ bioremediation system includes recovery wells, a nutrient and electron-acceptor supply system, and injection wells. To survive, microorganisms must ave: (1) energy source, (2) carbon source, and (3) inorganic elements suc as nitrogen and posporous [Cappelle, 1993]. Microorganisms in te saturated zone are assumed to exist as small colonies [Molz et al., 1986]. Due to te low solubility of most NAPL constituents, te colonies are likely to be tin suc tat te contaminant concentrations witin te colony are assumed to be te same as te substrate concentrations dissolved in te bulk aqueous pase. If te amounts of NAPL constituents, nutrient and electron acceptor transferred into te colonies are sufficiently ig, a ticker biofilm may be generated. Under suc a condition, NAPL constituents must diffuse across not only a liquid boundary layer but also witin te biofilm before tey can be biodegraded by microorganisms. Te dissolved NAPLs move wit te groundwater flow and are diluted at te front edge of te plume so tat more oxygen is available for aerobic respiration. Mixing also occurs along te edges of te plume wit 2of19

3 HUANG ET AL.: DEVELOPMENT OF A FORECASTING SYSTEM Figure 1. Te SCA-based forecasting system for enanced in situ bioremediation. iger rates of aerobic respiration. Dissolved oxygen in te groundwater entering te rear edge of te plume will also enance te microbial activities. [9] In order to pysically support calibration and verification for numerical models, a tree-dimensional pilotscale system was developed. It was of cuboid sape wit an interior dimension of LWH = m 3. Soils to be loaded into te pilot-scale reactor were pre-selected. Te inner surface of te reactor wall was labeled and divided into grids were different types of soils were loaded (Figure 2). A contaminant container was used to facilitate te leakage of petroleum into te system. Tap water in a water container was pumped into te reactor troug six water inlets on te inlet-end board as upstream groundwater inflow troug a peristaltic pump. [10] Monitoring wells were installed for facilitating access to te groundwater so tat a representative view of te subsurface ydrogeology could be obtained, troug eiter te collection of water samples or te measurement of pysical and ydraulic parameters. Locations of te wells are presented in Figure 2. In total, tere were 25 wells allocated in four sections of te pilot system. Soils in te system were stratified into four layers, wit te tird and fort layers being saturated wit water. Eac layer is 30 cm deep. Among te wells, 13 of tem (wit PVC pipes) were installed to reac te tird soil layer; te oter 12 wells could reac te fort layer (Figure 2). For eac well, a ose was installed tat passed troug te cap and reaced its bottom. Te outside of te ose was clamped by a clip so tat air and groundwater in te well could be isolated from te ambient environment (Figure 2). [11] To simulate ydrocarbon leakage, 12 liters of gasoline were injected into te bottom of te second soil layer at an upper stream location during a 1.5-day period (Figure 3); tus, only te fate and transport of te gasoline in te saturated zone would be simulated. At te same time, tap water from a water container was pumped into te system as groundwater inflow wit a rate of 20 liter/day (troug a peristaltic pump). Water level in te upstream gauge was 55 cm ig and tat in te downstream one was 45 cm ig. After te leakage period, suc flow conditions were maintained for 40 days to simulate te process of natural attenuation in te subsurface. [12] Soils loaded into te pilot-scale reactor were from several typical prairie sites. Te geocemical and microbial data in te initial 40-day natural attenuation period were listed in Table 1. Natural microorganisms and electron acceptors existed in te soil; terefore, biodegradation of contaminants occurred during te initial 40-day natural attenuation period. Since concentration of te natural biomass in te initial soil was low, an assumption was made tat biodegradation of contaminants could be ignored wen uman interference of additional oxygen and nutrient supplies did not exist. Also, geocemical reactions related to NAPLs were relatively slow in te groundwater system. [13] Te enanced in situ biodegradation process was ten started rigt after tis 40-day period. Te cultured microorganisms were injected into te treatment zone troug wells 1 and 6 along wit oxygen and nutrients (Figure 2). Te concentrations were 10 and 200 mg/l for biomass and oxygen, respectively. Te injection lasted for 17 days wit a flow of 10 liter/day. Te extraction flow rates at wells 7 and 11 were bot maintained at 15 liter/day. Water samples from different locations were collected every oter day. Benzene concentrations in tese samples were analyzed. A peristaltic pump was used to obtain groundwater samples troug pre-installed monitoring wells. For eac monitoring well, a groundwater sample was collected into a 20 ml glass bottle wic was sealed by a cap. Te Varian CP-3800 Gas Cromatograp (GC) was used for analyzing contaminant contents in water samples. Te experimental results could ten be used for validating, calibrating and verifying te developed numerical model under different conditions Modeling of NAPL Transport and Biodegradation Modeling of Contaminant Transport [14] A 3D multi-pase and multi-component (3DMM) model [Huang et al., 2003a, 2003b] is used to simulate contaminant transport in subsurface under different remediation process control alternatives [Sarkar et al., 1994; Fabritz, 1995]. Te 3DMM can account for complex pase beaviors, cemical and pysical transformations, and eterogeneous porous-media properties. It incorporates non-equilibrium interpase mass transfer, sorption, decay, microbiological activity, capillary pressure, and relative permeability witin a general setting. Details of te contaminant-transport model are given by Huang et al. [2003a, 2003b]. 3of19

4 HUANG ET AL.: DEVELOPMENT OF A FORECASTING SYSTEM Figure 2. Pilot-scale model and well locations. [15] Te basic mass conservation equation for components in subsurface can be written as follows [Brown, 1993; Delsad et al., " X np # f~c k k þ ~r r k C kl ~u l fs ~ l D kl *rc kl ¼ R k l¼1 were k is component index; l is pase index; f is soil porosity (fraction); ~C k is overall concentration of component k (volume fraction) (L 3 L 3 PV); r k is density of component k [ML 3 ]; n p is number of pases; C kl is concentration of component k in pase l (volume fraction) (L 3 L 3 ); ~u l is Darcy velocity of pase l [L T 1 ]; S l is saturation of pase l (L 3 L 3 PV); R k is total source/sink ð1þ 4of19 Figure 3. Te simulation domain.

5 HUANG ET AL.: DEVELOPMENT OF A FORECASTING SYSTEM Table 1. Initial Geocemical and Microbial Data of Soil for te Initial 40-Day Natural Attenuation Period Parameter term for component k (volume of component k per unit volume of porous media per unit time) (L 3 L 3 t 1 ); ~ ~D kl is dispersion tensor (L 2 T 1 ). Te overall concentration (~C k ) denotes te volume of component k summed over all pases (L 3 L 3 PV). [16] Injection and production wells are considered as source and sink terms in flow equation (1). Wells can be establised vertically in several layers of te aquifer or orizontally wit any lengt, and can be controlled according to pressure and/or rate constraints. Te well model is based on te formulations of Peaceman [1983] and Babu and Ode [1989]. Aquifer boundaries are modeled as eiter constant potential surfaces or closed surfaces. Te model can be solved numerically troug te block-centered finite difference metod Modeling of Enanced In Situ Bioremediation [17] Dissolved NAPLs in aqueous pase serve as substrates for unattaced and attaced bacteria in subsurface. Since microorganisms are mostly attaced to solid surfaces [Harvey et al., 1984], te majority of te contaminants will be removed from te aqueous pase troug biodegradation based on te attaced biomass. Te biodegradation model involves substrate competition, nutrient limitation, product toxic inibition, and aerobic cometabolism. Te basic structure of te biodegradation model for a system wit single substrate, single electron acceptor and single biological species can be caracterized as follows [de Blanc, 1998]: dc dt ¼Ak C X ~ m C ~C max X C m c Y K C þ C d ~C dt ¼ Ak C m C ~C max r X V c Y de dt ¼Ak E X ~ m E ~E max XM E m c Y E K E þ E k abio C ð2þ ~C ~E k abio ~C K C þ ~C K E þ ~E ð3þ C K C þ C Value Soil classification Silty clay, sand, and clay matrix till Hydraulic conductivity In te range of 10 7 to 10 5 (m/s) Moisture content % (by volume) Porosity % Na mg/l K mg/l Ca mg/l Mg mg/l Fe mg/l Cl mg/l N, NO 2,NO mg/l Dissolved oxygen <1.0 mg/l to 1.5 mg/l concentration Initial microbial species Pseudomonas sp. strain CFS-215, Geobacter sp., and Rodocuccus sp. Strain 33 E K E þ E ð4þ d~e dt ¼ Ak E m E ~E max r X M E V c Y dx dt ¼ m C maxx K C þ C d ~X dt ~C ~E K C þ ~C K E þ ~E E K E þ E bx ~C ~E ¼ m max ~X b ~X K C þ ~C K E þ ~E were C is aqueous pase substrate concentration (substrate mass per unit volume of aqueous pase) (M L 3 ); ~C is substrate concentration in attaced biomass (mass of substrate per unit volume of biomass) (M L 3 ); E is aqueous pase electron acceptor concentration (mass of electron acceptor per unit volume of aqueous pase) (M L 3 ); ~E is electron acceptor concentration in attaced biomass (mass of electron acceptor per unit volume of biomass) (M L 3 ); X is aqueous pase concentration of unattaced biomass (mass of unattaced cells per unit volume of aqueous pase) (M L 3 ); ~X is attaced biomass concentration (mass of attaced cells per volume of aqueous pase) (M L 3 ); A is surface area of a single microcolony (L 2 ); k E is electron acceptor mass transfer coefficient (L T 1 ); k C is substrate mass transfer coefficient (L T 1 ); m max is maximum substrate utilization rate (T 1 ); m c is mass of cells in a single microcolony, m c = r X V c (M); M E is mass of electron acceptor consumed per mass of substrate biodegraded (M M 1 ); r X is biomass density (mass of cells per volume of biomass) (M L 3 ); V c is volume of a single microcolony (L 3 ); Y is yield coefficient (mass of cells per volume of biomass, M L 3 ); K C is substrate alf-saturation coefficient (ML 3 ); K E is electron acceptor alf-saturation coefficient (ML 3 ); k abio is firstorder reaction rate coefficient (for abiotic decay reactions) (T 1 ); b is endogenous decay coefficient (T 1 ); and t is time (T). [18] Reduction of contaminants in te aqueous pase as sown in equation (2) results from tree aspects. Te first term accounts for diffusion of contaminants from te liquid pase across a stagnant film into te attaced biomass. Te second is about te reduction of contaminants by unattaced microorganisms in te bulk liquid, were te reduction rate is affected by concentrations of contaminant and electron acceptor based on te Monod kinetics; substrate competition, nutrient limitation, inibition, and reducing-power limitation can also be incorporated witin tis second term. Te tird term accounts for abiotic losses of contaminants troug first-order reactions. One equation of te same form as equation (2) will be used for eac substrate. Equation (3) describes te loss of substrate witin te attaced biomass [Molz et al., 1986]. It describes te processes of substrate diffusion into te attaced biomass, biodegradation witin te biomass, and abiotic decay. Substrate competition, nutrient limitation, inibition, and reducing-power limitation can also be incorporated witin tis term. Equations (4) and (5) describe te losses of te electron acceptor, wic are of te same forms as equations (2) and (3). Equations (6) and (7) simulate te growt and decay of unattaced and attaced biomass, respectively. ð5þ ð6þ ð7þ 5of19

6 HUANG ET AL.: DEVELOPMENT OF A FORECASTING SYSTEM Solution Metod [19] Te implicit pressure-explicit saturation metod is used for solving te contaminant-transport model [Aziz and Settari, 1979; Cierici, 1994; de Blanc, 1998]. Te only unknown in te pressure equation is te pressure of water pase. Te solution procedures are (a) solve te pressure equation implicitly using a Jacobi conjugate gradient solver to yield water pase pressure in all grid blocks; (b) capillary pressures from previous time step are used to determine te pressure of te oter pases in eac grid block once te water pase pressure is known; (c) te Darcy s law is used to determine te pase velocities; (d) mass conservation equations are solved explicitly to yield concentration of eac component in eac grid block; (e) pase concentrations and saturations are determined troug flas calculations [Wang and Barker, 1997]; (f) new capillary pressures are determined from te new saturations; and (g) repeat te procedures for eac time step till simulation ends. A tirdorder finite-difference metod [Saad et al., 1990; Liu, 1993] tat greatly reduces numerical dispersion effects is used to solve tese equations. Aquifer boundaries are modeled as eiter constant potential surfaces or closed surfaces. Te numerical metod selected for solving te biodegradation equations is SDRIV2 proposed by Kaaner et al. [1989]. Te Gear s metod [Gear, 1971] is used to solve te ordinary differential equations at eac grid block and time step. Te solution to te flow equations is used as te initial conditions for te biodegradation reactions Stepwise-Cluster Analysis [20] Considering ig complexities and computational requirements in incorporating numerical simulation model directly into optimization frameworks, a statistical relationsip between remediation system performance and operating condition will be developed based on a large number of runs for te developed simulation model under various system conditions. In te study system, many variables can be eiter continuous or discrete, and te relations among tem can be eiter linear or non-linear. Terefore, te conventional continuous and linear metods cannot reflect suc complicated caracteristics and relationsips. A stepwise-cluster analysis (SCA) model will ten be developed for tackling suc discrete and non-linear complexities. [21] In te stepwise-cluster analysis, te solutions of te numerical model (benzene concentrations at concerned locations) are considered as dependent variables; te operating conditions are independent variables. If te developed simulation model is run under n scenarios of system conditions, tere will ten be n sets of suc independent and dependent variables (e.g., if te model is run 50 times under various system conditions, ten n = 50). Assume tat tere are m independent variables [e.g., six process control variables, denoted as x =(x 1, x 2,, x m ), were m = 6], and p dependent variables [e.g., benzene concentrations at five concerned locations, denoted as y =(y 1, y 2,, y p ), were p = 5]. Tus, all data can be given by matrixes X =(x tr ) nm and Y =(y ti ) np, were r =1,2,, m, and i =1,2,, p Clustering Principles [22] In te stepwise-cluster analysis, sample sets of dependent variables will be cut or merged into new sets, and values of independent variables will be used as references to judge into wic new set a sample in te parent set will enter. After te completion of te cutting and merging processes, cluster trees could be developed and furter used for predicting new dependent-variable values according to te new independent-variable conditions. [23] Te essence of tis metod is, based on given criteria, to cut one sample set of dependent variables into two, and to merge two sets into one, step by step, in order to classify samples and sieve variables. Let cluster, wic contains n samples, be cut into two sub-clusters e and f (e and f contain n e and n f samples, respectively, i.e., n e + n f = n ). According to Wilks likeliood-ratio criterion, if te cutting point is optimal, te value of Wilks L(L = jwj/jtj) sould be te minimum [Wilks, 1960, 1962, 1963; Kennedy and Gentle, 1981], were T and W are te total-sample SSCP matrix {t ij } and te witin-groups SSCP matrix {w ij }, respectively; and jtj and jwj mean te determinants of matrixes {t ij } and {w ij }, respectively. Wen te L value is very large, clusters e and f cannot be cut, but must be merged into a greater cluster (). By Rao s F-approximation (R-Statistic), we ave: R ¼ 1 L1=S Z S P K 1 L 1=S P ðk 1Þ ð8þ Z ¼ n 1 ðp þ KÞ=2 ð9þ S ¼ P2 ðk 1Þ 2 4 P 2 þ ðk 1Þ 2 5 ð10þ were statistic R is distributed approximately as an F-variate wit v 1 = P (K 1) and v 2 = P (K 1)/2 + 1 degrees of freedom; K is te number of groups; and P is te number of dependent variables. Te R - statistics will reduce to an exact F-variate wen P = 1 or 2, or wen K = 2 or 3. Since te number of groups is two (K = 2 for operating conditions and benzene concentrations at concerned locations) in tis study, an exact F test is possible based on te Wilks L criterion. Tus we ave: FP; ð n P 1Þ ¼ 1 L L n P 1 ð11þ P Terefore, te criteria of cutting and merging clusters become to make a number of F tests [Rao, 1952, 1965; Tatsuoka, 1971] Tests of Optimal Cutting Points [24] To determine te optimal cutting point, n samples in () cluster are sequenced according to te values of x r,k () () in {x r }, i.e., x r,1 r x r,2 r x () r,n r. Ten te totalsample SSCP matrix and witin-groups SSCP matrix of dependant variable y are calculated based on te sequence statistic {k r }: n b ij k r ; n r n r kr B ðþ i ð ¼ t ij n r k r ÞB ðþ i n r n r i B ðþ ð kr ð12þ j k r ðþ ¼ A ij n r n r B i n r B j n r w ij k r ; n r ¼ tij n r bij k r ; n r ÞB ðþ j n r io ð13þ ð14þ 6of19

7 HUANG ET AL.: DEVELOPMENT OF A FORECASTING SYSTEM were A ðþ ij ðuþ ¼ Xu k¼1 B ðþ i or jðuþ ¼ 1 X u u k¼1 y ðþ i;k y ðþ j;k k r ¼ 1 r ; 2 r ; ; n r 1 ; 8r y ðþ i or j;k i; j ¼ 1; 2; ; p; and r ¼ 1; 2; ; m ð15þ ð16þ For eac x r, a cutting point (k* r ) is derived, wic satisfies: L k* r ; n r ð n r 1Þ ¼ min L k r ; n r k r ¼1 r ð17þ [25] For eac independent variable, an index of independent variable wic will be used for cutting judgments, r*, is derived wic satisfies: L k* r* ; n r m ¼ min r¼1 L k r ; n r ð18þ [26] Tus, te optimal cutting point of cluster is k* r*, and te relevant value of independent variable (wic will be used as te reference for new sample prediction) is x r*,k* r* (). Ten te F-test can be undertaken. If F P 0 ; n r* 1 L k* r* ; n r* P 0 1 ¼ L k* r* ; n r* n r* P 0 P 0 F 1 ð19þ is satisfied, ten cluster can be cut into two sub-clusters according to te distribution of x r* : (a) data in dependentvariable set wit k r* k* r* are allocated into sub-cluster e(<f), and (b) data in dependent-variable set wit k r* > k* r* are allocated into sub-cluster f, were P 0 is te number of dependent variables under consideration. Among tese independent variables, x r* is te most important one wic significantly affects te levels of te dependent variables. Conversely, if equation (19) is not satisfied, cluster cannot be cut. Ten te oter clusters will be tested to decide weter to cut or not, i.e., to test =1,2,, H (H is te total number of clusters at te current stage). Wen no cluster can be cut any more, ten te next step is to undertake te mergence of clusters Mergence of Clusters [27] To test te mergence of clusters e and f among te existing H clusters, te following total-sample SSCP matrix and witin-groups SSCP matrix sould be calculated firstly: t ij n e ; n f ðþ e ¼ A ij ðn e ÞþA ðf Þ ij n f ne B ðþ e i ðn e Þþn f B ðf Þ i n e B ðþ e ðn e Þþn f B ðf Þ i n f = n e þ n f n e n f B ðþ e i ð b ij n e ; n f ¼ j w ij n e ; n f n e j ÞB ðf Þ i i n f B ðþ e j ð n e þ n f ¼ tij n e ; n f n e bij n e ; n f n f i ÞB ðf Þ j n f ð20þ i ð21þ ð22þ were A ij and B i or j ave te same formulations as equations (15) and (16); and i, j =1,2,, p. Ten te F- test can be undertaken. If F P 0 ; n e þ n f P 0 1 L n e ; n f n e þ n f P ¼ L n e ; n f P 0 < F 2 ð23þ is satisfied, clusters e and f can be merged into a new cluster. Oterwise, it sould be similarly tested weter te oter clusters can be merged for e =1,2,,(H 1) and f =2,3,, H Prediction [28] After all calculations and tests ave been completed wen all ypoteses of furter cut or mergence are rejected, a cluster tree can be derived for eac dependent variable (i.e., benzene concentrations at concerned locations). Eac cutting point, wic leads to two brances, corresponds to a () value of an independent variable, x r*,k* r*. Wen a new sample set of independent variables {x r } is examined, its () x r* values can be compared wit x r*,k* r* at te cutting points, and classified into te relevant brances. Step by step, te sample will finally enter a tip cluster wic can no longer be furter cut or merged. Te criterion to classify a new sample to te relevant brances is: (a) sample data wit () x r* x r*,k* R* are merged into cluster e(<f ), and (b) sample () data wit x r* > x r*,k* R* are merged into cluster f. [29] Let e 0 be te tip cluster were te new sample enters. Ten te predicted dependent variables {y i } are: ð y i ¼ y e0 Þ i ð R e0 Þ i ð24þ were y i e 0 is te mean of dependent variable i in sub-cluster e 0, and R i e 0 is te radius of y i in cluster e 0 : n e 0 i ¼ max ð R e0 Þ ð y e0 Þ i ¼ 1 X n e 0 n e 0 k¼1 k¼1 ð y e0 Þ i;k n e 0 min k¼1 ð y e0 Þ i;k 8i ð25þ ð y e0 Þ i;k =2; 8i ð26þ 3. Development of Forecasting System 3.1. Numerical Simulation [30] Generally, environmental managers are more concerned over benzene tan te oter tree contaminants due to its ig toxicity; also, during te remediation process, concentrations of toluene, etylbenzene and xylenes (TEX) will become muc lower tan te respective environmental criteria as long as te benzene concentration can meet te environmental standard. Terefore, only benzene concentrations were investigated in tis study. Te experimental results were used for calibrating and verifying te numerical model under various conditions. [31] Te study system was defined as a tree-dimensional (3-D) domain. Te contaminated zone around te groundwater table was considered as te major pollution source. Figure 3 illustrates te simulation domain wit an area of m 2 and a dept of 1.2 m. Vertically, te domain 7of19

8 HUANG ET AL.: DEVELOPMENT OF A FORECASTING SYSTEM Table 2. Main Input Parameters Parameter Value Parameters for flow and transport simulation NAPL spill amount 12 L NAPL spill duration 36 r Permeability of sand/till/clay 1500/430/890 MD Porosity of sand/till/clay 0.45/0.35/0.30 NAPL/water interfacial tension 45 Dynes/cm NAPL density g/cm 3 Longitudinal dispersivity of sand/till/clay 0.1/0.1/0.1 m Transverse dispersivity of sand/till/clay 0.01/0.01/0.01 m Hydraulic gradient 0.03 m/m NAPL/water partition coefficient of benzene Benzene solubility 1750 mg/l Time step day Maximum time step size 10 day Tolerance for concentration cange Simulation period 40 day Parameters for enanced biodegradation simulation Water injection rate 20 L/d NH 4 NO 3 nutrient injection rate 1750 mg/l NH 4 HPO 4 nutrient injection rate 1100 mg/l Heterotrops microorganism injection rate 20 mg/l Oxygen injection rate 8 mg/l Water extraction rate 30 L/d Microorganisms maximum specific growt rate 4.2 per day Biomass density 0.09 g/cm 3 Yield coefficient (g cell/g benzene) 1.0 cells/g soil Half-saturation coefficient 0.77 mg/l Bulk density of porous medium 1.64 g/cm 3 Simulation period 10 day was discretized into four grid blocks corresponding to four simulation layers; eac layer was located at te middle of te grid block tat facilitated te application of a blockcentered finite difference sceme. In te orizontal plain, eac layer was discretized into 24 8 grids. Eac grid ad dimensions of 0.15, 0.15 and 0.30 m in x, y and z directions, respectively. Te total number of grids in tis 3-D computational system was 768 (24 8 4). Layers 3 and 4 were located in te saturated zone, wile layers 1 and 2 were in te unsaturated one. Tree soil types including clay, till and sand occupied te simulation domain, wit teir distributions in te four layers being sown in Figure 2. [32] Te NAPLs initially occupied a contaminated area in layers 3 and 4 around te groundwater table. Te zero-flow boundary conditions were enforced at te top and bottom of te modeling domain, as well as te sides parallel to te x- axis. Constant ydraulic eads were employed at te left and rigt boundaries, allowing continuous water flow in te aquifer. Te calibration and verification processes were undertaken using data obtained from te pilot-scale experiments. Te input parameters related to te benzene transport and biodegradation are presented in Table 2. Figure 4a presents te verification results for Day 15 in pase I (natural attenuation pase). Te temporal variations of benzene concentrations in well 10 are sown in Figure 4b. Te results demonstrate tat a reasonable level of prediction accuracy was obtained. More details of te error analyses are provided in Table 3, indicating tat te absolute errors between te simulated and observed concentrations ranged from 0.08 to 0.85 mg/l wit an average of 0.38 mg/l. Te root-mean-square error (RMSE) was 0.47 mg/l, and te correlation coefficient was 0.99, indicating tat te simulated concentrations were significantly close to te observed ones. Te RMSE could be formulated as follows: RMSE ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 X n ðy i y n i Þ 2 i¼1 ð27þ were y i is te observed value, y i is te predicted value, and n is te number of samples. Te mean relative error was 36.9% wit most of te error levels being witin te range of 10 to 20%. Two significant differences occurred at wells 5 and 7 were te observed values were significantly lower tan te simulated ones. Tis could be due to sampling and testing errors; moreover, in te pilot-scale experiments, te tank walls would limit contaminants from furter movement and diffusion wile an unlimited boundary was assumed in te numerical simulation. Similarly, te modeling results for pase II (i.e., bioremediation pase) were verified troug te observed data. Figure 4c sows te verification results for Day 8 in pase II. Te performance of te developed numerical model was evaluated troug a number of statistical analyses. Table 4 sows a mean absolute error of 0.21 mg/l and a maximum error of 0.4 mg/l. Te rootmean-square error (RMES) and correlation coefficient were 0.27 mg/l and 0.93, respectively. [33] In Table 4, te mean relative error was 35.1% wit a maximum of 65.9%. Te differences between te predicted and observed concentrations were generally acceptable. A few exceptions exist due to sampling errors as well as complexities in subsurface stratification and microbial reactions. Te observed concentrations in wells 1 and 2 were zero, indicating tat te benzene concentrations were lower tan te detection limit. Also, te flow rate of groundwater in te pilot was very low; tis led to difficulties in maintaining te rate of eac extraction well at a fixed level. Tus, tis rate varied randomly witin a small range wile a fixed rate was used in te simulation model. Tese complexities resulted in te raised errors. [34] In general, bot models for pases I and II could reasonably simulate te transport and biodegradation of te contaminant. Te verified models could ten be used for investigating te effects of parameter uncertainties on benzene concentrations, performing furter assessment of risks due to uman ingestion of te contaminated water, and supporting process control of detailed bioremediation practices Scenario Design and Simulation Results [35] Te developed NAPL-biodegradation model could ten be used for simulating te system s responses under various operating conditions for site remediation. However, it would bring ig complexities if te developed simulation model was directly incorporated witin te optimization framework. In tis study, a stepwise-cluster analysis metod was proposed to establis a relationsip between te remediation efforts (i.e., injection and extraction flows, and oxygen and biomass concentrations in te injected flows) and te system s responses (i.e., benzene concentrations). Te developed simulation model was used to generate a large number of inputs and outputs for supporting te establisment of suc a relationsip. [36] According to te caracteristics of te soil profile, te NAPL fate and transport, te contaminant-plume move- 8of19

9 HUANG ET AL.: DEVELOPMENT OF A FORECASTING SYSTEM Figure 4. Verification of te developed modeling system. Table 3. Error Analysis for Pase I Modeling Results Well Number Observed Concentration, mg/l Simulated Concentration, mg/l Relative Error, % Absolute Error, mg/l / / Mean error Root mean square error 0.47 Correlation coefficient of19

10 HUANG ET AL.: DEVELOPMENT OF A FORECASTING SYSTEM Table 4. Error Analysis for Pase II Modeling Results Well Number Observed Concentration, mg/l Simulated Concentration, mg/l Relative Error, % Absolute Error, mg/l / / Mean error Root mean square error 0.27 Correlation coefficient 0.93 ment, and te benzene concentrations in six wells (i.e., wells 5, 7, 8, 10, 11 and 12) were regarded as indicators of contamination situation in te groundwater. Benzene concentrations in tese wells were denoted as x 1, x 2, x 3, x 4, x 5 and x 6. In wells 5, 7 and 11, benzene concentrations at te fort soil layer were monitored; in te oter tree wells (8, 10 and 12), te concentrations at te tird soil layer were investigated. In order to reflect as many contamination situations as possible, a large range of benzene-concentration levels was examined. Te maximum benzene concentration was 30 mg/l, and te minimum was 0 mg/l. Witin tis range, 50 concentration levels were generated randomly for eac concerned well suc tat 50 contamination situations were produced (Table 5). [37] Enanced in situ biodegradation was implemented in conjunction wit a pump-and-treat system to circulate nutrients and oxygen troug te contaminated aquifer. Te process involved (a) te introduction of aerated and nutrient- and biomass-enriced water into te contaminated zone troug two injection wells, and (b) te recovery of te down-gradient water troug two extraction wells. Te amendments were circulated troug te contaminated zone to provide mixing and intimate contacts among te oxygen, nutrients, contaminants, and microorganisms. Terefore, te injection- and extraction-flow rates would directly affect contaminant removal efficiency and operating cost; moreover, microbial concentration was anoter key factor wic affected te remediation efficiency. Microbial activities could be greatly affected by oxygen concentration; oxygen deficiency could significantly reduce biodegradation rate. On te oter and, injection- and extraction-flow rates, oxygen addition rates, and biomass addition rates were related to te operating cost. In tis study, injection- and extraction-flow rates and oxygen/biomass concentrations in te injected flows were identified as te main control factors tat determined te efficiency and cost of te bioremediation process. [38] Te ranges of injection- and extraction-flow rates were determined based on te consideration of soil porosities and permeabilities and te testification of tese properties troug te developed simulation model. Te maximum flow rate was set as 20 L/day wile te minimum was 0 L/day. Te range of biomass concentrations in te injected flows was between 0 and 40 mg/l. Te range of oxygen concentration was between 10 and 400 mg/l. In total, 50 scenarios of te operating conditons were randomly generated (Table 6). Te relevant control variables were denoted as u 1, u 2, u 3, u 4, u 5, u 6, u 7 and u 8. Here, u 1 is te injection rate at well 2; u 2 is te injection rate at well 6; u 3 is te extraction rate in well 7; u 4 is te extraction rate at well 11; u 5 is te oxygen concentration in injection flow at well 2; u 6 is te oxygen concentration in injection flow at well 6; u 7 is te biomass concentration in injection flow at well 2; u 8 is te biomass concentration in injection flow at well 6. All of tese are continuous variables. [39] Combinations of te 50 contamination-level scenarios and te 50 operating-condition scenarios led to 2500 scenarios. Correspondingly, 2500 input files were produced for te NAPL-biodegradation model. Te experimental results indicated tat benzene concentrations in te groundwater reduced significantly eigt days after te remediation started. Terefore, a 10-day period was simulated. Te results indicated tat te removal rates of benzene varied significantly under different operating-condition scenarios. For example, in well 11 under scenarios 1 and 11, te initial benzene concentrations were 2.88 and 2.49 mg/l, respectively. After te 10-day remediation period, te concentrations under scenario 1 became to mg/l, and tose under scenario 11 became to mg/l. It was tus desired tat cost-effective operating conditions be identified under eac contamination-level scenario based on te modeling results Development of te Forecasting System [40] For eac contamination-level scenario (x 1 0, x 2 0, x 3 0, x 4 0, x 5 0 and x 6 0 ), 50 sets of data about (a) te respondent benzene concentrations at te concerned locations (x 1, x 2, x 3, x 4, x 5, and x 6 ) and (b) te operating conditions of enanced in situ biodegradation (u 1, u 2, u 3, u 4, u 5, u 6, u 7, and u 8 ) at a 10-day time period could be obtained troug running te developed simulation model. Significant correlations existed between set (x 1, x 2, x 3, x 4, x 5, x 6 ) and set (u 1, u 2, u 3, u 4, u 5, u 6, u 7, u 8 ). Terefore, multivariate analysis can be used for establising a correlative relationsip [Wilks, 1963; Tatsuoka, 1971]. [41] In total, 300 cluster trees were obtained troug te stepwise-cluster analysis, forming a set of forcasting sys- 10 of 19

11 HUANG ET AL.: DEVELOPMENT OF A FORECASTING SYSTEM Table 5. Fifty Levels of Contamination Situations a Scenario Well 5 x 1 Well 7 x 2 Well 8 x 3 Well 10 x 4 Well 11 x 5 Well 12 x a Units are in mg/l. tems. Figures 5 7 present tree cluster trees as examples to illustrate te modeling results. Figure 5 is te cluster tree of benzene concentration at well 7 (variable x 2 ) under contamination-level scenario 12; Figure 6 is te tree at well 5 (variable x 1 ) under scenario 34; Figure 7 is te tree at well 10 (variable x 4 ) under scenario 50. In tese cluster trees, te criteria for cutting and merging clusters are: (a) cut a cluster wen P < 0.05, and (b) merge clusters wen P > 0.05, were te P values sown at cutting and merging knots are significance levels of te F-test; P > 0.05 suggests no statistically significant difference between two clusters. [42] Te significance levels of different control variables (u 1, u 2, u 3, u 4, u 5, u 6, u 7 and u 8 ) are also clearly sown in te cluster trees. As indicated in Figure 5, te extraction flow rate at well 7 is te most significant variable tat affects te benzene concentration since groundwater is continuously extracted from tis well. Te benzene concentrations at well 7 are mainly related to variables u 1, u 2, u 3, u 4 and u 8 (i.e., injection flow rates of wells 2 and 6, extraction flow rates of wells 7 and 11, and biomass concentration in te injection flow for well 6). In Figures 6 and 7, benzene concentrations at wells 5 and 10 are related to all operatingcondition variables (u 1, u 2, u 3, u 4, u 5, u 6, u 7 and u 8 ) since te injection and extraction flow rates will directly affect contaminant transport and degradation; moreover, due to teir proximity, oxygen and microorganisms can be easily transported from injection wells 2 and 6 into wells 5 and 10. Consequently, benzene concentrations at wells 5 and 10 are related to oxygen and biomass concentrations in te injection flows for wells 2 and 6 (i.e., u 5, u 6, u 7 and u 8 ). [43] Based on tese trees, te benzene concentrations (x 1, x 2, x 3, x 4, x 5 and x 6 ) can be predicted given te inputs of 11 of 19

12 HUANG ET AL.: DEVELOPMENT OF A FORECASTING SYSTEM Table 6. Fifty Scenarios of Operating Conditions Scenario Injection Flow Rate at Well 2, L/day u 1 Injection Flow Rate at Well 2, L/day u 2 Extraction Flow Rate at Well 7, L/day u 3 Extraction Flow Rate at Well 11, L/day u 4 Oxygen Concent. at Well 2, mg/l u 5 Oxygen Concent. at Well 6, mg/l u 6 Biomass Concent. at Well 2, mg/l u 7 Biomass Concent. at Well 6, mg/l u operating conditions for site remediation. For example, under contamination-level scenario 34, let u 1 =11,u 2 = 14, u 3 = 19, u 4 =9,u 5 = 150, u 6 = 80, u 7 = 20 and u 8 =30 be te input operating conditions. To predict benzene concentration at well 5, we ave: u 2 > 9.7 for te first branc knot so tat te sample enters cluster 3 (Figure 6); u 5 < 248.2, so tat it enters cluster 6; u 2 < 18.1, so tat it enters cluster 11; u 1 < 19.9, so tat it enters cluster 21; u 6 > 38.1, so tat it enters cluster 28; u 1 > 6.5, so tat it enters cluster 33; u 5 > 81.4, so tat it enters cluster 38; u 3 > 10.6 so tat it enters cluster 43; u 4 > 1.4, so tat it enters cluster 49; and u 2 > 12.2, so tat it finally enters cluster 53 wit a prediction value of ± (mg/l). Tis implies tat te benzene concentration at well 5 ten days after te remediation is started will be witin te range of to mg/l. Tis simple example only demonstrates one cluster tree for benzene-concentration prediction at well 5 under contamination-level scenario 34. Predictions for te oter wells under te oter scenarios can be similarly undertaken. [44] Performance of te stepwise-cluster analysis metod was analyzed based on te resulting cluster trees under contamination-level scenario 34. In tis scenario, tere were 50 data sets of respondent benzene concentrations. Among tem, 43 sets were used for producing cluster trees for variables x 1, x 2, x 3, x 4, x 5 and x 6 ; ten te remaining 7 sets were used to verify te prediction accuracy. Te evaluation results are summarized in Table 7. It is indicated tat te mean absolute errors were 0.497, 1.153, 0.222, 0.141, of 19