A TOOLKIT FOR THE DEVELOPMENT AND APPLICATION OF PARSIMONIOUS HYDROLOGICAL MODELS

Size: px

Start display at page:

Download "A TOOLKIT FOR THE DEVELOPMENT AND APPLICATION OF PARSIMONIOUS HYDROLOGICAL MODELS"

Barrie Gray
5 years ago
Views:

1 A TOOLKIT FOR THE DEVELOPMENT AND APPLICATION OF PARSIMONIOUS HYDROLOGICAL MODELS Thorsten Wagener, Matthew J. Lees, Howard S. Wheater Department of Civil and Environmental Engineering, Imperial College of Science Technology and Medicine, London, SW19 2BU, UK In Singh, Frevert, Meyer (eds.) Mathematical models of small watershed hydrology - Volume 2. Water Resources Publications LLC, USA. ABSTRACT: A modelling toolkit is described which has been developed to produce parsimonious model structures with a high degree of parameter identifiability. This is necessary if sensible relationships between model parameters and catchment characteristics are to be established, for example for regionalization studies. The toolkit contains two major components. The first is a rainfall-runoff modeling system with a generic architecture of lumped, conceptual or metric-conceptual model elements, which allows alternative model structures to be rapidly constructed and tested. The second component is a Monte-Carlo analysis toolbox combining a number of analysis tools to investigate parameter identifiability, model behaviour and prediction uncertainty. Two example applications are presented. These illustrate the use of multiple objective functions to extract information from a single output time-series for analysis of parameter sensitivity and identifiability, and the trade-off between model complexity and identifiability. 1. INTRODUCTION Hydrological models are well-established tools that are widely utilized in engineering practice. The majority of model structures currently used can be classified as conceptual. Adopting the definition given by Wheater et al. (1993), conceptual model structures have two important characteristics: (1) their model structure is specified prior to any modeling being undertaken; and (2) (at least some of) the model parameters do not have a physical meaning, in the sense of being independently measurable, and have to be estimated through calibration against observed data. Conceptual model structures suffer from a number of problems despite their frequent use and development over a number of decades. One of the major constraints is the lack of parameter identifiability, i.e. different combinations of parameters (e.g. Johnston and Pilgrim, 1976; Beven and Binley, 1992), or even different model structures (e.g. Uhlenbrock et al., 1999) yield similar results in terms of a defined performance measure, or objective function. This results in difficulties in interpreting past behaviour of the catchment system, and hence in the propagation of uncertainty into future predictions in the form of wide confidence limits, i.e. a wide range of possible system behaviours (Wheater et al., 1986; Mroczkowski et al., 1997). The need for model calibration is a major limitation when ungauged catchments, where no streamflow measurements are available, have to be modelled. One approach to deal with this problem is the regionalization or regional transfer of SINGH 1

2 parameters of a certain model structure (e.g. Jakeman et al., 1992). A selected model structure is calibrated to a large number of catchments and statistical relationships between the parameters and the characteristics of the catchment such as size, land use or soil types are established. These relationships can then be used to derive parameter values for an ungauged catchment. Uncertainty in the model parameters due to a lack of identifiability significantly limits the use of models for this kind of regionalization because it is difficult to establish sensible statistical relationships (e.g. Moore and Clarke, 1981; Kuczera, 1983; Wheater et al., 1993). A model structure with identifiable parameters, i.e. a high regionalization potential (Lees et al., 1999), is therefore a prerequisite for successful regionalization. Possible directions of improvement with respect to producing better identified models are: (1) the reduction of model complexity to contain only the components, and therefore parameters, that can be identified from the available data (i.e. parsimonious modeling, e.g. Jakeman and Hornberger, 1993; Young et al., 1996; amongst others), (2) the improved use of available information, e.g. using different data periods to identify different parameters or groups of parameters (e.g. Wheater et al., 1986; Dunne, 1999; Wagener et al., 2000a; amongst others), and (3) the use of additional information, i.e. multi-response data such as water quality data, groundwater levels, or tracer measurements (e.g. de Grosbois et al., 1988; Ambroise et al., 1995; Kuczera and Mroczkowski, 1998; Seibert, 1999; amongst others). It should be noted that the use of additional output variables is unlikely to be particularly useful with respect to regionalization studies, since multi-response data are not commonly available. Therefore this approach is not investigated further here; instead we focus on methods of reducing model complexity and increasing the information that can be retrieved from streamflow measurements. To this end, a toolkit has been developed to enable the development, analysis and comparison of model structures of different levels of complexity. The aim is to identify the appropriate level of complexity that yields a sufficiently high level of performance, whilst retaining an acceptable level of parameter uncertainty. The toolkit is described, and a limited number of modeling exercises are presented to illustrate its use. 2. PARSIMONIOUS RAINFALL-RUNOFF MODELING Different researchers have observed that the number of parameters required to describe the key behaviour of environmental systems is often quite low (e.g. Jakeman and Hornberger, 1993; Young et al., 1996). Increasing the degree of model complexity above a certain level does not result in significantly improved performance (e.g. Naef, 1981; Hornberger et al., 1985; Refsgaard and Knutsen, 1996). Instead, the problem arises that many parameter combinations, often widely distributed over their individual feasible range, lead to acceptable model performance (e.g. Beven and Binley, 1992; Spear, 1995; Kuczera and Mroczkowski, 1998). Results from previous research in the field of rainfall-runoff modeling suggest that up to five or six parameters can be identified from streamflow and rainfall data using traditional single-objective calibration schemes (e.g. Kirkby, 1975; Hornberger et al., 1985; Wheater et al., 1986; Beven, 1989; Jakeman and Hornberger, 1993; Ye et al., 1997; Gaume et al., 1998; amongst others). Some researchers have therefore concluded that only models with no more than half a dozen parameters are required to describe the behaviour of a catchment with respect to the production of streamflow (e.g. Beven, 1989; Jakeman and Hornberger, 1993; Beck, 1987). These findings have led to the investigation of less complex (parsimonious) model structures that capture the key response modes of the hydrological system (e.g. Hornberger et al., 1985; Jakeman and Hornberger, 1993; Young et al., 1996; amongst others). SINGH 2

3 The principle of parsimony requires models to have the simplest parameterization that can be used to represent the data (Box and Jenkins, 1976). Parsimonious models have reduced problems of identifiability since only model parameters justified by the evidence, i.e. the data, are kept. The principle of parsimony, also known as Ockham's Razor, was advocated by the Franciscan monk William of Ockham in the early 14th century in statements such as 'plurality should not be posited without necessity', or 'what can happen through fewer [principles] happens in vain through more' (Spade, 2000). The principle can be described as a statement of cautious scientific method (Spade, 2000). The approach of retaining only necessary components ensures that the model components used are positively affirmed. However, using this approach in the context of rainfall-runoff modeling does not guarantee that all necessary model components are identified. Careful consideration is therefore required to ensure that the model does not omit one or more hydrologic processes important for a particular problem. A model structure that is too simple in terms of the number of processes represented can be unreliable outside the range of catchment conditions (i.e. climate and land use) on which it was calibrated (Kuczera and Mroczkowski, 1998). It is therefore vital to use data with high information content in order to ensure that the main response modes can be observed (Gupta and Sorooshian, 1985; Yapo et al., 1996). Young et al. (1996) describe how the principle of Ockham's razor can be applied to the modeling of environmental systems using a combination of Monte-Carlo techniques, dominant mode analysis, and data-based mechanistic modeling. Popper (1983) advocated the use of simple theories based on their degree of testability. The justification is that simpler theories apply more restrictions on how a system is allowed to behave and therefore are easier to test than complex theories. System behaviour not permitted by the theory will lead to rejection or modification of the theory. The degree of testability of a theory is therefore proportional to the amount of behaviour prohibited by it (Popper, 1983). In the context of hydrological modeling, it is important to consider that the testability of a model structure will improve in cases where an increasing number of output variables exists that can be compared to measured variables, e.g. predictions of groundwater levels or saturated areas (Nash and Sutcliffe, 1970; Seibert, 1999). Additional information is then available to test potential models, and hence to reject those that show behaviour which does not conform with observations. The basic assumption underlying this idea is that it is not possible to show that a certain theory (model) is correct, i.e. to validate it, but that it is only possible to say that there is insufficient evidence to refute or falsify the theory. In other words, regardless of how often we see a white swan, we can not conclude that all swans are white, i.e. it is not possible to verify (originating from the Latin word verus meaning true) the hypothesis that all swans are white. However, the observation of a blue swan would lead to the rejection of this hypothesis, i.e. it can be falsified (Magee, 1973). Different researchers discuss how this concept of falsifiability (Popper, 1983) could be applied in the context of hydrological modeling (e.g. Dooge, 1986; Mroczkowski et al., 1997, Seibert, 1999). This evidence suggests that striving for parsimonious model structures is advantageous. It is this aim that has motivated the development of the toolkit described here. 3. ANALYTICAL MODELING FRAMEWORK The aim of the analytical modeling framework proposed here is to achieve a balance between the required level of model structural complexity and the complexity that can be supported by the available data (Fig. 1). A hydrologist's perception of a given hydrological system strongly influences the level of conceptualization that must be translated into the model structure. The importance of different system response modes (i.e. key processes that need to be simulated by the model), however, depends upon the intended modeling purpose. SINGH 3

4 Therefore, the level of model structural complexity required must be determined through careful consideration of the key processes included in the model structure, and the level of prediction accuracy necessary for the intended modeling purpose. PURPOSE SYSTEM CONCEPTUALISATION DATA MODEL COMPLEXITY REQUIRED SUPPORTED PERFORMANCE SUFFICIENT UNCERTAINTY ACCEPTABLE Fig. 1. Proposed analytical framework for model development and application. The level of structural complexity actually supported by a given data set, is defined here as the number of parameters that can be identified by the information contained within the observed data. Unidentifiable parameters often lead to uncertainty in model predictions, i.e. relatively wide confidence limits, which limit the reliability of predictions and their use in any decision making process. They also limit the suitability of the model structure for purposes such as regionalization as described earlier. The task therefore is to balance the performance of the model and the identifiability of its parameters such that sufficient prediction accuracy is gained, whilst uncertainty is reduced to an acceptable level. 4. RAINFALL-RUNOFF MODELING COMPONENT 4.1 GENERAL As noted above, we seek the development of a model structure of appropriate complexity with respect to model performance and associated uncertainty. The philosophy behind this is the recognition that no model structure is suitable for all modeling exercises, but that the appropriate model structure is a function of (Wagener, 1998): (1) the modeling objectives (e.g. required spatial and temporal discretisation, relevant response modes to be simulated), (2) the characteristics of the hydrological system under investigation (e.g. dominant processes, response times of the system), and (3) the available data (e.g. possible discretization). An increasing number of modeling shells with different levels of complexity can be found in the literature (for example, Overland and Kleeberg, 1993; Woods and Ibbitt, 1993; Leavesley, 1998; Baalen et al., 1997; amongst others). These systems give their user the possibility to test the suitability of different model components and to combine them in a modular fashion. Components can be modified or added if none of the available components fulfils the problem-specific requirements. The Rainfall-Runoff Modeling Toolbox (RRMT; Wagener et al., 1999) has been developed in particular to produce parsimonious, lumped model structures with a high level of parameter identifiability. SINGH 4

5 The RRMT is a generic modeling framework or shell that allows its user to implement different model structures in accordance with the framework outlined above. It can be considered therefore to represent a modeling concept, rather than a specific model structure. The RRMT is implemented in the MATLAB (Mathworks, 1996) programming environment. The model structures that can be implemented are spatially lumped with low or medium levels of complexity (in terms of number of parameters). They can be classified as conceptual or hybrid metric-conceptual in type (Wheater et al., 1993). The latter is related to a systems approach to hydrologic modeling (see examples in Kleissen et al., 1990; Jakeman et al., 1990; Jakeman and Hornberger, 1993). The aim of this approach is to use observations (the metric paradigm) and other prior knowledge to test hypotheses about the structure of component hydrological stores (the conceptual paradigm) at catchment scale (Wheater et al., 1993). The restriction to a lumped approach could be seen as a limitation since it is often assumed to be only suitable for small catchments that are relatively homogeneous in terms of soil, vegetation and geology (Blackie and Eeles, 1985). However, experience with lumped models has shown that they can be valid over a wide range of catchment sizes, and that the aggregated response of a catchment can often be characterized well by such a spatially aggregated model (e.g. Littlewood and Jakeman, 1992; Jakeman and Hornberger, 1993; Jolley, 1995). Blackie and Eeles (1985) point out that the most important criteria for the suitability of a lumped approach is not the homogeneity of the catchment characteristics, but the stability of the catchment system. This stability is required with respect to the spatial distribution of precipitation, vegetation and soil. 4.2 SYSTEM ARCHITECTURE The RRMT system architecture is based on a modular structure. The modeling component consists of a moisture accounting and a routing module (Fig. 2); other available modules include optimization, visual analysis, and off-line data processing options. Different approaches can be used to represent each module, and a set of alternatives, broadly representative of the range of current modeling practice, is provided. However, all modules, for example routing or objective function modules, have a specified input and output structure, and can therefore be easily replaced by new or modified modules, so long as they conform to this structure. The moisture accounting module, which transforms rainfall into effective rainfall, predominantly represents evapotranspiration and associated soil moisture storage. The routing module simulates the lateral flow processes through various pathways, i.e. overland flow, throughflow and groundwater flow (Ward and Robinson, 2000). GUI OPTIMIZATION MODULES HYBRID MODEL ARCHITECTURE AET VISUAL ANALYSIS MODULES P T PET MOISTURE ACCOUNTING MODULE ER ROUTING MODULE Q OFF-LINE DATA PROCESSING MODULES MOISTURE STATUS Fig. 2. System architecture of the Rainfall-Runoff Modeling Toolbox. SINGH 5

6 A Graphical User Interface (GUI) for the RRMT allows easy access to the toolbox functions. However, since subsequent modeling of a large number of catchments may be required, for example in regionalization studies, it is also possible to run the model from the command line. This allows the user to write a batch file which loads data, changes model settings, runs a calibration, and stores the results for many different cases. Within the batch file, all components of the modeling procedure can be replaced, e.g. the user can try different routing modules or different objective functions for the same data, or perform simulations for different years of data. 4.3 MOISTURE ACCOUNTING MODULES The moisture accounting module divides the incoming rainfall into losses, through evapotranspiration and associated storage, and water which appears as an output from the catchment system (i.e. effective rainfall). A general water balance equation describing this process can be written in its implicit form as follows (e.g. Hornberger et al., 1998; Blackie and Eeles, 1985) dv dt = P Q AET (1) where V [L 3 ] is the volume of water stored in the catchment, P [L 3 T -1 ] is the precipitation rate, Q [L 3 T -1 ] is the rate of surface and subsurface runoff, and AET [L 3 T -1 ] is the rate of actual evapotranspiration. The moisture accounting module can be represented with varying levels of complexity. Approaches currently implemented in the RRMT (Fig. 3) range from conceptual water balance structures such as those based on Penman's drying curve concept (Penman, 1949; see for example Jolley, 1995) or the Catchment Moisture Deficit (Evans and Jakeman, 1998), to more empirical loss functions related to the Antecedent Precipitation Index (API) such as the Catchment Wetness Index (CWI, Whitehead et al., 1979; Jakeman et al., 1990; Jakeman and Hornberger, 1993). These are typical approaches adopted in parsimonious moisture accounting components, and can be found in many popular model structures. The CWI is a simple loss function closely related to the well known antecedent precipitation index. The proportion of rainfall r k, at every time step k, contributing to runoff, i.e. the effective rainfall u k, is determined by the CWI, in this case based on s k, which is an indication of the soil moisture state of the catchment, and ranges from zero to one, 1 = ( s k + s k ) r k (2) 2 u k 1 the value for s k is calculated using the following equation, 1 s k = crk + 1 sk 1, (3) τ( tk ) where c is a factor introduced to ensure that the total volume of modeled effective rainfall equals the total volume of observed streamflow. The parameter c is therefore not calibrated, but is calculated explicitly. Writing this equation in a slightly different form provides more insight into the model structure, sk c sk 1 sk 1 + (4) c cτ( t ) = rk k SINGH 6

7 This equation shows that the modulated system moisture 'state' at time step k equals the sum of the rainfall input at time step k and the modulated system state at time step k-1, minus the depletion by losses to stream and evapotranspiration described by τ(t k ), see Fig. 3a. The depletion is related to temperature, which is used as a surrogate for potential evapotranspiration, by the following equation, τ ( t ) = τ exp[( RT t ) f ] (5) k w where the reference temperature RT is usually fixed to a nominal value depending on the climate (e.g. RT=10 C for England and Wales (Sefton and Howarth, 1999), RT=20 C for warmer climates (Jakeman et al., 1994)). The two parameters which have to be calibrated are τ w, the time constant of catchment losses at RT, and the temperature modulation factor f, which relates a unit change in temperature to the change in loss rate. Note that the temperature can be replaced by potential evapotranspiration when available (Niadas, 1999). Different modifications of the CWI have been developed to account for specific hydrologic catchment characteristics, for example a power transformation to s k has been used successfully in catchments with a very flashy response, and the use of a threshold for s k, below which no rainfall occurs, to enable the modeling of ephemeral rivers (Fig., 3b; Jakeman et al. 1994). One conceptual water balance component implemented in the RRMT is a version of the Penman model structure used by Jolley (1995) that is based on the Penman drying curve concept (Penman, 1949). The concept assumes that actual evapotranspiration (ae) occurs at the potential (pe) rate while water is available in the root zone or root-reservoir. The root zone is extended by an additional 25mm to allow for capillary rise. The actual rate decreases to a percentage of the potential rate when this soil zone is depleted. The parameterization of this concept is a conceptual structure with two stores. The size of the upper store is equal to the 'root constant' plus the aforementioned 25mm, and evapotranspiration from it takes place at the potential rate as long as water is available. Values for the root constant can be selected as a function of vegetation from tables (e.g., Grindley, 1970). However, treating the root constant as a free parameter, identified by calibration, can lead to an improvement in model performance (Sheratt, 1985). The upper store is connected to a lower store of "notional infinite depth" (Moore, 1999) via an overflow mechanism. Actual evapotranspiration continues at a fraction g = 8% (~1/12) of the potential rate (as suggested by Penman, 1949) from the lower store, after the upper store is depleted. The amount of actual evapotranspiration from upper (ae u ) and lower store (ae l ) is therefore calculated as follows, k pe, D1 < S ae u = 0, D1 = S max1 max1 (6a) g( pe r S ae l = 0, max1 + D1), D1 = S D1 < S max1 max1 (6b) where pe is the potential evapotranspiration, r is the rainfall input, D1 is the moisture deficit in the upper store, and S max1 is the size of the upper store, i.e. the root constant plus 25mm. SINGH 7

8 The effective rainfall u is produced through two mechanisms: (1) a percentage φ (usually 15%, Moore, 1999) is bypassed as a direct contribution to rapid groundwater recharge or infiltration excess overland flow, φ( r pe) r > pe u1 = (7) 0 r pe and, (2) saturation excess runoff is produced when both stores are full, {(1 φ)( r pe) D1 2;0} u2 = max D (8) where D2 is the soil moisture deficit of the lower store. So that the total effective rainfall u can be calculated as, u = u1 + u2 (9) The main advantage of the Penman model structure is its efficiency in terms of the number of parameters. The predominantly vertical processes described by the moisture accounting component usually imply spatial homogeneity of response. This assumption is usually not valid for entire catchments, although good results are obtained in many cases (Blackie and Eeles, 1985; Jakeman and Hornberger, 1993). To address this limitation, Moore and Clarke (1981) introduced a probability distribution approach to storage capacity to account for a heterogeneous catchment response in lumped modeling, e.g. due to the formation of saturated areas in the catchment. A version of this probability distribution loss function is available in the RRMT. Detailed descriptions of this approach are available in Moore and Clarke (1981), or Moore (1999). SINGH 8

9 r k r k sk cτ(t 1 k ) s k-1 /c s k /c r k r sk 1 k cτ(t ) k s k-1 /c s k /c l (a) u k =0.5(s k +s k-1 )*r k (b) u k =[0.5(s k +s k-1 )] p *r k ae k r k u k2 S max1 S max2 u k1 AE 1 1 g 1 (c1) u k (c2) PE ae k + CMD c4 r k c3 D k 1 F(c) (d) (e) 0 c Fig. 3. Schematic plots of the different moisture accounting modules available: (a) Catchment Wetness Index (CWI), (b) modified CWI, (c1) Conceptual Penman model structure, (c2) Penman drying curve, (d) Catchment Moisture Deficit model structure (CMD), (e) Storage capacity distribution function. 4.4 ROUTING MODULES The moisture accounting component produces that part of the rainfall that is contributing to runoff, usually called effective or excess rainfall. One or more routing components are typically applied to introduce retention and translation processes occurring when the contributing rainfall moves to the catchment outlet via different pathways. Even in complex models, these are often represented by relatively simple structures. The most commonly used conceptual element to describe this transfer from effective rainfall to runoff is the conceptual reservoir or conceptual store. The behaviour of this reservoir can be described by combining two equations. A storage function can be defined to describe the relationship between outflow of the reservoir and the amount of water stored, SINGH 9

10 n S( t) = a Q ( t) (10) where S(t) is the storage [L] at time t, Q(t) is the outflow [L/T] at time t, a is the storage coefficient [L 1-n T n ], and n is the coefficient of non-linearity [-]. Additionally a mass balance equation describes the change in storage S(t)/dt [L] as the difference between inflow (u(t) [L/T]) and outflow (Q(t) [L/T]) rates, S( t) = u( t) Q( t) (11) dt Jakeman and Hornberger (1993) show that the non-linearity of the relationship between rainfall and runoff can often be accounted for through the consideration of antecedent moisture conditions in a loss function. The consequence is that the remaining transfer from effective rainfall to streamflow can be approximated by a linear relationship. This leads to the most common form of the conceptual reservoir, the linear reservoir (n = 1). In this case the parameter a becomes the residence time T [T], and the outflow of the reservoir is directly proportional to the storage. The storage function and the mass balance equation can be combined to yield the following model, dq 1 = [ u( t) Q( t)] (12) dt T The advantage of the linear reservoir model is computational efficiency since it can be solved analytically. It can be shown that the linear reservoir is identical to the following first-order discrete-time Transfer Function (TF) (Lees, 2000), b Q = ; 0 t u 1 t 1 a1z b t = T 0 ; t a1 = 1 (13) T where t is the discretization step size, and the backward shift operator z -i is defined as, z i Q t = Q (14) t i Chow et al. (1988) describe how a general form of the TF model can be derived from a general hydrologic system model (Chow and Kulandaiswamy, 1971). The general form of an n th order single input, single output (SISO) discrete time-system can be written as, Q 1 B( z ) t = u 1 t δ A( z ) (15) where δ represents a lag element, and A(z -1 ) and B(z -1 ) are the following polynomials, z 1 a z 1 a p p z ( ) = (16) A SINGH 10

11 and B z b b z 1 b m m z ( 1 ) = (17) The TF model structure is therefore described by the triad [p,m+1,δ]. This general model can represent any combination of linear reservoirs connected in parallel and/or series as shown in Fig. 4 (Lees, 2000), using partial fraction expansion to perform the decomposition (Young, 1992). The advantages of the representation of a linear conceptual reservoir in TF form include the availability of powerful system identification techniques for optimal parameter estimation, and increased structural flexibility (Lees, 2000). A system identification technique that has been successfully used to identify and estimate TF models in the context of rainfall-runoff modeling is the Simple Refined Instrumental Variable technique (SRIV, Young et al., 1996). A TF identified by the SRIV algorithm is one of the routing component options contained in the RRMT. LR LR inflow LR LR outflow LR LR Fig. 4. General linear reservoir model. The number of reservoir elements required is, amongst other things, dependent upon the modeling time scale selected. A large number of studies have shown that the most common configuration identified for a daily time scale, when using the SRIV technique, is two reservoirs in parallel (e.g. Young, 1992; Jakeman and Hornberger, 1993; Lees, 2000). This structure is commonly used in rainfall-runoff modeling (Moore, 1999), although the use of the TF approach ensures that a parallel structure will not be used unless the data support this level of complexity, and indeed conversely more complex structures are sometimes identified (Lees and Wagener, 2000b;c). In the common situation where a parallel structure is objectively identified, the two reservoirs can be considered to represent a quick and a slow response component. These are often interpreted as quickflow and baseflow processes, although these two components aggregate a number of hydrological pathways (Hornberger et al., 1999; Ward and Robinson, 2000). A single reservoir is usually sufficient in the case of a coarser discretization in time, e.g. monthly (Jolley 1995) or when a baseflow component is absent, e.g. ephemeral rivers (Jakeman et al., 1994). Physical realism of the selected routing structure is an important criterion if the final aim of the modeling exercise is regional transfer of model parameters. If the slow response component is mainly associated with contributions from groundwater flow, use of a linear approach assumes that the outflow of the groundwater reservoir is directly proportional to the hydraulic head (Hornberger et al., 1998). Cases where this assumption is valid include confined aquifers with constant thickness (Darcy's law), and unconfined aquifers where the variation in flow depth is small, i.e. the impermeable layer is far below the river bed (Chapman, 1999). However, Wittenberg (1999) found that the behaviour of shallow groundwater reservoirs may be more realistically represented by a non-linear reservoir. SINGH 11

12 Additional evidence for the need to use non-linear reservoirs in some cases is given by the fact that three parallel linear reservoirs are required to adequately fit the catchment response (Lees and Wagener, 2000b;c). In this case, one store normally represents the quick response, and the slow response is divided into two stores with different residence times. One interpretation of this result is that two independent aquifers drain to the river (Wittenberg, 1999). This is often unlikely, and the use of a non-linear reservoir to represent the slow response, in combination with a linear reservoir for the quick response, may be more reasonable. The use of a non-linear reservoir is an available option in the RRMT. Linear and non-linear reservoirs can also be combined in parallel or serial structures. u u u q 1 =k 1 (S-S 1 ) S Q S Q S S 1 q 2 =k 2 (S-S 2 ) S 2 q 3 =k 3 S Fig. 5. Conceptual routing components available in the RRMT: (a) linear; (b) non-linear (example); and (c) leaky catchment structure. All routing components described so far are based on the assumption that all subsurface runoff drains from the catchment via the stream. However, part of the runoff leaves some catchments through subsurface pathways. A routing component taking this into account can consist of a linear reservoir with different outlets (Chapman, 1999; see also Sugawara, 1995; and Moore, 1999). The outflow from the bottom outlet is the part of the effective rainfall not contributing to streamflow (Fig. 5). 4.5 OPTIMIZATION MODULES The model structures that can be implemented in the RRMT contain parameters that typically refer to a collection of aggregated processes. Therefore they often do not have a direct physical interpretation and cannot be measured in the field. Instead, they must be estimated using a calibration procedure whereby the model parameters are adjusted until the system output and model output show an acceptable level of agreement. The agreement is typically measured using an objective function, i.e. some aggregation function of the model residuals, supported by visual inspection of the calculated time series. The model, i.e. a model structure and parameter set combination, that produces the best performance is commonly assumed to be representative of the natural system under investigation. Most parameters in conceptual rainfall-runoff models define non-linear model equations. The consequence of this is that an iterative search is required to identify the optimum parameter values. This can be done using a manual 'trial-and-error' procedure, by an automatic search algorithm, or by a combination of both approaches (Boyle et al., 2000). Manual calibration is time consuming and difficult in the presence of parameter dependence. However, automatic calibration algorithms can be applied to overcome this problem. Available search algorithms can be separated into local and global approaches. Local search algorithms start from an initial solution, i.e. an initial parameter set, and try to sequentially improve this solution by repeatedly moving through the parameter space using various schemes to find the next location. The search is stopped when a termination criterion, e.g. a specific objective function value, is satisfied. Research has shown that the characteristics of the response surface created by conceptual rainfall-runoff models are usually not suitable for application of local optimization methods (Duan et al., 1992), since the presence of multiple optima often leads to premature convergence of the optimization at a local optimum. Global optimization methods, working with parameter or model populations (i.e. parameter set / model structure SINGH 12

13 combinations), have therefore been introduced. Popular approaches include population evolution (e.g. Genetic Algorithms (GA), Wang, 1991; and the Shuffled Complex Evolution algorithm (SCE-UA), Duan et al., 1992), or adaptive random search methods (e.g. Price, 1987). However, various researchers have found that different parameter sets, often widely distributed in the feasible parameter space, lead to similar model performance with respect to a certain objective function (e.g. Johnston and Pilgrim, 1976; Beven and Binley, 1992; etc). These findings have led some researchers to the conclusion that the idea of 'point identifiability', i.e. a global optimum, is not feasible in the presence of errors in data and model structure, and limitations in parameter estimation procedures (e.g. Spear and Hornberger, 1980; Van Straten and Keesman, 1991; Beven and Binley, 1992). Instead, these researchers advocate the identification of 'behavioural' parameter populations, i.e. parameter sets performing better than a certain threshold. All parameter sets that are classified as behavioural are considered to be possible representations of the natural system under investigation (Van Straten and Keesman, 1991). Monte-Carlo sampling procedures (Press et al., 1992) such as importance sampling or Markov chain sampling (e.g. Kuczera and Parent, 1998) are sometimes used as the basis to explore the feasible parameter space in order to identify potential models. Importance sampling based on a uniform prior distribution is most often applied (e.g. Beven and Binley, 1992; Freer et al., 1996), i.e. assuming no prior knowledge of parameter values, other than boundary values. The drawback of this sampling approach is its requirement for a large number of model runs (Spear, 1993). Gupta et al. (1998) point out that, in the presence of (unavoidable) model structural error, a range of parameter sets is required to adequately simulate all response modes of natural systems. Single parameter sets will favor specific response features, e.g. peak or low flows. This leads to the conclusion that a multi-objective optimization problem exists even for single output model structures. The implications of this observation are discussed further below. In those cases where a TF model is selected as the routing component, structure identification and parameter estimation is performed using the SRIV method of system identification (Young, 1985; Jakeman et al., 1990; Young, 1992). Since the TF model is linear the standard least squares normal equations can be used to calculate the optimum parameter estimates under a number of assumptions relating to the form of the random inputs to the system. The TF structure is identified by the fitting of a large number of different model structures followed by an assessment of the model performance versus parameter identifiability using a extension of the Akaike Information Criterion (AIC; Akaike, 1974) termed Young's Information Criterion (YIC; Young et al., 1996). This statistical assessment is combined with an assessment of the physical validity of the model in an approach termed data-based mechanistic modeling (Young, 1992; Young et al., 1996; Lees, 2000). In the rainfallrunoff modeling case this means that a TF structure is only accepted if the TF structure can be interpreted as a combination of linear stores in series and/or parallel OBJECTIVE FUNCTIONS The performance of a model, i.e. a parameter set and model structure combination, is typically judged using an objective function, usually in combination with visual inspection of the calculated hydrograph. Objective functions aggregate the model residuals, i.e. the part of the observed flow not reproduced by the model, which can be calculated using, ε( t θ) = y( t) yˆ( t θ) (18) SINGH 13

14 where ŷ(t θ ) is the calculated flow at time step t using the parameter set θ, y(t) is the observed flow at time step t, and ε ( t θ) is the resulting residual at time step t using parameter set θ. The task is then to minimize the size of the objective function. A variety of functions are available in the RRMT (Table 1), which can be used to evaluate different aspects of a model s performance. The most commonly utilized objective functions in hydrological modeling are based on the Simple Least Squares (SLS) function, N 2 SLS( θ) = ε( t θ) (19) t = 1 where N is the number of flow values available. The SLS function is the maximum likelihood estimator when the following assumptions about the residuals cannot be rejected (Troutman, 1985; Yapo et al., 1996; Gershenfeld, 1999): (1) the residuals are independent and identically distributed (i.i.d.), (2) the residual distribution has homogeneous variance, (3) the residual distribution follows a normal distribution with mean zero. Table 1. Objective functions available in the RRMT. objective function notation equation Nash-Sutcliffe Efficiency 1 Root Mean Square Error Heteroscedastic Maximum Likelihood Estimator 2 Bias Deviation of Runoff Volumes 3 NSE NSE( θ) = 1 N t = 1 N 1 N ( y( t) yˆ( t θ) ) t = 1 ( y( t) y) RMSE RMSE( θ) = ( y( t) yˆ( t θ) ) HMLE BIAS DV min HMLE = θ, λ w( t) = y( t) BIAS( θ) = DV ( θ) = 1 N 2( λ 1) N t = 1 N t = 1 N t = 1 N t = 1 N t = 1 w( t) N t = ( y( t) yˆ( t θ) ) w( t) ( y( t) yˆ( t θ) ) N t = 1 yˆ( t θ) y( t) y( t) RMSE Response Modes 4 FD,FQ,FS RMSE of selected time steps 1/ N 2 RMSE Horizontal Segmentation FH,FM,FL RMSE of selected time steps SINGH 14

15 RMSE Warming FWU RMSE of selected time steps Up Period 1 Nash and Sutcliffe (1970) 2 Sorooshian and Dracup (1980), Yapo et al. (1996) 3 American Society of Civil Engineers (1993) 4 Boyle et al. (2000) The analysis of the characteristics of the residual distribution is also an important step in evaluating the suitability of a model structure (see for example Yapo et al., 1996; Mroczkowski et al., 1997; or Ljung, 1999 for details). "If a fit produces residuals consistent with the random error assumptions, then the model has extracted all useful information from the data leaving only noise in the residuals" (Mroczkowski et al., 1999). Graphical tests can be applied to evaluate the assumptions made about the characteristics of the residuals (Draper and Smith, 1981; Kuczera, 1983). A number of different plots are available in the RRMT to facilitate this evaluation: (1) plotting the residuals versus predicted and calculated runoff reveals whether the variance of the residuals increase with increasing flow values, i.e. the problem of heteroscedascity; (2) plots of residuals versus time reveal long term effects (trends) or dependency in time; (3) normal distribution probability scaled plots can be used to indicate how close the residual distribution is to a normal distribution; and (4) calculating (and plotting) the autocorrelation coefficients allows users to assess the correlation of the residuals in time. If the assumption of zero mean cannot be rejected, the autocorrelation coefficient of the residuals is described as (Scholz, 1995) r N τ ε( t θ) ε( t + τ θ) t = 1 ( τ) = (20) N τ 2 t = 1 ε ( t θ) where τ is a lag (τ = 0,1,2,,N-1), and r(τ) ranges from minus one to plus one. The 95% confidence intervals can be calculated as follows, 1 r( τ ) < 1.96 (21) N where N is the number of data points available. Not more than 5% of the residuals should lie outside these limits. The assumption of no autocorrelation, i.e. independence in time, is often not satisfied in rainfall-runoff modeling applications. Residuals of small temporal discretizations, e.g. daily, are usually related over a number of time steps. Sorooshian and Dracup (1981), and Kuczera (1983) describe consequences and corrective measures if this assumption is violated. Another problem often encountered in rainfall-runoff modeling is the fact that the residual variance increases with increasing flow values, i.e. the assumption of homoscedascity cannot be justified (Sorooshian and Dracup, 1981). In such cases, the variance can be stabilized through transformation of the simulated and observed flow data, or by the use of a weighted least-squares objective function (Kottegoda and Rosso, 1997). A Box-Cox transformation (Kottegoda and Rosso, 1997) which can be written in the following form is a useful transformation in this regard, SINGH 15

16 λ q 1 for λ 0 q ( λ) = λ (22) log( q) for λ = 0 where q is the flow and λ is the transformation parameter. Sorooshian and Dracup (1981) use this transformation in their Heteroscedastic Maximum Likelihood Estimator (HMLE; Table 1, see also Sorooshian et al., 1993). In this objective function, the parameter λ is estimated simultaneously with the model parameters. The HMLE is the maximum likelihood estimator when assumptions (1) and (3) are valid, and the residual distribution has mean zero and heteroscedastic variance, i.e. the variance is a function of the flow. Both Box-Cox transformations and the use of the HMLE are available options in RRMT. Another problem, already briefly indicated, occurs in terms of selection of an appropriate objective function when automatic search algorithms are applied. The drawback of single-criterion algorithms is that the calibration result is fully dependent on one objective function (Gupta et al., 1998; Boyle et al., 2000). This can lead to an unreasonable emphasis on the fitting of a certain aspect of the response, e.g. peak flows, whilst neglecting the model performance with regard to another aspect, e.g. low flows (Fig. 6). Hydrological models are typically not capable of fitting all system response modes with a single parameter set due to the presence of structural errors. This can lead to a calibration result that is not acceptable to hydrologists, therefore constraining the usefulness of automatic calibration (Boyle et al., 2000). A multi-criteria approach is proposed by Gupta et al. (1998) to address this problem. The objective of this approach is to increase the amount of information retrieved from the model residuals to: (1) find the parameter population necessary to fit all aspects of the observed output time-series, e.g. in a first stage of a hybrid automatic-manual calibration procedure (Boyle et al., 2000); (2) increase the identifiability of the model parameters (Wheater et al., 1986; Wagener et al., 2000a); and (3) to assess the suitability of the model structure to represent the natural system, i.e. identify model structural insufficiencies (Gupta et al., 1998; Boyle et al., 2000). Or as Yan and Haan (1991) state " the use of multiple objective criteria for parameter estimation permits more of the information contained in a data set to be used and distributes the importance of the parameter estimates among more components of the model. Additionally, the precision of some parameters may be greatly improved without an adverse impact on other parameters". SINGH 16

17 Fig. 6. Plot showing predictions using two different parameter sets with an identical model structure. Both realizations yield similar values of the Nash-Sutcliffe Efficiency measure (0.82), but show differences in fit when the response is analyzed closely. A simple application of multi-objective optimization is the definition of specific objective functions for water resources management. Often, a specific flow range, e.g. between the minimum environmentally acceptable flow and a maximum water supply abstraction rate, is of particular interest. Specifying an objective function measuring the performance of the model in this range, in addition to the traditional measures of performance, can help to assess the suitability of a model structure for the selected purpose (Lees and Wagener, 2000b;c). Measures available in the RRMT allow the calculation of the RMSE above (FH) and below (FL) a user specified threshold, and within a certain flow range (FM), see Table 1. Other multiple objectives that can be selected are the RMSE based on a modified segmentation scheme suggested by Boyle et al. (2000, see application example for details), or the RMSE for a warming up period to estimate initial conditions, FWU. The RRMT allows users to add objective functions in a modular way depending on an application s requirements. 4.7 VISUAL ANALYSIS MODULES Different plotting options are available in RRMT to analyze the data and the performance of the model. Examples of these plots are: (1) double mass plots; (2) observed versus calculated flow scatter diagrams, both on normal and logarithmic scales; (3) flow duration curves and volumetric fit, etc. These plots are helpful tools that enable an assessment of model performance from different perspectives. 4.8 SUMMARY In summary, the RRMT is a generic modeling shell that allows its user to implement, evaluate and modify lumped, conceptual or hybrid metric-conceptual model structures. A variety of moisture accounting and routing components are provided and the addition of new structures is straightforward. Different data manipulation, optimization, and visualization options are available to calibrate and evaluate the model. The implemented approach is in accordance with the requirements outlined in the analytical framework described earlier in the text. SINGH 17

18 5. MONTE-CARLO ANALYSIS TOOLBOX 5.1 GENERAL The detailed investigation of model performance in terms of parameter sensitivity and identifiability, the suitability of a particular model structure, and prediction uncertainty, are increasingly important parts of the modeling task. The understanding of model behaviour and performance gained increases the transparency of the modeling procedure and helps to assess the reliability of modeling results. The Monte-Carlo Analysis Toolbox (MCAT, Wagener et al., 1999; Lees and Wagener, 2000a) includes a number of analysis methods to evaluate the results of Monte-Carlo parameter sampling experiments or model optimization methods based on population evolution techniques, for example, the Shuffled Complex Evolution algorithm (SCE-UA; Duan et al., 1992). Functions contained in the MCAT include an extension of the Regional Sensitivity Analysis (RSA, Spear and Hornberger, 1980; Hornberger and Spear, 1980) proposed by Freer et al. (1996), various components of the Generalized Likelihood Uncertainty Estimation (GLUE) method (Beven and Binley, 1992; Freer et al., 1996), options for the use of multipleobjectives for model assessment (Gupta et al., 1998; Boyle et al., 2000), response surface plots, and a measure to evaluate parametric identifiability for a selected objective function or in a dynamic fashion. 5.2 SYSTEM ARCHITECTURE The MCAT is a collection of MATLAB (Mathworks, 1996) analysis and visualization functions integrated through a graphical user interface (Fig. 7). It can also be accessed through an interface from the RRMT. Note that the MCAT is not specifically related to rainfall-runoff modeling and can be used to analyze the results of any dynamic mathematical model. GUI CLASS PLOTS (GLUE) REGIONAL SENSITIVITY ANALYSIS GLUE CONFIDENCE LIMITS GLUE VARIABLE UNCERTAINTY DOTTY PLOTS 2-D AND 3-D SURFACE PLOTS MULTI- OBJECTIVE (MO) ANALYSIS (MO) PARETO CONFIDENCE LIMITS DYNAMIC IDENTIFIABILITY ANALYSIS A POSTERIORI PARAMETER DISTRIBUTIONS (MO) PARAMETER RANKINGS (MO) NORMALIZED PARAMETER RANGES Fig. 7. System architecture of the Monte-Carlo Analysis Toolbox. 5.3 PARAMETER SENSITIVITY AND IDENTIFIABILITY Sensitivity analysis is an approach to evaluating how changes in model parameters affect the model output variable(s). This information can be used to identify parameters that are not important for the reproduction of the system response and can therefore be subsequently fixed or removed, reducing the dimension of the calibration problem. A popular sensitivity analysis method that utilizes the results of Monte-Carlo sampling is Regional Sensitivity Analysis (RSA, Spear and Hornberger, 1980; Hornberger and Spear, 1981), which analyzes the sensitivity of different parameters SINGH 18

19 without referring to a certain point in the parameter space, e.g. the most likely value for a specific parameter (Spear, 1993). cumulative distribution 1 0 F( θ i B) F( F( θ θ i i ) B ) cumulative distribution 1 θ i 0 F( θ j B) F( θ j F( θ j θ j ) B ) Fig. 8. Cumulative distributions of initial ( F( θ x ) ), 'behavioural' ( F( θ x B) ) and 'non-behavioural' ( F( θ x B) ) populations for a sensitive parameter θ i, and a (conditionally) insensitive parameter θ j. The RSA method starts with the Monte-Carlo sampling of N points in the feasible parameter space based on a uniform distribution. The resulting parameter population is partitioned into a behavioural ( B ) and a non-behavioural ( B ) group. This division can for example be based on the predicted state of the system (Spear and Hornberger, 1980) or on a measure of performance (Hornberger et al., 1985; Beven and Binley, 1992, amongst others). The cumulative distributions for both groups ( F( θ x B) and F( θ x B) ) are computed (Fig. 8). A separation between the curves indicates that the initial distribution F( θ x ) is divided by the classification into behavioural and non-behavioural. This indicates that the parameter θ x is sensitive, i.e. has an important effect on the model result. The significance of the separation can be estimated using statistical tests such as the Kolmogorov-Smirnovtwo-sample test (Kottegoda and Rosso, 1998), and a heuristic ranking scheme can be introduced (Spear and Hornberger, 1980). However, the lack of separation between the cumulative distributions of F( θ B) and F( θ B) is only a necessary, but not a sufficient condition for insensitivity of x θ x (Spear, 1993). It is possible that it is caused by strong correlation with other parameters. Evaluation of the parameter covariance can be used to estimate whether this is the case (Hornberger and Spear, 1981; Hornberger et al., 1985). The interaction between two parameters can also be investigated in the MCAT by plotting their response surface with respect to one or different objective functions. A modification of the RSA approach proposed by Freer et al. (1996) is implemented in the MCAT to visually inspect the sensitivity of the different parameters with respect to the selected objective function. Freer et al. (1996) split the parameter population, ranked on their objective function values, into ten groups of equal size and plot the cumulative distribution of the parameters in each group with respect to the chosen measure of performance. Differences in form and separation of the resulting curves indicate parameter sensitivity. Splitting the parameter population into ten groups, instead of just dividing it into behavioural and non-behavioural parameter sets as in the original method, avoids the selection of a threshold value, and increases the information gained by the analysis. The variation in performance between the different groups can be visualized using the class plot option in the MCAT. This figure shows the response vector calculated with the best performing parameter set in each of the groups and plots them together with the observed response (if available). x SINGH 19

Fig. 9. Example of a well identified and a poorly unidentified parameter. The top row shows scatter plots of parameter versus measure of performance.

The bottom row shows the cumulative distribution of the best performing 10% of parameter sets and the corresponding gradients within each segment of the parameter range.

20 Fig. 9. Example of a well identified and a poorly unidentified parameter. The top row shows scatter plots of parameter versus measure of performance. It has to be considered that these projections into a single parameter dimension can, however, hide some of the structure of the response surface (Beven, 1998). The bottom row shows the cumulative distribution of the best performing 10% of parameter sets and the corresponding gradients within each segment of the parameter range. However, parameter sensitivity is only a necessary, but not a sufficient requirement for identifiability, since values of sensitive parameters that produce an acceptable model performance can still be distributed over a relatively wide range of the feasible parameter space. A model, i.e. a parameter set θ within a certain model structure, is termed (globally) identifiable if it is possible to uniquely determine its location in the parameter space based on the model output produced. This requires the parameter set to yield a unique response vector (Mous, 1993). However, the specific characteristics of conceptual rainfall-runoff models (Duan et al., 1994; Gupta et al., 1998), the often limited information content of the available timeseries, and the restrictions of single value objective functions that aggregate the response vector into a single value, often limit the success of parameter identification as described earlier. The RSA procedure outlined above has been extended to investigate the identifiability of a parameter. Reducing the analysis to the cumulative distribution of the best performing group derived from the approach implemented by Freer et al. (1996) allows the definition of an approximate measure of identifiability for each parameter. The cumulative distribution of a uniform distribution is a straight line. Deviations from this straight line indicate regions of higher identifiability. Splitting the feasible parameter range into segments, and calculating the gradient of the cumulative distribution in each segment leads to an indicator of identifiability. Fig. 9 shows how the gradients for a well identified and a poorly unidentified parameter are distributed. An example of the utility of this measure is included as part of the demonstration application described later. The identifiability measure can also be used in a dynamic way to link parameters, and therefore model components, to system response modes in an objective fashion. The flow-chart in Fig. 10 describes the steps required to perform the dynamic identifiability analysis. In short, the SINGH 20