Comparison of prediction performance of AWRA-L with other models

Size: px
Start display at page:

Download "Comparison of prediction performance of AWRA-L with other models"

Transcription

1 WATER FOR A HEALTHY COUNTRY Comparison of prediction performance of AWRA-L with other models Neil R. Viney, Jai Vaze, Bill Wang, Yongqiang Zhang, Ang Yang, Jamie Vleeshouwer, Avijeet Ramchurn and Andrew Frost 30 June 2013 A water information R & D alliance between the Bureau of Meteorology and CSIRO s Water for a Healthy Country Flagship

2 Water for a Healthy Country Flagship Report series ISSN: X Australia is founding its future on science and innovation. Its national science agency, CSIRO, is a powerhouse of ideas, technologies and skills. CSIRO initiated the National Research Flagships to address Australia s major research challenges and opportunities. They apply large scale, long term, multidisciplinary science and aim for widespread adoption of solutions. The Flagship Collaboration Fund supports the best and brightest researchers to address these complex challenges through partnerships between CSIRO, universities, research agencies and industry. The Water for a Healthy Country Flagship aims to provide Australia with solutions for water resource management, creating economic gains of $3 billion per annum by 2030, while protecting or restoring our major water ecosystems. The work contained in this report is collaboration between CSIRO and [list collaborators] For more information about Water for a Healthy Country Flagship or the National Research Flagship Initiative visit Citation Viney NR, Vaze J, Wang B, Zhang Y, Yang A, Vleeshouwer J, Ramchurn A and Frost A (2013) Comparison of prediction performance of AWRA-L with other models. CSIRO Water for a Healthy Country Flagship, Australia. Copyright and disclaimer 2013 CSIRO To the extent permitted by law, all rights are reserved and no part of this publication covered by copyright may be reproduced or copied in any form or by any means except with the written permission of CSIRO. Important disclaimer CSIRO advises that the information contained in this publication comprises general statements based on scientific research. The reader is advised and needs to be aware that such information may be incomplete or unable to be used in any specific situation. No reliance or actions must therefore be made on that information without seeking prior expert professional, scientific and technical advice. To the extent permitted by law, CSIRO (including its employees and consultants) excludes all liability to any person for any consequences, including but not limited to all losses, damages, costs, expenses and any other compensation, arising directly or indirectly from using this publication (in part or in whole) and any information or material contained in it.

3 Contents Acknowledgments... iv Executive summary... v 1 Introduction Background to this study Previous model intercomparison Methodology The models The catchments Input data Calibration procedure Evaluation procedure Results Global calibration of peer models Local calibration of peer models Assessment against previous model intercomparison Discussion Globally calibrated models Locally calibrated models Assessment against previous model intercomparison Conclusions References Comparison of prediction performance of AWRA-L with other models i

4 Figures Figure 1 Distribution of calibration and validation catchments... 4 Figure 2 Cumulative distribution of daily efficiency of streamflow predictions in calibration mode for AWRA-L and two globally-calibrated models Figure 3 Cumulative distribution of monthly efficiency of streamflow predictions in calibration mode for AWRA-L and two globally-calibrated models Figure 4 Cumulative distribution of annual efficiency of streamflow predictions in calibration mode for AWRA-L and two globally-calibrated models Figure 5 Cumulative distribution of raw bias of streamflow predictions in calibration mode for AWRA-L and two globally-calibrated models Figure 6 Cumulative distribution of absolute bias of streamflow predictions in calibration mode for AWRA-L and two globally-calibrated models Figure 7 Cumulative distribution of F value of streamflow predictions in calibration mode for AWRA-L and two globally-calibrated models Figure 8 Cumulative distribution of daily efficiency of streamflow predictions in validation mode for AWRA-L and two globally-calibrated models Figure 9 Cumulative distribution of monthly efficiency of streamflow predictions in validation mode for AWRA-L and two globally-calibrated models Figure 10 Cumulative distribution of annual efficiency of streamflow predictions in validation mode for AWRA-L and two globally-calibrated models Figure 11 Cumulative distribution of raw bias of streamflow predictions in validation mode for AWRA-L and two globally-calibrated models Figure 12 Cumulative distribution of absolute bias of streamflow predictions in validation mode for AWRA-L and two globally-calibrated models Figure 13 Cumulative distribution of F value of streamflow predictions in validation mode for AWRA-L and two globally-calibrated models Figure 14 Cumulative distribution of daily efficiency of streamflow predictions in calibration mode for AWRA-L and three locally-calibrated models Figure 15 Cumulative distribution of monthly efficiency of streamflow predictions in calibration mode for AWRA-L and three locally-calibrated models Figure 16 Cumulative distribution of annual efficiency of streamflow predictions in calibration mode for AWRA-L and three locally-calibrated models Figure 17 Cumulative distribution of raw bias of streamflow predictions in calibration mode for AWRA-L and three locally-calibrated models Figure 18 Cumulative distribution of absolute bias of streamflow predictions in calibration mode for AWRA-L and three locally-calibrated models Figure 19 Cumulative distribution of F value of streamflow predictions in calibration mode for AWRA-L and three locally-calibrated models Figure 20 Cumulative distribution of daily efficiency of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models Figure 21 Cumulative distribution of monthly efficiency of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models ii Comparison of prediction performance of AWRA-L with other models

5 Figure 22 Cumulative distribution of annual efficiency of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models Figure 23 Cumulative distribution of raw bias of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models Figure 24 Cumulative distribution of absolute bias of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models Figure 25 Cumulative distribution of F value of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models Figure 26 Cumulative distribution of daily efficiency of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models for varying regionalisation distances Figure 27 Cumulative distribution of monthly efficiency of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models for varying regionalisation distances Figure 28 Cumulative distribution of annual efficiency of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models for varying regionalisation distances Figure 29 Cumulative distribution of raw bias of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models for varying regionalisation distances Figure 30 Cumulative distribution of absolute bias of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models for varying regionalisation distances Figure 31 Cumulative distribution of F value of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models for varying regionalisation distances Figure 32 Cumulative distribution of F m value of streamflow predictions in validation mode for two versions of AWRA-L and WaterDyn Comparison of prediction performance of AWRA-L with other models iii

6 Acknowledgments This work is part of the water information research and development alliance between CSIRO s Water for a Healthy Country Flagship and the Australian Bureau of Meteorology. The authors thank Ulrike Bende- Michl, Sri Srikanthan, Wije and Fangfang Zhao for their comments on the manuscript. iv Comparison of prediction performance of AWRA-L with other models

7 Executive summary This study assesses the ability of the most recent version of the AWRA-L model (version 3.5) to predict streamflow across Australia and compares its performance with those of several other models and modelling methods. AWRA-L is implemented as a globally-calibrated model. That is, its Australia-wide streamflow predictions are generated using a single set of model parameters. These parameters are obtained by finding the best fit to streamflow in a set of 302 gauged calibration catchments. Model predictions are then evaluated by applying this global parameter set in an independent set of 305 gauged validation catchments. The peer models against which AWRA-L is compared, use one of two modelling approaches. The first approach uses a global calibration and validation procedure that is identical to the one employed for AWRA-L. In the second approach, the models are calibrated separately on each of the 302 calibration catchments to yield 302 separate sets of model parameters. Evaluation of model performance is then done by modelling each of the 305 validation catchments using the parameter set from the geographically nearest calibration catchment. The global calibration procedure compares predictions of the Sacramento and GR4J models against those of AWRA-L. The local calibration procedure compares predictions from the same two models, together with a third model, Springsim, against the predictions of the globally-calibrated AWRA-L. The model intercomparisons show that in the validation catchments, AWRA-L typically provides streamflow predictions that are as good as or better than the alternatives. Furthermore, it does not suffer from some of the peer models drawbacks such as spatial discontinuities in streamflow generation and performance deterioration with increasing regionalisation distances. In comparison to other models, AWRA-L also has other advantages such as the ability to predict many landscape properties and fluxes other than streamflow. It is also the only model among those tested that has been explicitly coupled to groundwater and river routing modules. Comparison against the results of a previous model benchmarking study highlights the substantial improvements in the streamflow predictions of AWRA-L that have been achieved in the past three years. In the light of these findings, it is recommended that AWRA-L continues to be used to support the production of the National Water Accounts and Australian Water Resources Assessments. Comparison of prediction performance of AWRA-L with other models v

8

9 1 Introduction 1.1 Background to this study The Australian Water Resources Assessment-Landscape (AWRA-L) model is a landscape hydrology model developed through the Water Information Research and Development Alliance (WIRADA) between CSIRO and the Bureau of Meteorology. The AWRA-L model (van Dijk, 2010) is designed to provide up-to-date, credible, comprehensive, accurate and relevant information about Australian water resources availability. This information is used to inform water resources management in Australia, and in particular to provide input into the Bureau of Meteorology s regular Australian Water Resources Assessments and National Water Account statements. AWRA-L contains a number of conceptual parameters that require calibration against observed data. Typically, data such as time series of streamflow, leaf area index and soil moisture have been used in calibration (e.g., Zhang and Viney, 2012), since these variables are all predicted by the model. Calibration is performed simultaneously on a large number ( ) of catchments spread across the whole of the continent to yield a single set of model parameters that applies everywhere. AWRA-L operates on a daily time step and requires input of various daily climate variables such as precipitation, solar radiation and maximum and minimum air temperature. To date, AWRA-L has been applied using input data that is lumped at the catchment scale. That is, for each calibration catchment, there is a single time series of spatially-averaged data of each of the relevant input variables. The model then uses this data to produce its output time series at the same spatial scale. The catchments used for calibration and validation of AWRA-L range in size from km 2. The calibration and application of AWRA-L is slightly different to that of conventional rainfall-runoff models. Such models are usually calibrated against streamflow from a single catchment. Model validation is usually done against an independent part of the streamflow time series from the same catchment or against streamflow predictions of the model applied in a nearby catchment. However, conventional rainfall-runoff models can also be applied in the same manner as AWRA-L, with simultaneous calibration against a large number of observed streamflow series. This document describes a model comparison study that benchmarks the prediction performance of AWRA-L against the performance of several peer models. 1.2 Previous model intercomparison This study follows an earlier report by Viney (2010) that compares the streamflow prediction performance of AWRA-L with the performance of a suite of peer models. The peer models may be broadly classed in two groups. One group consists of five spatialised lumped rainfall-runoff models (AWBM, IHACRES, Sacramento, Simhyd and SMAR-G). The second group consists of two minimally-calibrated continentalscale models (WaterDyn, as implemented in AWAP, and CABLE-SLI). AWRA-L is an example of the latter group. An important distinction between the Viney (2010) study and this study is that in the former, an uncalibrated version of AWRA-L (version 0.5) is assessed. All AWRA-L model parameters take default values and there is no optimisation of parameters to attempt a better fit between predictions and observations. In contrast, the current study assesses the performance of a calibrated AWRA-L in which parameters are optimised for streamflow prediction. In the Viney (2010) study the continental-scale models are uncalibrated, but the lumped models are calibrated. These calibrations are performed separately on each catchment and assessment of model performance is done using nearest neighbour regionalisation. Comparison of prediction performance of AWRA-L with other models 1

10 The results presented by Viney (2010) show that the uncalibrated AWRA-L (and the continental-scale models in general) produces streamflow predictions that are inferior to those of the spatialised lumped models. Of the latter, Sacramento produces the best predictions against a range of performance metrics relevant for water resources assessment. The study described in this report is therefore an extension of the Viney (2010) report in that it now assesses the performance of a calibrated AWRA-L. This now places AWRA-L on an even footing with the peer models. 2 Comparison of prediction performance of AWRA-L with other models

11 2 Methodology 2.1 The models AWRA-L version 3.5 is used in this intercomparison. This version builds on AWRA-L version 0.5 (van Dijk, 2010) in the following ways: Use of spatialised vegetation height and wind speed climatology rather than spatially uniform values; Modifications to methods for calculating capillary rise, potential evaporation and infiltration; Use of calibration to optimise parameter values. Predictions of the AWRA-L model are compared against those of three other models. All of these peer models are conventional rainfall-runoff models, but have been applied here in a regionalised manner. GR4J (Perrin et al., 2003) is a simple lumped rainfall-runoff model. It has been designed with a primary focus on parsimony and has just four optimisable parameters. Sacramento (Burnash et al., 1973) is a more complex model, but remains, at its heart, a lumped rainfallrunoff model. In this study, the calibrations of Sacramento optimise 13 model parameters. Springsim (Ramchurn, 2012) is a 12-parameter rainfall-runoff model. It is designed primarily for low flow prediction, but can be applied more generally. All three models operate on a daily time step and are lumped at the catchment scale. 2.2 The catchments Zhang et al. (2013) have collated streamflow data for a set of Australian catchments that is amenable to catchment water balance model calibration and evaluation. These catchments have areas of at least 50 km 2, at least ten years of observed streamflow data and a relative absence of significant regulation and impairment such as dams, irrigation and urbanisation. Of the 780 catchments in their data set, Zhang et al. identify a non-nested subset of 607 catchments with areas less than 5000 km 2. Approximately half (302) of these catchments have been randomly nominated by Zhang et al. as calibration catchments, with the remainder nominated as validation catchments (Figure 1). In this study, we use the calibration catchments for calibration and the validation catchments to assess model performance. Comparison of prediction performance of AWRA-L with other models 3

12 Figure 1 Distribution of calibration and validation catchments 2.3 Input data For both calibration and evaluation, the input data for all the models are lumped at the catchment scale. The streamflow data used in this study was collated by the Bureau of Meteorology from the collections of the various state agencies. Other input data required by the models and supplied by the Bureau of Meteorology includes daily precipitation, solar radiation and maximum and minimum air temperatures. The non-climatic input data required by AWRA-L includes information on forest cover and soil properties. 2.4 Calibration procedure GLOBAL CALIBRATION All the models in this study are calibrated using a consistent methodology and against a consistent set of observational data. For the globally-calibrated models, calibration is done simultaneously on all 302 calibration catchments to yield a single set of model parameters that applies in all catchments. For each catchment we calculate the function F = (E d + E m )/2 5 ln(1 + B) Comparison of prediction performance of AWRA-L with other models

13 where E d and E m are respectively the Nash-Sutcliffe efficiencies of daily and monthly streamflow, and B is the bias (total prediction error divided by total observed streamflow). The coefficients of this equation control the severity and shape of the resulting bias constraint penalty (Viney et al., 2009a). The function F can take a value between one (for a perfect fit) and minus infinity. The objective function is then taken as the mean of the 25th, 50th, 75th and 100th percentiles of the F values of the 302 calibration catchments. This objective function value is maximised in calibration LOCAL CALIBRATION Local calibration follows the same procedure as global calibration, except that there is only a single F value for each catchment and therefore no opportunity or need to average the quartiles. The objective function for each catchment is the F function and it is maximised in calibration. Each of the 302 calibration catchments is calibrated separately to yield 302 distinct parameter sets. 2.5 Evaluation procedure For both globally-calibrated and locally-calibrated models, the evaluation procedure is the same. Streamflow predictions are made for the 305 validation catchments, either using the global parameter set or the parameter set from the nearest calibration catchment. For each catchment we calculate the following metrics: daily Nash-Sutcliffe efficiency monthly Nash-Sutcliffe efficiency annual Nash-Sutcliffe efficiency raw bias (total prediction error divided by total observed streamflow) absolute bias (absolute value of raw bias) F value. The F value is a combination of daily and monthly efficiency and absolute bias. As such, as well as being a good choice for the objective function in calibration, it is also a convenient measure of overall prediction performance in validation. In Section 3 the values of each of these metrics on all 302 calibration catchments or on all 305 validation catchments are combined into a single curve of non-exceedance probability. At each value of the metric (the y-axis), the non-exceedance probability (x-axis) shows the proportion of catchments with a lower metric value. Better models are shown by higher non-exceedance curves for the efficiency metrics and for the F value. For bias, the better models have lower non-exceedance curves of absolute bias values and raw bias curves that are closer to zero IMPACT OF REGIONALISATION DISTANCE In nearest-neighbour regionalisation, the regionalisation distance that is, the distance between the centroids of the target (validation) catchment and the nearest donor (calibration) catchment ranges from 6 km to 265 km, with a median of 25 km. About 84 % of the validation catchments are within 50 km of the nearest calibration catchment, while 99 % of them are within 200 km. In a separate assessment of the performance of locally-calibrated models, we first artificially extend the minimum regionalisation distance to 50 km, by choosing to model each validation catchment using parameters from the nearest catchment that is at least 50 km away (centroid to centroid). This results in a median regionalisation distance of 58 km. In a second analysis, we use a minimum regionalisation distance of 200 km, which results in a median of 206 km. Comparison of prediction performance of AWRA-L with other models 5

14 2.5.2 COMPARISON WITH PREVIOUS INTERCOMPARISON STUDY The earlier model intercomparison by Viney (2010) assesses the prediction performance of five spatialised lumped runoff models and two continental scale models. The three locally calibrated models assessed in this report are broadly representative of the five lumped models in the earlier report. The two continental scale models in Viney (2010) are WaterDyn, which is implemented in AWAP (Raupach et al., 2008) and depicted in the figures and tables of Viney (2010) as AWAP, and CABLE (Kowalczyk et al., 2006). Also assessed in Viney (2010) is the performance of an uncalibrated version of AWRA-L (version 0.5). In this report we revisit the assessment of prediction performance of WaterDyn and the uncalibrated AWRA-L and compare them with the performance of the current version of AWRA-L (version 3.5). This new assessment applies both WaterDyn and AWRA-L v0.5 to the same set of 305 validation catchments as used in the rest of this report. However, output from WaterDyn is only available at monthly time steps, so a modified version of the F value is used in this assessment. The modified F value is given by F m = E m 5 ln(1 + B) 2.5 and like the F value, it is a convenient measure of overall model performance. 6 Comparison of prediction performance of AWRA-L with other models

15 3 Results 3.1 Global calibration of peer models CALIBRATION PERFORMANCE This section assesses the performances of globally-calibrated versions of AWRA-L, GR4J and Sacramento. Calibration performances are shown in Figures 2 7. Sacramento has better daily efficiencies in calibration than either of the other two models (Figure 2), while both Sacramento and AWRA-L have better monthly efficiencies than GR4J (Figure 3). Annual efficiencies are similar among the three models, except that AWRA-L appears better in the poorly-modelled catchments (Figure 4). On the bias metrics, AWRA-L clearly outperforms the other two (Figures 5 and 6). When the daily and monthly efficiencies are combined with bias to yield a combined metric (Figure 7), AWRA-L again performs best. Figure 2 Cumulative distribution of daily efficiency of streamflow predictions in calibration mode for AWRA-L and two globally-calibrated models. Comparison of prediction performance of AWRA-L with other models 7

16 Figure 3 Cumulative distribution of monthly efficiency of streamflow predictions in calibration mode for AWRA-L and two globally-calibrated models. Figure 4 Cumulative distribution of annual efficiency of streamflow predictions in calibration mode for AWRA-L and two globally-calibrated models. 8 Comparison of prediction performance of AWRA-L with other models

17 Figure 5 Cumulative distribution of raw bias of streamflow predictions in calibration mode for AWRA-L and two globally-calibrated models. Figure 6 Cumulative distribution of absolute bias of streamflow predictions in calibration mode for AWRA-L and two globally-calibrated models. Comparison of prediction performance of AWRA-L with other models 9

18 Figure 7 Cumulative distribution of F value of streamflow predictions in calibration mode for AWRA-L and two globally-calibrated models VALIDATION PERFORMANCE When the parameter values calibrated for the globally-calibrated models are applied in the validation catchments, there is little, if any deterioration in model performance (Figures 8 13). For example, from calibration to validation, median values of daily efficiency increase slightly for all three models, while monthly efficiencies are approximately identical and annual efficiencies decrease slightly. Both absolute bias and F value worsen slightly for GR4J, improve slightly for Sacramento and are steady for AWRA-L. Reflecting these similarities between calibration and validation modes for the globally-calibrated models, Sacramento has the best validation performance for daily efficiency (Figure 8), while AWRA-L has the best bias and F value statistics (Figures 11 13). Among the two peer models, GR4J is better for annual efficiency, but Sacramento is better according to all other metrics. 10 Comparison of prediction performance of AWRA-L with other models

19 Figure 8 Cumulative distribution of daily efficiency of streamflow predictions in validation mode for AWRA-L and two globally-calibrated models. Figure 9 Cumulative distribution of monthly efficiency of streamflow predictions in validation mode for AWRA-L and two globally-calibrated models. Comparison of prediction performance of AWRA-L with other models 11

20 Figure 10 Cumulative distribution of annual efficiency of streamflow predictions in validation mode for AWRA-L and two globally-calibrated models. Figure 11 Cumulative distribution of raw bias of streamflow predictions in validation mode for AWRA-L and two globally-calibrated models. 12 Comparison of prediction performance of AWRA-L with other models

21 Figure 12 Cumulative distribution of absolute bias of streamflow predictions in validation mode for AWRA-L and two globally-calibrated models. Figure 13 Cumulative distribution of F value of streamflow predictions in validation mode for AWRA-L and two globally-calibrated models. Comparison of prediction performance of AWRA-L with other models 13

22 3.2 Local calibration of peer models CALIBRATION PERFORMANCE In this section the globally-calibrated AWRA-L is compared with locally-calibrated versions of GR4J, Sacramento and Springsim. The curves for AWRA-L in Figures are the same as those shown in Figures 2 7. Unsurprisingly, the locally-calibrated models substantially outperform AWRA-L in calibration. Springsim provides the best calibration performance according to all metrics. Figure 14 Cumulative distribution of daily efficiency of streamflow predictions in calibration mode for AWRA-L and three locally-calibrated models. 14 Comparison of prediction performance of AWRA-L with other models

23 Figure 15 Cumulative distribution of monthly efficiency of streamflow predictions in calibration mode for AWRA-L and three locally-calibrated models. Figure 16 Cumulative distribution of annual efficiency of streamflow predictions in calibration mode for AWRA-L and three locally-calibrated models. Comparison of prediction performance of AWRA-L with other models 15

24 Figure 17 Cumulative distribution of raw bias of streamflow predictions in calibration mode for AWRA-L and three locally-calibrated models. Figure 18 Cumulative distribution of absolute bias of streamflow predictions in calibration mode for AWRA-L and three locally-calibrated models. 16 Comparison of prediction performance of AWRA-L with other models

25 Figure 19 Cumulative distribution of F value of streamflow predictions in calibration mode for AWRA-L and three locally-calibrated models VALIDATION PERFORMANCE There is a substantial deterioration in performance of the locally-calibrated models when calibrated parameters are applied to the validation catchments. Here, each validation catchment is modelled using parameters from the (geographically) nearest calibration catchment. The predictions remain generally superior to those of the globally calibrated AWRA-L in terms of daily efficiency (Figure 20), except for some of the poorly-modelled catchments where AWRA-L s predictions are better than those of Sacramento and Springsim. For monthly and annual efficiencies (Figures 21 and 22), there is little difference in performance among the four models, while AWRA-L has slightly better bias statistics (Figures 23 and 24). According to the F values, AWRA-L is slightly better than the other three models in poorly modelled catchments, but slightly worse in better modelled catchments (Figure 25). Among the three locally-calibrated models, GR4J and Sacramento have slightly better validation statistics than Springsim. In particular, GR4J has the best daily efficiencies (Figure 20). Comparison of prediction performance of AWRA-L with other models 17

26 Figure 20 Cumulative distribution of daily efficiency of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models. Figure 21 Cumulative distribution of monthly efficiency of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models. 18 Comparison of prediction performance of AWRA-L with other models

27 Figure 22 Cumulative distribution of annual efficiency of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models. Figure 23 Cumulative distribution of raw bias of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models. Comparison of prediction performance of AWRA-L with other models 19

28 Figure 24 Cumulative distribution of absolute bias of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models. Figure 25 Cumulative distribution of F value of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models. 20 Comparison of prediction performance of AWRA-L with other models

29 3.2.3 IMPACT OF REGIONALISATION DISTANCE As noted in the previous section, the performances of the locally-calibrated models degrade significantly between calibration and validation using nearest-neighbour regionalisation. In this section we assess the impact of regionalisation distance on the performance of the locally-calibrated models in validation mode. This is achieved by artificially increasing the minimum regionalisation distance to firstly 50 km and then to 200 km. The validation results for these two analyses are presented in Figures Also shown for comparison in Figures are the respective validation curves for AWRA-L. These AWRA-L curves are identical to those presented in Figures 8 13 and also in Figures Comparisons of Figures with Figures indicates that the validation performance of the locallycalibrated models degrades significantly when the minimum regionalisation distance is increased to 50 km. There is further significant deterioration when the minimum regionalisation distance is further increased to 200 km. The degree of deterioration appears to be greatest for Springsim and least for GR4J. For daily efficiency, the globally-calibrated AWRA-L provides better predictions in the poorly modelled catchments than each of the locally calibrated models when a 50 km limit is implemented (Figure 26). When a 200 km limit is used, AWRA-L s daily efficiency performance is better than the other models in all but a few well-modelled catchments. For all other metrics, AWRA-L s performance is clearly better than all of the locally calibrated models for both regionalisation distances (Figures 27 31). Figure 26 Cumulative distribution of daily efficiency of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models for varying regionalisation distances. Comparison of prediction performance of AWRA-L with other models 21

30 Figure 27 Cumulative distribution of monthly efficiency of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models for varying regionalisation distances. Figure 28 Cumulative distribution of annual efficiency of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models for varying regionalisation distances. 22 Comparison of prediction performance of AWRA-L with other models

31 Figure 29 Cumulative distribution of raw bias of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models for varying regionalisation distances. Figure 30 Cumulative distribution of absolute bias of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models for varying regionalisation distances. Comparison of prediction performance of AWRA-L with other models 23

32 Figure 31 Cumulative distribution of F value of streamflow predictions in validation mode for AWRA-L and three locally-calibrated models for varying regionalisation distances. 3.3 Assessment against previous model intercomparison In this section we revisit the assessment of prediction performance of WaterDyn and the uncalibrated AWRA-L and compare them with the performance of the current version of AWRA-L (version 3.5). A comparison of F m values for WaterDyn and the two versions of AWRA-L is shown in Figure 32. It is clear that the predictions of the current version of AWRA-L are vastly superior to those of both the uncalibrated AWRA-L and WaterDyn. For both AWRA-L v0.5 and Waterdyn, the majority of catchments have F m values of less than zero (57% and 59%, respectively). In contrast, only 31% of catchments have F m values of less than zero for AWRA-L v Comparison of prediction performance of AWRA-L with other models

33 Figure 32 Cumulative distribution of F m value of streamflow predictions in validation mode for two versions of AWRA-L and WaterDyn. Comparison of prediction performance of AWRA-L with other models 25

34 4 Discussion 4.1 Globally calibrated models Because it is a model that, through its use of separate hydrological response units (HRUs) for forest and grassland, explicitly includes the impacts of spatial variability in land cover, AWRA-L is designed to be deployed in a globally-calibrated fashion. That is, it is designed to use a single set of model parameters to model a large spatial domain. In this study, that spatial domain extends to cover the whole of Australia. Although it could be applied in a locally-calibrated mode, its use of multiple HRUs means that there would be considerable parameter uncertainty and redundancy. Local calibration of AWRA-L should only be attempted using a judiciously chosen subset of its routinely optimised parameters. Such a calibration strategy has not been used in this study. As such, the most relevant comparisons against peer models are those with other globally-calibrated models. In Section 3.1, it has been shown that AWRA-L s prediction performance is superior to those of the globally-calibrated GR4J in both calibration and validation and for all metrics except for daily efficiency of the better-modelled catchments. AWRA-L s performance is similar to that of the globally-calibrated Sacramento: with slightly better bias and slightly worse daily efficiency. The observation that AWRA-L has worse daily efficiencies than GR4J and Sacramento particularly for the better-modelled catchments but is better than the latter two models at other metrics, suggests that there may be scope for improvement in those aspects of AWRA-L that control the reproduction of daily streamflow variability. Such improvement might involve improved conceptualisation of high flow responses or better spatialisation of non-optimised landscape attributes. In principle, this similarity in performance between AWRA-L and Sacramento suggests that the streamflow components of the National Water Accounts and Australian Water Resources Assessments could be modelled as accurately using Sacramento. A modelling strategy using Sacramento would have an advantage in that it would require less input data than AWRA-L. However, AWRA-L has at least two other significant advantages. Firstly, it not only models streamflow, but also several other quantities that are relevant to the accounts and assessments, including soil moisture, recharge and evapotranspiration. In particular AWRA-L has been designed to ensure that its shallow soil water store is representative of the part of the soil profile whose moisture content can be estimated remotely by passive microwave sensors. This not only provides an alternative means of validating AWRA- L s predictions, but also means that AWRA-L is amenable to assimilation of observed moisture contents into the modelling framework. Work on soil moisture assimilation into AWRA-L is ongoing and shows promising results (Renzullo et al., 2013). Unlike other models, AWRA-L also includes a dynamic vegetation growth algorithm that reproduces daily changes in vegetation density. The inclusion of dynamic vegetation cover prediction means that AWRA-L is likely to cope better with the slow response changes to vegetation cover that might accompany climate changes. Secondly, unlike Sacramento, AWRA-L has been explicitly coupled with groundwater and river routing modules AWRA-G (Jöhnk et al., 2013) and AWRA-R (Lerat et al., 2013), respectively. 4.2 Locally calibrated models This study has also compared the globally-calibrated streamflow predictions of AWRA-L with the nearestneighbour regionalisation approach for locally-calibrated models (Section 3.2). In the calibration catchments, the predictions of the locally calibrated models are vastly superior to those of AWRA-L. This is not unexpected, since locally-tuned parameters are always likely to produce better predictions (assuming robust optimisation) than parameters that are tuned simultaneously to multiple catchments. 26 Comparison of prediction performance of AWRA-L with other models

35 The real test of the locally-calibrated models is how accurately they can predict in ungauged catchments. To assess this we use nearest-neighbour regionalisation in the validation catchments. Each of the validation catchments is modelled using parameters calibrated in the nearest calibration catchment (Section 3.2.2). Under this comparison, the predictions of AWRA-L are broadly similar to those of the locally-calibrated models. As was observed with the global model comparison, AWRA-L s predictions tend to be better than those of other models in bias, but worse in daily efficiency. Again these results suggest that the locally calibrated models using nearest neighbour regionalisation could potentially be used for National Water Accounts and Australian Water Resources Assessments. Such an approach could certainly bring benefits in those catchments that are used for calibration, where the locallycalibrated models provide much better predictions than the globally calibrated AWRA-L. However, the proportion of Australia with gauged streamflow that is suitable for calibrating a streamflow model is relatively small. The calibration and validation catchments depicted in Figure 1 comprise just 5.1 % of the area of Australia. Even in a heavily gauged region like the Murray-Darling Basin, the 155 gauged calibration and validation catchments cover just 7.4 % of the basin. This means that the use of locally-calibrated models would require that a large majority of the continent would still have to be modelled using regionalisation of model parameters. The use of nearest-neighbour regionalisation brings two significant disadvantages. The first is that it generates spatial discontinuities in streamflow generation at the boundaries between different parameter values. This, in turn, leads to an unnatural tessellated effect in maps of streamflow generation. The second disadvantage relates to regionalisation distance. In the main, the validation catchments used in this study are reasonably close to their donor catchments. The median centroid separation is 25 km. Despite this, the results presented in Sections and indicate that there is a significant degradation in prediction performance for the locally-calibrated models between calibration and nearest-neighbour regionalisation. This indicates that parameters calibrated in one catchment may not necessarily be appropriate for use in other catchments, even in adjacent catchments. In Section 3.2.3, it is shown that this performance degradation is exacerbated when the regionalisation distance is artificially increased. Even for a minimum regionalisation distance of 50 km, this degradation ensures that the ensuing validation predictions are significantly poorer than those of AWRA-L. The important distinction to be made here is that a globally-calibrated model like AWRA-L does not suffer from a distance-related performance degradation. This is shown clearly in Section 3.1, where there is no deterioration in prediction performance between the calibration and validation catchments. Therefore, we can say that application of a globally-calibrated AWRA-L in a catchment thousands of kilometres from the nearest calibration catchment is likely to yield predictions of similar quality to those in a catchment adjacent to a calibration catchment. In fact the prediction quality is likely to be similar to that in the calibration catchment itself. There are, however, ways to improve the performance of the locally-calibrated models that have not been tested in this study. One such way is to use donor averaging. Instead of modelling a validation catchment using parameters from the nearest calibration gauge, one could use an average of the predictions using multiple sets of nearby parameters. Research by Viney et al. (2009b) suggests that an optimal number of parameter sets is about five and that the use of this output averaging method can yield significant improvements in predictions in ungauged catchments. In a sense, this method is intermediate between nearest-neighbour regionalisation and global calibration and combines some of the best characteristics of both. It retains an element of proximity in the regionalisation scheme, while at the same time cushioning and broadening the calibration dependence beyond a single, possibly anomalous, calibration catchment. It is also likely to reduce but not eliminate the problem with spatial discontinuities in streamflow generation. However, it is not clear to what extent it might ameliorate prediction degradation associated with increasing regionalisation distances. Comparison of prediction performance of AWRA-L with other models 27

36 4.3 Assessment against previous model intercomparison A previous model intercomparison by Viney (2010) shows modest prediction performance by an uncalibrated version of AWRA-L (version 0.5). Its performance, whilst on a par with other continental-scale models, AWAP (WaterDyn) and CABLE, is significantly poorer than that of a suite of five locally-calibrated lumped models. The results presented in Section 3.2 show that the current version of AWRA-L (version 3.5) produces predictions in validation that are now as good as or better than those of a suite of three locally-calibrated models that are broadly comparable with the five assessed in Viney (2010). The results presented in Section 3.3 show that the current AWRA-L is also now significantly better than WaterDyn. Finally, comparison of AWRA-L v3.5 against AWRA-L v0.5 (Section 3.3) highlights the significant advances that have been achieved in the development of the AWRA-L model over the past three years. 28 Comparison of prediction performance of AWRA-L with other models

37 5 Conclusions The comparisons of prediction performance between AWRA-L and a range of globally-calibrated and locally-calibrated peer models have shown that AWRA-L is a good choice as a modelling technology to support National Water Accounts and Australian Water Resources Assessments. In validation mode, it typically provides streamflow predictions that are as good as or better than the alternatives. Furthermore, it does not suffer from some of the peer models drawbacks such as spatial discontinuities in streamflow generation and performance deterioration with increasing regionalisation distances. In comparison to other models, AWRA-L also has other advantages such as the ability to predict many landscape properties and fluxes other than streamflow, and the fact that it has been explicitly coupled to groundwater and river routing modules. It is therefore recommended that AWRA-L continues to be used to prepare data for the National Water Accounts and Australian Water Resources Assessments. Comparison of prediction performance of AWRA-L with other models 29

38 References Burnash RJC, Ferral RL and McGuire RA (1973) A generalized streamflow simulation system conceptual modeling for digital computers. Tech. Rep., Joint Federal and State River Forecast Center, Sacramento, 204pp. Jöhnk KD, Crosbie RS, Peeters LJM and Doble RC (2013) AWRA-G a continental scale groundwater component for a land surface model of Australia. Geoscientific Model Development (in prep.). Kowalczyk, E.A., Y. Wang, R.M. Law, H.L. Davies, J.L. McGregor and G. Abramowitz (2006). The CSIRO Atmosphere Biosphere Land Exchange (CABLE) model for use in climate models and as an offline model. CSIRO Marine and Atmospheric Research paper 013, Aspendale, Vic., Aust., 37pp. Lerat J, Dutta D, Kim S, Hughes J, Vaze J and Dawes W (2013) Refinement and extension of the AWRA river model (AWRA-R). CSIRO: Water for a Healthy Country National Research Flagship. Perrin C, Michel C and Andreassian V (2003) Improvement of a parsimonious model for streamflow simulations. Journal of Hydrology, 279, Ramchurn A (2012) Improved modelling of low flows and drought impacts in Australian catchments using new rainfall-runoff model SpringSIM. Proc. Hydrology and Water Resources Symposium, Raupach, M.R., P.R. Briggs, V. Haverd, E.A. King, M. Paget and C.M. Trudinger (2008). Australian water availability project (AWAP). Final Report for Phase 3, CSIRO Marine and Atmospheric Research, Canberra, Aust., 67pp. Renzullo LJ, Collins D, Perraud J, Henderson B, Jin W and Smith AB (2013) Improving soil water representation in the Australian Water Resources Assessment landscape model through the assimilation of remotely-sensed soil moisture products. Proc. 20th MODSIM Congress, Adelaide, Aust. (in prep.). Van Dijk AIJM (2010) The Australian Water Resources Assessment System. Technical Report 3. Landscape Model (version 0.5) Technical Description. CSIRO: Water for a Healthy Country National Research Flagship. Viney NR (2010) A comparison of modelling approaches for continental streamflow prediction. CSIRO: Water for a Healthy Country National Research Flagship, 116pp. Viney NR, Perraud J, Vaze J and Chiew FHS (2009a) The usefulness of bias constraints in model calibration for regionalisation to ungauged catchments. Proc. 18 th World IMACS/MODSIM Congress, Cairns, Aust., Viney NR, Vaze J, Chiew, FHS, Perraud J, Post DA and Teng J (2009b) Comparison of multi-model and multidonor ensembles for regionalisation of runoff generation using five lumped rainfall-runoff models. Proc. 18 th World IMACS/MODSIM Congress, Cairns, Aust., Zhang Y and Viney N (2012) Toward optimum multiple objective model calibrations for AWRA-L model. CSIRO: Water for a Healthy Country National Research Flagship, 18pp. Zhang Y, Viney N, Frost A, Oke A, Brooks M, Chen Y, and Campbell N (2013) Collation of Australian modeller s streamflow dataset for 780 unregulated Australian catchments. Water for a Healthy Country National Research Flagship, 115pp. 30 Comparison of prediction performance of AWRA-L with other models

39

40 CONTACT US t e enquiries@csiro.au w YOUR CSIRO Australia is founding its future on science and innovation. Its national science agency, CSIRO, is a powerhouse of ideas, technologies and skills for building prosperity, growth, health and sustainability. It serves governments, industries, business and communities across the nation. FOR FURTHER INFORMATION Water for a Healthy Country Flagship Neil Viney t e neil.viney@csiro.au w 32 Comparison of prediction performance of AWRA-L with other models