Utilization of climate information and soil moisture estimates to provide monthly and sub-monthly streamflow forecasts

Size: px
Start display at page:

Download "Utilization of climate information and soil moisture estimates to provide monthly and sub-monthly streamflow forecasts"

Transcription

1 INTERNATIONAL JOURNAL OF CLIMATOLOGY Int. J. Climatol. 34: (2014) Published online 12 February 2014 in Wiley Online Library (wileyonlinelibrary.com) DOI: /joc.3924 Utilization of climate information and soil moisture estimates to provide monthly and sub-monthly streamflow forecasts Hui Wang a * and Xiang Fu b a Bureau of Economic Geology, Jackson School of Geosciences, The University of Texas at Austin, TX, USA b State Key Laboratory of Water Resources and Hydropower Engineering Science, Wuhan University, China ABSTRACT: Streamflow forecasts at monthly and sub-monthly time scales, e.g. 10-day period, are critical for making decisions to allocate water for different users and mitigate possible flooding. Adaptive forecasting of 10-day streamflow is still challenging, although persistence prediction models sometimes have good performance when streamflow has strong lag-correlation. This study proposes a scheme of proving monthly and sub-monthly flow forecasts at the beginning of the month and updating sub-monthly forecasts subsequently. It examines a principal component regression method to provide monthly average streamflow and sub-monthly, e.g. 10-day, average streamflow forecasts, utilizing gridded precipitation forecasts from climate models and soil moisture estimates from hydrological models. Monthly streamflow forecasts are first obtained. It is then disaggregated to 10-day streamflow based on historical observations using a nonparametric approach. The disaggregated 10-day streamflow forecasts are further improved by incorporating streamflow and soil moisture estimates in the previous 10 days. Hence, sub-monthly flow can be improved adaptively. The proposed approach is demonstrated for monthly and sub-monthly streamflow forecasts in July at the Yangtze River, the largest river in China. The correlation between monthly streamflow forecasts and observation is 0.46 in leave-one-out cross-validation mode. Updated sub-monthly streamflow shows better skill than disaggregated sub-monthly forecasts. To examine the impact of the accuracy of monthly streamflow forecasts on disaggregated 10-day streamflow, synthetic streamflow time series of different level of forecasting skills were examined. Results show that the higher skill of synthetic monthly streamflow forecasts, the lower forecasts error. The value of soil moisture estimates in proving streamflow forecasts is also examined. KEY WORDS streamflow forecasts; sub-monthly streamflow; soil moisture; nonparametric disaggregation Received 7 August 2013; Revised 10 December 2013; Accepted 20 December Introduction Short-term (daily to monthly) and long-term (seasonal to inter-annual) streamflow forecasts are of importance to water resources planning and management (Chiew et al., 2003; Maurer and Lettenmaier, 2004; Golembesky et al., 2009; Eum and Kim, 2010; Alemu et al., 2011). Improving streamflow forecasts at different temporal scales has been a research focus in hydrologic engineering community for the past several decades. This study focuses on monthly and sub-monthly (10-day) average streamflow forecasting and demonstrate the applicability of the proposed method for flow forecasts for a gauge station at the Yangtze River, which is the largest in China. The flood season for the Yangtze River includes the months of July, August and September (JAS), with monthly average peak occurring in July. Streamflow forecasts at monthly and sub-monthly time scale, e.g. 10-day period, are critical for decision making, e.g. allocating water for different users. The 10-day average flow in the Yangtze River * Correspondence to: H. Wang, Bureau of Economic Geology, Jackson School of Geosciences, The University of Texas at Austin, University Station, Box X, Austin, TX , USA. hui.wang@beg.utexas.edu during flood season is especially important to inform decisions such as flood mitigation, hydropower generation and water allocation for domestic and irrigation uses. We choose July to test the proposed method for forecasting monthly and 10-day flow conditions. July is divided into three 10-day forecasting periods. The first 10-day streamflow denotes average streamflow in day 1 10 in July. This is similar for the second 10-day streamflow. As there are 31 days in July, the last forecasting period denotes average flow in the last 11 days. For convenience, these were denoted as the first, the second and the last 10-day forecasting period. Short-term streamflow forecasting techniques can be categorized into physical processes-based modelling (Nwagzzie, 1987) and statistical methods. The former generally focuses on the flow generation processes through distributed or lumped hydrological models and provides flow forecasts based on calibrated models which are driven by variables in the forecasting period, e.g. precipitation and temperature. Compared to physical modelling approaches, statistical methods have gained increasing attention in forecasting hydrological variables (e.g. Wu et al., 2008; Chau and Wu, 2010; Taormina et al., 2012). Statistical approaches for streamflow forecasting focus on either the characteristics of river flow 2014 Royal Meteorological Society

2 3516 H. WANG AND X. FU time series or possible relations between streamflow and other hydrological variables (e.g. precipitation, soil moisture) to build a statistical forecasting model. In building such a model, the components that are found significantly correlated with streamflow are often chosen as predictors (Wang et al., 2013). For instance, lagged streamflow data are potential predictors if streamflow time series has strong persistence. Recent studies show that seasonal forecasted precipitation from General Circulation Models (GCMs) and large-scale oceanic-atmospheric index are highly correlated with streamflow data (Sankarasubramanian et al., 2008; Block et al., 2009; Kalra and Ahmad, 2009) in some places. Although seasonal precipitation forecasting skill of GCMs at seasonal scale is usually higher than its skill at monthly scale, a recent study (Wang, 2013) found that monthly forecasts from GCMs are statistically significant corrected with observations and have small errors for some locations in the United States. Hence, monthly precipitation forecasts are potential predictors for streamflow forecasts for their promising performance. Most of the previous studies have chosen gridded precipitation forecasts, lagged streamflow time series and large-scale oceanic-atmospheric index (Hamlet and Lettenmaier, 1999; Xu et al., 2007; Kalra and Ahmad, 2009) as potential predictors. At the same time, soil moisture is also a useful predictor of streamflow forecasting (Dumedah and Coulibaly, 2013). Estimating soil moisture requires extensive observations and hydrological modelling which aims to capture different processes in the water cycle. Recently, a number of land surface models (Pitman, 2003) have been developed to provide soil moisture estimates at different soil depths. Fan and Dool (2004) used a one-layer hydrological model and utilized precipitation and evaporation to provide soil moisture estimates. Both gridded precipitation forecasts from climate models and soil moisture estimates were used in this study to provide streamflow forecasts. A number of studies (e.g. Chau et al., 2005; Xu et al., 2007; Wang et al., 2009; Wu et al., 2009; Zhou et al., 2012) have addressed streamflow forecasting for Yangtze River. Wu et al. (2009) used a support vector regression model for monthly streamflow prediction at different time horizons and compared results with four other models: autoregression moving average, k-nearest neighbours, artificial neural networks and crisp distributed artificial neural networks. Wang et al. (2009) developed a wavelet network model to provide forecasts of mean flow for 10- day period and their model showed better performance over a threshold autoregression model. A few studies have utilized climate information in providing streamflow forecasts. Xu et al. (2007) incorporated a large scale teleconnection index and outgoing long range radiation (olr) to provide seasonal flow prediction. Sub-monthly streamflow forecasts for the Yangtze River incorporating climate information have not been studied. At the same time, it is also important to provide monthly and submonthly flow forecasts at the beginning of the month and possibly update sub-monthly forecasts consequently. Persistence prediction models are usually used in flow forecasts but their application in practical water resources management is often inhibited due to poor performance, especially when streamflow has weak lag-correlations. The contribution of this paper is to propose a scheme of proving monthly and sub-monthly flow forecasts at the beginning of the month and updating sub-monthly forecasts subsequently. The proposed scheme integrated precipitation forecasts from GCMs and soil moisture estimates to provide monthly streamflow forecasting. The obtained monthly flow forecasts were then temporally disaggregated into 10-day flow based on historical observations. A nonparametric disaggregation approach (Prairie et al., 2007) was applied to preserve the flow pattern of the three forecasting periods. To further update the 10-day flow forecasts, soil moisture and streamflow conditions in the previous 10 days were used as independent variables to provide updated forecasts. For instance, at the beginning of the month (July), monthly streamflow and 10-day streamflow of the three forecasting periods are provided at the same time. At the end of the first forecasting period (the 10th day), streamflow of the second forecasting period could be updated based on soil moisture estimates and observed streamflow in the first 10 days. Similarly, the last 10-day streamflow forecasts could be updated. It is reasonable to expect that sub-monthly forecasts could be affected by the accuracy of monthly streamflow forecasts. To examine this, a synthetic flow generation scheme was used to generate monthly flow with different levels of skill. The generated synthetic monthly streamflow were used to obtain 10-day forecasts for the three forecasting periods in July. Forecasting skills of submonthly streamflow were then compared to disaggregated sub-monthly forecasts. The objectives of this study are to: (1) provide monthly and sub-monthly streamflow forecasts for the Yangtze River in July, using gridded precipitation from climate models and gridded soil moisture estimates; (2) update streamflow forecasts for the first, second and the last 10- day period; (3) determine the effects of the accuracy of monthly streamflow forecasts on disaggregated 10-day flow forecasts; and (4) examine the value of soil moisture estimates in providing streamflow forecasts by comparing persistence prediction model and the proposed method. The paper is arranged as follows. Background information is provided in this section, followed by a description of dataset and methodology in Section 2. In Section 3, the proposed framework is illustrated for a gauge station at the Yangtze River. Monthly streamflow forecasts and flow forecasts for the three 10-day periods in July are provided. Updated 10-day streamflow utilizing soil moisture estimates and streamflow observation in the previous 10-day is also discussed. In Section 4, the effects of monthly streamflow forecasts on disaggregated 10-day flow forecasts are examined. We also present the comparison between persistence prediction model and the proposed method for sub-monthly flow forecasts. This paper is concluded in Section 5.

3 MONTHLY AND SUB-MONTHLY STREAMFLOW FORECASTS 3517 Figure 1. Yichang station at Yangtze River. 2. Dataset and methodology 2.1. Dataset Monthly and sub-monthly (10 day) streamflow data from 1959 to 2006 was collected for Yichang station at Yangtze River. The drainage area of this gauge station is nearly km 2. The Three Gorges Dam was constructed between 1994 and 2009, the main purpose of which is to prevent floods and generate hydroelectricity. The active storage of the dam is 39.4 km 3 which makes it the largest storage reservoir in the world. Figure 1 shows the study area and the gauge station. Monthly retrospective precipitation forecasts from ECHAM 4.5 (Roeckner et al., 1996), a GCM developed at Max Planck Institute for Meteorology. It is named after European Centre for Medium-Range Weather Forecasts and Hamburg (the location of Max Planck Institute for Meteorology) and forced with constructed analogue sea surface temperatures (SSTs) (Li and Goddard, 2005). These retrospective forecasts are available for 7 months ahead and are updated every month beginning from January In this study, retrospective seasonal precipitation forecasts for July at the nearby grids of the streamflow gauge were downloaded (iridl.ldeo.columbia.edu) for the period The spatial resolution of 1-month-ahead precipitation forecasting data is 2.5 latitude by 2.5 longitude. The unit of monthly precipitation from ECHAM4.5 is mm month 1 and the data is in netcdf format. Soil moisture data is from Climate Data Assimilation System I from National Centers for Environmental Prediction National Center for Atmospheric Research (NCEP NCAR) Reanalysis Project (Kalnay et al., 1996). In Reanalysis Project, soil moisture is not assimilated but rather is derived from the model, which is a T62 model (equivalent to a horizontal resolution of about 210 km) with 28 vertical levels. Although they are not validated due to lack of soil moisture observation over a long term, to the extent that the model and its physical parameterization are realistic, soil moisture estimates can be reliable and accurate. However, there will have regional biases if the model tends to be biased. This has no influence in the modelling work presented in this study as they are used as predictors. Monthly average soil moisture (Kalnay et al., 1996) in June at neighbouring grids of the streamflow gauge was downloaded (iridl.ldeo.columbia.edu) for monthly streamflow forecasts. Average soil moisture estimates for the previous 10 days of each 10-day forecasting period was also downloaded to update sub-monthly forecasts. Hence, average of gridded soil moisture estimates for the last 10 days in June and the first and second 10-day periods in July from 1959 to 2006 were downloaded. The spatial resolution of soil moisture data is the same as precipitation forecasts. Estimated soil moisture is unitless and the data is in netcdf format. Figure 2 shows the flowchart of this study. Explanation of approaches presented in Figure 2 is provided in the following paragraphs Principal component regression In this study, principal component regression was employed and it consists of two parts: principal component analysis (PCA) and multivariate linear regression. PCA replaces a large number of correlated predictors, herein precipitation forecasts and soil moisture estimates in different grids, with a smaller number of uncorrelated predictors, known as the principal components (PCs). A large portion of the variance in the original data is

4 3518 H. WANG AND X. FU Gridded monthly precipitation forecast Gridded monthly soil moisture estimates Principal component regression Monthly streamflow forecasts Sub-monthly streamflow observation in the previous period Sub-monthly soil moisture estimates in the previous period Nonparametric disaggregation Sub-monthly streamflow forecasts Updated sub-monthly streamflow forecasts Figure 2. Flowchart of this study. typically explained by several PCs. Predictors are identified based on Spearman correlation between potential predictors (precipitation forecasts or soil moisture estimates) and monthly flow observation. They are selected if the correlation coefficient is significant at 0.05 significance level. Two PCs were used in linear regression and these PCs explain over 90% of the original variance in predictors. A multivariate linear regression model can be formulated as in Equation 1. Y = X β + (1) where Y ={y 1, y 2,..., y n } is the dependent variable; X is the n p design matrix and the first column are all ones; β ={β 1, β 2,..., β p } is the regression coefficient vector and the first element corresponds to the intercept and is error with elements following the normal distribution with mean 0 and variance σ 2. The unknown parameters in Equation 1 are the regression coefficients and the error term, denoted θ = (β,σ 2 ). Monthly streamflow forecasts were obtained in leaveone-out mode using data between 1959 and For instance, when forecasting average monthly streamflow in July 1959, all the data between 1960 and 2006 were used to build the multiple linear regression model. The same algorithm was used to update sub-monthly streamflow. To update 10-day disaggregated streamflow forecasts, a regression model was constructed between the next 10-day streamflow and corresponding predictors, e.g. soil moisture estimates and streamflow observation in current 10-day period. For instance, to update streamflow forecasts for the second 10-day period (day 11 20) at the end of the 10th day, soil moisture estimates and streamflow observation during the first 10-day period (day 1 10) were used Temporal disaggregation Disaggregation is the process of dividing a given time series into its constituent parts. Temporal disaggregation of monthly streamflow forecasts to 10-day streamflow forecasts is of interest in this study. Different stochastic parametric models have been developed for disaggregation. The linear stochastic framework was originally developed by Valencia and Schaake (1973), which was later modified by Grygier and Stedinger (1990) to improve parameter estimates. Compared to parametric methods, nonparametric approach has gained increasing attention for its simplicity and successful application in many hydrological problems (Lall, 1995). The advantage of the nonparametric disaggregation was demonstrated in kernel-based approach by Tarboton et al. (1998). A robust and simple approach based on resampling was proposed to achieve space time disaggregation (Prairie et al., 2007) to avoid kernel density fitting. Recent works on streamflow disaggregation based on nonparametric approach have shown improvements over traditional parametric schemes (Tarboton et al., 1998; Robertson et al., 2004; Lee et al., 2010). In this study, nonparametric disaggregation of Prairie et al. (2007) was applied to disaggregate monthly streamflow to 10-day streamflow forecasts. Note that both monthly and 10-day streamflow in flow rate units [cubic metre per second (cms)] refer to average flow rate during the specific periods, either 1 month or 10-day period. Streamflow values were first converted to volumetric unit (m 3 ) to ensure

5 MONTHLY AND SUB-MONTHLY STREAMFLOW FORECASTS 3519 (a) (b) Figure 3. (a) Monthly average flow in July over the years (b) Average flow during the first, second and last 10-day period in July over the years that the monthly streamflow was the summation of submonthly streamflow in three 10-day periods. Disaggregated sub-monthly flow were then converted back to flow rate (cms). Disaggregation of monthly streamflow into 10-day streamflow forecasts was performed in a leave-one-out cross-validation mode by leaving out the conditioning year from the training dataset. For the given monthly streamflow forecasts over the year , K neighbours of the forecasted monthly streamflow were identified by computing the Euclidean distance from the forecasted monthly streamflow of other years to the conditioning year. The observed 10-day streamflow for the respective neighbours was resampled to constitute N ensembles of 10-day disaggregated forecasts. The number of ensembles (w(k)*n ) that each identified neighbour represents in the conditional probability density function (i.e. represented by N ensembles) was estimated by the kernel weighting function (Equation 2) suggested by Lall and Sharma (1995). 1 w (k) = K 1 k k k=1 where w(k) represents the weights of the kth neighbour, and k is the rank of the neighbour out of the total selected K neighbours. K and N were chosen as 15 and 200, (2)

6 3520 H. WANG AND X. FU Figure 4. Monthly streamflow forecasts using leave-out cross-validation. The correlation between monthly streamflow observation and forecasts is 0.46, which is statistically significant at 0.05 level. respectively after preliminary analysis. This algorithm can be summarized as the following steps: Step1: Select the conditional year i, which is within the range of Step 2: Calculate Euclidean distance between monthly streamflow forecasts in year i and streamflow forecasts in year j (j is within the range of but j i). Step 3: Identify K -nearest neighbours based on Euclidean distance. Step 4: Sample sub-monthly flow in historical years based on weights assigned in Equation 2 to obtain N ensemble members Synthetic monthly streamflow generation There are different approaches (e.g. Maurer and Lettermaier, 2004; Weigel et al., 2008; Sankarasubramanian et al., 2009) to generate synthetic streamflow. For simplicity, a slight modification of the Sankarasubramanian et al. (2009) was used in this study, which is next described. Let Q i represents monthly streamflow observation in July of year i, where i ranges from 1959 to 2006 in this study. The mean and variance of the monthly streamflow observation time series are μ Q and σ 2,respectively. Let r represent the skill indicator of synthetic streamflow, which ranges from 0 to 1. Synthetic streamflow Q i,j is generated according to Equation 3: Q i,j = rq i + ɛ j (3) where Q i,j represents the j th synthetic flow ensemble member for year i. ɛ j is the noise term following gamma distribution with two parameters, k and θ. As denoted by Equations 4 and 5, these two parameters ensure that the mean and variance of the gamma distribution are μ Q (1 r) and(1 r 2 )*σ 2. kθ = μ Q (1 r) (4) kθ 2 = ( 1 r 2) θ (5) The conditional expectation and marginal expectation of Q i,j can be derived as following: ( ) E Q i,j = E(Q i ) + E(ɛ) E = rq j + μ Q (1 r) = r ( ) Q i μ Q + μq (6) ( ( )) E Q i,j = E ( r ( ) ) Q i μ Q + μq = μq (7) It can be seen from Equation 6 that if the skill of synthetic streamflow is 1, the conditional mean of ensemble forecasts will be the same as the observations. If r is closer to 0, the conditional mean will be deviate further from the observation. Equation 7 denotes that the long-term mean of the observation is preserved by the synthetic generation, regardless of the skill. To investigate how well the flow variability is preserved by this synthetic streamflow generation scheme, the variance of conditional expectation of Q i,j is examined in Equation 8. ( ) Var(E Q i,j ) = Var(r(Q i μ Q ) + μ Q ) = Var ( rq j ) = r 2 σ 2 (8)

7 MONTHLY AND SUB-MONTHLY STREAMFLOW FORECASTS 3521 (a) (b) (c) Figure 5. Boxplot of disaggregated sub-monthly streamflow in the first, second and last 10-day forecasting period in July over the years is shown in panels (a), (b) and (c), respectively. The dashed lines represent observations. It shows that the variance of streamflow explained by the ensemble mean is proportional to r 2. There are two main characteristics of this generation scheme: (1) Long-term average flow of observed streamflow is reserved by ensemble mean under different skills. This does not hold true as long-term bias exists for most operational forecasting models. For models with bias, it can be eliminated after calibration and also the purpose of this study is to investigate the value of different forecasts, hence, we argue it is a reasonable assumption in generating forecasts ensembles. (2) The variance of the streamflow observation is partly preserved in synthetic ensemble streamflow, which is proportional to the square of skill indicator, as shown in Equation Forecasts evaluation In this study, monthly and sub-monthly (10-day) streamflow forecasts were provided at the beginning of the month. There are a total of four forecasting periods: 1- month-ahead; the first, the second and the last 10-day forecasting period. To evaluate the forecasting skill of the proposed method, the following three criteria were used: Pearson s correlation, mean square error and rank probability score (RPS). Pearson s correlation between streamflow observation and forecasts was used to evaluate how much of the variance exhibiting in observed data was preserved. Mean square error was utilized to determine the deviance between the observation and forecasts over the period In addition, Mean Square Skill Score (MSSS), derived from mean square error, is used to compare forecasts and long-term average of the forecasting period. MSSS is defined as the following: MSSS = 1 MSE A (9) MSE B where model A and B is streamflow forecasts and climatology of monthly streamflow, respectively. Similar to MSSS, Rank Probability Skill Score (RPSS) is employed to compare performance of monthly/submonthly streamflow forecasts and climatology, in terms of RPS. When RPSS is positive, streamflow forecasts outperform climatology. The RPSS value represents how much of the RPS has been reduced (RPSS > 0) or increased (RPSS < 0) compared to the reference model (climatology). RPSS can be calculated for each year and average RPSS for the period shows the

8 3522 H. WANG AND X. FU (a) (b) Figure 6. (a) Scatterplot between average streamflow in the first 10 days and the second 10 days. The correlation between the two variables is (b) Scatterplot between average soil moisture in the first 10 days and average streamflow in the second 10 days. The correlation between the two variables is performance comparison of climatology and proposed forecasting method, in terms of categorical forecasting, over the long term. 3. Results Figure 3(a) shows monthly streamflow in July over the period Its peak reached cms in 1998, resulting in the worst flood in the past decades in China. Figure 3(b) shows monthly streamflow and the first, second and last 10-day streamflow in July over the years. There is a statistically significant correlation between July streamflow and monthly precipitation forecasts at three grids of 2.5 longitude by 2.5 latitude; and statistically significant correlation between monthly streamflow and 1-month lagged soil moisture estimates at two grids. These five variables were chosen as the potential predictors for July streamflow. The Spearman correlation between monthly flow in July and gridded precipitation forecasts for July is 0.46, 0.51 and 0.44 at three grids. The correlation between monthly flow in July and gridded soil moisture estimates in June is 0.45 and 0.41 at

9 MONTHLY AND SUB-MONTHLY STREAMFLOW FORECASTS 3523 (a) (b) (c) Figure 7. (a) The scatterplot between sub-monthly streamflow observation and updated streamflow forecasts for the first 10-day period in July. (b) The scatterplot between sub-monthly streamflow observation and updated streamflow forecasts for the second 10-day period in July. (c) The scatterplot between sub-monthly streamflow observation and updated streamflow forecasts for the last 10-day period in July. two grids. As described in the methodology section, PCA is applied to the predictors to reduce the dimension of the regression model. The first two PCs explained over 90% of the variance exhibiting in the original predictors. Each PC was composed of the five predictors and the coefficients reflected the contribution of each predictor to the PC. Hence, these two PCs were chosen as the predictors for July streamflow. Forecasted inflow in leave-one-out cross-validation mode is shown in Figure 4. The correlation between monthly streamflow observation and the forecasted streamflow is 0.46; mean square error is The forecasts maintain the overall fluctuations in the observation, but it does not well capture the inter-annual variability, especially for high streamflow years. This could influence the accuracy of 10-day streamflow in the month. MSSS of monthly streamflow forecasts is , with climatology ( cm) as the reference model. There is a 20% improvement in the forecast over the climatology. Average RPSS is 0.10 which reveals that flow forecasts reduce nearly 10% of the RPS of climatology. Figure 5 shows the boxplot of sub-monthly streamflow forecasts for the first, second and last 10-day period in July. For each forecasting period, there are 200 ensemble members. The dashed line represents streamflow observation in each forecasting period. For most of the years, the boxplot contains the observation. Ensemble mean of forecasts could be used as deterministic forecasts for sub-monthly streamflow. The correlation between observation and ensemble mean of forecasts is 0.35, 0.42 and 0.20 for the first, second and last 10-day period. MSSS is 0.10, 0.17 and 0.02 for the first, second and last 10-day forecasting period. Except for the second 10- day period, disaggregated sub-monthly forecasts were not superior to climatology with respect to mean square error. Surprisingly, average RPSS of disaggregated 10-day forecasts was negative for all three forecasting periods, with 0.17, 0.06 and 0.28 in the first, second and third 10-day, respectively. This reflects that disaggregated flow forecasts do not well capture the probability density distribution of observation. This has been improved by the updating scheme of 10-day streamflow forecasts. Disaggregated 10-day forecasts from monthly streamflow forecasts could be further improved by incorporating streamflow observation and soil moisture estimates in the previous 10 days, if there is strong correlation between them. Figure 6(a) shows the scatter plot of the second 10- day streamflow observation in July and the first 10-day streamflow observation in July. The correlation between the two variables is 0.50, which is statistically significant at 0.05 level. Similarly, the correlation between the last 10-day streamflow and the second 10-day streamflow is

10 3524 H. WANG AND X. FU 0.57 (figure not shown), which is statistically significant at 0.05 level. The correlation between the first 10-day streamflow in July and the last10-day streamflow in June is not statistically significant and it was not used to update sub-monthly streamflow forecasts. Figure 6(b) shows the scatter plot of the second 10- day streamflow observation in July and the first 10-day soil moisture at a neighbouring grid in July. The correlation between the two variables is 0.54, which is statistically significant at 0.05 level. Similarly, the correlation between the first 10-day streamflow observation and the last 10-day soil moisture of a neighbouring grid in June is 0.55; and the correlation between the last 10- day streamflow and the second 10-day soil moisture of a neighbouring grid is 0.54 (figure not shown). Both are statistically significant at 0.05 level. Streamflow observation and soil moisture estimates in the previous 10-day period were used as potential predictors to improve 10-day flow forecasts. For instance, the second 10-day forecasts can be updated at the end of the 10th day of July. Figure 7 shows the scatterplot between updated 10-day streamflow forecasts and 10-day streamflow observation in the three forecasting periods in July. The correlation between the observation and updated forecasts was 0.52, 0.65 and 0.67 for the first, second and last 10-day forecasting period, respectively. MSSS of the updated 10-day forecasts was 0.27, 0.42 and 0.45 for the first, second and last 10-day period. This was substantially improved compared to the disaggregated sub-monthly forecasts, which demonstrates the value of lagged 10-day streamflow and 10-day soil moisture estimates in forecasting streamflow. Average RPSS for the first, second and last 10-day forecasting period is 0.14, 0.25 and 0.17, respectively. 4. Discussion The disaggregated 10-day forecasts from monthly streamflow forecasts, as shown in Figure 5, do not well preserve the inter-annual variability of the observation data. This was partially caused by the inaccuracy of monthly streamflow forecasts. Although the correlation between streamflow observation and monthly streamflow forecasts based on gridded precipitation and soil moisture estimates (0.46) was statistically significant, the mean square error was The deviance between forecasts and observation influenced the disaggregated forecasts. One raised question is: how is the disaggregated sub-monthly streamflow influenced by the accuracy of monthly streamflow forecasts? To address this, synthetic monthly streamflow forecasts with different level of skills were generated to obtain disaggregated 10-day streamflow. Figure 8 shows synthetic monthly streamflow with skill level 0.5 and 0.9. One hundred ensemble forecasts for each year were generated. The dotted line represents the ensemble mean of synthetic forecasts. The shaded area denotes the uncertainty associated with the forecasts for specific years. It can be seen that the higher skill of (a) (b) Figure 8. (a) The ensemble mean and 90% confidence interval of the synthetic streamflow with skill indicator 0.5. (b) The ensemble mean and 90% confidence interval of the synthetic streamflow with skill indicator 0.9. synthetic forecasts, the smaller the uncertainty associated with it. MSSS of the ensemble mean of synthetic flow is 0.76 and 0.98 for r being 0.5 and 0.9, respectively. Ten different sets of synthetic monthly streamflow with skills from 0.5 to 0.95 with a 0.05 step value were generated. Each set contained 200 ensemble members of monthly streamflow forecasts over the years Each ensemble member was utilized to obtain disaggregated 10-day streamflow forecasts. As described in the methodology section, Spearman s correlation and mean square error were used to evaluate the forecasting skill. Disaggregated 10-day streamflow forecasts were compared among different levels of monthly streamflow forecasts. Figure 9(a), (c) and (e) shows the boxplots of correlation between 10-day streamflow observation and disaggregated 10-day streamflow in the three forecasting periods under different synthetic monthly forecasts. It can be seen that the higher the skill of synthetic forecasts, the higher the correlation value. For instance, when the skill indicator of the synthetic streamflow forecasts is 0.9, the mean of the correlation between forecasts and observation in the second 10-day period is greater than Figure 9(b), (d) and (f) shows the boxplots

11 MONTHLY AND SUB-MONTHLY STREAMFLOW FORECASTS 3525 (a) (b) (c) (d) (e) (f) Figure 9. The correlation between disaggregated sub-monthly streamflow forecasts and sub-monthly streamflow observation under different skill levels of monthly streamflow forecasts in the first, second and last 10-day period is shown in panels (a), (c) and (e), respectively. The mean square error of disaggregated sub-monthly streamflow under different skill levels of monthly streamflow forecasts in the first, second and last 10-day period is shown in panels (b),(d) and (f), respectively. of mean square error of 10-day streamflow forecasts in the three forecasting periods under different synthetic monthly flow forecasts. The higher the skill of synthetic forecasts, the smaller the mean square error. This denotes that improved monthly steamflow forecasts could also benefit sub-monthly streamflow forecasts. Similar results were observed for MSSS which was derived from mean square error (figure not shown). By assuming no forecast error of monthly streamflow forecasts, monthly observation (perfect forecasting) was used to obtain disaggregated sub-monthly streamflow forecasts. The correlation between sub-monthly forecasts and observation was 0.61, 0.84 and 0.69 for the first, second and last 10-day period in July, respectively. Another question of interest is what is the value of soil moisture estimates in providing updated 10-day streamflow forecasts. To examine this, we compare the results from above with persistence prediction models, where only streamflow observation in the previous 10 days is used as the predictor. Table 1 compares the performance of proposed 10-day forecasts updating scheme and persistence prediction models. Improvements in evaluation

12 3526 H. WANG AND X. FU Table 1. Performance comparison between persistence prediction model and the proposed model using soil moisture estimates. Model Forecasting period Person s correlation coefficient MSE Average RPS MSSS Average RPSS Persistence prediction model Proposed model using soil moisture estimates MSE, mean square error. metrics demonstrate the value of soil moisture estimates from hydrological models in proving streamflow forecasts. 5. Conclusion This study presents an application of climate information in forecasting streamflow at monthly and sub-monthly time scales. It provides a practical updating scheme of sub-monthly streamflow forecasts. Gridded precipitation forecasts and estimated soil moisture were used to provide operational monthly streamflow forecasts, which were then disaggregated to sub-monthly streamflow based on the historical pattern of sub-monthly streamflow, using a nonparametric disaggregation approach. By using the disaggregated method, the 15 nearest neighbours of the monthly forecasts of the conditioning year were identified and 200 ensemble members were selected to form the distribution of the sub-monthly streamflow, as shown in Figure 5. To provide adaptive forecasts, estimated soil moisture and gridded streamflow observation in the previous 10-day were used as predictors for 10-day streamflow in the current forecasting period. The correlation between operational monthly streamflow forecasts and monthly streamflow observation was 0.46, which is statistically significant at 0.05 level. The correlation between disaggregated 10-day streamflow forecasts and observation was 0.35, 0.42 and 0.20 for the first, second and last 10- day period in July over By incorporating soil moisture estimates and sub-monthly streamflow forecasts in the previous 10 days, updated sub-monthly streamflow forecasts were substantially improved, as shown in Figure 6. Such streamflow forecasting framework could be applied to guide short-term water resources planning. The disaggregated sub-monthly streamflow forecasts were affected by the accuracy of operational monthly streamflow forecasts. To examine this effect, synthetic monthly streamflow forecasts were generated to obtain disaggregated sub-monthly streamflow forecasts. It reveals that the higher skill of monthly streamflow forecasts, less error of the disaggregated sub-monthly streamflow forecasts, as shown in Figure 9. This also serves as a motivation to further improve monthly streamflow forecasts. The value of soil moisture estimates in improving 10-day streamflow forecasts are also demonstrated via the comparison between persistence prediction models and the proposed method. There are limitations for this work. First, the forecasting skill of streamflow is contingent on gridded precipitation forecasts from GCMs, the performance of which varies spatially. This might inhibit the application of the approach. Further research is needed to identify potential predictors for streamflow. Second, the error introduced in the disaggregation approach limits the skill of sub-monthly forecasting. Other approaches, e.g. support vector machine, should be examined to better capture the temporal pattern in disaggregating monthly flow forecasts. Acknowledgements The authors thank the anonymous reviewers for their thoughtful and critical comments that improved the manuscript. This research was partially supported by the National Natural Science Foundations of China (No and No ). References Alemu TE, Palmer NP, Polebitski A, Meaker B Decision support system for optimizing reservoir operations using ensemble streamflow predictions. J. Water Resour. Plann. and Manage. 137(1): Block PJ, Souza Filho FA, Sun L, Kwon H A streamflow forecasting framework using multiple climate and hydrological models. J. Am. Water Resour. Assoc. 45(4): Chau KW, Wu CL A hybrid model coupled with singular spectrum analysis for daily rainfall prediction. J. Hydroinformatics 12(4): Chau KW, Wu CL, Li YS Comparison of several flood forecasting models in Yangtze River. J. Hydrol. Eng. 10: Chiew FHS, Zhou SL, McMahon TA Use of seasonal streamflow forecasts in water resources management. J. Hydrol. 270: Dumedah G, Coulibaly P Evolutionary assimilation of streamflow in distributed hydrologic modeling using in-site soil moisture data. Adv. Water Resour. 53: Eum H-I, Kim Y-O The value of updating ensemble streamflow prediction in reservoir operations. Hydrol. Process. 24: Fan Y, Dool H Climate prediction center global monthly soil moisture data set at 0.5 resolution for 1948 to present. J. Geophys. Res. Atmos. 109(D10), DOI: /2003JD Golembesky K, Sankarasubramanian A, Devineni N Improved drought management of Falls Lake Reservoir: role of multimodel streamflow forecasts in setting up restrictions 135(3): Grygier JC, Stedinger JR SPIGOT, A synthetic streamflow generation software package. Technical description, version 2.5,

13 MONTHLY AND SUB-MONTHLY STREAMFLOW FORECASTS 3527 School of Civil and Environmental Engineering, Cornell University, New York, NY. Hamlet AF, Lettenmaier DP Columbia River streamflow forecasting based on ENSO and PDO climate signals. J. Water Resour. Plann. Manage. 125(6): Kalnay E, Kanamitsu M, Kistler R, Collins W, Deaven D, Gandin L, Iredell M, Saha S, White G, Woollen J, Zhu Y, Leetmaa A, Reynolds B, Chelliah M, Ebisuzaki W, Higgins W, Janowiak J, Mo KC, Ropelewski C, Wang J, Jenne R, Joseph D The NCEP/NCAR 40-Year Reanalysis Project. Bull. Am. Meteorol. Soc. 77(3): Kalra A, Ahmad S Using oceanic-atmospheric oscillations for long lead time streamflow forecasting. Water Resour. Res. 45(3), DOI: /2008WR Lall U Recent advances in nonparametric function estimations: Hydrologic applications. Rev. Geophys. 33(S2): Lall U, Sharma A A nearest neighbor bootstrap for resampling hydrological time series. Water Resour. Res. 32(3): Lee T, Salas JD, Prairie J An enhanced nonparametric streamflow disaggregation model with genetic algorithm. Water Resour. Res. 46: W08545, DOI: /2009WR Li S, Goddard L Retrospective forecasts with ECHAM4.5 AGCM IRI. Technical Report, December, International Research Institute for Climate and Society, University of Columbia, New York, NY. Maurer EP, Lettenmaier DP Potential effects of long-lead hydrologic predictability on Missouri River main-stem reservoir. J. Climate 17(1): Nwagzzie IL Comparative analysis of some explicit-implicit streamflow models. Adv. Water Resour. 10(2): Pitman AJ The evolution of, and revolution in, land surface schemes designed for climate models. Int. J. Climatol. 23(5): Prairie J, Rajagopalan B, Lall U, Fulp T A stochastic nonparametric technique for space-time disaggregation of streamflows. Water Resour. Res. 43: W Robertson AW, Kirshner S, Smyth P Downscaling of daily rainfall occurrence over northeast Brazil using a hidden Markov model. J. Climate 17: Roeckner E, Arpe K, Bengtsson L, Christoph M, Claussen M, Dümenil L, Esch M, Giorgetta M, Schlese U, Schulzweida U The atmospheric general circulation model ECHAM-4: model description and simulation of present-day climate. Report 218, Max-Planck-Institutfür Meteorologie, Hamburg, Germany. Retrieved Report_218.pdf Sankarasubramanian A, Lall U, Espinueva S Role of retrospective forecasts of GCMs forced with persisted SST anomalies in operational streamflow forecasts development. J. Hydrometerol. 9: Sankarasubramanian A, Lall U, Souza Filho FD, Sharma A Improved water allocation utilizing probabilistic climate forecasts: short term water contracts in a risk management framework. Water Resour. Res. 45: W11409, DOI: /2009WR Taormina R et al Artificial neural network simulation of hourly groundwater levels in a coastal aquifer system of the Venice lagoon. Eng. Appl. Artif. Intel. 25(8): Tarboton D, Sharma A, Lall U Disaggregation procedures for stochastic hydrology based on nonparametric density estimation. Water Resour. Res. 34(1): , DOI: /97WR Valencia D, Schaake JC Jr Disaggregation processes in stochastic hydrology. Water Resour. Res. 9: Wang H Evaluation of monthly precipitation forecasting skill of the National multi-model ensemble in the summer season. Hydrol. Process., DOI: /hyp Wang W-C, Chau K-W, Cheng C-T, Qiu L A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series. J. Hydrol. 374: Wang H, Reich B, Lim YH A Bayesian approach to probabilistic streamflow forecasts. J. Hydroinformatics, DOI: /hydro Weigel AP, Liniger MA, Appenzeller C Can multi-model combination really enhance the prediction skill of probabilistic ensemble forecasts? Q. J. R. Meteorol. Soc. 134: Wu CL, Chau KW, Li YS River stage prediction based on a distributed support vector regression. J. Hydrol. 358(1 2): Wu CL, Chau KW, Li YS Predicting monthly streamflow using data-driven models coupled with data-preprocessing techniques. Water Resour. Res. 45, DOI: /2007WR Xu K, Brown C, Kwon H-H, Lall U, Zhang J, Haysashi S, Chen Z Climate teleconnections to Yangtze river seasonal streamflow at the Three Gorges Dam, China. Int. J. Climatol. 27: Zhou J, Zhang J, Guo J, Zhang Y The chaotic neural network model of monthly runoff forecast based on wavelet de-noising. J. Water Resour. Res. 2012(1):