Development of Stochastic Flow Sequences Based on Observed and Paleo-Reconstructed Data

Size: px
Start display at page:

Download "Development of Stochastic Flow Sequences Based on Observed and Paleo-Reconstructed Data"

Transcription

1 Development of Stochastic Flow Sequences Based on Observed and Paleo-Reconstructed Data Ken Nowak, James Prairie, and Balaji Rajagopalan November 27, 2007 Introduction With a goal of developing more robust flow sequences for the Gunnison River, this analysis investigates the utility of incorporating paleo-reconstructed streamflow data. Reconstructed flows completed by Woodhouse et al. (2006) span 1569 to 1997, providing 368 years of data, which proceed the observed period ( ), thus helping to better understand long term trends in the hydrology of the Gunnison basin. However, reconstructed flow magnitudes are sensitive to the sampling and statistical method employed (Hidalgo et al., 2000) and as a result, their use has been met with some contention. Nevertheless, it is generally accepted that reconstructed flows are good indicators of annual hydrologic state (i.e., wet or dry). We examine a variety of methods to incorporate state information from the paleo period and flow magnitudes from the observed period; thus capitalizing on their individual strengths to produce more varied flow sequences. Four techniques, all based on a hydrologic state framework, are first presented and used to generate new flow sequences using paleo streamflow data. These techniques are next evaluated based on comparisons of their ability to capture observed and paleo statistics including generation of extended periods of drought and surplus; beyond the capabilities of using only the observed streamflow data. The investigation concludes with a summary of the results and identification of a recommended technique. Applied Stochastic Techniques As previously stated, the accuracy of paleo reconstructed flow data is somewhat contentious. However, it can be reliably used to identify the annual hydrologic state and thus, incorporated in stochastic streamflow sequence generation. The median flow of the 60 year observed period provides a metric for converting the paleo flow data into a binary hydrologic state sequence indicating wet (1) or dry (0). Upon obtaining the binary sequence for the paleo record, several different techniques are used to generate sequences for stochastic simulation. The different methods use a (1) block resampling of the binary, (2) homogeneous Markov model, (3) block homogeneous Markov model, and (4) nonhomogeneous Markov model. Each method develops 30 year long binary traces, which are then assigned flow magnitudes from the observed data using a k-nearest neighbor (KNN) data resampling technique. The process in which the binary traces are generated distinguishes one method from another. In the sections to follow, each method is described in detail, including the KNN method of assigning magnitudes to the binary traces cesu.Task4_final report.doc 1

2 Block Resampling (Block) The most basic of the paleo techniques is a random 30 year block bootstrap of the paleo binary sequence. This technique can capture epochal behavior not seen in the observed, but is limited to the specific paleo binary sequence. Homogeneous Markov (HM) This method and the two that follow differ from the block resampling technique because they all generate new, unseen binary sequences; while the block resampling is limited to the sequences observed in the paleo data. This is possible through the use of a two state, first order Markov model (Rajagopalan, et al., 1996). The two states are wet and dry; resulting in four possible transitions (wet to wet, wet to dry, dry to dry, and dry to wet). The homogeneous Markov begins by calculating average transition probabilities for the entire paleo period. For example, the average wet-wet transition is effectively computed as a percentage representing the number of times a wet year is followed by a wet rather than a dry year. Furthermore, the wet-wet transition probability must be one minus the wet-dry transition probability. This results in a matrix of four values indicating the probability of the next year being wet or dry given the state of the current year. Generation of the new sequence starts by randomly selecting the state (1 or 0) for the first year. Using the previous year s state and the appropriate transition probabilities as weights, state information is assigned to the current year and this process is repeated to create a trace. Block Homogeneous Markov (BHM) To capture the epochal nature of paleo data, the strengths of the previous two methods were combined to form a block homogeneous Markov model. This model reduces smoothing caused by averaging transition probabilities over the entire paleo period while still generating new sequences. A block of the paleo binary is randomly selected and transition probabilities are calculated specifically for that 30-year period; thereby creating an unique set of probabilities for each simulation. Once the probability matrix has been filled for a particular simulation, traces are generated as described in the homogenous Markov section. This introduces more drought/surplus variability compared to the homogeneous method while maintaining a simple framework. Non-Homogeneous Markov (NM) This final method has the ability to introduce the most variability in the generated sequences. As the name implies, transition probabilities may not be constant throughout the data, or even a 30 year block. By computing transition probabilities for each year in the paleo data, the nonhomogeneous nature of the data can be incorporated in the generated flow sequences. We next briefly present and explain this method. Readers are referred to Prairie et al. (2007) for a more detailed description of this technique. Additional information can also be found in Rajagopalan, et al. (1996) and Rajagopalan, et al. (1997). The four local transition probabilities for each year t, can be determined from the probabilities of transitioning from a dry to wet state ( P dw ) and a wet to dry state ( P wd ) cesu.Task4_final report.doc 2

3 The probability of transitioning from a dry to dry state is found as P dd = 1 P and the probability of transitioning from a wet to wet state is found as Pww = 1 Pwd, as described previously. These transition probabilities are calculated based on years in the range [ t h() to t + h() ] as: P P dw wd n t ti K St St i= h [1 ] 1 2 dw ( t) = (1) n t ti K St i= 2 hdw n t ti K [1 St ] St 1 i= 2 = hwd ( t) (2) n t ti K [1 St ] i= 2 hwd dw where K ( ) = the kernel function; h () is kernel bandwidth for the transition of interest (dw or wd); S t = system hydrologic state (1 = wet, 0 = dry) at time t ); S t 1 = system hydrologic state at time t 1; t = year of interest; and n = the number of values in the window t h ( ) to t + h ( ). The kernel function K is defined as: 2 ( ) 3h K( x) = 1 x (3) 2 (4h 1) where t t x = h (), x 1, t is the year of interest, and t () is the transition of interest. The bandwidth provides a window overlaying the current year, over which the suite of transition probabilities are calculated. The value of h is optimized for years transitioning from wet and then again for years transitioning from dry. Optimization is accomplished using a Least Squares Cross Validation (LSCV) method defined as: n 1 2 LSCV ( h) = [1 Pˆ t ( t i i )] (4) n where n = the number of dw or wd transition within the window [ t h() to t + h() ]; Pˆ t i ( t i ) = the estimate of the transition probability ( Pˆ dw or Pˆ wd ) at year t dropping the information on year t. i= 1 The LSCV is calculated for a suite of h values; the h ultimately selected results in the smallest LSCV value. Once h values have been selected for transitions from both wet and dry years, transition probabilities can be calculated for each year cesu.Task4_final report.doc 3

4 This process results in a matrix of sequential transition probabilities that are bootstrapped in 30 year blocks. In order to generate the new binary sequence, the first year is randomly selected as wet or dry. Once a hydrologic state has been assigned to the first year, the state of the second year is determined based on the set of bootstrapped transition probabilities, which reflect the previous year s state. This process is repeated until the binary sequence has been completed. K-NN magnitude resampling Once the desired number of binary traces has been generated, flow magnitudes are assigned from the observed data. To accomplish this, we employ a modified K-NN framework that utilizes current and previous hydrologic state information in addition to previous flow magnitude to assign a flow value to the current year. This model can be described as the conditional probability density function (PDF): where the flow at the current time previous system state = t 1 f x S, S, x ) (5) ( t t t 1 t 1 t = xt conditioned on the current system state = S t, x. S, and previous flow = t 1 The observed data are first grouped based on hydrologic state (i.e., wet or dry) and then their transition category (i.e., a wet year proceeded by dry falls into the category of drywet transition). Flow assignment begins by randomly selecting a magnitude from the observed record in the same hydrologic state as the first year in the binary trace. Once the first year has received a flow value, the second year is classified based on its transition (i.e., if the first year was wet and the second dry, the transition category for year 2 would be wet-dry). Flows are then assigned by a lag-1 KNN (Lall and Sharma, 1996) approach with a limited pool of available neighbors to reflect the transition of interest (i.e., wetwet). Once the neighbors have been identified they are weighted such that the closet neighbor has the greatest weight and the farthest the least. Then one of the weighted neighbors is randomly resampled. The magnitude of the year following the selected neighbor becomes the flow for the second (current) year in the binary trace. This process repeats until all years in the binary trace have received a flow value. Upon completion of assigning flows to the first trace, the process begins again with the second trace and so on until flow magnitudes have been assigned to all traces. Statistical Analysis and Validation For each of the paleo techniques described earlier, we generated 100 simulations, 30 years in length. From preliminary analysis, it was decided to focus on two of the four methods in the interest of brevity (statistics for all methods are available upon request). The block paleo and non-homogeneous methods were selected as principle techniques due to the contrast of strengths and drawbacks found in each approach. Of the four methods, the block paleo is simplest to implement, yet introduces the least variability. Alternatively, the non-homogeneous incorporates the most variability, but is more 2567.cesu.Task4_final report.doc 4

5 intensive to implement. A suite of statistics including distributional, drought, and surplus data properties are computed from the simulations and compared against that of the observed and paleo data using boxplots. A detailed description of boxplots can be found in the previous report titled Development of Stochastic Flow Sequences Based on Observed Data. Distributional Statistics The distributional statistics computed for comparison with the observed and paleo data are mean, standard deviation, coefficient of skewness, backward lag-1 correlation, maximum, and minimum. In addition, the probability density functions (PDFs) for each trace are compared to that of the observed and paleo data. Drought and Surplus Statistics Drought and surplus lengths and magnitudes are important to understand in hydrologic studies. Figure 1 illustrates several key terms to be used in quantifying drought and surplus periods in terms of both length and magnitude. Drought and surplus statistics are presented as histograms, descriptions of which can be found in the previous report. Surplus Length (LS) Flow Drought Length (LD) Surplus volume (MS) Threshold (e.g., median) Drought Deficit (MD) Time Figure 1 Drought and surplus statistic definitions. Results - Statistical Comparison We present the boxplots of the distributional statistics followed by those of drought and surplus statistics. Distributional Statistics Boxplots of the distributional statistics for the different methods are shown in Figure 2. Red triangles represent the observed data and blue circles are the corresponding statistic 2567.cesu.Task4_final report.doc 5

6 from the paleo data. Observed statistics are generally well captured by both methods because the generated sequences resample magnitudes from the observed data, hence preserving the observed statistics. The exception is the backward lag-1 correlation; this statistic is influenced by both paleo and observed data. Paleo data determines the binary or state sequence; thus impacts the backward lag-1 correlation by limiting the pool of observed years for resampling. As a result, the paleo tends to be better captured, with the median simulated correlation lying between the observed and paleo. Since the two method resample the observed data, the maximum and minimum values cannot exceed the observed. Paleo magnitudes are not incorporated in either technique, thus it is not expected that the paleo statistics would be captured. Figure 2 Distributional statistics for Block (top) and Non-Homogeneous methods (bottom) (100 simulations, 30 years each). Figure 3 shows boxplots of the average annual flow from each method. Both capture the observed and paleo values. Furthermore, an increase in variability with the NM framework is shown by the expanded interquartile range (IQR), compared to that of the Block method cesu.Task4_final report.doc 6

7 Figure 3 Mean annual flows. For Block (left) and NM (right). PDFs of both the observed and paleo data fall within the simulation PDFs (Figure 4). This indicates that the simulations capture all distributional properties of the observed data. Clearly, 100 simulations can capture the observed statistics. Figure 4 Block annual flow PDFs (left) and NM annual flow PDFs (right). Dashed red is observed, dot-dash blue is paleo and grey are generated sequences. Drought and Surplus Statistics Boxplots of average drought and surplus length for each method are shown in Figures 5 and 6. From visual inspection of Figure 5, a small increase in variability in the NM method can be seen over the Block approach. However, this is much more obvious in the surplus length plot (Figure 6) cesu.Task4_final report.doc 7

8 Figure 5 Mean drought lengths. Block (left) and NM (right). Figure 6 Mean surplus lengths. Block (left) and NM (right). Histograms of the drought and surplus distributions for both methods are shown in Figures 7 and 8. The NM framework generates maximum surplus/drought lengths very similar to those of the Block method. The important difference is that the NM distribution is more smoothed, and as a result, the probabilities are more realistic. For example, both models generated a 14 year surplus period. However, the Block technique indicates zero probability of having a surplus ranging from 9-13 years in length, while the NM approach has probabilities for all surplus lengths up to the maximum with the exception of one. Both methods show improved capacity to generate more varied flow sequences compared to KNN without coupled paleo hydrologic state information cesu.Task4_final report.doc 8

9 Figure 7 Drought length histograms. Block (left) and NM (right). Figure 8 Surplus length histograms. Block (left) and NM(right). Sequent Peak Algorithm As indicated in the previous report, the sequent peak algorithm (Loucks et al., 1981) provides a method for quantifying system reliability to meet demands based on a given flow sequence. Figure 9 shows sequent peak plots for both techniques, depicting the required storage to meet a variety of demands. While this method is convenient and visually simple, it should be used with care. Storage requirements tend to be conservative estimates, as demand is assumed to be constant throughout time and must always be met. This oversimplifies system operations and provides a coarse estimate of reliability. However, these plots show similar results to those found with only KNN simulations; primarily, demands that should be met based on the observed data can only be met with 50-60% reliability based on the simulated data. This further stresses the importance of utilizing a varied set of flow sequences for reservoir management and modeling cesu.Task4_final report.doc 9

10 Figure 9 Sequent peak algorithm. Dashed red line is maximum storage of Aspinall Unit and Taylor Park. Dotted blue line is curve for the observed period. Block (left) and NM (right). Paleo Indication of Long-Term Basin Characteristics The Block and NM methods provide increased variability in generated flow sequences, which is undoubtedly valuable in reservoir management. However, of equal if not more importance is the insight the paleo methods and data provide regarding the long term hydrologic trends of the basin. Table 1 shows the probability of wet and dry years for the paleo and observed periods. Table 1 Observed and paleo transition (hydrologic state) probabilities. Dry Probability Wet Probability Observed Paleo There is a clear indication that the years prior to the observed have been wetter than the observational period. This is further supported by the drought/surplus length histograms, where both the Block and NM methods generated longer periods of surplus than drought. Figure 10 shows the probability of experiencing a wet year, as smoothed and normalized flows, and smoothed system state. The window length for the smooth equals 2 h + 1, where h = 33, the optimal bandwidth for transitions from a wet year found with the LSCV method. All are based on the paleo data and clearly show a wet trend. However, as addressed in the previous report, hydrologic state is determined by a median flow criterion of the observed period, which is somewhat subjective cesu.Task4_final report.doc 10

11 Figure 10 Annual probability of experiencing a wet year based on paleo data, red dashed line is 50% (top). Smoothed paleo flow normalized by the observed median, red dashed line is 0 (middle). Smoothed system state information (bottom). Lastly, Figure 11 shows the 4 sets of transition probabilities used to generate the NM flow sequences. Of interest are the high probabilities of transitioning from wet to wet and lower probability of going from wet to dry, with the exception of the observed period cesu.Task4_final report.doc 11

12 Figure 11 Paleo transition probabilities used in the non-homogeneous Markov framework. Conclusions and Suggestions From recent events, it has become evident that water managers need to model and plan for extreme events, as improbable as they may seem. The techniques presented in this report allow for the generation of more varied events, thus aiding water managers. In particular, the NM approach is most effective at introducing variability to the stochastic sequence generation process. While it is slightly more intensive to implement, it has several benefits not found in the block method. By using transition probabilities to generate new binary sequences, an unlimited number of unique traces can be produced. Additionally, there is some smoothing in the calculation of the NM transition probabilities which results in sequences that better reflect the epochal nature of the data. Wet and dry epochs are undoubtedly captured by the block method. However, during a dry period, a 15 year drought may be broken into two seven year droughts by the presence of one slightly wet year. With the NM approach there is a much higher potential to reproduce the dry epoch as one continuous drought. As stated earlier, the paleo data indicates that the pre-observational period was, on average, more wet than the past century. This is undoubtedly useful information as well cesu.Task4_final report.doc 12

13 However, it should be used with caution, as the paleo record may not be indicative of future conditions. Factors such as global warming and anthropogenic basin impacts may shape future climate more so than the past trends. Incorporating the variability of the past with the predicted future trends will be the next step in generating flow sequences for reservoir management. References Hidalgo, H.G., T.C. Piechota, and J.A. Dracup (2000), Alternative principal components regression procedures for dendrohydrologic reconstructions, Water Resources Research, 36(11), Lall, U. and A. Sharma (1996), A nearest neighbor bootstrap for resampling hydrologic time series, Water Resources Research, 32(3), Loucks, D.P., J.R. Stedinger, and D.A. Haith (1981), Water Resource System Planning and Analysis, Prentice-Hall, Inc., New Jersey. Prairie, J., K. Nowak, B. Rajagopalan, U. Lall, and T. Fulp (2007), A stocahstic nonparametric approach for streamflow generation combining observational and paleo reconstrcuted data, (in press) Water Resources Research. Rajagopalan, B., U. Lall, and D.G. Tarboton (1996), Nonhomogeneous Markov model for daily precipitation, Journal of Hydrologic Engineering, 1(1), Rajagopalan, B., U. Lall, and M.A. Cane (1997), Anomalous ENSO occurrences: an alternate view, Journal of Climate, 10, United States Bureau of Reclamation (2004). Aspinall Unit Operations Environmental Impact Study Background Material. Grand Junction, CO. Western Colorado Area Office (2007). Water Operations: Taylor Park Reservoir. Retrieved August 8, 2007, from Unites States Bureau of Reclamation website: Woodhouse, C.A., S.T. Gray, and D.M. Meko (2006), Updated streamflow reconstructions for the Upper Colorado River Basin, Water Resource Research, 42, W05415, doi: /2005wr cesu.Task4_final report.doc 13