Evaluating paleohydrologic reconstruction sensitivity to model selection

Size: px
Start display at page:

Download "Evaluating paleohydrologic reconstruction sensitivity to model selection"

Transcription

1 Evaluating paleohydrologic reconstruction sensitivity to model selection Lisa Wade December 20, 2011 Abstract Paleohydrologic reconstructions of annual streamflow from tree-ring chronologies can provide valuable insight into the long-term natural hydrologic variability of rivers. Extensive work has been performed to create robust reconstructions of the Upper Colorado River by modeling the annual natural flow at Lees Ferry, a key gage in terms of policy and management. The goal of this paper is to extend the techniques used to reconstruct the Upper Basin to the Lower Basin by examining both the intervening flow to the Lower Colorado River (Lees Ferry to Imperial Dam) and the Gila River. A variety of statistical techniques are presented and the sensitivity of model selection to the overall reconstruction is examined. Introduction The future reliability of Colorado River Basin water supplies depends on natural hydrologic variability, climate change impacts and other human factors. Natural variability is the dominant component at annual to decadal time scales and thus, capturing and understanding the full range of such variability is critical to assessing risks to near- and mid-term water supplies. Paleohydrologic reconstructions of annual flow using tree rings provide much longer (400+ years) records of annual flow than do historical gage records, and thus a more complete representation of potential flow sequences. While the longterm natural variability of the Upper Colorado River Basin has been well-captured by high-quality multicentury reconstructions of the annual flow of the Colorado River at Lees Ferry, AZ, there has been no equivalent effort for the whole of the Lower Colorado River Basin, including the Gila River. The contribution of the Lower Basin to overall basin flows is estimated to be 15% on average, but this percentage varies significantly from year to year, potentially impacting water supply risk and management for the entire basin. In order to accurately assess the hydroclimatic variability of the Lower Basin, the sensitivity of reconstruction to modeling decisions must be understood. A clear understanding of the trade-offs in necessary to understanding the strengths and weaknesses of a model and therefore and understanding of the uncertainties. To investigate model sensitivity, reconstructions of annual streamflows for the Gila River and Lower Colorado River near Yuma, AZ are developed. The natural flow of the Gila at the confluence with the Colorado River and the flow under current management are both considered.

2 Methodology Three tree-ring reconstructions are considered: the natural flow of the Gila at its confluence with the Lower Colorado River, the flow under current management of the Gila River at its confluence with the Lower Colorado River, and the intervening flow of the Lower Colorado River. Data Selection: Tree Ring Chronologies All of the tree-ring chronologies were generously provided by the University of Arizona Laboratory of Tree Ring Research. They were screened for length of record and location. Figure 1 shows the location of the tree rings and table 1 contains site specific information. All of the tree rings used in this paper where collected as part of the North American Monsoon Project [Griffin et al 2011]. They have been detrended using a spline with a frequency response of 0.5 at a wavelength of 100 years. Table 1: Tree Ring Chronology site information. Figure 1: Location of the tree ring chronology sites. Code Latitude Longitude Name Species Elev. (m) Start Year End Year AWH Antelope Wash PIPO BFM Black Mountain PSME BRM Black River PSME DCU Ditch Canyon PSME EAM Echo Amphitheater PSME FCU Filmore Canyon PSME FMM Fox Mountain PSME FSM Florida Saddle PSME GPM Guadalupe Peak PSME MMM Magdelena Mountains PSME

3 ORM Organ Mountains PSME OSM Onion Saddle PSME PRM Paddys River PSME RPM Rincon Peak PSME RPU Rio Pueblo PIPO SCM Santa Catalina High PSME SFK South Fork PIPO SMM San Mateo Mountain PSME SPJ San Pedro PIJE SPM Satan Pass PSME SPS San Pedro PILA SPW San Pedro ABCO TCM Tsegi Canyon PSME TSM Turkey Springs PIPO WCM White Canyon PSME WKM Wahl Knoll PSME WMP Walnut canyon PIPO WPU Webb Peak PSME Species: PSME = Pseudotsuga menziesii; PIPO = Pinus ponderosa ; PIED = Pinus edulis Data Selection: Gila River Natural Flow In order to create a natural flow sequence for the Gila River, a number of assumptions were made. First, we assumed that the vast majority of the flow in the river is derived from the headwaters and that once the river reaches the Phoenix metro area, it becomes a naturally losing reach. In practical terms, this means that any contributions to the river that come in below the confluence of the Salt River and the Gila River are negligible, or are off-set by natural losses in the channel. Estimations of natural channel losses made by the US Bureau of Reclamation in a 1947 report to Congress were assumed to be correct. In the report, Reclamation provided a natural flow sequence for the Gila River from 1897 to This natural flow sequence is used the calibration flow for the natural flow model in order to extend the natural flow record to Data Selection: Gila River Under Current Management The US Geological Survey operates a stream gage near Dome, AZ ( ). Reliable records start in The monthly streamflow data was summed to produce annual stream flow records. This is the closest gage to the mouth of the river. Figure 2 shows the observed water year flow sequence for this gage. The blue line highlights the threshold of 0.2 MAF, above which flows could begin to have an impact on operations of the reservoirs on the main stem of Colorado River.

4 Figure 2. Observed flow of the Gila River near Dome, AZ. The blue line shows 0.2 MAF, which is used as the threshold for the logistic model. Data Selection: Lower Colorado River Intervening Flows Reclamation produces a naturalized stream flow record for the entire Colorado River from 1906 to 2006 (Reclamation 2011). This is updated regularly and available from their website. The intervening flow record is found by summing all of the intervening flows along the nine nodes in the Colorado River System Simulation (CRSS) model that are located in the Lower Colorado River Basin. Of these nodes, five are well correlated with regional rainfall, but there are four nodes that are not correlated with rainfall. There are several reaches in the river that are naturally losing reaches, which could explain this lack of correlation. Another possible explanation is gage error, since the total volume of water in the river is orders of magnitude larger than the intervening flows. It could also be due to an error in the depletion accounting method. We are working with our friends at Reclamation to investigate this issue. Until it has been resolved, we are using the sum of the five well-correlated flows to represent the intervening flow. Gila River: Natural Flows The 1947 Reclamation report had data from 1897 to In order to get the longest overlap period possible between the natural flow and the tree ring chronologies, a regression analysis was performed to extend the natural flow record to Four types of models were tested to find the best regression between the upstream headwaters and the 1947 natural flow sequence: Generalized Linear Model (GLM), GLM with the Principle Components (PCs) of the headwaters, local polynomial, and local polynomial with PCs. All four models were tested with both the normal link function and the gamma link function.

5 Table 2: Comparing model output for generating the best regression model of headwaters and the naturalized streamflow record for the Gila River at the mouth from the 1947 Reclamation report. Model Selected Predictors AIC Adjusted R 2 GLM, normal Gila, Salt+Verde+Tonto GLM, gamma Gila, Verde GLM, normal, PCA PC 1, PC GLM, gamma, PCA PC Local Polynomial, normal Gila, Salt+Tonto, Verde Local Polynomial, gamma Gila, Salt+Tonto, Verde, and Salt+Verde+Tonto Local Polynomial, normal, PCA PC Local Polynomial, gamma, PCA PC Figure 3. The first two components explain 98% of the variance.

6 Figure 4: Comparing the naturalized Gila River annual flow using four GLM models. Figure 5: Comparing the naturalized Gila River annual flow using four local polynomial models and the first GLM model that uses headwater data with the normal link function. All of the models appear to do a good job capturing the natural flow of the Gila River based on their objective criteria score. From their plots, they all appear to have general agreement on the low to mid-

7 range flow values, except for the GLM based models with gamma link function. These models predict the highest flow at the beginning of the record and then predict negative flows in In 1993, there was a series of winter storms that alternated between warm and cold that resulted in massive flooding. The recorded value of flow at the USGS gage near Dome, AZ was about 5.3 million acre-feet [House 1993]. The best natural flow sequence will have the highest flow of the record at All of the local polynomial based models meet that requirement better than the GLM based models. Based on the AIC scores, the local polynomial models using PCs bet out the local polynomials based on the headwater data along. For the PC based local polynomial models, the normal and gamma link functions have identical AIC scores and very high adjusted R 2 scores. To pick between the two, let us examine the full model diagnostics. Table 3: Comparing the two best performing models. Model Degree Alpha GCV score Local Polynomial, normal, PCA Local Polynomial, gamma, PCA

8 Figure 6: Examining the Model Diagnostics of the PC based Local Polynomial with normal link function

9 Figure 7: Examining the Model Diagnostics of the PC based Local Polynomial with gamma link function. Based on the model diagnostics and the GCV score, the best model is the PC based local polynomial with a gamma link function. This is the natural flow record that will be used in the tree ring reconstruction of the natural Gila River flow. Due to the lack of significant autocorrelation, all of the tree ring reconstructions for the natural Gila River flow will be done with residual tree ring index, not the standard index. Results Gila River: Natural Flow

10 In order to build a tree ring reconstruction of the natural flow of the Gila River, three different statistical models were tested: GLM with tree ring indexes as the potential regressors, GLM with principle components as the potential predictors, and Local Polynomial with principle components as the predictors. Both normal and gamma link functions were used for in all three types of models. For the normal link functions, a log transform of the data is used. Results are displayed in table 4. Figure 9 and figure 10 show the reconstructions. The GLM with tree ring indexes as the potential regressors and gamma link function did not perform well. All of the mid-range peaks were dramatically over estimated, causing the mean of the reconstruction to be well outside the range provided by the other models. It is not displayed. Table 4: Results of natural Gila River tree ring reconstruction models. Model AIC / BIC Selected Predictors Adjusted R 2 GLM, normal, log transform AWH,RPM,TSM GLM, gamma SPW,TSM,SMM GLM, normal, PCA / PC 1, 2, 5, 10, 11/ PC1,2,4, / GLM, gamma, PCA Fails to converge Local Poly, normal, PCA, log trans PC 1, 2, 4, Local Polynomial, gamma, PCA PC 1, 2, 3, Figure 8: Eigen spectrum from the PCA of the 28 selected tree ring chronologies. The first 11 PCs explain over 90% of the variance.

11 Figure 9: Comparing the four reconstruction models and the calibration natural flow record during the overlap period ( ). Figure 10: Comparing the four natural Gila River reconstructions for the entire period ( )

12 From the objective criteria, it appears that the PCA regression with local polynomial using a normal link function on the log transform of the data does the best job of explaining the data (red line). The PCA regression with the local polynomial using a gamma link function appears to do a better job of capturing some of the high flows in the overlap period. However, it overestimates a number of the low flows. This is a significant problem for the reconstructions because tree ring chronologies in general are very good at reproducing low flows. During low flow years, the trees experience stress due to water scarcity and have a very high correlation with the flow in the river. During high flow years, the trees have less correlation with streamflow because once the soil becomes saturated, any additional water that is delivered to the basin is not seen by trees. It is interesting to note that for all of the reconstruction methods, there is general agreement in the trend of the time series. This attests to the robustness of the hydroclimatic signal that is present in the tree ring data and the strength of the relationship between this signal and the annual flow in the Gila River. Gila River: Flows under Current Management The Gila River water year flows under current management are very sporadic, as shown in Figure XXX. Therefore, a linear model is not appropriate. Instead, four logistical models will be constructed. The first two will use existing reconstructions of the headwater gages on the Salt, Verde, Tonto, and Gila rivers, one using GLM with a logit link function and one logistic local polynomial model. The second two will use tree ring principle components, one using GLM with a logit link function and one logistic local polynomial model. The threshold for the observed flows is set at 0.2 MAF/water year. This level results in 12 years with flow from our 79 year record ( ). Table 5: Results of the logistical models for the Gila River under current management Model AIC Predictors Low Flow hits/misses High Flow hits/misses GLM, logit, headwaters Gila, Verde 66 / 1 7 / 5 Local Poly, logit, Salt+Tonto 67 / 0 12 / 0 headwaters GLM, logit, PCA PC 1,3,5,6 65 / 2 1 / 11 Local Poly, logit, PCA PC 6 67 / 0 5 / 7

13 Figure 11: Plots showing the model and observed output from the four logistic models considered for modeling the Gila River under current management. The four models all have different strengths and weaknesses. The headwater local polynomial has the best record of high flow hits and misses. It has the second highest AIC score, but it is not much larger than the headwater GLM model. Based on the hit and miss record, the headwater local polynomial is selected as the best model. In order to simulate magnitude of the flows that are above the threshold, a Generalized Pareto Distribution (GPD) is fit the observed data. Figure 12a shows the threshold selection analysis. A threshold of 0.2 MAF, which was selected initially due to management, also proves to be a good selection based on the GPD selection criteria. Several different models were considered and the best model is a GPD with the Salt + Tonto record serving as a covariate for generalized linear modeling of the scale parameter. Model diagnostics are show in Figure 12b.

14 Figure 12a: Diagnostic plots to select the best threshold value for the GPD model. Figure 12b: Model diagnostic plots for the selected GPD model using the Salt+Tonto reconstruction as the covariate for the scale parameter. Next, in order to reconstruct the threshold exceedance going back in time, a two part model was constructed. First, the scale parameter was calculated based on the GLM regression that was used in the non-stationary GPD. It is a simple linear model relating the Salt + Tonto flow to the value of the scale parameter. Second, the logistic model is applied to the entire record. Since the preferred logistic model depends only on the headwater reconstruction of the Salt + Tonto, the length of record is from 1361 to

15 2005. Then, 100 values were simulated from the non-stationary GPD for the years that the probability of exceedance was greater than 0.5. The resulting time series is shown in figure 13, with the simulation average shown in the solid line and the maximum value simulated shown in the dashed line. There are only 81 years in the entire period from 1361 to 2005 that exceeded the threshold. Figure 13: Results from the GPD simulation. The solid line is the simulation average. The dashed line is the simulation maximum. Lower Colorado River Intervening Flows Using the sum of the five nodes that are correlated with rainfall as the calibration flow, three types of regression models were fit, both with the normal and the gamma link function. All of the models used the same 28 tree ring chronologies as the natural Gila River flows reconstruction. A comparison of the results is shown in Table 6. The time series of reconstructed flow are shown in Figure 14 and figure 15. The PCA based GLM with gamma link function and the PCA based Local Polynomial with gamma link function produced flow values that are well outside the expected bounds and are not displayed. Table 6: Comparing the results of the Lower Colorado River Intervening Flows Reconstruction Model AIC Selected Predictors Adjusted R 2 GLM, normal TCM,AWH,TSM,WPU 0.38 GLM, gamma AWH,TSM 0.00 GLM, normal, PCA, log transform PC 1, 2, 3, 5, 6, 7, GLM, gamma, PCA PC 1, 2, 3, 5, 6, Local Poly, normal, PCA, log PC 1, 2, 5, trans Local Polynomial, gamma, PCA PC 1, 2, 3, 4,

16 Figure 14: Comparing the model outputs and the observed flow for the Lower Colorado River intervening flows for the overlap period Figure 15: Comparing the reconstructed flow for the Lower Colorado River intervening flows from 1612 to Both of the GLM models based on the tree ring data directly have years with negative flow. Their peaks for the high flow years also are not a large as either the observed data or the models based on the principle components. Between the GLM and Local Polynomial, the flow sequence is very similar. The Local Polynomial model is selected based on the minimum GCV score, which is a more robust measure

17 than AIC. Taking this into account leads to the selection of the Local Polynomial model as the best regression model to reconstruct the Lower Colorado River intervening flows. The Lower Colorado River intervening flows models overall do not perform as well as the models for the natural flow in the Gila River. This could be due to a slight disconnect between the tree ring information and the hydraulic response of the Lower Colorado River. This issue was identified in the data selection process and manifests itself in poor model performance. The models depend on a strong relationship between the hydroclimatic signal expressed in the tree rings and the stream flow. In order to improve the models, a better sequence of natural flow data is needed. Conclusion Regressions that relate tree ring chronologies to the water year natural flow for rivers can be a powerful tool to extend the period of record. This can provide insights into the natural variability of a river system. For the Lower Colorado River Basin, three separate reconstructions were performed: natural flow for the Gila River, flow under current management for the Gila River, and natural intervening flow for the Lower Colorado River. A variety of statistical techniques were used to find the best regression model. For both the natural flow in the Gila River and the natural intervening flow for the Lower Colorado River, a PCA based local polynomial with a normal link function performed on the log transform of flow data performed the best. For the Gila River under current management, a threshold of 0.2 MAF/water year was selected. A local polynomial logistic model performed on headwater reconstructions performed the best. When coupled with a GPD distribution, ensembles of streamflow can be generated. All three reconstruction models can now be used in a system risk model. This will help water managers better understand the amount of flexibility in the entire Colorado River system. References Griffin et al Latewood chronology development for summer-moisture reconstruction in the US Southwest. Tree-Ring Research 67, House, Kyle The Arizona Floods of January and February Arizona Geology, Volume 23, No 2. Arizona Geological Survey. TreeFlow. Upper Colorado Basin. Retrieved January 5, 2011, from TreeFlow: stream flow reconstructions from tree rings: Woodhouse, C.A., S.T. Gray, and D.M. Meko Updated streamflow reconstructions for the Upper Colorado River basin. Water Resources Research 42(5): W05415

18