Demanding Baselines: Analysis of Alternative Load Estimation Methods for Two Large C&I Demand Response Programs

Size: px
Start display at page:

Download "Demanding Baselines: Analysis of Alternative Load Estimation Methods for Two Large C&I Demand Response Programs"

Transcription

1 Demanding Baselines: Analysis of Alternative Load Estimation Methods for Two Large C&I Demand Response Programs Assessing Baseline Load Methods for California s Critical Peak Pricing & Demand Bidding Programs Amy Buege, Quantum Consulting, Inc. Michael Rufo, Quantum Consulting, Inc Michael Ozog, Summit Blue Consulting, Inc. Dan Violette, Summit Blue Consulting, Inc. Abstract This paper presents selected findings from a comprehensive evaluation of two demand response (DR) programs implemented in 2004 by Pacific Gas and Electric Company (PG&E), Southern California Edison Company (SCE), and San Diego Gas and Electric Company (SDG&E): the Critical Peak Pricing (CPP) tariff and the Demand Bidding Program (DBP). The evaluation was conducted under the guidance of the three investor-owned utilities, the California Energy Commission (CEC) and the California Public Utilities Commission (CPUC). The focus of this paper is on the assessment of alternative baseline load estimation methods and the resulting impacts stemming from their application to actual 2004 DR events (note that a companion paper presents findings from the overall evaluation 1 ). Several of the baseline methods were found to perform quite well on average for the entire sample. However, significant differences were found among the baseline methods with respect to the degree of error associated with each, their relative accuracy in predicting loads on average, and whether they tended to over or under-predict loads. Regardless of the baseline calculation method selected, significant uncertainty in estimated program-wide impacts can occur when: average impacts are relatively small relative to baseline loads; program populations are small and dominated by a few very large customers with highly variable loads; and program events are few and follow unexpected patterns (e.g., events called on sequential days with decreasing temperatures). For individual customers, baselines that can accurately estimate the achieved load reduction for a given curtailment period is even more problematic. This can impact the equity of the program by introducing errors in pay-forperformance calculations. Introduction In 2002, the California Energy Commission adopted R , its Order Instituting Rulemaking on policies and practices for advanced metering, demand response, and dynamic pricing. Following this ruling, in Decision , the Commission authorized Pacific Gas and Electric Company (PG&E), Southern California Edison Company (SCE), and San Diego Gas and Electric Company (SDG&E): to implement a voluntary Critical Peak Pricing (CPP) tariff and Demand Bidding Program (DBP). The goal underlying these DR programs was to provide California with greater flexibility in responding to periods of high peak electricity demand. CPP is a rate that includes increased prices during 6 or 7 hours (Noon to 6pm for PG&E and SCE, 11am to 6pm for SDG&E) for up to 12 Critical Peak Pricing days each year and reduced prices 1 Quantum Consulting Inc. & Summit Blue Consulting, LLC 2004

2 during non-critical-peak periods. Specific prices in the tariff are applied based on participating customers Otherwise Applicable Tariff (OAT). Peak prices vary from 5 to 10 times OAT depending on the utility. DBP is a program that provides opportunities for customers to promise load shifting during critical periods for a bid incentive. Bidding is an offer to curtail usage by 100 kw or more for two or more hours during program events and receive payment equal to the amount of the estimated reduction times the predetermined DBP price incentive. DBP price incentives range from $0.15 to $0.50 per kwh reduced depending on market prices and whether the event is a day-of or day-ahead event (see Quantum Consulting 2004, for full program details). This paper presents selected findings from a comprehensive evaluation of the CPP and DBP programs. The evaluation was conducted under the guidance of the three investor-owned utilities, the California Energy Commission (CEC) and the California Public Utilities Commission (CPUC). The overall evaluation consisted of four main components; a process evaluation focused on assessing the programs procedures and processes, as well as participants activity levels and satisfaction with the program experience; a market assessment which included a large quantitative survey focused on estimating DR potential, barriers and opportunities; a load baseline analysis, which systematically assessed the performance of different representative-day methods (see below for definition); and an impact evaluation, which estimated impacts for the 2004 DR. The focus of this paper is on the assessment of alternative baseline load estimation methods and the resulting impacts stemming from their application to actual 2004 DR events. Results from the overall evaluation are summarized in a companion paper (Buege, et al., 2005) and the final study report (Quantum 2005). Electric utilities have experimented with different programs to motivate customers to reduce their peak demand under critical and restricted conditions by paying them for the amount of load they reduce. A critical aspect of these types of voluntary demand response programs is determining the actual amount of load reduction that is achieved. In order to compute the load reduction, an estimate must be made of how much load the customer would have had without curtailment (i.e., the baseline). Accurate estimation of baselines for DR programs is important for both settlement (determining how much individual customers are paid, in the case of the bidding programs) and evaluation (estimating program-wide impacts). There are, however, many different ways to estimate customer-specific baselines. Understanding how well alternative baseline specifications forecast actual customer load was one of the key questions in our evaluation of the CPP and DBP programs. Methodology The analysis of customer baselines began by identifying and selecting a set of baseline methodologies that included the methods used for settlement in the 2004 CPP and DBP programs, as well as several distinct alternatives. The methods analyzed are referred to as representative-day approaches. In these approaches representative-days are used to construct a typical day using load data from the days preceding the event day. The alternative baselines were selected based on a literature review of work that had previously been conducted examining alternative baseline methodologies (CEC 2003), recommendations from WG2 committee members, and a review of baselines that are currently employed for other large customer programs at one or more of the California utilities. The following three types of baseline methodologies were evaluated: 3-Day Baseline. This baseline was calculated by selecting a series of days that represent the most recent 10 similar days that occurred prior to the event day. From this series of 10 similar days, the three days with the highest overall load during the curtailment hours were selected and

3 the load for each hour of these three days was averaged (by hour) to calculate an hourly 3-Day baseline estimate. The DBP program uses this methodology for settlement. 10-Day Baseline. Like the 3-Day baseline, this alternative baseline methodology also selects from the last 10 similar days. However, this approach calculates the baseline for each hour by averaging the hourly load over all of the last 10 similar days (instead of selecting the three highest days). An adjusted 10-Day baseline was also calculated by applying a scalar adjustment to the 10-Day baseline based on a series of calibration hours. The scalar adjustment factor was calculated by computing the ratio of the average load over three calibration hours to the average load for the same hours from the last 10 similar days. Prior Day Baseline. This baseline uses the most recent similar day as a proxy for the subsequent day s baseline. Analysis Days Selected. Each baseline methodology was evaluated over a series of days between July 1, 2003 and August 31, The days selected varied among the utilities and were selected based on each utility s system load data during this period. The days selected for analysis fell into one or more of the following day type classifications: High load days the high load days represented the most likely potential event day (days with high system load and/or days falling at the end of a heat storm). 3 Low load days the low load days capture days that might represent a potential test or distribution system emergency event day. Consecutive high load days the consecutive load days were selected from a series of high load days that fell back to back. These days were selected to represent events associated with heat storms in which events may be called consecutively. Selecting baseline analysis days from each of the different day type classifications allows for comparison of the baseline methodologies under different weather and load pattern scenarios. Analysis Hours. The hours for which each of the CPP and DBP baselines were evaluated were dependant upon the range of curtailment hours for the DR program. Table 1 provides the analysis hours used for the CPP and DBP programs by utility. Table 1. Analysis Hours for the CPP and DBP Programs by Utility Program Evaluation Hours PG&E SCE SDG&E CPP 12 pm - 6 pm 12 pm - 6 pm 11 am - 6pm DBP 12 pm - 8 pm 12 pm - 8 pm 12 pm - 8 pm Because the DBP program can be called on a prior-day or same-day basis, two 10-Day adjusted baselines were calculated for DBP using distinct sets of calibration hours. The CPP program can only be called on a prior-day basis and thus only one adjusted baseline was calculated for CPP (since the 2 Results may differ if weather conditions were substantially different that those in the study period. 3 Although these were the highest load days for 2003, note that, overall, summer 2003 was relatively mild.

4 same-day load may be affected by intentional actions such as pre-cooling). Table 2 shows the hours used for the two calibration adjustments for the CPP and DBP baselines for each of the utilities. Table 2. Hours Used for the Calibration Adjustment for the CPP and DBP 10-Day Adjusted Baselines Program Baseline Adjustment Calibration Type Type Hours CPP 10-Day Prior-Day 12pm - 3pm DBP 10-Day Prior-Day 12pm - 3pm Same-Day 9am - 12pm Analysis Sample. The dataset used for the baseline analysis consisted of interval load data for a representative sample of 500 customers selected from the population of program-eligible accounts. This population was randomly selected and stratified by customer size and business type. Energy weights for the sample were created to ensure the results were representative of the entire DR eligible population. Performance Metrics. Three metrics were used to measure the performance of the alternative baseline methodologies. The first two were taken from the previously cited report commissioned by the California Energy Commission to develop DR baseline estimation protocols (CEC 2003). The CEC report recommends measurement protocols to help calculate demand reductions for DR program participants and includes statistical metrics associated with measuring the bias, variability and overall error magnitude in the baselines. The first two metrics examine the forecast error. They essentially treat the baseline calculations as forecasts of the actual load for that hour. Assessing the viability of the baseline thus became an issue of determining the forecast error based on the root mean squared error (i.e., the square root of the difference between the baseline and the actual load, squared). The third metric focuses on how useful the baseline estimates are as predictors of energy use. This involves comparing the explanatory power of the alternative baselines within a regression model for each hour s load. A benefit of this third approach is that it can be used to determine if a baseline methodology consistently over or under-estimates demand, as well as allowing for hypothesis testing on the relationship between the baseline and the actual load. These metrics are summarized below Baseline Bias (Metric 1). This metric gauges whether or not a baseline has a systematic tendency to over- or under-state the estimated load and corresponding demand reduction. Median relative hourly error for a baseline tells us whether that baseline tends to be biased. Variability and Overall Error Magnitude (Metric 2). This metric gauges how wide the swings are around the typical value. Theil s U statistic, the account s relative root-mean-square hourly error, was used to measure the variability. Theil s U indicates the typical relative error magnitude for a typical account. Predictive Power of Baselines (Metric 3). This method focuses on how useful the baseline estimates are as predictors of energy use, whereas the first two metrics examine the forecast error. This metric used a regression equation to predict actual hourly loads use as a function of the estimated hourly loads for each baseline method.

5 Baseline Assessment Results Significant differences were found in the baseline methods with respect to the degree of error associated with each, their relative accuracy in predicting loads on average, and whether they over or under-predicted loads. The 10-day method was found to be superior to the 3-day method (which is used in DBP for settlement). The same-day adjustment was found to provide a further improvement in the 10-day method and was selected as the method to use for the calculation of impacts in the evaluation. Although the Prior Day method was found to have relatively low bias, the error magnitude for the extreme cases were very high with this baseline and thus it was not recommended for use in settlement or impact evaluation. Although the 3-day method used for settlement under performed as compared to the other methods, its overall performance was still reasonably good on average. This section presents results for: An assessment of the three baseline types with and without adjustments for all selected analysis days, and An assessment of the three baseline types with and without adjustments for High Load Days, Low Load Days, and Consecutive High Demand days. Note that baseline performance by other segments, including customer size and business type, was also analyzed in the full study (Quantum 2004) but are not presented in this paper. All Baseline Analysis Days: 3-Day, 10-Day, Prior Day Baselines As noted in the Methodology section, the assessment focused on six analysis days (three high demand days, 1 low demand day and 3 consecutively high days) and the three main baseline method types (3-Day, 10-Day [adjusted and unadjusted] and Prior Day) for the non-participant population. Figure 1 displays the average load shape for each of the distinct baselines, as well as the actual load, across the six analysis days. Figure 1 shows, as one might expect, that the 3-Day baseline tends to over-state the actual load for a given day since it is comprised of an average of the three days with the highest load during the event hours from a series of recent similar days. A rationale behind the 3-Day baseline is that events are likely to be called on the hottest days of the summer. The degree to which the 3-Day baseline overstates the load is lessened when looking solely at high demand days, however, on average an overestimation still occurs. This tendency to over-state the load is also seen in the results of the bias diagnostics (the RHE or Relative Hourly Error) and coefficient of the regression modeling presented below. Figure 1 also shows that the 10-Day baselines tend to under-state the actual load. This too may be expected since the analysis days were selected to be similar to hypothetical event days, and thus tend to have higher loads than average days. Due to the scale and hours presented in this figure it is difficult to ascertain the precise impact of the adjustments on the 10-Day baseline, as a result, these are presented in greater detail in Figure 2 and the discussion follows. Finally, while the Prior Day baseline tends to be very close to the actual load on average, the diagnostics presented later in this chapter show it has more variability, which will be evident by looking at the t-values associated with the regression coefficients. This greater variability occurs because the Prior Day method is not an average over a series of days and thus changes in an organization s operations for one day have a much larger impact. As a result, the Prior Day baseline is a less reliable baseline on the whole.

6 kw Hour 3-Day Baseline 10-Day Unadjusted 10-Day Adjusted (Prior-Day) 10-Day Adjusted (Same-Day) Prior-Day Baseline Actual Load Figure 1. 3-Day, 10-Day (w/ and w/o Adjustments) and Prior Day Baselines versus Actual Load for All Analysis Days Averaged Over All Utilities Throughout this section, the results of the various baseline diagnostic metrics will be provided in detail to further support what is shown in the graphical averages presented in Figure 1. Figure 2 provides a magnified look at the 10-Day baselines for the DBP event hours, making it easier to see the effect of the prior-day and same-day adjustments on the 10-Day baseline. Both of these adjustments shift the 10-Day baseline up so that it becomes very close to the actual load. On average, the same-day adjustment has a slightly larger shift towards the actual load and is extremely accurate on average. Bias (Metric 1). A key measure of bias in this evaluation is the median relative hourly error. Table 3 (on the following page) shows the median relative hourly error for the CPP and DBP baselines analyzed for this evaluation, which gives an indication of the bias associated with each specific baseline. This table illustrates that for both the CPP and DBP programs the 3-Day baseline consistently overstates the actual load. The average degree (based on the median) to which the 3-Day baselines are overstated is 2 percent averaged across the entire sample for both the CPP and DBP programs. Relative Error Magnitude (Metric 2). Theil s U statistic calculated for a given account indicates the typical relative error magnitude for that account. The distribution of this statistic across accounts indicates the range of performance. This distribution is looked at both in terms of the median and an extreme, the 95th percentile. The median Theil s U indicates the typical relative error magnitude for an average account. The 95 th percentile indicates performance in the worse cases. Table 4 provides the median and 95 th percentile Theil s U statistic for each of the distinct baseline evaluated. These statistics provide an indication of the overall error magnitude associated with a specific baseline by measuring the size of the variability around the expected value, or in this case the actual load. The closer the statistic is to zero the smaller the relative error magnitude. As illustrated in Table 4, for CPP, the typical error magnitude is minimized with the 10-Day adjusted (prior-day) baseline (8.5 percent); however, the error magnitude for extreme accounts is minimized with the 10-Day unadjusted baseline (although they are very similar). Similarly for DBP, the 10-Day same-day adjusted baseline has an error magnitude that is almost 30 percent lower than that associated with the 3-Day baseline.

7 kw Hour 10-Day Adjusted (Prior-Day) 10-Day Unadjusted 10-Day Adjusted (Same-Day) Actual Load Figure 2. Close Up of 10-Day Baselines and Actual Load During DBP Event Hours Table 3. Bias Calculations for the CPP and DBP 3-Day, 10-Day and Prior Day Baselines with and without Adjustments for All Analysis Days across All 3 Utilities Baseline Details Bias Utility Baseline Adjustment Median RHE CPP 3-Day None Day None Prior-Day Prior Day None DBP 3-Day None Day None Prior-Day Same-Day Prior Day None Table 4. Error Magnitude Calculations for the CPP and DBP 3-Day, 10-Day and Prior Day Baselines with and without Adjustments for All Analysis Days across All 3 Utilities Baseline Details Error Magnitude Utility Baseline Adjustment Median Thiel's U 95% Thiel's U CPP 3-Day None Day None Prior-Day Prior Day None DBP 3-Day None Day None Prior-Day Same-Day Prior Day None

8 Predictive Power (Metric 3). Table 5 shows the results of the regression equation model used to predict energy use as a function of the baseline. The model coefficients in combination with the associated t-values 4 and R-Square values give an indication of how useful the baseline estimates are as predictors of energy use. Table 5. Regression Coefficient with Associated t-value and R-Square for the CPP and DBP 3-Day, 10-Day and Prior Day Baselines with and without Adjustments for All Analysis Days across All Utilities Day Type Program Baseline Adjustment Coef. t-value R-Square Overall CPP 3-Day None , Day None , Prior-Day , Prior Day None , DBP 3-Day None , Day None , Prior-Day , Same-Day , Prior Day None Several important conclusions can be derived from the results presented in Table 5. First, the 3- Day baseline significantly over-estimates the impacts relative to most of the other choices. Second, the 10-Day baseline without any adjustment produces the best results for CPP, while the 10-Day same-day adjustment is the best predictor for DBP. 5 These results are consistent across utilities and day types. Third, the Prior day baseline is a very poor predictor relative to the other options, and it greatly overestimates the actual load. Finally, except for a few cases, the baselines over predict the actual load, as nearly all the coefficients are less than one. Impact of Day Type (High Demand, Low Demand, and Consecutive Days) on 3-Day, 10-Day, Prior Day Baselines Because actual DR events are more likely to be called on high demand days, a second assessment was performed on the same selected analysis days broken down by day type. This assessment helps to determine if the performance of the baselines varies based on whether an event day is a high demand day, a low demand day or a day that falls within a set of consecutively high demand days. The baseline diagnostics that resulted from this assessment were very similar for all day types indicating there was not a drastic change in the baseline performance as a function of the type of event day, given summer 2003 weather conditions. Table 6 shows the average load shape for each of the baseline methods calculated, as well as the actual load for the high demand days. 4 Note that the large sample size from the hourly observations used in the regression model causes the magnitude of the resulting t-values to be somewhat less meaningful (since the observations are not all strictly independent), however, in relation to one another the t-values support the findings of the other baseline statistics. 5 It is believed that a same-day adjustment for the CPP period would also have outperformed the 10-Day unadjusted method for CPP had it been included.

9 As expected the difference between the 3-Day baseline and the actual load is smaller on high demand days and the amount by which the 10-Day baseline under-predicts the actual load is larger. Table 6. 3-Day, 10-Day (w/ and w/o Adjustment) and Prior Day Baselines versus Actual Load for High Demand Days Averaged Over All 3 Utilities kw Hour 3-Day Baseline 10-Day Unadjusted 10-Day Adjusted (Prior-Day) 10-Day Adjusted (Same-Day) Prior-Day Baseline Actual Load Table 7 provides a magnified look at all of the 10-Day baselines (adjusted and unadjusted) for the DBP event hours for the High Demand days. Table 7. Close Up of 10-Day Baselines and Actual Load During DBP Event Hours for High Demand Analysis Days Averaged Over All Utilities 520 kw Hour 10-Day Adjusted (Prior-Day) 10-Day Unadjusted 10-Day Adjusted (Same-Day) Actual Load Bias (Performance Metric 1). The bias calculations by day type in Table 8 show that the 3-Day baseline tends to over-state the actual load to a higher degree on Low Demand days and Consecutive

10 days and to a lesser degree on High Demand days and the 10-Day baseline does the exact opposite. However, under all types of event days (High, Low and Consecutive) the 10-Day adjusted baselines continue to have the lowest amount of overall bias. Table 8. Bias Calculations for the 3-Day, 10-Day (w/ and w/o Adjustment) and Prior Day Baselines for High Demand Versus Low Demand Versus Consecutive Days Averaged Over All Utilities Baseline Details Bias - Median RHE Utility Baseline Adjustment High Dmd Low Dmd Consec CPP 3-Day None Day None Prior-Day Prior Day None DBP 3-Day None Day None Prior-Day Same-Day Prior Day None Relative Error Magnitude (Performance Metric 2). The comparison of relative error magnitude for High Demand, Low Demand and Consecutive High days, shown in Table 9, indicates the magnitude of the error is lower for the 10-Day baselines on Low Demand days; however, the error at the extreme values tends to be higher on these days, which is to be expected. Table 9. Error Magnitude for the 3-Day, 10-Day (w/ and w/o Adjustment) and Prior Day Baselines for High Demand Versus Low Demand Days Averaged Over All 3 Utilities Error Magnitude Baseline Details Median Thiel's U 95% Thiel's U Utility Baseline Adjustment High Dmd Low Dmd Consec High Dmd Low Dmd Consec CPP 3-Day None Day None Prior-Day Prior Day None DBP 3-Day None Day None Prior-Day Same-Day Prior Day None Predictive Power (Performance Metric 3). As shown in Table 10, for all demand day types (High, Low and Consecutive) the 3-Day baseline has a regression model coefficient β value that is less than one. This indicates that the 3-Day baseline is over-stated for both High and Low Demand days. The degree to which it is overstated can be calculated as 1 - β percent. For High Demand days the 3- Day baseline for both CPP and DBP was overstated by 5 percent and for Low demand days it was overstated by 10 percent for DBP and 18 percent for CPP. The 10-Day baseline with no adjustments predicted extremely well for DBP, on average, on High demand days that is evident by coefficient estimate of 1.0 (while the CPP version shows a slight 2 percent under-statement). Similar to the 3-Day,

11 the 10-Day baseline continues to over-state on Low Demand Days. The prior-day adjustment for CPP and the same-day adjustment for DBP improve upon both of the predictions for the Low demand days, and shifting the curves up slightly so that both of the baselines now over-predict the actual load by 1 percent on High Demand days. Table 10. Regression Coefficients with Associated T-Values and R-Squares for the 3-Day, 10-Day (w/ and w/o Adjustment) and Prior Day Baselines for High Demand Versus Low Demand Days Averaged Over All 3 Utilities Day Type Program Baseline Adjustment Coef. t-value R-Square High Demand CPP 3-Day None , Day None , Prior-Day , Prior Day None , DBP 3-Day None , Day None , Prior-Day , Same-Day , Prior Day None Low Demand CPP 3-Day None Day None , Prior-Day , Prior Day None DBP 3-Day None Day None , Prior-Day , Same-Day , Prior Day None Consecutive CPP 3-Day None , Day None , Prior-Day , Prior Day None , DBP 3-Day None , Day None , Prior-Day , Same-Day , Prior Day None Other Segments Analyzed In addition to analyzing the various baselines by event Day Type, the baselines were also analyzed by customer business type and size. Although space constraints prohibit inclusion of the baseline performance metrics for these segments in this paper, one key finding resulting from this additional analysis was the effect of customer size on the overall error in the program impact estimates, which is discussed in the following section. Table 11 below shows that for the DBP 3-Day baseline, although the error distribution is similar across size categories at the 50 and 75 th percentiles, the percent error at the 90 th percentile is quite a bit larger for the Large and Extra Large customers (50 percent). As it turns out, this level of error for even a small group of the largest customers can result in very large differences in the absolute magnitude of estimated program savings that result from application of

12 different baseline methods. As discussed in the next section, visual inspection of results is important to assess the accuracy of estimated impacts for the very largest customers. Table 11. Percentile Distribution of Errors by Customer Size for DBP 3-Day Baseline Method Customer Size Group Error (% ) at Selected Percentiles of Sample 50% 75% 90% Very Small (100 to 200 kw) 4% 13% 31% Small (200 to 500 kw) 4% 10% 25% Medium (500 to 1,000 kw) 6% 15% 39% Large (1,000 to 2,000 kw) 6% 15% 50% Extra Large (2,000+ kw) 5% 16% 50% Application of Baselines to Impact Evaluation This baseline assessment was used in the 2004 DR impact evaluation. The detailed results from the impact evaluation are outside the scope of this paper, however the top-level findings are germane to the conclusions in the following section. Readers are encouraged to review the full impact results in the evaluation report (Quantum 2004). The objective of the impact evaluation was to determine the first-year program demand impacts. This impact evaluation approach involved computing an hourly baseline for all program participants for each of the event days and then calculating the difference between the baseline and the actual load for the event day. The overall participant difference (or delta) for a given event was then simply the sum of the differences across the program participants: ( kwˆ kw ) Difference t = n n, t n, t where, Difference t = Difference between the estimated baseline load and the actual load at time t, ˆ = Estimated baseline load of customer n for event t, and k W n, t kw n, t = Actual load of customer n for event t. Based on the assessment of the various customer baseline methodologies, two sets of baselines were selected for use in calculating the summer 2004 program impacts. The load reductions were estimated using two of the baseline methods discussed above the 3-Day Baseline method used for DBP program settlement and the 10-Day Adjusted Baseline, which had the best overall performance. Load impacts from the DR events were estimated for the CPP and DBP programs for each utility and the summer 2004 events. The impact evaluation confirmed that there were significant observable peak load reductions for active participants, however savings ranged widely from five percent up to 17 percent depending on utility, event and program 6. Note, however, that the narrow range of 2004 program events and, in some cases, small potentially unrepresentative mix of participant types, limits the extent to which summer 2004 experiences can be projected for 2005 and beyond. Furthermore, it 6 For the DBP program savings were significantly higher (up to 28 percent) when estimated for only those customers who placed bids.

13 was found that small numbers of large customers with highly variable loads caused very large differences in the absolute magnitude of estimated program savings across baseline methods. For example, analysis of one of the DBP events indicated that the estimated impacts from five customers contributed between 30 and 80 percent of total program impact depending on what baseline method was used. It was found upon visual inspection that neither of the baseline methods used in the impact evaluation could be relied upon to produce unbiased results for several of the very largest customers. As a result, individual visual inspections of the interval data were made for the two weeks leading up to and including the event day for the largest customers. These inspections were critical to the evaluation team s process of developing its final estimates of program impacts. Baseline Assessment Conclusions This section presents the conclusions regarding the performance of alternative baseline methods for estimating demand response impacts. Table 12 displays a qualitative summary of the baseline performance metrics results over all analysis days (High, Low and Consecutive High demand days). Table 12. Summary of Baseline Performance Metrics for All Analysis Days Baseline Details Program Baseline Adjustment Bias Performance Metric Error Magnitude Typical Extreme Predictive Power CPP 3-Day None Performance 10-Day None Key Prior-Day Best 4 Prior Day None Mid 2 DBP 3-Day None Worst 1 10-Day None Prior-Day Same-D ay Prior Day None The results of the analyses of alternative hourly load baselines lead to the following conclusions: The 10-Day Baseline with Same Day Adjustment was the most accurate of the methods evaluated and was recommended as the basis for estimation of overall program impacts for Day- Of DBP program events. For previous day programs and events, the 10-Day unadjusted and 10-Day with prior-day adjustment were relatively similar in performance and both were superior to their 3-Day counterparts. The baseline recommended for calculating overall program impacts for day-ahead DBP and CPP was the 10-Day baseline with prior-day adjustment. The 3-Day Baseline with no adjustment, which was used for settlement in the 2004 DBP program, performed less well than 10-Day methods and appeared to produce a consistently large over-estimate of the baseline. This means that, on average, the program will tend to overpay somewhat for actual reductions. No recommendation was made about whether the DBP 3-Day Baseline should continue to be used for DBP settlement purposes. That decision was left to the DR program planners who are able to factor in other issues, such as assessing the costs involved in changing their existing settlement systems and potential effects on customer behavior.

14 The Prior Day Baseline was the poorest performing baseline in terms of variability and predictive accuracy, but has low bias. This method was not recommend for use in settlement or evaluation. As discussed in the impact section above, some individual customer loads will invariably be misestimated using these approaches, but many errors will cancel when averaged across all the customers in a program. However, a few large customers with large errors can bias results when active program populations are small, as they were for the California IOU s price-responsive DR programs in the summer of Although visual inspections can be useful in ex-post evaluations to correct for these biases, such adjustments are not generally viable for program settlement, which requires easy-tounderstand, pre-determined, transparent methods. Programs may need to consider customer specific baselines or sub-metering for very large customers with loads that are difficult to predict using representative-day methods. References CEC (2003). Protocol Development for Demand Response Calculation Findings and Recommendations. Prepared by KEMA-Xenergy Inc. for the California Energy Commission. Hungerford, David, M. Rufo, E. Lovelace, D. Violette, A. Buege, and P. Willems, (2005). Results from a Real-Time, Statewide Evaluation of Large C&I Demand Response Programs in California, proceedings of the 2005 International Energy Program Evaluation Conference, Brooklyn, New York. Quantum Consulting, Inc. (2004). Working Group 2 Demand Response Program Evaluation Program Year 2004, Final Report. Prepared by Quantum Consulting Inc. and Summit Blue Consulting, LLC. December.