DEVELOPING A METHOD DETECTION LEVEL (MDL) FOR WHOLE EFFLUENT TOXICITY (WET) TESTING

DEVELOPING A METHOD DETECTION LEVEL (MDL) FOR WHOLE EFFLUENT TOXICITY (WET) TESTING Background EPA has long maintained that the analytical variability associated with whole effluent toxicity test methods is well within the range of precision routinely observed for chemical test procedures used to implement the NPDES program 1. The analogy to chemical testing is appropriate because EPA told Congress that: The generation of scientifically accurate and valid biological measurements for environmental pollutants requires approximately the same criteria for assessing the adequacy of a method as previously described for chemical analyses. The same performance characteristics and development states of the method must be known in order to make an assessment of adequacy. Availability, Adequacy, and Comparability of Testing Procedures for the Analysis of Pollutants Established Under Section 304(h) of the Federal Water Pollution Control Act - Report to Congress; EPA/600/9-87/030; September, 1988; pg. 3-11 Given the similar levels of precision observed for WET testing and chemical testing, EPA endeavors to treat the variability similarly. EPA manages the regulation of WET in the same way it manages the regulation of chemical-specific pollutants in order to determine reasonable potential (RP), derive permit limits, determine data quality control, and evaluate self-monitoring data. Many similarities between chemical-specific toxicant and WET controls can be found in the Technical Support Document for Water Quality-based Toxics Control (1991). Determining RP in both cases uses many of the same strategies. Permit limit derivation makes similar exposure assumptions and relies on nearly identical toxicological data bases. Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program; EPA-833-R-00-003; June, 2000; p. 4-1 1 See 60 Federal Register 199 @ 53535 (Oct. 16, 1995); see also Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program; EPA-833-R-00-003; June, 2000; p. 4-4 and 7-1. 2002, Risk Sciences REVISED DRAFT Page 1 of 10

Although the relative test precision may be similar, WET test methods differ significantly from chemical test methods in several important ways. First, whole effluent toxicity is a method-defined parameter. Unlike chemical analyses, there is no way to independently corroborate the toxicity estimates reported by a WET test. Consequently, the accuracy of toxicity tests cannot be ascertained, only the precision of toxicity can be estimated..." 2 Second, whole effluent toxicity is estimated using living organisms rather than inanimate machines. This introduces many new and potentially confounding variables into the analysis. The health of test organisms an biological systems cannot be calibrated before the experiment in the same way as analytical instrumentation. A single living organism is far more complex than the most sophisticated analytical instrumentation ever conceived There are no knobs to turn to adjust for these factors to achieve consistent performance during a test method. Availability, Adequacy, and Comparability of Testing Procedures for the Analysis of Pollutants Established Under Section 304(h) of the Federal Water Pollution Control Act - Report to Congress; EPA/600/9-87/030; September, 1988; pg. 3-11 Third, EPA has never developed or applied Method Detection Levels to whole effluent toxicity test data. A Method Detection Level is the threshold at which the standard test method can accurately and reliably distinguish whether a specific chemical is present or absent from a given sample. 3 The MDL is not the level at which the amount of chemical can be reliably quantified; that threshold is more frequently called the Practical Quantitation Level or PQL. In NPDES permitting, values observed below the MDL are usually described as Not Detected (or ND or < mdl ) because such values cannot be distinguished from zero. The lack of scientific confidence about the true sample quality limits the regulatory utility of all ND values. In WET tests, where no MDL is applied, all results are presumed to be equally accurate and reliable. Such an assumption is inconsistent with EPA s written guidance. While chemical precision is often determined well above analytical detection, WET precision is often based on the minimum detection level. Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program; EPA-833-R-00-003; June, 2000; Pg. 4-2 (citing Ausley, 1996) 2 60 Federal Register 199; Oct. 16, 1995; p. 53535 3 40 C.F.R. 136.2 (f) 2002, Risk Sciences REVISED DRAFT Page 2 of 10

The precision of toxicity test measurements is similar to that of finely tuned instruments operating at detection limits. Availability, Adequacy, and Comparability of Testing Procedures for the Analysis of Pollutants Established Under Section 304(h) of the Federal Water Pollution Control Act - Report to Congress; EPA/600/9-87/030; September, 1988; pg. 3-11 In general, methods can only be used to measure analytes over a specified concentration range, defined as the linear dynamic range. The linear dynamic range is limited at the lower level by the detection limit and above the detection limit by the linearity of the measurement process Availability, Adequacy, and Comparability of Testing Procedures for the Analysis of Pollutants Established Under Section 304(h) of the Federal Water Pollution Control Act - Report to Congress; EPA/600/9-87/030; September, 1988; pg. 3-4 Quantifying WET Test Precision Using Coefficients-of-Variation The primary reason that MDLs have not been established for WET test methods is that EPA believes it is not possible to know the true biological response for any given exposure to one or more toxic chemicals. That is an accurate statement when referring to the range of expected results that may occur during reference toxicant tests. It may not be possible to define a full calibration curve for WET testing. However, it may be possible to identify curves that deviate from shapes observed across different laboratories. While it may be impossible to know how much adverse impact to expect from a given exposure to a toxic substance, it is possible to know what biological response must be expected from exposure to non-toxic conditions: there should be no difference in survival, reproduction, growth, etc. Data from several EPA-sponsored studies clearly indicates that biological organisms will routinely respond differently to identical exposure conditions. 4,5,6 This is true regardless of whether the test samples were initially formulated to be toxic or non-toxic. 4 See Preliminary Report: Interlaboratory Variability Study of EPA Short-term Chronic and Acute Whole Effluent Toxicity Test Methods; EPA-821-R-00-028A; October, 2000 (co-sponsored by AMSA, EEI, & Westcas) 5 See Precision of the EPA Seven-Day Ceriodaphnia dubia Survival and Reproduction Test: Intra- and Interlaboratory Study; Electric Power Research Institute; EPRI-EN-6469; November, 1989. (co-sponsored by EPA) 6 See Whole Effluent Toxicity Testing Methods: Accounting for Variance; Water Environment Research Foundation; Report #D93002; 1999. (co-sponsored by EPA) 2002, Risk Sciences REVISED DRAFT Page 3 of 10

EPA has quantified the level of inter- and intra-lab test precision and published the results in new guidance documents (seen Appendix A & B of this white paper). Using EPA s precision estimates, it is possible to calculate an MDL for each toxicity test method. Calculating an MDL for WET Test Methods Using Coefficient-of Variation A coefficient-of-variation (CV) is calculated by dividing the standard deviation by the mean of the same data set: CV = Std. Dev. Mean If one knows the mean and the coefficient-of-variation, one can calculate the standard deviation by rearranging the equation: CV x Mean = Std. Dev. And, if one knows the mean and the standard deviation, one can calculate the upper 99 th percentile of expected values. For a large data set with a normal distribution, the upper 99 th percentile can be derived using the following formula: 7 Mean + (2.33 x Std. Dev.) For non-toxic samples, the LC-50 and EC-25 should be >100% and the NOEC should be equal to 100. That is, there is no observed effect in undiluted effluent (100% concentration). If expressed at toxicity units (TU), the LC-50, the NOEC and the EC-25 should be <1.0 TU for non-toxic samples. Toxicity units are calculated by dividing 100 by the LC-50 or NOEC or EC- 25 (hence, 100 /100 = 1.0). Assuming a sample is non-toxic, and using the coefficients-of-variation published by EPA, 8 we can calculate the upper 99% confidence interval. For example, the median CV for the EC-25 endpoint in the Ceriodaphnia reproduction test is 0.27. Therefore, the standard deviation of test results reported for identical non-toxic samples must also be 0.27 (Std. Dev. = CV / EC-25 = 0.27 / 1). Thus, the 99 th percentile upper confidence level must be 1.63 toxicity units (1 + [2.33 * 0.27]). 7 Naiman, A., Rosenfeld, R. & Zirkel, G. Understanding Statistics 2 nd Ed.; 1977 pg. 77-83 8 See Appendix B to this white paper/ 2002, Risk Sciences REVISED DRAFT Page 4 of 10

The 99 th percentile upper confidence level of a WET test is functionally equivalent to an MDL in a chemical test. And, the value may be used to express the same concept. We cannot be confident that toxicity exists in a sample until we observe a test result greater than 1.63 TU. Values smaller than 1.63 TU cannot reliably be distinguished from 1.0 TU (zero toxicity) and should be reported as not-detected or ND or <1.63 TU just as chemical test results are recorded. The following table shows the estimated MDL for the most common chronic sublethal endpoints using the method described above and the median CV s published by EPA 9. Estimated Method Detection Level for Non-Toxic Samples 10 (expressed as chronic toxicity units TUc) Test MDL for MDL for Test Species Endpoint NOEC EC-25 Fathead minnow Growth 1.7 TUc 1.6 TUc Ceriodaphnia dubia Reproduction 1.8 TUc 1.6 TUc Selenastrum (algae) Cell Count 2.1 TUc 1.6 TUc Sheepshead minnow Growth 1.9 TUc 1.3 TUc Inland Silverside Growth 2.1 TUc 1.6 TUc Mysid Shrimp Growth 1.9 TUc 1.7 TUc If EPA s high estimate of the CV (0.45) is used instead of the median estimate of CV, then the MDL for chronic Ceriodaphnia dubia reproduction would be 2.04 toxicity units 11. By using the higher estimate we are saying that 75% of the time the intralaboratory coefficient-of-variation is less than the value recorded in Tables 3-2, 3-3 and 3-4 of EPA s new guidance. By using the median estimate, we are stating that the intra-laboratory CV is less than the table value half the time and greater than the table value half the time. All of the above calculations and estimates are made based on the assumption that the WET test data is normally-distributed. Non-toxic samples will conform to a normal distribution. However, reference toxicant samples are more likely to resemble a log-normal distribution just as most chemical analyses do. In either case, the data can be transformed as necessary to meet the distributional and statistical assumptions. 9 Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program; EPA- 833-R-00-003; June, 2000, Appendix pg. A-5 10 Calculations based on assumption that underlying population data is normally-distributed. 11 Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program; EPA- 833-R-00-003; June, 2000; Pg. 3-4. 2002, Risk Sciences REVISED DRAFT Page 5 of 10

Calculating an MDL Using EPA s Reasonable Potential Procedure Another way to calculate the MDL would be to apply the same procedure EPA recommends for estimating whether an effluent has reasonable potential to exceed a water quality criterion. The TSD procedure 12 assumes that the pollutant parameter is log-normally distributed and does not assume that there is a statistically-large data set available. The TSD procedure multiplies the highest measured water quality value by a specific factor to derive the 99 th percentile expected value. The specific factor varies based on the number of samples available and the coefficient-of-variation for all measured values. EPA s table of specific values is reproduced in Appendix C of this white paper. Assuming that the highest measured value for non-toxic samples should be 1 TU, then the 99 th percentile confidence level can be calculated by multiplying that value by the factor appropriate for each test endpoint. To be conservative, all calculations should be performed using the table value derived when there are 20 samples available. So, for example, for Ceriodaphnia dubia reproduction the EC-25 has an average CV equal to 0.27. The appropriate factor derived by interpolation from the table above is approximately 1.5. Multiplying 1.5 times 1TUc produces an MDL of 1.5 TUc. This answer is very close to the one calculated based on normally-distributed data (1.63 TUc). Therefore, using the TSD procedure, we can conclude that there is a "reasonable potential" for a non-toxic water sample to exhibit a value up to 1.5 TUc (based on the EC-25 endpoint). Conversely, there is no reasonable potential to record values greater than 1.5 TUc for non-toxic samples. In effect, the 1.5 TUc threshold serves the same purpose as the MDL does in traditional chemical analysis. Values less than 1.5 TUc carry to much uncertainty and should be reported as Not Detected, or ND or <1.5 TUc. As before, if EPA s higher estimate of the CV is employed, the calculated MDL will also increase. In the Ceriodaphnia dubia example, if the CV = 0.45, then the specific factor is 1.9 and the MDL is 1.9 TUc (1.0 TUc * 1.9). This value is very close to the MDL estimate derived from using the assumption that the underlying data is normally-distributed (e.g. 2.04 TUc). It is important to note that EPA s highest estimate of the CV is recorded at the 75% confidence level. Normally, the TSD procedure is applied using the 99 th percentile confidence level and the 99 th percentile probability basis. Therefore, even when using the high CV estimate, the calculated MDL is not as rigorously derived as normally occurs when analyzing variability in effluent quality. The actual MDL for WET would be much higher if the effluent characterization procedure was used. 12 See Technical Support Document for Water Quality-based Toxics Control; EPA-505/2-90-001; 1991; Chapter 3. 2002, Risk Sciences REVISED DRAFT Page 6 of 10

All of the MDLs shown above were calculated based on toxicity units (TU). And, TU is calculated based on sample concentration (NOEC or EC-25 or LC-50). The results are consistent with EPA s previous published guidance on the width of the expected error band for WET methods. "It should be noted here that the dilution factor selected for a test determines the width of the No-Observed-Effect-Concentration and the Lowest Observed Effect Concentration Interval and the inherent maximum precision of the test...with a dilution factor of 0.5, the NOEC could be considered to have a relative variability of plus or minus 100%." Short-Term Methods for Estimating Chronic Toxicity of Effluents and Receiving Water to Freshwater Organisms (EPA-600-4-91-002); July, 1994; Section 4.14.6; p. 16 The plus or minus 100% error band is also consistent with the acceptance range used by EPA to judge laboratory performance on WET testing in the DMR-QA program. When the expected value is 4 TUc, a laboratory s performance is deemed acceptable if they report a value between 8 TUc and 2 TUc (NOEC = 25% with a range between 12.5% to 50%). If laboratories are allowed to vary within that range, then it is appropriate for dischargers to take account of that variability prior to certifying a specific result on a DMR. However, the choice of dilution intervals can have a significant impact on the estimates of NOEC, IC-25 or LC-50. A tighter dilution series might reduce the coefficient-of-variation for any given method. 13 Calculating and MDL Using Percent Effect Using data from EPA s recent interlaboratory variability study, it is possible to estimate the level of percent effect that is likely to occur by chance (see Figure 1 below). The chart shown in figure 7 was generated using interlaboratory reproduction data for Ceriodaphnia that were only exposed to laboratory control water. By repeatedly sampling two random groups of ten organisms and comparing the mean reproduction for those groups, we can calculate the frequency with which apparent adverse impacts occur for reasons unrelated to effluent toxicity. For example, Figure 1 shows that we can be 90% certain that the effect is real when the observed inhibition is greater than 25%. 13 Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program; EPA- 833-R-00-003; June, 2000; Appendix D, Pg. 4. 2002, Risk Sciences REVISED DRAFT Page 7 of 10

Figure1: Federal regulations define the MDL as the level at which we can be 99% certain that the test correctly distinguishes between the presence and absence of any given pollutant parameter (see 40 CFR 136.2). Figure 1 shows that an inhibition greater than 40% is unlikely to occur by chance more than 1% of the time (e.g. 99% confidence). Therefore, if the MDL for toxicity were calculated in the same manner as it is for chemical tests, inhibitions less than 40% would be reported as not detected or ND. Similar graphs can be prepared for any combination of species, method, and biological endpoint (survival, growth, reproduction) provided there is a large body of data describing organisms performance in non-toxic control water. 2002, Risk Sciences REVISED DRAFT Page 8 of 10

Special Implementation Considerations Dischargers are obligated to report all toxicity test data just as they received it from the laboratory. However, permittees are also required to certify those results based on a system designed to establish the validity of the data. 14 As noted earlier, EPA believes that the analytical variability of WET test methods is substantially similar to that seen for chemical methods. And, EPA recommends that WET testing be implemented in the same way that chemical-specific pollutants are regulated. Therefore, it is appropriate to calculate and apply an MDL for toxicity tests just as is done for chemical analyses. The safest way to implement the MDL for WET tests is to report the NOEC or EC-25 as it was calculated by the laboratory, but certify compliance based on the MDL value. This approach will avoid any claim by the permitting authority that the discharger failed to report all test results as required. The permitting authority has the right to make an independent determination as to whether a permit exceedence has occurred without using an MDL. It is strongly recommended that all permittees seek advance regulatory approval for calculating and using a WET-MDL rather than planning to defend the approach in court. Finally, it is also important to note that all of the above estimates are for an MDL used to evaluate a single WET test only. The level of confidence increases as the number of tests showing similar results increases. Likewise, the MDL will decrease as the number of organisms (replicates) used in each WET test increases. 14 40 CFR 122.22(d) 2002, Risk Sciences REVISED DRAFT Page 9 of 10

Appendix A: Interim Coefficients-of-Variation for Acute WET Methods 15 : 15 Reproduced from Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program; EPA-833-R-00-003; June, 2000; Appendix pg. A-4. 2002, Risk Sciences REVISED DRAFT Page 10 of 10

Appendix B: Interim Coefficients-of-Variation for Chronic WET Methods 16 : 16 Reproduced from Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program; EPA-833-R-00-003; June, 2000; Appendix pg. A-5. 2002, Risk Sciences REVISED DRAFT Page 11 of 10

Appendix C: Reasonable Potential Factors in TSD 17 17 Reproduced from the Technical Support Document for Water Quality-based Toxics Control; EPA-505/2-90- 001; 1991; Chapter 3. 2002, Risk Sciences REVISED DRAFT Page 12 of 10