Annual Review APPLICATIONS AND STATISTICAL PROPERTIES OF MINIMUM SIGNIFICANT DIFFERENCE-BASED CRITERION TESTING IN A TOXICITY TESTING PROGRAM

Size: px
Start display at page:

Download "Annual Review APPLICATIONS AND STATISTICAL PROPERTIES OF MINIMUM SIGNIFICANT DIFFERENCE-BASED CRITERION TESTING IN A TOXICITY TESTING PROGRAM"

Transcription

1 Envionmental Toxicology and Chemisty, Vol. 19, No. 1, pp , 000 Pinted in the USA 07-78/00 $ Annual Review APPLICATIONS AND STATISTICAL PROPERTIES OF MINIMUM SIGNIFICANT DIFFERENCE-BASED CRITERION TESTING IN A TOXICITY TESTING PROGRAM QIN WANG, DEBRA L. DENTON, and RAKESH SHUKLA* A Biostatistics and Epidemiology, Abt Associates Clinical Tials, 55 Wheele Steet, Cambidge, Massachusetts 0138, USA U.S. Envionmental Potection Agency, Region IX, 75 Hawthone Steet WTR-5, San Fancisco, Califonia Division of Epidemiology and Biostatistics, Depatment of Envionmental Health, Univesity of Cincinnati Medical Cente, Cincinnati, Ohio , USA. (Received 4 Januay 1999; Accepted 17 July 1999) Abstact As a follow up to the ecommendations of the Septembe 1995 SETAC Pellston Wokshop on Whole Toxicity (WET) on test methods and appopiate endpoints, this pape will discuss the applications and statistical popeties of using a statistical citeion of minimum significant diffeence (MSD). We examined the uppe limits of acceptable MSDs as acceptance citeion in the case of nomally distibuted data. The implications of this appoach ae examined in tems of false negative ate as well as false positive ate. Results indicated that the poposed appoach has easonable statistical popeties. Repoductive data fom shot-tem chonic WET test with Ceiodaphnia dubia tests wee used to demonstate the applications of the poposed appoach. The data wee collected by the Noth Caolina Depatment of Envionment, Health, and Natual Resouces (Raleigh, NC, USA) as pat of thei National Pollutant Dischage Elimination System pogam. Keywods Aquatic toxicity testing Minimum significant diffeence Statistical powe INTRODUCTION The U.S. Envionmental Potection Agency (U.S. EPA) and individual states in the United States ae in the pocess of implementing Clean Wate Act Section 101(a)(3), which pohibits the dischage of toxic pollutants in toxic amounts by equiing dischages to conduct acute and chonic toxicity testing with effluents and eceiving wates. One ecommendation of a ecent Pellston confeence on whole effluent toxicity testing (WET) was to exploe how minimum significant diffeence (MSD) and tests of bioequivalence can be used to define statistical quality contol citeia [1]. To addess some concens in the application of hypothesis testing, the uppe MSD limits as an additional test acceptability citeion fo toxicity tests has been included in the chonic west coast maine methods manual []. Thusby et al. [3] have also suggested evision of the cuent acceptability equiements and ecommended some changes of acceptance citeia based, in pat, on an empiical database of MSDs. In this pape, we popose a statistical basis of selecting the limits of MSDs in a given WET test and then evaluate its statistical popeties. Cuent statistical appoach and test design The cuent statistical appoach in WET testing is the test of the null hypothesis H 0 : c t 0 vesus a toxic effect altenative, H 1 : c t 0, whee c and t ae the tue population means of esponse of the contol goup (0% effluent) and the effluent concentation goup, espectively. The esults fom the tests that fail to eject the H 0 often lead to the infeence that a suspected toxicant does not deleteiously affect test oganisms. A lack of sufficient statistical powe in a tox- * To whom coespondence may be addessed (akesh.shukla@uc.edu). This wok has not been subjected to U.S. Envionmental Potection Agency eview. Theefoe, it may not necessaily eflect the views of that agency. icity test implies that a statistically nonsignificant esult cannot be taken to imply the absence of a eal toxic effect. The difficulty is seen clealy when noting that, if a toxin is in fact pesent, the statistical esult that fails to find such a toxic effect is easie to obtain with a small study than with a lage one. Befoe dawing the conclusion of no significant toxic effect, it is vey impotant to conside the powe of the test. Since in pactice toxicity tests may often have low powe and it is not unusual to find tests that have % powe o less, cuent statistical tests may povide inadequate potection against dawing the wong conclusion when the concentation does have a toxic effect [4]. As noted by Hayes [4], powe of the toxicity tests can be impoved by inceasing sample size, inceasing the false positive ate (), deceasing the vaiability in esponse, o limiting the analysis to detection of lage diffeences between the teatment and the contol. Using limited sample size and fixed ( o ) level, toxicity tests often have poo powe to detect small, but biologically significant, effect. Denton and Nobeg-King [5] examined seveal U.S. EPA test methods using the cuent expeimental design and based the 75th pecentile of MSD values at a powe of 80% to detemine the effect size. They ecommended that powe analysis can povide useful infomation to egulatos upon which to establish both test sensitivity (decease the ate of false negatives) and minimize the issue of excessive powe (possible false positive ates). Minimum significant diffeence-based citeion testing As a means of inceasing the powe of the statistical tests, the test methods equie that toxicity tests meet some minimum standads in ode to be valid and acceptable. Fo example, suvival in the contol must be 80 o 90% fo some tests to be acceptable. The implications of defining test acceptability in tems of contol suvival ates on sample size equiements fo detecting a stated decement in suvival have been dis- 113

2 114 Envion. Toxicol. Chem. 19, 000 Q. Wang et al. cussed elsewhee []. Some test methods equie that the obseved MSD must be below a setting uppe limit to be acceptable [5]. The inclusion of MSD limits as acceptance citeia fo toxicity tests has ecently eceived attention in the liteatue [1 3,5]. METHODS The essential points fo the MSD-limits appoach poposed in liteatue can be summaized as follows. (1) Select an uppe limit of acceptable MSDs. () Compae the obseved MSD of a test to the uppe limit established in step 1. Only when the obseved MSD is less than the uppe limit is the test consideed acceptable. (3) If the test is acceptable accoding to step, then and only then pefom the fomal statistical testing at the pespecified level. If the test fails to achieve significance, conclude the effluent concentation has no toxic effect on test oganisms; othewise, declae the concentation is toxic to test oganisms. We call this thee-step pocess minimum significant diffeence-based citeion testing (MSDBCT). An impotant design issue fo implementing MSDBCT is the selection of the uppe limit in step 1. Some citeia fo selecting uppe limits based on the histoical databases of MSDs have been suggested in the liteatue [3,5]. RESULTS Since the pupose of MSDBCT is to ensue the powe of the toxicity tests, it is necessay to evaluate the powe of any poposed MSDBCT. In this aticle, we popose a way to establish the uppe limit of MSDBCT that takes the statistical powe diectly into consideation. We discuss the statistical ationale of the poposed MSDBCT and investigate the statistical popeties of the poposed MSDBCT accoding to the false negative ate and the false positive ate. We also illustate thee examples fom a WET testing database to show the applications of the poposed MSDBCT. In toxicity tests, one is inteested in testing the null hypothesis of no diffeence, H 0 : c t 0. A level test fo the above null hypothesis is usually conducted by the Student s t statistics. Assume fo simplicity that equal numbes of oganisms, say n oganisms, ae to be used in each goup. Also assume we wish to detect a biologically significant level of mean diffeence in esponse, (0), between two goups with a powe of at least 1. To ensue this powe equiement being satisfied, we popose an uppe limit (MSD ) of acceptable MSDs as t1, 1, MSD (1) whee t 1, ( 1,) denotes the 1 (1 ) pecentile of the cental t ( ) distibution with (n ) degees of feedom and,, is the noncentality paamete of the noncental t distibution associated with significance level, powe 1, and degees of feedom. Tables 1 and contain some t 1,, 1,, and,, fo selected,,, and. (Detailed vesions of these tables can be obtained fom many textbooks, such as [7].) Let x c and x t denote sample means of esponse fom the contol and the effluent goups, espectively. Then the MSDBCT we popose consists of the following steps: (1) Fo specified choices of,,,, and, find the constants t 1,, 1,, and,, fom Tables 1 and o elsewhee and then compute MSD as defined in Equation 1. () Calculate the,, Table 1. 1 (1 ) pecentile of the t ( ) distibution with degees of feedom t 1, , obseved MSD fom the test esult as t 1, s/n, whee s is the pooled within-test sample standad deviation. Compae the obseved MSD with MSD in step 1. If MSD MSD,go to step 3; othewise, conside the test as unacceptable and stop. (3) Conclude that thee exists no significant toxic effect if x c x t MSD; othewise, declae that the paticula effluent concentation has significant toxic effect on the test oganisms. In wods, the MSDBCT states that, if the obseved MSD of a test is less than o equal to an established uppe limit (MSD ) and the null hypothesis of no diffeence in esponse between the contol and the effluent concentation is not ejected by the usual Student s t test, then and only then conclude that thee is no toxic effect. To apply the above MSDBCT to toxicity tests, the,,, and must be chosen by the egulatoy agency. Fo the pupose of illustating the method on eal examples, let us suppose the egulatoy agency specified a pioi the 95/5 ule fo conducting the toxicity testing: If the mean esponse in effluent concentation goup is not statistically significantly diffeent fom the mean esponse in the contol goup at the level ( ) and if thee is at least 95% powe ( ) fo detecting a 5% diffeence fom the contol mean (0.5 c ), then a nontoxic esult is concluded. Examples Table 3 contains the epoductive data of thee selected toxicity tests fom shot-tem chonic WET test with Ceiodaphnia dubia tests. The data wee collected by the Noth Caolina Depatment of Envionment, Health, and Natual Resouces (Raleigh, NC, USA) as pat of thei National Pollutant Dischage Elimination System pogam. The pooled contol mean though all the tests of this database is 4.5 [8] so that we can easonably assume that the unknown population mean of the contol is c 4.5. The sample size n is 1 fo each of the thee tests in Table 3; theefoe, the degees of feedom fo each test. Accoding to the above 95/5 ule,,, and 0.5 c If we choose, then we obtain t 0.95, 1.7, 0.95,.9, and,, 3.40 fom Tables 1 and. Table. The noncentality paamete of the noncental t distibution,,,,,,

3 MSD-based whole effluent toxicity testing Envion. Toxicol. Chem. 19, Table 3. Repoduction data fom a whole effluent toxicity testing with Ceiodaphnia x s Test A Test B Test C Example 1: Apply the MSDBCT to test A in Table Compute the MSD accoding to Equation 1 as t0.95, 0.95, MSD ,. Calculate the obseved MSD as MSD t1,s 1.7 n 1 7. Since MSD MSD, we conclude this test is unacceptable and stop. Based on a failue to attain a significant esult (x c x t 7 MSD 7.) fom the Student s t test, the test A found no toxic effect. Since the estimated post hoc powe of detecting a mean diffeence of.13 fo test A (assuming that the obseved significant diffeence [S] can be taken to epesent the population significant diffeence []) is only 41%, it is doubtful that the nonsignificant esult fom the Student s t test implies a nontoxic effect. The poposed MSDBCT pevents the Student s t test with a low powe fom declaing a nontoxic effect in this example. Example : Apply the MSDBCT to test B in Table Compute the MSD as in Example 1: MSD Calculate the obseved MSD as MSD Since MSD MSD, this test is acceptable and we go to step x c x t.5, which is less than 3.71 (the obseved MSD). Declae a nontoxic effect on the test oganisms. Based on a failue to attain a significant esult (x c x t.5 MSD 3.71) fom the Student s t test, test B found no toxic effect. Since the estimated powe of detecting a mean diffeence of.13 fo test B is 87%, it is unlikely that the null hypothesis of no diffeence (a nontoxic effect) would have been wongly accepted by the Student s t test. The poposed MSDBCT leads to the same conclusion (a nontoxic effect) as the Student s t test with a high powe does in this example. Example 3: Apply the MSDBCT to test C in Table MSD Find MSD as MSD Since MSD MSD, this test is acceptable. We go to step x c x t.75, which is lage than.45 (MSD). Declae a toxic effect on test oganisms. Based on a significant esult (x c x t.75 MSD.45) fom the Student s t test, test C detected a toxic effect. Test C has a vey high estimated powe (99.5% fo detecting a mean diffeence of.13). As can be expected, the poposed MSDBCT leads to the same conclusion (a toxic effect) as the Student s t test with a high powe does in this example. Statistical ationale fo the MSDBCT appoach To test H 0 : c t 0 vesus H 1 : c t 0ata significance level and to ensue a powe 1 to detect a biologically significant diffeence c t between the contol and the effluent concentation, the standad powe appoach equies p{x c x t MSD c t } 1 () whee MSD t 1 s/n. This powe depends on unknown population vaiance, which is estimated by the sample vaiance s. It can be shown that the inequality in Equation is guaanteed if the tue population vaiance satisfies the inequality n (3) ( ),, Since we do not know the tue, we cannot compae it with. Schuimann [9] suggests using s as a easonable altenative to. Using this appoach, the false negative ate of the test (the pobability of wongly declaing no toxic effect when c t ) tuns out to be noticeably smalle than the specified nominal level if the degees of feedom is finite. This is undesiable since the false positive ate (the pobability of wongly declaing toxic effect when c t 0) will be inceased well above nominal level [9]. A bette appoach is to fame the question in the following hypothesis testing scenaio: H: H: 0 1 whee is a known constant as defined in Equation 3. With usual assumptions, s / has a chi-squaed distibution with degees of feedom. Theefoe, we eject H0 if and only if s 1, 1, o s else we etain H. This is a unifomly most poweful unbiased 0 test fo testing H at the significance level [7]. 0 Thus, an impoved vesion of ensuing the powe equiement in Equation consists of accepting the null hypothesis H 0, declaing no toxic effect, if 1. the H (the hypothesis of at least 1 powe to declae 0 a toxic effect if c t ) is not ejected, i.e., 1, s (4) Since MSD t 1, s /n and (n )/[(,, ) ], Equation 4 is equivalent to saying t1, 1, MSD (5),,

4 11 Envion. Toxicol. Chem. 19, 000 Q. Wang et al. Fig. 1. Pobability of declaing a nontoxic effect when c t fo the minimum significant diffeence-based citeion testing (MSDBCT) as a function of / with degees of feedom (. ), 50 ( ), and 100 ( ). Fig.. Pobability of declaing a nontoxic effect when c t 0 fo the minimum significant diffeence-based citeion testing (MSDBCT) as a function of / with degees of feedom (. ), 50 ( ), and 100 ( ).. The hypothesis H 0 (no diffeence in esponse between the contol and the effluent concentation) is also not ejected, i.e., x c x t MSD () then we conclude that thee exists no significant toxic effect. Combining Equations 5 and, we obtain the MSDBCT as suggested in the peceding sections. Statistical popeties of the MSDBCT Given, Figue 1 povides the pobabilities of declaing a nontoxic effect when c t (the false negative ates) at vaious degees of feedom fo the MSDBCT appoach. It is seen that the pobabilities depend on /. The pobabilities ise to the peaks of about, 1, and 0 at / 1.19, 0.85, and 0.1, coesponding to degees of feedom, 50, and 100, espectively. Unlike the Schuimann s [9] appoach (as given by s and x c x t MSD), the imum false negative ate of the MSDBCT appoach is much close to the nominal level of. This is tue fo each degee of feedom. Figue plots the pobabilities of declaing a nontoxic effect when c t 0(1false positive ates) at vaious degees of feedom fo the MSDBCT appoach. It is seen that the pobabilities ae an inceasing function of /. DISCUSSION In whole effluent toxicity testing, egulatos have seached fo ways to ensue that the powe of the statistical test is sufficient to detect toxicity. The technique employed to contol the powe is setting some test acceptability citeia. These citeia ae designed to povide adequate powe to detect a biologically significant diffeence fom the contol [1,,5]. Taditional acceptance citeia focus pimaily on esponse (suvival ate) in a contol. Baile and Ois [] have examined the implications of defining test acceptability in tems of contol suvival ates on sample size equiements fo detecting a stated decement in suvival. The inclusion of MSD limits as additional citeia fo toxicity tests has begun ecently [1 3,5]. This additional citeion is an impovement ove taditional citeia because it uses all the data (not just that of the contols) in making a final decision concening the acceptability of test data. Howeve, the effect of establishing an uppe limit of acceptable MSDs to the toxicity tests has not been caefully examined. To the best of ou knowledge, ou pape is the fist and only epot to evaluate the pefomance of the MSDBCT accoding to the false negative (and false positive) ates actually achieved by the MSDBCT. The MSDBCT poposed in this pape demonstates that the inclusion of MSD limits as acceptability citeia in whole effluent toxicity testing can be made scientifically and statistically sound povided one chooses the MSD limit (MSD ) accoding to Equation 1. The actual false negative ate of the poposed MSDBCT can be held vey close to the nominal level. The actual false positive ates can also be evaluated, as shown in Figue. The appoach of MSDBCT must pefom satisfactoily fo nomally distibuted endpoints. How this method pefoms when the assumptions ae gossly violated emains to be shown. It may, howeve, be safe to state that, fo mild to modeate deviations fom the stated assumptions, the poposed method should pefom easonably well. The poposed MSDBCT can pevent the Student s t test with a low powe fom declaing a toxic test as nontoxic. On the othe hand, the poposed MSDBCT will not change the esult (significant o nonsignificant) fom the Student s t test with a high powe. An impotant design question fo implementing the poposed MSDBCT is the choice of the value (a diffeence fom the contol desied to be detected as statistically significant) and the coesponding powe equiement. Thusby et al. [3] have suggested ways to select the value (called the theshold value in thei pape). In ou examples, we have chosen as 5% of the contol mean and equied 95% powe fo the test to detect this diffeence. If we had equied 80% powe to detect this diffeence, MSD computed accoding to Equation 1 fo the examples discussed in ou pape would have been 5.09 instead CONCLUSION The MSDBCT can be applied to multiple concentation situations, but it does not addess excessive statistical powe poblems fo an individual test. The bioequivalence testing appoach does addess both the statistical false positive and negative concens. We ecommend that additional toxicity test

5 MSD-based whole effluent toxicity testing Envion. Toxicol. Chem. 19, methods including both feshwate and maine test species be examined fo both the MSDBCT and bioequivalence testing appoach befoe consideation fo application in the WET testing pogam. Acknowledgement This wok was suppoted, in pat, by coopeative ageement EPA CR80480 between the Univesity of Cincinnati and the U.S. Envionmental Potection Agency. We expess ou sincee thanks to Lay Ausley fo poviding us with the data. REFERENCES 1. Chapman GA, et al Methods and appopiate endpoints. In Gothe DR, Dickson KL, Reed-Judkins DK, eds, Whole Toxicity Testing: An Evaluation of Methods and Pedictions of Receiving System Impacts. Society of Envionmental Toxicology And Chemisty, Pensacola, FL, USA, pp Chapman GA, Denton DL, Lazochak JM Shot-tem methods fo estimating the chonic toxicity of effluents and eceiving wates to West Coast maine and estuaine oganism. EPA 00/ 95/13. U.S. Envionmental Potection Agency, Cincinnati, OH. 3. Thusby GB, Heltshe J, Scott KJ Revised appoach to toxicity acceptability citeia using a statistical pefomance assessment. Envion Toxicol Chem 1: Hayes JP The positive appoach to negative esults in toxicology studies. Ecotoxicol Envion Saf 14: Denton DL, Nobeg-King TJ Whole effluent toxicity statistics: A egulatoy pespective. In Gothe DR, Dickson KL, Reed-Judkins DK, eds, Whole Toxicity Testing: An Evaluation of Methods and Pedictions of Receiving System Impacts. Society of Envionmental Toxicology and Chemisty, Pensacola, FL, USA, pp Baile AJ, Ois JT Implication of defining test acceptability in tems of contol-goup suvival in two-goup suvival studies. Envion Toxicol Chem 15: Lehmann EL Testing Statistical Hypothesis, nd ed. Wiley, New Yok, NY, USA. 8. Shukla R, Wang Q, Fulk F, Deng C, Denton D Bioequivalence appoach fo whole effluent toxicity testing. Envion Toxicol Chem 19: Schuimann DJ A compaison of the two one-sided tests pocedue and the powe appoach fo assessing the equivalence of aveage bioavailability. J Phamacokinet Biopham 15: APPENDIX The MSDBCT is equivalent to accepting H 0, thus declaing nontoxicity, if n 1, c t and z t1, (,,) /n whee (s )/ has a cental chi-squaed distibution with degees of feedom and (x c x t) (c t) z /n has the standad nomal distibution. Futhemoe, z and ae statistically independent. Fom these facts, simulations ae pefomed to obtain all the pobabilities plotted in all the figues fo the MSDBCT with,, and.