Statistics in Risk Assessment

Size: px
Start display at page:

Download "Statistics in Risk Assessment"

Transcription

1 What s about Truth? 100% 80% Effect 60% 40% 20% 0% Log (Dose/concentration) Hans Toni Ratte Institute of Environmental Research (Biology V) Chair of Ecology, Ecotoxicology, Ecochemistry Workgroup of Aquatic Ecology and Ecotoxicology RWTH Aachen University

2 Content Introduction Statistical toxicity parameters Small Statistical Crash Course ECx and LOEC/NOEC concept Minimal detectable difference β-error and statistical power Test Results from Examples NOEC versus ECx Lessons learned? Conclusions 2

3 Background Introduction Conduct of Biotesting Prospective assessment of chemicals prior to marketing Retrospective effects assessment of environmental samples (field monitoring) Legal requirements National acts Germany: pesticides act, chemicals act, waste water act EU member states Council Directive 91/414/EEC REACh (new in 2007(?)) Competent authorities Responsible for execution of these laws Decide on authorization of substances based on biotest results (determination the PNEC) Hence biotest results must even endure before court 3

4 Biotesting Tiered Approach Introduction Basic Level Acute toxicity in Daphnia magna (24 48 h) Acute toxicity in fish (48 96 h) Tier I Growth inhibition test with green algae (72 h) Chronic toxicity in Daphnia magna (21 d) Chronic toxicity in fish (Danio rerio) (14-21 d) Terrestrial Plants, Growth test and Vegetative Vigour test (14 d) Earthworm toxicity test (Eisenia fetida): lethal effects, 14 d Tier II Higher-Tier 4

5 Requirements of Regulatory Authorities Introduction Test conduct OECD guidelines or ISO standards Good-laboratory practice (GLP) Statistics? Some recommendations within guidelines ISO/TS 20281:2004 Water quality Guidance on statistical interpretation of ecotoxicity data (also corresponding OECD text) However: Recommendations often weak and recommended methods not obligatory 5

6 Aim of Presentation Introduction Explaining you the concepts of hypothesis testing (NOEC determination) and concentration/response modeling (ECx estimation from curve fitting); the concept of the minimal detectable difference (MDD) between two samples as a simple indicator of test power; the weakness of the NOEC concept (too much freedom for manipulation). Making you aware of the final end of intelligent testing ; i.e. the consequences of weak recommendations and their consequences Motivating you to advocate for science-based regulatory actions 6

7 Example from Guideline OECD 202:2004 Daphnia sp., Acute Immobilization Test Introduction The percentages immobilized at 24 hours and 48 hours are plotted against test concentrations. Data are analysed by appropriate statistical methods (e.g. probit analysis, etc.) to calculate the slopes of the curves and the EC50 with 95% confidence limits (p = 0.95) This description is adequate and the mentioned probit analysis is frequently performed with this test (sometimes also replaced by logit or Weibull analysis) 7

8 Determination of the EC 50 Statistical toxicity parameters Immobility in an acute Daphnia test, OECD 202 Concentration/ response curve/function obtained by fitting Function used to compute EC 50 and 95%-confidence Method: Probit analysis (=regression using the linearized normal sigmoid function) % Mortality Data Function 95%-CL EC 50 : 2.0 mg/l 95%-confidence limits: mg/l 0 1 Concentration [mg/l] 8

9 Example from Guideline OECD 211:1998 Statistical toxicity parameters Daphnia magna Reproduction Test the number of deaths among the parent animals and the day on which they occurred (see ); the Lowest Observed Effect Concentration (LOEC) for reproduction, including a description of the statistical procedures used and an indication of what size of effect could be detected and the No Observed Effect Concentration (NOEC) for reproduction; where appropriate, the LOEC/NOEC for mortality of the parent animals should also be reported; where appropriate, the ECx for reproduction and confidence intervals and a graph of the fitted model used for its calculation, the slope of the dose-response curve and its standard error; 9

10 NOEC, LOEC and ECx 100% 80% 60% 40% 20% 0% EC 50 EC 20 NOEC LOEC 10 Statistical toxicity parameters Effect Log (Dose/concentration)

11 LOEC and NOEC (from OECD 211) Statistical toxicity parameters Lowest Observed Effect Concentration (LOEC) is the lowest tested concentration at which the substance is observed to have a statistically significant effect on reproduction and parent mortality (at p < 0.05) when compared with the control, within a stated exposure period. No Observed Effect Concentration (NOEC) is the test concentration immediately below the LOEC, which when compared with the control, has no statistically significant effect (p < 0.05), within a stated exposure period. 11

12 Statistical Procedures (OECD 211) Statistical toxicity parameters The mean for each concentration must then be compared with the control mean using an appropriate multiple comparison method. Dunnett s or Williams tests may be useful ( ). It is necessary to check whether the ANOVA assumption of homogeneity of variance holds. Relatively weak conditions - Selection of tests? - Statistical test direction? - ECx: value of x? What are the consequences? 12

13 Toxicity Parameters and Data Scale Small Statistical Crash Course Quantal/qualitative Responses Biological variable with nominal scale Example: Mortality (a number of dead animal out of a number of introduced ones after a certain interval Point-estimate from response curve: LC 50 or EC 50 Immobilization Metric/quantitative Responses Biological variable with metric scale Example: Biomass yield, growth rate, offspring Point-estimate from response curve: EC x ; where x: 10, 20, 50%, (no fixing of x) Toxic threshold: LOEC/NOEC (nearly always required) Statistical test methods and curve-fitting procedures are different in these two scales! 13

14 LOEC/NOEC Concept Determined by hypothesis testing (statistical test) Small Statistical Crash Course The difference between a treatment and the control that a statistical test is able to see, can be smaller or greater depending on the variable s variance and the replication of test units What s about the minimum difference that can be detected by a statistical test? 14

15 Minimal Detectable Difference, MDD Small Statistical Crash Course Example: t-test Starting point: t-formula: t is the standardized difference between control (c ) and treatment (t) with the tabulated t* being the critical margin (e.g., at α = 0.05) and inserted into the formula, the MDD is easily obtained after rearranging: ( x c x )* = t t = t* = x s² n c c c c MDD = t x * t s² + n ( xc xt )* s² c s² t + n n t t t s² n c c + s² n t t and expressed as relative difference to the control: % MDD = MDD x c *100 15

16 Influence of Variance and Replication on %MDD Number of Replicates Small Statistical Crash Course %Coefficient of Variation n. d

17 Influence of Test Direction on the MDD Small Statistical Crash Course Statements on test direction are very rare %Coefficient of Variation Test direction n = 5 One-sided Two-sided n. d n. d. 17

18 CV and MDD in Selected Biotests Biotest Variable CV% n %MDD NOEC EC10 EC20 EC50 Algae Growth Inhibition, OECD 201 Growth rate Terrestrial Plant, OECD 208 Shoot Dry Weight Emergence Rate Daphnia reproduction, OECD 211 Offspring Fish, Juvenile Growth, OECD 215 Weight Chironomid, OECD 218 Emergence Rate Lemna Growth Inhibition,OECD 221 Yield Earthworm Reprod., OECD 222 Offspring < Conclusion: Laboratory biotests show CVs and MDDs between 5 and 40% TheNOEC canbesmallerthantheec 10 or as high as the EC 50 18

19 High MDDs are Dangerous Small Statistical Crash Course The MDD grows with increasing variance and decreasing replication Differences smaller than the MDD favor the Null-hypothesis: H 0 (µ control = µ treatment ) But high risk of type-ii error (β-error) Wrong H 0 accepted Not favorable for the environment and not in line with the precautionary principle 19

20 Reality and Theory - Statistical Errors Theory Reality Small Statistical Crash Course Ho Ho (Statistical Decision) rejected! accepted! Ho true Error!! Type-I Error α-error Correct Decision Ho wrong Correct Decision Error!! Type-II Error β-error Test power is judged on the basis of the β-error 20

21 H 0 : µ 0 = µ 1 true? β = 0.07 µ o = µ 1 = H 0 : µ 0 < µ 1 true? α = Small Statistical Crash Course MDD

22 Power function (Power = 1 - β) Small Statistical Crash Course ß-error µ µ1 [mm] Power To guarantee a power of 80% the difference between µ 0 and µ 1 must be at least 0.8 mm in the current example 22

23 Which Test Gives us Power? LOEC/NOECs determined using multiple tests Test Results from Examples Multiple tests ensure that the experiment-wise error probability (type-i error) is equal or lower the selected significance level α (e.g. 0.05) Current guidelines offer a selection of multiple tests: Dunnett s test (multiple t test; most widely recommended) Williams test (multiple sequential t test) Pair-wise Mann-Whitney U test with Bonferroni adjustment of the significance level 23

24 Example Data Set and Dunnett s test Test Results from Examples Dunnett`s Multiple t-test Procedure Tab. 4: Comparison of treatments with "Control" by the t test procedure after Dunnett. Significance was Alpha = 0.05, onesided smaller (multiple level); Mean: arithmetic mean; n: sample size; s: standard deviation; %MDD: minimum detectable difference to Control (in percent of Control); t: sample t; t*: critical t for Ho: µ1 = µ2 =... = µk; the differences are significant in case t > t* (The residual variance of an ANOVA was applied; df = N - k; N: sum of treatment replicates n(i); k: number of treatments). Treatm. [µg/l] Mean s df %MDD t t* Sign. Control : significant; -: non-significant The NOEC appears to be higher than µg/l. 24

25 Where is the NOEC? Test Results from Examples Concentration [µg/l] Statist. Test LOEC NOEC Dunnett; one-sided >19.2?19.2 Dunnett; two-sided >19.2?19.2 Williams; one-sided Williams; two-sided Bonferroni-U-test; one-sided >19.2?19.2 Bonferroni-U-test; two-sided >19.2?19.2 Conclusion: Williams test most powerful (lower NOECs) Bonferroni-U test least powerful (NOEC higher, but not possible to determine here) Dunnett s test leads to ambiguous results (not able to determine unequivocal NOECs here) Two-sided testing results sometimes in higher NOECs 25

26 New Findings? NOEC versus ECx No OECD (1998) - Report on the OECD workshop on statistical analysis of aquatic toxicity data. It was concluded that the NOEC, as the main summary parameter of aquatic ecotoxicity tests, is inappropriate for a number of reasons ( ) and should therefore be phased out. It was recommended that the OECD should move towards a regression-based estimation procedure. A steering group should be set up to direct the mathematical, statistical and biological work required to take the workshop recommendations forward. This group should include representatives from the appropriate scientific and regulatory communities. 26

27 OECD (1998) Against NOEC NOEC versus ECx The NOEC must be one of the test concentrations. No precision statements are possible for the NOEC. NOECs may correspond to large effects on test organisms. The NOEC will not be obtainable in all cases. The above points indicate that the NOEC is far from ideal as a summary measure of toxic effect. It is too heavily dependent on the experimental design and the variability in the data. Consequently the NOEC may correspond to large effects, possibly of biological significance. Its value in hazard assessment is questionable. Pro NOEC: Simple calculation and use 27

28 OECD (1998) pro ECx NOEC versus ECx The ECx is not restricted to be one of the test concentrations. The precision of the ECx can be quantified. ECx values are comparable. The whole of the toxic response of the organism may be characterized. Regression modeling is flexible. Replication is not a crucial issue. A greater concentration range can be studied. 28

29 OECD (1998) on ECx Problems NOEC versus ECx The difficulty in choosing a model. For extreme percentiles confidence intervals may be very wide. ECx estimation is generally computationally more difficult than NOEC estimation. ECx estimates may be difficult to obtain in some cases. E.g. when low concentrations give 0% response and high concentrations give 100% response with no intermediate responses at any concentration. Using ECxs in place of NOECs requires the value of x to be specified. Use and understanding of precision and confidence intervals must be increased. 29

30 New Update OECD 201:2006 (Algae) Lessons learned? For estimation of the LOEC and hence the NOEC, it is necessary to compare treatment means using analysis of variance (ANOVA) techniques. The mean for each concentration must then be compared with the control mean using an appropriate multiple comparison or trend test method. Dunnett s or Williams test may be useful ( ). It is necessary to assess whether the ANOVA assumption of homogeneity of variance holds. Recent scientific developments have led to a recommendation of abandoning the concept of NOEC and replacing it with regression based point estimates ECx. An appropriate value for x has not been established for this algal test. A range of 10 to 20 % appears to be appropriate (depending on the response variable chosen), and preferably both the EC10 and EC20 should be reported. Competent authorities don t like both the 30

31 Conclusions Conclusions Clear insights that the NOEC concept is problematic Discrepancy between scientific insights and regulatory practices Regulatory needs ask for simple solutions in spite of their shortcomings and risks This appears in contradiction to the precautionary principle There is need that science and the precautionary principle rather than convenience governs the regulatory practice 31