EC CIS Working Group Groundwater. Threshold Values. Initial analysis of 2015 Questionnaire Responses

Size: px
Start display at page:

Download "EC CIS Working Group Groundwater. Threshold Values. Initial analysis of 2015 Questionnaire Responses"

Transcription

1 EC CIS Working Group Groundwater Threshold Values Initial analysis of 2015 Questionnaire Responses Final December 2015 Technical Report Amec Foster Wheeler

2 Written by Tony Marsland and Susie Roy (Amec Foster Wheeler) Directorate-General for the Environment Working Group Groundwater

3 EUROPEAN COMMISSION Directorate-General for the Environment Unit C1 - Water Contact: Elisa Vargas-Amelin Elisa.Vargas-AMELIN@ec.europa.eu European Commission B-1049 Brussels

4 Threshold Values - Initial Analysis of 2015 Questionnaire Responses EC CIS Working Group Groundwater Threshold Values Initial analysis of 2015 Questionnaire Responses Final December 2015 Technical Report

5 Contents Executive summary... 1 Background and Methodology... 1 LIST OF ABBREVIATIONS Introduction Background Scope Methodology Data capture Formulation, trialling, and distribution of questionnaires Description of questionnaire and spreadsheets Responses Assessment of results Distribution of status tests and threshold values Comparability of Threshold Values - the effect of using different summary statistics Influence of Natural Background Levels Other sources of variability in Threshold Values Summary of significant variations by substance and test Discussion Conclusions...23 Annexes...24 Annex I Threshold Value and Status Assessment Methods...24 Annex II - Comparison of TVs taking into account summary statistics...35 Annex III - Ranges of TVs by selected substances and tests...40 Annex IV - Questionnaire and Workbook used in the survey...46 Questionnaire text...46 Questionnaire Workbook...46

6 Threshold Values - Initial Analysis of 2015 Questionnaire Responses EXECUTIVE SUMMARY Background and Methodology Threshold Values (TVs) are used to assess the chemical status of groundwater bodies. They are established by Member States (MS) in accordance with the procedures set out in Annex II of the Groundwater Directive and also under CIS Guidance Document 18 (Guidance on Groundwater Status and Trend Assessment). TVs are set for individual pollutants that present a risk to a groundwater body and are a trigger for further assessment of the impact of that pollutant. As such they should be focused on the protection of the groundwater body, including actual or potential legitimate uses or functions of groundwater and the interactions with associated surface waters (GWAAE) and directly dependent terrestrial ecosystems (GWDTE). Earlier work conducted for the European Commission (EC) described the role of TVs and examined some of the reasons behind the wide range and number of substances for which TVs were set by MS during the first River Basin Planning cycle (RBMP1). The Blueprint to Safeguard Europe s Water Resources concluded that the information provided in RBMP1 on chemical status was not sufficiently clear and that the methodologies for establishing TVs were not transparent. To further investigate the differences in TVs and provide an update on what was proposed by MS for the second River Basin Planning Cycle (RBMP2), the EC requested that Amec Foster Wheeler (AmecFW) undertake a short review using a questionnaire, to provide information that would underpin any proposals for potential harmonisation of approaches to the setting of TVs. This short report describes the questionnaire that was sent out to MS contacts via Working Group Groundwater (WGGW), the responses received, and presents an initial analysis of the data. The report aims to build upon previous work, focusing on TV methodologies and the reasons for the observed ranges and variability in TVs; the detailed objectives and key questions to be addressed are noted in Chapter 1. After review by a WGGW sub-group and trialling, the questionnaires were distributed to all MS in March 2015, 23 of whom subsequently responded. Initial review and analysis concluded that further liaison with MS was necessary to confirm the authors interpretation of the submitted information and resolve significant uncertainties, inconsistencies and missing data. This was completed in September Results There appear to be relatively few changes to the component tests of groundwater chemical status and the number of substances for which TVs are derived, between RBMP1 and 2. The wide variations in the number of substances (6 to 54) and values persist, with many MS using two or less of the five status tests noted in CIS GD18. Those that do use more status tests tend to have wider ranges in TVs as these reflect the protection needs of a wider range of receptors. Anonymised summaries of the status tests, TV derivation methods, methods of using natural background levels (NBLs), and status compliance criteria are provided in Annex I. This also gives a qualitative assessment of the potential sources and scale of variability in TVs. Annex III then provides a complementary summary of the ranges in MS TVs for selected substances by status test and identifies the sources of variability at both the upper and lower end of the observed range in TVs. 1

7 The report notes that TVs cannot be applied in practice or objectively compared unless the summary statistics associated with the TV are defined (i.e. whether the TV is expressed as a mean or maximum and over what time period). Chapter 3.2 and Annex II illustrate the effects of using different TV summary statistics by looking at MS data for nitrate, arsenic, ammonium, cadmium and nickel. For illustrative purposes the TVs and summary statistics applied by MS are compared to Drinking Water Standards (DWS). The analysis reveals that although most MS are applying the GWD Annex I standard for nitrate, a high proportion are not using a nitrate TV that is as stringent as the DWS. This pattern is repeated for the other selected substances. The fact that different numeric TVs can result in the same compliance requirements, once the variations in summary statistics are taken into account, is also highlighted. The uptake of the many methods of using NBLs in the derivation of TVs is reviewed and it is noted that where NBLs exceed Criteria Values 1, comparison of the resultant TVs becomes more difficult. In addition to summary statistics and NBLs, the other key sources of variability in TVs are identified as the use of dilution factors in the GWAAE Test and the derivation of TVs based on contaminated land criteria. The combined effect of all these factors is assessed by substance and status test. The report concludes that the key factors that are leading to variations in TVs are: Major impacts: dilution factors in the GWAAE test; NBLs where NBL > Criteria Value; use of contaminated land CVs. These factors mainly affect the upper end of the range of TVs. Moderate impacts: Varying NBL methods and values where NBL < Criteria Value; safety factors; differences in summary statistics. Of the above those sources of variation underlined are dependent on environmental factors. The remainder are dependent on/a result of, the TV derivation methodologies. The continuing low uptake of the GWDTE and GWAAE tests in terms of definition of TVs is noted, as is the issue of the degree of protection provided to drinking water sources by TVs for those MS that use mean data and do not apply a safety factor. As more of the component status tests are used the total variation in TVs will increase but the comparability of TVs within a component test should improve, providing that some basic measures are taken to improve transparency of approach. Measures identified include: Subdivision and reporting of TVs according to status test, with a clear indication of the associated summary statistics, any dilution factor employed and the separation of TVs based on contaminated land criteria; Distinguish between TVs derived where the NBL is greater or less than the CV and reduce the number of NBL methods used; Clarify the objectives behind the establishment of TVs, produce guidance on the use of safety factors and summary statistics and encourage the more widespread adoption of CIS GD18 status tests. 1 standards for the receptors or uses of groundwater, sometimes termed receptor-based standards. 2

8 LIST OF ABBREVIATIONS CIS Common Implementation Strategy for the Water Framework Directive CV Criteria Value DWD _ Drinking Water Directive DWPA Drinking Water Protected Area DWS Drinking Water Standards EC - European Commission EQS - Environmental Quality Standard GD CIS Guidance Document GWB Groundwater body GWAAE Groundwater Associated Aquatic Ecosystem GWD Groundwater Directive (2006/118/EC) GWDTE Groundwater Dependent Terrestrial Ecosystem GWQS - Groundwater Quality Standard (GWD Annex I) MS - Member State NBL - Natural Background Level RBMP River Basin Management Plan (1: first cycle plan; 2: second cycle plan) SWB Surface water body TV Threshold Value WFD Water Framework Directive (2000/60/EC) WGGW CIS Working Group on Groundwater 3

9 1 INTRODUCTION 1.1 Background Article 3 of the Groundwater Directive (GWD /118/EC) describes the criteria for assessing the chemical status of a groundwater body, including the groundwater standards for nitrates and pesticides as noted in Annex I of that Directive and Threshold Values (TVs) established by Member States in accordance with the procedures set out in Annex II of the Directive. CIS Guidance Document 18 2 gives further guidance on these procedures. TVs are set for individual pollutants that present a risk to a groundwater body and are a trigger for further assessment of the impact of that pollutant. If there is no risk identified in the characterisation process undertaken in accordance with WFD Article 5, then a TV (and by inference the use of a relevant component status test noted in CIS GD18) is not necessary. The GWD indicates that TVs should be focused on the protection of the groundwater body, including actual or potential legitimate uses or functions of groundwater and the interactions with associated surface waters (GWAAE) and directly dependent terrestrial ecosystems (GWDTE). An earlier report (Scheidleder, ), described the role of TVs and examined some of the reasons behind the wide range and number of substances for which TVs have been set by Member States (MS). The report was based on information prepared by MS for the first River Basin Planning cycle (RBMP1). The key findings of the report were that the significant differences in reported TVs used in classification during the RBMP1 were likely to be caused by: Variation in methodology for deriving Natural Background Levels (NBLs); Variation in Environmental Quality Standard (EQS) values used; Variation in safety margins applied to Drinking Water Standard (DWS) values, and in the DWS values used for parameters not covered by the Drinking Water Directive (DWD); Differences in the aggregation of monitoring results for reporting; Differences in the acceptable extent of exceedance ; and Differences in the typical areal extent of groundwater bodies (GWBs). The Blueprint to Safeguard Europe s Water Resources (EC ) concluded that the information provided in the first River Basin Management Plans (RBMPs) on chemical status was not sufficiently clear to set a baseline. In the supporting information on groundwater it was noted that the methodologies for establishing TVs were not sufficiently transparent. The substances for which TVs were set and the TVs themselves vary widely, making it difficult to compare the classification results for the chemical status of groundwater bodies. In RBMP1 the 26 MS set TVs for a combined total of 156 different substances 5, with the numbers for individual MS varying from less than 10 to over European Commission CIS GD No. 18 : Guidance on Groundwater Status and Trend Assessment (Technical report ). 3 A. Scheidleder 2012 : Groundwater Threshold Values In-depth assessment of the differences in groundwater threshold values established by Member States. Umweltbundesamt, Austria. 4 European Commission, November 2012 : A Blueprint to Safeguard Europe s Water Resources. Brussels 5 CIS Technical Report No.7 : Technical Report on Recommendations for the Review of Annex I and II of the Groundwater Directive 2006/118/EC, December

10 In order to further investigate the differences in TVs and provide an update on what was proposed by MS for the second River Basin Planning Cycle (RBMP2), the European Commission (EC) requested that Amec Foster Wheeler (AmecFW) undertake a short review using a questionnaire, to provide information that would underpin any proposals for potential harmonisation of approaches to the setting of TVs. The work would be steered by a sub-group of volunteers from Working Group Groundwater (WGGW), led by Ian Davey (UK) 6. The agreed objectives of the questionnaire were to: Obtain sufficient information to enable a clear understanding of why the TVs used in RBMP1 vary so much; Understand whether and how Member States are revising their approaches for RBMP2; Form an evidence base for any proposals to achieve better consistency in the methods for deriving TVs; and Provide a basis for enabling an assessment of the comparability of groundwater status results. The approach adopted in the questionnaire was to focus on the methods used to derive TVs and request that MS illustrate these methods with representative examples from identified groundwater bodies. An Excel workbook accompanied the questionnaire in which these example data could be recorded. The questionnaire itself requested data on: the classification tests used and in what way these differed from those outlined in CIS GD18 ; how NBLs were derived and used in connection with TVs; the underlying standards (Criteria Values) and how these related to TVs; how TVs were derived and then used in classification; and what changes were proposed in RBMP2. In all of the above, the summary statistics that accompanied the numeric values were requested (whether the numbers were means, maxima etc. and the time periods over which these were applied.) Subsequently the work was extended to provide this short report on the initial analysis of responses received from MS. The questionnaire was also extended to include questions on Groundwater Associated Aquatic Ecosystems (GWAAE) to support the WGGW subgroup that has been working on this subject. 1.2 Scope This report describes the approach to and main elements of the data collection (Chapter 2) and summarises the results primarily in the form of a series of tables (Chapter 3 and Annexes I and III). Copies of the questionnaire and workbook used to collect the data may be found as electronic files in Annex IV. Not all the data collected and summarised in the Annexes have been analysed in detail; the priority has been to address certain key questions (see below) using the most representative and complete data sets. For some substances and methods there were insufficient data to do detailed analysis, make valid comparisons or come to objective conclusions. The results presented here are part of a continuing process aimed at developing understanding of how TVs are being derived and used, with the aim of clarifying and potentially rationalising the methods of producing and deploying TVs. This report should be read in conjunction with the previous TV report (Scheidleder, 2012) as the intention is to build on and not repeat this work. One area that the previous report 6 From August 2015 Tim Besien (UK) took over this role. 5

11 was not able to address in detail, due to lack of information, was the summary statistics or compliance regime for TVs (i.e. how the numerical values were used were they compared to annual mean values of monitoring data aggregated over an area or maximum values applied at a single monitoring point etc.). This is a particular focus in the present work as it could explain some of the differences between the reported TVs or lead to differences in impact in terms of compliance, despite the use of TVs that are the same numerically. The report does not present detailed options for rationalising methods of TV derivation though it does aim to highlight the differences in methods and whether these adequately cover Water Framework Directive (WFD) and GWD requirements. The responses to the additional questions inserted covering the GWAAE work area are not reported here and will be addressed in outputs from the GWAAE sub-group of WGGW. In order to focus the analysis of the results and reporting a number of questions that this work should address (subject to adequate responses to the data request), were set: 1. What are the key factors that are leading to variations in TVs? 2. Can the variations be ascribed to differences in natural or groundwater body related factors or differences in methodology that could lead to differences in status assessment? 3. Do the TV methods protect the different types of receptor noted in the WFD? 4. Do the TV methods follow the approach (es) set out in CIS guidance? 5. Are there any major differences in approaches to the methods for setting TVs proposed in RBMP2? 6. Are there any major differences in the number of substances for which TVs are set in RBMP2? 6

12 2 METHODOLOGY 2.1 Data capture To keep the data collection exercise down to a manageable and acceptable level, the focus has been to obtain a coherent view of the methods used to derive and apply TVs, building on the information supplied for the earlier TV Report (Scheidleder 2012), rather than collecting more data on the range of TVs. The aim is to explain the previously observed variability and if and how this may change in RBMP2. In order to address this, it was considered essential to collect more detailed data on Member States use of TVs in the individual component tests for chemical status, illustrated with case-specific data from individual groundwater bodies to explain how the methods are implemented in practice. This includes how NBLs are taken into account and how the derived TVs relate to Criteria Values (CVs - standards for the receptors or uses of groundwater, sometimes termed receptor-based standards). It has not been the intention to repeat previous work, but wherever appropriate an update has been requested so that any changes proposed for RBMP2 can be reported. Where necessary, multiple examples were requested to demonstrate the range of approaches used (e.g. different substance groups) and clearly link these to the reported TVs. The questionnaire also requested that MS indicate any changes in their approach to deriving and using TVs subsequent to the first RBMPs. 2.2 Formulation, trialling, and distribution of questionnaires A draft questionnaire and accompanying Excel workbook (Annex IV) were reviewed by the volunteer group and then circulated to MS prior to the WGGW meeting in Rome in October In discussion at that meeting, concern was expressed by MS at the level of detail and purpose of the questionnaire. The EC emphasised the importance of collecting sufficient information to address the concerns raised in the Blueprint (EC 2012) and confirmed that it was a technical assessment process to develop understanding and not a compliance check. At the meeting it was agreed that a few additional questions should be added to the questionnaire in connection with the WGGW s work on GWAAE, this being the most effective way to collect the data needed for this work. Based on the comments received, the questionnaire and workbook were revised and then trialled. The finalised documents were distributed to MS in early March 2015 with a return date of 9 April These documents were pre-populated, where possible, with information from the previous questionnaire responses and examples of completed documents were attached to assist MS. Although there was a good response from MS with 18 responses by the deadline, a number of points needed clarification, some of which were dealt with in one to one discussions at the WGGW meeting in Brussels in mid- April. An extended deadline for additional responses and clarifications was set of 29 April 2015 and the total responses received by this date was 24 out of a possible 29 (one MS made two responses for different administrations within the MS). 2.3 Description of questionnaire and spreadsheets. The questionnaire and workbook are attached as electronic files in Annex IV. Both documents have explanations/descriptions of their format and how they should be completed, so this will not be repeated here. 2.4 Responses The level of detail provided by the 23 MS (24 total responses) that replied varied enormously. Though most MS provided sufficient information to enable the assessment to proceed, there were concerns over the consistency of some of the information, 7

13 particularly with respect to the summary statistics accompanying TVs. The descriptions in the questionnaire responses of how monitoring data are assessed and compared to TVs were often at variance with the data in the spreadsheets. In most cases we believe that this has been due to a misunderstanding of what was required in the spreadsheets and we have assumed that the questionnaire responses take precedence. On analysis of the data, further inconsistencies with data collected for the 2012 report became apparent. The above was noted in an initial draft report submitted to the EC in June It was agreed that further liaison should take place with MS and a summary of AmecFW s interpretation of the submitted data was despatched in July 2015 to MS contacts, with a request to confirm or note changes. In some cases specific questions were raised. For the most part the uncertainties were resolved but there remain a few instances where differences with previously submitted information are apparent and these are highlighted in the summary tables noted in Annexes I and III. Only 5 MS did not return questionnaires and therefore are not included in the analysis. In the assessments described below the contributing MS have been allocated arbitrary identification numbers, to anonymise the results, as requested by MS. This is consistent with the focus of the exercise which is to examine the impacts of different methodologies, not assess MS compliance. 8

14 3 ASSESSMENT OF RESULTS 3.1 Distribution of status tests and threshold values The questionnaires invited MS to indicate which of the following component tests for groundwater chemical status they had applied to their groundwater bodies in each plan period: CIS Guidance Document 18 tests: General Quality Assessment (GQA); Groundwater Dependent Terrestrial Ecosystems (GWDTE); Groundwater Associated Aquatic Ecosystem (GWAAE); Drinking Water Protected Areas (DWPA); Saline or other intrusions (Saline); Alternative /Member State derived test (ALT) Table 1 presents an overview of the use of these tests in terms of the derivation of TVs in RBMP1 and 2. Care should be taken in interpreting these data as it is possible that a MS may have determined that there is no risk to a specific class of receptor and therefore not applied a status test (and therefore not set TVs). However in the present work we have focused on methods rather than the numbers of TVs, thus if only one groundwater body within a MS is at risk then a method for deriving TVs to address this risk should be in place. During this work one MS indicated that it had considered a status test but had not derived TVs for this test as it considered that no groundwater bodies were at risk. Three MS were not able to supply data for RBMP2 as their proposals had not been finalised, therefore we have presented both numbers and percentages in Table 1 for comparative purposes. Though most MS use one or more of the CIS18 tests, 5 used a locally derived (ALT) test in RBMP1 and for 4 MS this was the only test they used to classify groundwater chemical status. In RBMP1 all MS who applied CIS18 tests used the GQA test and for 3 MS this was also the only test that they used. All the MS who used the DWPA test also used the GQA test. The GWDTE and GWAAE tests have the lowest uptake in terms of TV development and a number of MS make reference to expert judgement with regards to determining chemical status via these tests. In terms of the tests and TVs applied, the main changes in RBMP2 are the small increases in the use of the GWDTE, GWAAE and Saline tests. Table 1: Number of Member States defining Threshold Values for each Status Test RBMP1 (24 responses) RBMP2 (21 responses) Status Test GQA GWDTE GWAAE DWPA Saline ALT No % of responses No % of responses Table 2 shows the number of substances for which MS have set TVs, broken down by status test. Mean values are presented for those tests used by 10 or more MS. In RBMP1 most TVs were set for the GQA, DWPA and ALT tests, but the number of substances varies widely (ranges of 6-54, 3-53 and respectively). The wide variations persist in RBMP2 and there is no significant change in the average number of 9

15 TVs, though individual MS have made significant adjustments (figures in brackets with the +/- annotation). Only 3 MS have chosen to change their existing TVs (in terms of values) between RBMP1 and 2. A detailed breakdown of the status tests used including, the method for deriving TVs, how natural background is incorporated, how the TVs are used in status assessment and a summary of the potential sources and scale of variability, is given in Annex I. Details are provided for both RBMP1 and 2 (where available). These summaries of methods, which represent our interpretation of the material submitted in both the previous and this survey, were circulated to MS in July/August 2015 with a request that they were checked. This was considered necessary to resolve inconsistencies and uncertainties. Most MS contacts have responded either with agreement or modifications. On this basis, the summaries may be taken to be more up to date and representative than some of the questionnaire and workbook responses. The number of methods for deriving TVs and range in TVs increase with the number of different status tests employed. This diversity is both understandable and appropriate where the methods are tailored to the type of groundwater receptor/use/function, such that the resultant TV protects the receptor or use that has been identified as being at risk of failing the WFD environmental objectives. A wide variety of NBL methods are used by MS (see Chapter 3.3), but these methods tend not to vary between status tests, although NBLs themselves will vary enormously with hydrogeological conditions. In Annex I, under Status assessment, both the method of compiling monitoring data (Calculation column - how the summary statistic is used) and how this is then applied in the status test (Compliance column) are described. The final columns of Annex I give an initial qualitative assessment of the inherent potential variability in TVs that arises from the different methods for both naturally occurring and synthetic substances. In principle the variability assessed here arises from: the NBL method and how this affects TVs when the NBL<CV (see Chapter 3.3); the use of dilution factors in the GWAAE test 7 ; the use of unusual standards (e.g. contaminated land) in deriving TVs; dependence on ecosystem specific or locally variable CVs. Variation due to the use of safety factors/adjustments that take into account differing summary statistics are excluded from this qualitative assessment but where such factors exist these are noted in the comments column in Annex I; their impact is explored in Chapter 3.2. To complement Annex I, the actual variations in TVs by MS and test, including any changes indicated between RBMP1 and 2, are presented in Annex III, which will be described later in this report. 7 See p26 of CIS GD18. Because of changes in concentration along the flow path between the groundwater body and the associated surface water body, a dilution factor or an attenuation factor may be applied to derive an appropriate CV or TV. 10

16 Table 2: Number of substances for which Threshold Values have been defined for each status test MS ID CIS Tests CIS TESTS for CIS Tests CIS TESTS for Alternative which TVs Alternative which TVs GQA GWDTE GWAAE DWPA Saline method derived GQA GWDTE GWAAE DWPA Saline method derived (1) 2 (+1) (1) (+6) (+4) (+10) (9) 4+(+4) 4+(+4) (+2) (+12) 0 12 (+12) 12 (+12) 3 (+3) 0 (-11) 4 (+4) (+2) (+1) r RBMP2 not available 12 0* 0* 0* 0* 0* 44* 5* 0* 0* 0* 0* 0* 44* 5* (+2-10) (+1) (+1) 15 (+15) (+1) (2) 9 (2) 13 3 (2) r RBMP2 not available (+9-1) 0 2 (+1) 8 (+4) (+1) 17 (+2) 1 (+1) (+1) r RBMP2 not available (+4) 9 (+4) 2 (-4) 12 (+2) 3 (+1) (+24-4) 0 30 (+30) (+1) Mean KEY RBMP Cycle 1 RBMP Cycle 2 (note that numbers of substances includes Annex 1 standards for nitrates and pesticides) 0 : no TVs defined for this test/test not used (-1, +1): number of different substances removed or added in RBMP2 13 : number of substances (2) : Number of substances with different numeric values in RBMP2 44* MS method incorporates several CIS tests 11

17 3.2 Comparability of Threshold Values - the effect of using different summary statistics All numeric standards and in this case TVs are implicitly associated with summary statistics which describe how the numeric value is used - whether the value represents a mean, maximum, 95%ile etc., the timescale over which this statistic applies e.g. instantaneous, daily, or annual and in some cases an indication of the area or volume of the medium to which the numeric standard applies (which can be important for groundwater where sampling may be from different depths). For example, a Drinking Water Directive standard is normally a maximum value applied to every sample (i.e. instantaneous) and is described as a maximum admissible concentration (MAC) at the point of supply to the consumer. In contrast a TV for a groundwater body for chloride could be set at a value of 200 mg/l which is compared to mean annual data, spatially averaged over the body. The summary statistics for the TV in this case are mean annual groundwater body average. Valid comparisons of standards or TVs cannot be made if the summary statistics are not fully defined. The same numeric values may represent different standards in terms of their impact on compliance if their summary statistics differ. Equally, different numeric values may have the same effect (in terms of compliance) once their summary statistics are taken into account. A numeric value does not become a standard (or a TV) that can be applied in practice until its summary statistics are defined. For these reasons specific questions relating to the aggregation and comparison of groundwater monitoring data with TVs for each status test were raised in the questionnaire and workbook. The effects of differing summary statistics in relation to the TVs derived for selected substances are explored in Tables 3, 4 and 5, Figures 1 and 2 and Annex II. Based on the responses from Member States the TVs used in the GQA, DWPA and ALT tests for nitrate, arsenic, ammonium, cadmium, and nickel were compiled together with their associated summary statistics. These tests and substances were selected primarily on the volume of data available, there being inadequate data (only a few data points or no data for RBMP2) from which to draw meaningful conclusions for the other status tests and most other substances. In order to give some focus to the analysis, comparisons were also made with the DWS for the selected substances, these being the most common Criteria Values referred to by MS in deriving TVs, on the basis that drinking water supply is regarded as one of the most important uses/functions of groundwater. It should follow, therefore, that TVs in either the ALT test (the status test employed by a MS that does not use one or more of the status tests recommended in CIS GD18) or in one or both of the CIS GQA and DWPA tests, should be focused on drinking water protection. Given that TVs are intended to be standards, exceedance of which should trigger further assessment and potential protective measures it is reasonable to expect that they should be equivalent to or more stringent than the DWS in at least one status test. Note: Although WFD Annex V (2.3.2) and GWD Annex III place an emphasis on the use of annual mean data for the assessment of status, there is no specific requirement to adjust TVs to take account of summary statistics. However, whilst the Directives give MS some flexibility in the methods of status assessment, there is a need to demonstrate that the environmental objectives are being met. MS TVs were mapped against their summary statistics and values according to the scheme noted in Table 3. 12

18 Table 3: Matrix for comparison of TVs and DWS, taking account of summary statistics No. of responses SUMMARY STATISTIC (= the characteristics of the data used to compare against the numeric value) More stringent Maximum Instantaneous (individual samples) Mean or median, annual data Less stringent Mean or median of 2 years monitoring data Mean or median of 3 years monitoring data Mean or median of 6 years monitoring data Mean or median of over 6 years monitoring data e.g DWS Less stringent More stringent DWS SUBSTANCE X mg/l - NUMERIC VALUE Less stringent than DWS Member State TV equivalent to or more stringent than DWS Drinking Water Standard The numeric values are grouped into ranges and the summary statistics are banded, giving rise to a series of cells into which the TVs reported by MS are allocated. A single cell is allocated to the relevant DWS for comparison purposes. The ranges of the numeric value are based on the range of TVs encountered during this and earlier work. Though the dividing line can only be regarded as approximate, the matrix highlights the cells which are considered to be more or less stringent than the numeric value of the DWS. For the summary statistic (vertical axis) the most stringent statistic is maximum/instantaneous, there being a substantial difference between this and an annual mean or median. Based on work undertaken by some MS who have compared the means and maxima of chemical monitoring data e.g. for nitrate, a downwards adjustment (a safety factor ) in the order of 10-25% may be needed in the TV if mean data are used to protect a standard or criteria value based on a maximum concentration (UKTAG 8 ). The remaining rows in the table are allocated to increasing periods over which the mean or median is calculated from the monitoring data. Overall, the cells with white background represent areas where the TV is less stringent than the DWS. The greyed area is that which we estimate is either equal to or more stringent than the DWS. An example of a completed matrix is given in Table 4, which presents the results for the GQA test and nitrate in RBMP1 and 2. Note: whilst nitrate is a GWD Annex I substance with a fixed groundwater quality standard (GWQS) rather than a TV, this makes no difference to the analytical process and nitrate is a good example in that the NBL is usually substantially lower than the DWS and does not interfere with the comparison process. Table 4 demonstrates that, for the GQA test: In line with the suggested compliance regime in the legislation (GWD), all responding MS have used either mean or median data to compare against the TV/GWQS; There is a wide range in the time series used (from annual to over 6 years); There are few changes between RBMP1 and 2; Very few MS have built in a safety factor by adopting a lower numeric value; Approximately 75% of MS have adopted a TV/GWQS that is less stringent than the DWS. A complete dataset (including matrices for the DWPA and ALT tests) for nitrate and all the other substances assessed is presented in Annex II (Nitrate and arsenic are assessed together in Annex II A). 8 United Kingdom Technical Advisory Group, Feb 2007 : Paper 11(i)b Groundwater Chemical Classification 13

19 The results for the DWPA test for nitrate (Annex IIA) indicate that: Most MS have used mean annual data; The same number (but greater proportion than in the GQA test) of MS have used a safety factor and adopted a lower numeric value to compensate for using mean data; Approximately 75% of MS have adopted a TV/GWQS that is less stringent than the DWS in RBMP1. This figure falls to 64% in RBMP2. The results for the ALT test for nitrate, which is, where used, the only chemical status test, indicate that most MS have again used mean data but half have also applied a safety factor. Taking the results from the three tests together, it is clear that of the MS that have adopted a GQA TV/GWQS for nitrate that is less stringent than the DWS, roughly one third (those in bold italics in Table 4) have not deployed either the DWPA or ALT tests and therefore there is no other mechanism that provides protection equivalent to the DWS 9. Figure 1 presents the proportions of nitrate TVs that are more or less stringent than the DWS, taking the GQA, DWPA and ALT tests together, both in tabular and diagrammatic form. This indicates that 70% and 60% of TVs in RBMP1 and RBMP2 respectively are less stringent than the DWS. The above analysis has been repeated for all the substances noted previously and the results are summarised in Figure 2 and Table 5. A similar pattern is observed with between 41% (arsenic) and 70% (nitrate) of TVs in RBMP1 being less stringent than the relevant DWS, with a similar range of figures for RBMP2. 9 No MS has indicated that it has considered the DWPA test but has not applied TVs on the basis that no groundwater bodies are at risk. 14

20 Table 4: Example comparison matrix - GQA Test for nitrate. GQA TEST, RBMP1 SUMMARY STATISTIC More stringent Maximum Instantaneous Less stringent GQA TEST, RBMP2 Mean or median annual Mean or median 2 years 19 responses ,4, 5,11, 15,16,22,24 #17 Mean or median 3 years 6,18 20 Mean or median 6 years 3,7, Mean or median >6 years 14 SUMMARY STATISTIC More stringent Maximum Instantaneous Less stringent NITRATE mg/l - NUMERIC VALUE NITRATE mg/l - NUMERIC VALUE 20 responses Mean or median annual 2,4,5,11,15, 16,22,24 #17 Mean or median 2 years Mean or median 3 years 6,18 20 Mean or median 6 years 3,7,23 1,10 21 Mean or median >6 years 8, 14 Less stringent More stringent DWS 20 MS that has not used the DWPA test Equivalent to or more stringent than DWS #17 probable value based on formula Figure 1: Proportions of TVs that are more or less stringent than DWS - nitrate example RBMP1 RBMP2 Nitrate No % No % TV less stringent where only the GQA test is used by MS TV less stringent in all the tests where DWS could be used TV protective (equivalent to or more stringent to DWS) in ALT test only TV protective in GQA test only TV protective in GQA or DWPA tests Total

21 Figure 2: Proportions of TVs that are more or less stringent than DWS other substances 16

22 Table 5: Summary of Figures 1 and 2 (proportions of TVs that are more or less stringent than DWS) Percentage of MS with TVs equivalent to or more stringent than DWS Percentage of MS with TVs less stringent than DWS Substance RBMP1 RBMP2 RBMP1 RBMP2 Nitrate Arsenic Ammonium Cadmium Nickel The above analysis not only highlights the differences between TVs used by MS but also the comparability of TVs with different numeric values. For example, in DWPA test in RBMP2 for arsenic (Annex IIA) the TVs for MS 8 & 24 (10 and 7.5 µg/l respectively) are comparable, despite the difference in numeric value, due to the fact that their different summary statistics lead to the same level of compliance when monitoring data are assessed. A similar position applies to MS 1, 8, 10, 18, & 24 for nitrate in DWPA/RBMP2. As most MS have adopted a similar approach to the derivation of TVs for other substances, it can be assumed that these conclusions have widespread application across the range of TVs as far as the GQA, DWPA and ALT tests are concerned. 17

23 Figure 3: Natural Background Level Methods 18

24 3.3 Influence of Natural Background Levels Natural Background Levels (NBLs) were identified in the previous Threshold Values report (Scheidleder 2012) as a major source of variation in TVs, and various methods of setting TVs in relation to NBLs were noted. These are reproduced and assigned numbers in Figure 3 of this report, and referred to in Annex I, where the methods used by each MS are identified, including any variations with status test and substance. In most cases MS have used two of these methods, one for where NBL<CV (1-6 in Figure 1) and one for where NBL>CV (7 or 8). One MS reported that it does not take account NBLs when setting TVs but takes account of them in status assessment by excluding exceedences of TVs where these are due to elevated NBLs. Taking into account the different combinations of methods in Figure 3 (e.g. 1+7, 2+7 etc. and 1+8, 2+8 etc.) and the possibility of not taking NBL into account at all, it seems that there at least 13 different potential approaches to taking (or not taking) into account NBLs. We have identified the use of 10 of these approaches from responses to the recent questionnaire. No examples of the use of method 4 were encountered and only a couple of examples of methods 1 and 2. By far the most frequently encountered combination was methods 6 & 7 (45% of respondents). The plethora of methods, in particular for where NBL<CV, makes it difficult to draw comparisons between TVs. Methods 1, 2 and 4 link the TV to the NBL rather than the CV, which will introduce inherent variability in the TV and seems to deviate from the original purpose of TVs as noted in the GWD (Article 3 and Annex II). In the qualitative assessment of variability noted in the last two columns of the table in Annex I, the assessment of variability for natural substances only applies where NBL<CV. When NBL>CV, the only options are NBL methods 7 and 8, both of which have the potential to give rise to large variations in the resultant TVs. This is substance dependant and will particularly apply to substances that have large variations in NBLs that span and significantly exceed typical CVs, for example, chloride, sulphate, ammonium and to a lesser degree some metals. 3.4 Other sources of variability in Threshold Values In addition to the safety factors and NBLs described above, other major sources of variability in TVs noted from the responses include: The use of dilution factors in the GWAAE test. Though only used as a distinct test by a handful of MS (Table 2), the factoring in of dilution of the groundwater baseflow contribution to a dependant watercourse is a logical approach to deriving TVs but results in widely varying values arising from both the varying flow regimes in the watercourses and the EQS values for these watercourses, as described in the Scheidleder 2012 report. Constraints on dilution factors could reduce the variation in TVs but are likely to result in TVs that are less focused on the individual local circumstances and risks. The derivation of TVs based on contaminated land criteria. Again, whilst only used by a couple of MS (e.g. 12 and 23), these seem to be responsible for some of the high TVs for synthetic substances noted in the Scheidleder 2012 report. The scale at which these TVs are applied is unclear in the MS responses but appears to be at the site scale, which seems to be inconsistent with the intended purpose of TVs, which are normally set at the groundwater body (or larger) scale in order to assist with the determination of the status of groundwater bodies. 19

25 3.5 Summary of significant variations by substance and test The variation in TVs for the substances assessed in detail in this report, broken down by test is given in Annex III. This table distinguishes between the drivers behind the variation in TVs at both the upper and lower limits of the TV range; as noted below, these drivers may differ. Note: all values referred to below are means except DWS, which are maxima. Nitrate: is an Annex 1 GWQS so the maximum value is set by the GWD, but MS can derive lower limits if they feel this is necessary. The variation in the lower limits is driven by: the use of safety factors, usually to take account of the summary statistics as described in section 3. This applies to the GQA, DWPA and ALT tests. Typical range: mg/l. the use of ecosystem specific criteria values (EQS or wetland specific values) when the GWAAE or GWDTE tests are applied. TVs derived on this basis are usually lower than TVs adjusted to take account of safety factors. Typical range: mg/l. Arsenic: with one or two exceptions the range of arsenic TVs is low and the lower limit is influenced by the use of safety factors. High values are found where: the NBL is high (up to 653 µg/l but more typically in the range µg/l); dilution factors are used when applying the GWAAE test (up to 199 µg/l); exceptionally, TVs are derived for contaminated land (up to 200,000 µg/l). Cadmium: a number of MS use TVs at the lower end of the range (down to 0.4µg/l), incorporating a safety factor that goes beyond that needed to take account of summary statistics and is well below the DWS of 5 µg/l. The same number of MS has derived cadmium TVs for the GWAAE test; this does not give rise to exceptionally high TVs. High values (up to 27 µg/l) occur when the NBL is high. Nickel: follows a similar pattern to arsenic, but fewer MS have developed TVs for this substance and the upper limit arising from NBLs is a maximum of 30 µg/l (compared to a DWS of 20 µg/l). The highest values arise from the use of dilution factors in the GWAAE test (up to 116 µg/l). Ammonium: the submitted TVs have been checked to determine whether the values are expressed as ammonium (NH 4 ) or nitrogen (N). The values tabulated in this report are all expressed as ammonium. The lowest and highest values derived across the tests are based on NBLs and the total range in TVs is relatively low if two extreme values are excluded ( µg/l with extremes, µg/l without). Chloride: the total range in TVs across the tests is large ( mg/l) and both low and high limits are driven by the wide range in NBLs. 20

26 4 DISCUSSION Given that TVs are defined in order to assist in the classification of groundwater chemical status, taking into account a wide range of receptors, hydrogeological conditions and pressures, the wide range of reported TVs is unsurprising. However, when reported in aggregate at the MS level, it is not immediately apparent what is driving these variations. There are clear differences between the TV methods, substances for which TVs are derived leading to the risk of not achieving good status and underlying CVs for the different types of test. In terms of the numbers of substances for which TVs are defined, the DWPA and GQA tests are comparable, with averages of 22 and 25 respectively in RBMP2, but the variations between MS are substantial (6 54) in both RBMP1 and 2. In terms of reported TVs, the Saline test is characterised by a small number of naturally occurring substances (such as chloride and sulphate) with TVs heavily influenced by NBLs, leading to wide ranges but generally high numeric values. Mean data over varying timescales are used to assess compliance with the TVs and, as noted in Annex I, some MS also employ a trend assessment procedure in status assessment. Very few MS have derived TVs for the GWDTE test but those that have tend to focus on nutrients such as nitrate. When derived, the TVs tend to be significantly lower than those adopted for the GQA and DWPA tests and mean data are used to assess compliance. The small number of MS that use the GWAAE test tend to base the TVs on surface water EQSs which are often substantially lower than DWS or GWD Annex 1 standards. However dilution factors are often built into the associated TVs, resulting in wide ranges in TVs for an extensive list of substances. The DWPA test is used extensively but, despite the focus on protection of drinking water, and therefore the use of DWS as Criteria Values on which to base TVs, many MS use mean data to compare with the TVs, instead of maxima or alternatively do not adopt a safety factor. In principle, this test should give rise to the lowest range of TVs but in practice the TVs that have been adopted represent differing levels of protection despite the relative consistency in numeric values. Similar concerns with the comparability of TVs apply to the GQA test, but in this case the aim of the test is normally to provide a general level of protection over the groundwater body. The underlying CVs are often DWS but the method of assessment is invariably based on mean monitoring data, which raises questions regarding the degree of protection afforded to drinking water sources by these TVs where no other status tests (such as the DWPA test) are employed. The reliance on NBLs in setting or applying TVs is unavoidable where the NBL>CV, as otherwise there could be a determination of poor status where no anthropogenic influence was involved. Dependency on NBLs where the CV is significantly in excess of the NBL raises the issue of whether the objective behind the setting of the TV is to protect (maintain at the existing level) the NBL or prevent pollution/poor status (the GWD clearly indicates that it should be the latter). Much of the variation in TVs at the lower end of the range seems to reflect this difference in underlying objective. Excluding the use of contaminated land TVs, variation in TVs at the upper end of the range for naturally occurring substances is a consequence of NBLs or the use of dilution factors in the GWAAE test. 21