JOAQUIN - JOINT AIR QUALITY INITIATIVE FINAL REPORT TRANSNATIONAL MODEL INTERCOMPARISON AND VALIDATION EXERCISE IN NORTH-WEST EUROPE

Size: px
Start display at page:

Download "JOAQUIN - JOINT AIR QUALITY INITIATIVE FINAL REPORT TRANSNATIONAL MODEL INTERCOMPARISON AND VALIDATION EXERCISE IN NORTH-WEST EUROPE"

Transcription

1 JOAQUIN - JOINT AIR QUALITY INITIATIVE FINAL REPORT TRANSNATIONAL MODEL INTERCOMPARISON AND VALIDATION EXERCISE IN NORTH-WEST EUROPE September 2015

2 Interim report on Joaquin WP2A7 2

3 Table of Contents 1 INTRODUCTION BACKGROUND OBJECTIVES MATERIALS AND METHODS MODEL DOMAIN MODELS CONSTRUCTION OF THE EMISSION DATASET METEOROLOGICAL INPUT OBSERVATIONAL DATA PM 10, PM 2.5 AND NO EC MODEL EVALUATION AND VALIDATION RESULTS AND DISCUSSION PM SPATIAL VALIDATION ON ANNUAL MEAN TEMPORAL VALIDATION ON DAILY VALUES TEMPORAL VALIDATION ON HOURLY VALUES PM SPATIAL VALIDATION ON ANNUAL MEAN TEMPORAL VALIDATION ON HOURLY VALUES NO SPATIAL VALIDATION ON ANNUAL MEAN VALUES TEMPORAL VALIDATION ON HOURLY VALUES EC SPATIAL VALIDATION ON ANNUAL MEAN VALUES BC MEASUREMENTS BELGIUM EC MEASUREMENTS BELGIUM BC MEASUREMENTS UNITED KINGDOM EC MEASUREMENTS FRANCE CALIBRATED ENSEMBLE MAPS AND POPULATION EXPOSURE EFFECT OF METEOROLOGY ON POLLUTANT CONCENTRATIONS Interim report on Joaquin WP2A7 3

4 4 CONCLUSION REFERENCES Interim report on Joaquin WP2A7 4

5 This report was drafted as a part of the Joaquin-project. This is an INTERREG IVB NWE project aiming to improve air quality in the Northwest European region. Joaquin (Joint Air Quality Initiative) focusses on the air quality in Northwest Europe, the associated health effects an possibilities for improvement. The project comprises the measurement of some parameters showing a stronger correlation with health effects (ultrafine particles, particulate matter composition (metals, soot ) than the currently measured PM 10 and PM 2,5 parameters. The project will also evaluate measures currently available to policy makers. Certain measures will even be piloted in the participating cities. These findings will be presented to stakeholders and policy makers, whilst providing them with a tool to start working on these measures (decision supporting tool). Finally, this project will also spread information on these novel parameters and air quality in general to both experts and the general public, that will enable them to better assess the air quality in their own region. Duration: 01/05/ /11/2015 Partners: - Belgium (4): Vlaamse Milieumaatschappij (VMM), Intergewestelijke Cel voor het Leefmilieu (IRCEL-CELINE), Vlaams Agentschap Zorg & Gezondheid (VAZG), Stad Antwerpen - France (2): École des Ingénieurs de la Ville de Paris (EIVP), Atmo Nord Pas de Calais - The Netherlands(4): GGD Amsterdam, Provincie Noord-Holland, Rijksinstituut voor Volksgezondheid en Milieu (RIVM), Enery research Centre of the Netherlands (ECN) - United Kingdom (6): University of Brighton, University of Leicester, Leicester City Council, London airtext, Greater London Authority (GLA), Transport for London (TfL) More information on the project can be found on Interim report on Joaquin WP2A7 5

6 1 Introduction 1.1 Background Joaquin (Joint Air Quality Initiative, is a EU cooperation project supported by the INTERREG IVB North West Europe programme ( The general aim of the project is to support health-oriented air quality policies in Europe. To achieve this, the project will provide policy makers with the necessary evidence on the current local and/or regional situation (e.g. measurements and model maps of emerging health relevant parameters), provide them with bestpractice measures that can be taken and motivate them to adapt and strengthen their current air quality policies. Although air quality has improved considerably in Europe in recent decades (EEA, 2014), airborne fine particles still have a significant impact on our health and life expectancy (Amman et al., 2005). The recent REVIHAAP study of the WHO confirmed once more that particulate matter concentration has a strong link with health effects and that there is no safe threshold below which no effects occur (WHO, 2013, REVIHAAP). For Flanders, particulate mass accounts for 75% of lost healthy life years due to environmental causes (Buekers et al., 2012). Because of this strong link found between particulate matter and health effects, current EU air quality (AQ) legislation is centred on monitoring, limiting and reducing particulate matter concentrations of airborne particles. However, recent toxicological & epidemiological research argues that other particle metrics, such as particle numbers (PN) and particles from combustion sources (black carbon, BC), may constitute additional links to health endpoints than matter concentration (Janssen et al., 2011). Air pollution, especially of particulate matter, is by its nature a truly transnational problem, e.g. emissions of air pollutants in one region can have a detrimental effect on the air quality of other regions and vice versa. Especially in North-West Europe, high concentrations of particle matter can be seen across several countries (EEA, 2014). This is less true for particle numbers and black carbon, although at present there is no clear understanding of their spatial distribution, especially at local and regional level. To be able to calculate population exposure and to monitor progress in mitigation strategies accurate regional scale maps are needed. For this purpose, air quality models are increasingly used by European countries since they are able to deliver such maps. However, comparison of model results, especially at country boundaries, is hampered by the fact that all countries use different models. Currently, limited interaction or harmonization of modelling results exists between countries. A wide spectrum of options can be chosen when setting up a model: the chemical mechanism, the spatial and temporal resolution, the aerosol description and physics, as well as many other physical and chemical processes (Vautard et al., 2007). A European composite map of model results for each country demonstrated that the national maps greatly differ in both grid resolution and grid orientation, causing further sources of variation at country borders (de Smet et al, 2013; ETC/ACM Technical paper 2013/3). The compilation of different model building stones into a chemistry transport model leads to possible accumulation or compensation of errors, the so-called modelling uncertainty, which cannot be accounted for with one single model (Vautard et al., 2007; Derwent et al., 2014). Therefore, it is recommended to use a spectrum of different model results to simulate past, present and future air quality and visualize the effect of emission reductions due to policy actions on pollutant concentrations. Input parameters, i.e. emission data, meteorology, land use data and boundary conditions are also likely to differ between model runs in different countries, hereby complicating further the comparison or exchange of model results. Model intercomparison studies using the same input parameters for all models may help to evaluate the ability of different models to simulate pollutant concentrations Such studies have already been performed in the past to evaluate the simulation of inorganic aerosol compounds (Hass et al., 2003), O 3 (van Loon et al., 2007; Solazzo et al., 2012a) and PM 10 (Vautard et al., 2007; Stern et al., 2008, Colette et al., 2011; Solazzo et al., 2012b), focusing on both short-term as well as long-term model performance. However, both emission inventories and air quality models are continuously being improved, making a new intercomparison study for the North-West European Interim report on Joaquin WP2A7 6

7 region with its high pollution levels necessary. In contrast to previous studies, this study will also focus on emerging pollutants like EC. 1.2 Objectives Since the same four models from this study will subsequently be used to evaluate the effect of emission reduction scenarios in the NWE region, for which the Joaquin project aims to provide an ensemble prediction, the participating models need to be evaluated against observations. The main objectives of this study are twofold: 1. To compare both spatial and temporal model performance of three air quality models currently used by Belgium, i.e. Chimère, BelEUROS and AURORA, and one Dutch model, LOTOS- EUROS, for particulate matter (PM 10 and PM 2.5 ), Nitrogen Dioxide (NO 2 ) and Elemental Carbon (EC) in the North-West European region. 2. To assess the sensitivity of the four different models to meteorological input by testing concentration differences between different meteorological years while keeping emission input constant. Interim report on Joaquin WP2A7 7

8 2 Materials and Methods 2.1 Model domain The model domain comprises North-West Europe, i.e. Belgium, the Netherlands, the majority of England, North-France and West-Germany. Figure 1 provides a map of the different model grids. This region can be considered as a hotspot for air pollution. The coordinates of each grid can be found in Table 1. Figure 1 Model domain of the four models used Each model was nested in a coarser domain which comprised the whole of Europe, so that accurate boundary conditions were provided to the domain of interest. Both CHIMERE and AURORA were nested in a coarse grid of CHIMERE with a resolution of 0.5 x 0.5, while LOTOS-EUROS and BelEUROS were nested in a coarser master domain, with respectively a spatial resolution of 0.5 x 0.5 and 0.55 x 0.55 km² (Figure 2). Interim report on Joaquin WP2A7 8

9 Figure 2 Scheme representing the nesting of each model and their resolution 2.2 Models The four models included in the North-West European model intercomparison study are four state-ofthe art Eulerian chemical transport models: The CHIMERE model (Menut et al., 2013) AURORA (Lauwaet et al., 2013) BelEUROS (Deutsch et al., 2008) LOTOS-EUROS (Schaap et al., 2008) Three of the models (Chimère, AURORA and LOTOS-EUROS) are currently used in combination with measurements (data assimilation) for operational forecasts in Belgium and the Netherlands, while the BelEUROS model was frequently used in the past for scenario-analysis and source apportionment studies in Belgium. The CHIMERE model is developed, maintained and distributed by Institut Pierre Simon Laplace (CNRS) and INERIS (Bessagnet et al., 2008). It is designed to produce daily forecasts of ozone, aerosols and other pollutants and make long-term simulations for emission control scenarios, as is the case in France (Honoré et al., 2008), Belgium and several other regions in Europe. It runs over a range of spatial scales from the regional scale to the urban scale with resolutions from 1-2 km to 100 km. More details can be found in Menut et al. (2013) and on the website The AURORA model (air quality modelling in urban regions using an optimal resolution approach) was developed at VITO. The model was extensively applied over the Belgian domain for both scenario-analyses (Lauwaet et al., 2013) as well as daily forecasting for Belgium and specific urban regions. Within the Joaquin project AURORA was adapted to model EC as a separate pollutant so that separate EC emissions could be used as input (see Deutsch et al., 2013). The BelEUROS model is, like LOTOS-EUROS, based on the EUROS model developed by RIVM in the Netherlands for the modelling of winter and summer smog episodes. The model was in 2004 extended by VITO for the Belgium domain with algorithms for atmospheric particles (Deutsch et al., Interim report on Joaquin WP2A7 9

10 2008) and through the years continuously improved. It is an operational tool for policy support in Belgium (Delobbe et al., 2002). Within the Joaquin project BelEUROS was adapted to model EC as a separate pollutant so that separate EC emissions could be used as input (see Deutsch et al., 2012). The LOTOS-EUROS model is a model developed by the consortium of TNO, RIVM, KNMI and PBL in the Netherlands. LOTOS-EUROS is a unification of the two models LOTOS and EUROS, of which the first version was released in 2004 as LOTOS-EUROS version 1.0 ( Table 1 provides a summary (in analogy with Stern et al., 2008) of how the major model components were configured for this intercomparison exercise. Table 1 Summary of the major model components (in analogy with Stern et al., 2008) CHIMERE AURORA BelEUROS LOTOS-EUROS Gas phase chemistry Reduced MELCHIOR (Derognat et al., 2003) CB-IV 99 mechanism CB IV-99 CBM IV Heterogeneous chemistry N 2 O 5 hydrolysis: RH dependent N 2 O 5 hydrolysis: RH Not included dependent Included Aerosol size distribution Nine bins: µm 2 bins: < 2.5 µm and µm 2 bins: < 2.5 µm and µm 2 bins: < 2.5 µm and µm Inorganic aerosols Thermodynamic equilibrium with ISORROPIA Thermodynamic equilibrium with ISORROPIA Thermodynamic equilibrium with ISORROPIA Thermodynamic equilibrium with ISORROPIA Organic aerosols Simplified SOA Not included Not included Not included scheme (Bessagnet et al., 2005) Secondary aerosols No No No Yes in the coarse fraction Aqueous chemistry Aqueous phase for sulfuric acid RH dependent sulfuric acid formation Aqueous phase Aqueous phase for formation of sulfuric sulphate production acid Dry deposition/ sedimentation Resistance approach Resistance approach Resistance approach Bi-directional exchange for NH3 (Kruit et al., 2010) Wind blown dust Not included Not included Not included Included Interim report on Joaquin WP2A7 10

11 Sea salt included Not included Included (Monahan scheme) Included Biogenic emissions MEGAN (Guenther et al., 2006) MEGAN v (Guenther et al., 2006) Simplified isoprene and terpene scheme (Guenther) isoprene and monoterpene from trees, grass, and crops Boundary conditions coarse domain LMDz-INCA2 (gases), LMDzAERO (aerosols) N/A (nested inside CHIMERE) Climatology MACC Vertical resolution 8 levels, σ 997 to 500 hpa 17 layers: average thickness 27 m (bottom) to 612 m (top) 4 layers to 4000m (top) 4 layers up to 3500 meter Horizontal resolution 0.1 x x 7 km² 7.5 x 7.5 km² x Projection Longitude-Latitude Lambert Conformal Conic Shifted Pole Lambert Conformal Conic Longitude-Latitude 2.3 Construction of the emission dataset Total emission data for the year 2009 used as input for the four models was provided by TNO. A full description of the methodology used to generate the emission maps can be found in Denier van der Gon et al. (2014). Here we provide a brief overview. The TNO MACC-II emission inventory (Kuenen et al., 2014) formed the starting point for the Joaquin 2009 emission dataset. The TNO MACC II inventory is based on the official reported emissions from countries to CLRTAP (Convention on Long-Range Transboundary Air Pollution) and UNFCCC (United Nations Framework Convention on Climate Change). These official emission data are collected at sector level and checked for completeness, consistency and accuracy. Wherever data are not available or found not to be accurate enough, other emission estimates are used. PM 10 and PM 2.5 emissions are split up into Elemental Carbon (EC), Organic Carbon (OC), sulphate (SO 4 2- ), sodium (Na + ) and other minerals for 200 subsectors. The complete set of emissions compiled at sector level is spatially distributed at a resolution of º x º (lon x lat) based on proxy maps. Emissions are available for CO (carbon monoxide), NH 3 (ammonia), NMVOC (non-methane volatile organic compounds), NO x (nitrogen oxides), SO x (sulphur oxides) and the particulate matter compounds EC (elemental carbon < 2.5 µm), PM 2_5_Other (PM2.5 other than EC) and PM_Coarse (all particles with diameter between 2.5 and 10 µm). For Flanders (Flemish Environment Agency), the Netherlands (Dutch Emission Registration) and England (UK National Atmospheric Emissions Inventory NAEI) a high resolution 1x1 km² bottom-up emission inventory was available. These inventories were harmonized prior to further processing in a GIS environment. For Flanders separate EC emissions were provided, while for the Netherlands and Interim report on Joaquin WP2A7 11

12 England EC emissions were estimated from PM 2.5 based on EC fractions in the TNO MACC database (Visschedijk et al., 2010). Subsequently, TNO MACC-II emissions for Flanders, the Netherlands and England were replaced with the high resolution emission data. This yielded a hybrid emission base map for the year 2009 on the MACC-II grid, which then had to be converted to the different grid or model formats with different spatial resolution and projections as shown in Figure 1 (except for LOTOS-EUROS which can directly use the TNO-MACC emission input). All formats store information about the sector (SNAP), the pollutant name, the emission value and the country where the source is located (except BelEUROS). For point sources, the pre-processing was generally easier since no spatial redistribution needed to be performed. However, the CHIMERE model requires information on the effective emission height instead of stack height since it is not able to calculate plume rise. This was solved by splitting every point source into several point sources at different heights, each carrying a fraction of the total emission. The vertical distribution and the height related fractions were based on Bieser et al. (2011). Figure 3 a-d shows the spatial distribution of the resulting dataset for NO x, EC, PM 2.5 (excl.ec) and PM coarse ). On the emission maps for EC, PM 2.5 and PM coarse the highest emissions can be seen in the region of Paris and in the other French cities. This has two causes. Firstly, reported PM emissions of domestic energy use for France are among the highest in Europe due to woodburning. Secondly, no bottom-up emission data could be used for France like for Flanders, the Netherlands and England. Total country emissions were spatially distributed using proxy maps like population density. In this approach, most of the PM emissions due to residential combustion (wood burning) are allocated to the city of Paris and other cities, while in reality more emissions should be allocated to less populated areas in France (Denier van der Gon et al., 2012). 2.4 Meteorological input All models used meteorological input data from the European Centre for Medium-Range Weather Forecasts (ECMWF), they only differed in the type of meteorology. The meteorological input of BelEUROS is based on ECMWF data of the type Operational archive, Atmospheric model, Analysis, while AURORA uses re-analysed ERA-Interim data. Both CHIMERE and LOTOS-EUROS have ECMWF forecasted data as input. The resolution of the ECMWF data is 0.25 x 0.25, except for BelEUROS which uses 0.5 x Observational data PM 10, PM 2.5 and NO 2 Hourly and daily measurement data for PM 10, PM 2.5 and NO 2 were taken from Airbase, an European Air Quality Database with validated monitoring data ( total number of stations in the Joaquin modelling domain amounted to 300 for PM 10, 106 for PM 2.5 and 265 for NO 2. Only background (urban, suburban and rural) stations were selected, since they are representative for the spatial resolution of the models, while traffic and industrial stations generally require a higher model resolution. The raw Airbase hourly data were processed for use in the DELTA Tool (see 2.6), which was a large task since not all countries report their raw measurement data in the same way. After the model runs were finished, hourly time series for each grid cell containing a measurement station were extracted and prepared for use in the DELTA Tool. Interim report on Joaquin WP2A7 12

13 Figure 3 Spatial distribution of NO x, EC, PM 2.5 (excl EC) and PM coarse emissions in the Joaquin domain used as input for the models, here on the LOTOS-EUROS grid (0.125 x ) Interim report on Joaquin WP2A7 13

14 2.5.2 EC Data on the chemical composition of PM is generally not available in the data sets provided by Airbase, since reporting to Airbase is based on legal reporting obligations at EU level (EEA, 2013). Monitoring of BC and EC levels measured in ambient air at urban background and traffic sites is not addressed by EU regulation, only at rural background stations (EC, 2008). Several countries performed, however, in the past chemical characterization studies where PM was analysed for its different components (Vercauteren et al., 2011). These are not continuous measurements, since EC is measured offline on PM 10 or PM 2.5 samples with a thermo-optical method. Countries inside the Joaquin domain were contacted in order to obtain these measurements and daily EC measurement data of 2009 were obtained for the Paris area, several locations in (Western) Germany and Flanders. For the Netherlands, we received annual averages, mostly based on black smoke measurements. Data from (Western) Germany were excluded from the analysis as there was uncertainty on the pollutant that was measured (EC, Black Smoke) and the method used. Next to the discontinuous EC measurements, the measurement of Black Carbon (BC) has been implemented in several European air quality monitoring networks, mostly in urban networks. They are a continuation of the older black smoke or soot measurements which had partly been put in place based on national level regulations (EEA, 2013). For Belgium and England, BC measurements were available measured with MAAP and AE22 sensors, respectively. In the Netherlands both annual average BC and black smoke measurements are available for Model evaluation and validation The four models in this study were evaluated on the basis of comparisons against measurements. In order to do this, statistical indicators are calculated to determine the capability of the air quality models to reproduce measured concentrations. More specifically, we calculated the mean bias, Root Mean Square Error (RMSE), the Pearson correlation coefficient (r) and R² of the linear regression line to evaluate how each model reproduces both the spatial variation in the model domain and the hourly and daily temporal variation in 2009 in each measurement station. For EC specifically, temporal and spatial validation of model results is performed for each set of BC/EC measurements separately, since the BC monitor type or the EC analysis protocol often differ between countries. In order to increase the number of stations for spatial validation, both BC and available BS measurements for the Netherlands were converted to EC concentrations by means of the following equations (based on Schaap et al., 2007): EC=0.7*BC (MAAP monitor) EC=0.056*BS EC=0.088*BS (rural background stations) (urban background and traffic stations) For the Belgian stations, BC measurements were converted by the following equation based on an urban background station (Matheeussen, 2011): EC = 0.84*BC 0.77 BC measurements for England were not converted since they were shown to agree well with EC measurements (Butterfield et al., 2010). Although statistical performance indicators provide insight on model performance in general, they do not tell whether model results have reached a sufficient level of quality for a given application, e.g. for assessment or policy support (Pernigotti et al., 2013). Specifically for PM 10 and PM 2.5 this could therefore be useful information. Therefore, Thunis et al. (2012) proposed to use a Model Quality Objective (MQO) indicator which allows to asses model performance for compliance checking. MQOs are developed for limit values (PM 10, PM 2.5, NO 2, O 3 ) as defined in 2008/EC/50. The MQO is based on the RMSE between measured and modelled concentrations and takes into account measurement uncertainty according to the considered pollutant and concentration: Interim report on Joaquin WP2A7 14

15 where RMS U is defined as the Root Mean Square of the measurement Uncertainty, m i and x i represent the modelled and the measured value at a certain time step i, and U is the observation uncertainty. The MQO is then defined by requiring that model results have a similar margin of tolerance (in terms of uncertainty) as observations i.e. RMSE / 2 RMS U 1 The estimation of measurement uncertainty for PM 10, PM 2.5 and NO 2 used in the Delta Tool is based on Pernigotti et al. (2013), where a formulation for a concentration dependent measurement uncertainty is presented. Furthermore, also based on the MQO, Thunis et al. (2012) derived Model Performance Criteria for a set of additional statistical indicators, i.e. Normalized Mean Bias (NMB), correlation coefficient R and the Normalised Mean Standard Deviation (NMSD). These additional statistical indicators can be used to identify which of the model performance aspects need to be improved. Both the MQO and the MPC are calculated for daily and annual mean PM 10 values, annual mean PM 2.5 values and hourly and annual mean NO 2 values since limit values for these time aggregations exist in the Air Quality directive. This approach was integrated in the Delta Tool software, which is an IDL-based model evaluation software developed by the Joint Research Centre in the framework of the FAIRMODE procedure for benchmarking of Air Quality Directive modelling applications (). In this study, we used the Delta Tool v4.0. Furthermore, the Atmosys validation tool, an online validation tool developed within the Atmosys project ( was used to calculate the statistical indicators for PM 10 and PM 2.5. Results from the Atmosys Tool were cross-checked with output from the openair package of the statistical software R and found to be equal. Since no European limit values exist for EC, the Delta Tool could not be used here. Therefore, we produced Taylor Diagrams for the validation of this parameter. Such a diagram provides a different way of graphically summarizing how closely the models match observations at each measurement station. The similarity between model and observation values is quantified in terms of their correlation, their centred Root Mean Square Error and the amplitude of their variations (represented by their standard deviation) (Taylor, 2001). Differences between meteorological years were assessed by so-called Quantile-Quantile plots (q-q plots) and by a Wilcoxon Signed Rank test. A q-q plot is a plot of quantiles of the first data set against the quantiles of the second data set. A quantile is the fraction of points below a given value, f.e. the 30% quantile is the point at which 30% of the data fall below that value. If the concentration maps of two meteorological years are similar, the points on the q-q plot will be on the 1:1 line. The Wilcoxon signed Rank test allows calculating if there is a statistically significant difference in median concentrations between meteorological years. It is a non-parametric alternative for the paired Students t-test. Due to the large number of samples, the test is very sensitive to differences. The question is therefore also whether a statistical significant difference between samples is what we are actually interested in, and so we also tested whether the differences were larger than some specified threshold values. As the different pollutants have different ranges, these threshold values were chosen as 5% and 10% of the median pollutant concentration for the reference year. Interim report on Joaquin WP2A7 15

16 3 Results and discussion 3.1 PM Spatial validation on annual mean Figure 4 presents annual mean PM 10 concentrations in the Joaquin domain, modelled by the four different AQ models and the Airbase measurement data. From these maps it is clear that all models underestimate the measured PM 10 concentrations, which is consistent with previous model intercomparison studies (Schaap et al., 2015, Solazzo et al., Stern et al., 2008). Modelling PM concentrations is challenging with air quality models since a wide range of PM physics and chemistry needs to be incorporated and a large variety of emission sources has to be considered (Solazzo et al., 2012). The problem is especially difficult when simulating long temporal periods and large spatial scales due to the variety of sources involved and the chemical and physical transformations of some species that can occur over long time periods (e.g., Mathur et al., 2008). The highest variation in modelled concentrations between urban and rural areas is observed for the BelEUROS and CHIMERE model and the lowest for the Aurora model. All models tend to produce high concentrations for the region of Paris, which can be attributed to the high emission data used here (see 2.3). Figure 4 Annual mean PM 10 concentrations modelled with Chimere, Aurora, BelEUROS and LOTOS- EUROS. Bullets represent the measured PM 10 concentration at urban and rural background stations The low concentrations above sea for the AURORA model compared to the other three models can most likely be attributed to the fact that no sea salt was included in the model run. In contrast, BelEUROS shows very high PM 10 concentrations above the North Sea and the English Channel, which belong mainly to the PM coarse size range since this effect is not visible on the PM 2.5 maps. They Interim report on Joaquin WP2A7 16

17 can be attributed to an overestimation of sea salt concentrations due to an underestimation of sea salt deposition above sea (Deutsch et al., 2014). When comparing the measured PM 10 annual mean values from AirBase with corresponding model grid values, we see again that all models clearly underestimate the actual concentrations (Figure 5). The slope of the regression line between measured and modelled concentrations amounts from for AURORA to for BelEUROS. The correlation coefficient for annual mean PM 10 concentrations is highest for the CHIMERE model and lowest for the BelEUROS model. The low correlation coefficient for the latter is caused by a large overestimation of PM 10 concentrations in the region of Paris and other French cities (urban and suburban monitoring stations), which is observed for all models (Figure 6) but is highest for the BelEUROS model. When French stations are removed from the analysis, spatial correlation coefficients amount to 0.71, 0.49, 0.61 and 0.67 for CHIMERE, AURORA, BelEUROS and LOTOS-EUROS, respectively (Figuur 7). The slope of the regression line is then highest for the CHIMERE model, but mean bias without the French stations is lowest for BelEUROS, i.e. -3.9, compared to the other models (-7.6 to -8.9). BelEUROS seems to overestimate the lowest PM 10 concentrations. Figure 5 Scatterplot of modelled versus measured annual mean PM 10 concentrations per station type for CHIMERE, AURORA, BelEUROS and LOTOS-EUROS Interim report on Joaquin WP2A7 17

18 Figure 6 Scatterplot of modelled versus measured annual mean PM 10 concentrations per country type for CHIMERE, AURORA, BelEUROS and LOTOS-EUROS. Figuur 7 Scatterplot of modelled versus measured annual mean PM 10 concentrations per station type for CHIMERE, AURORA, BelEUROS and LOTOS-EUROS; French stations are excluded from the analysis Interim report on Joaquin WP2A7 18

19 Air quality models in general underestimate PM 10 concentrations. Therefore, model results for assessment purposes are often calibrated with measurement data. So, when looking at the MQO for annual mean values, which takes into account the measurement uncertainty, uncalibrated model results of CHIMERE, AURORA, BelEUROS and LOTOS-EUROS fulfil these respectively for only 10%, 9%, 46% and 6% of the stations (Figure 8). MQO are most fulfilled for rural background stations (27% on average for all models), while no difference is seen between urban or suburban background stations (15% on average for all models). Results also vary considerably among countries, i.e. for Belgium, Germany, France, England and the Netherlands 9%, 29%, 10%, 19% and 14% of the stations for the four considered models fulfil the MPC, respectively. This analysis shows that the uncalibrated model results are not useful for policy applications, especially in suburban and urban areas. Furthermore, it is highly recommended to include sea salt in the model run, especially in North- West Europe which is situated around the North Sea. Figure 8 Map indicating for which stations the model quality objective (Thunis et al., 2012) for annual mean values is fulfilled (< 1, green dot) and for which not (>1, red dot) Temporal validation on daily values For each measurement station, statistical performance indicators were calculated showing how well each model is able to reproduce the measured daily PM 10 concentrations in that station. Figure 9 shows a boxplot of the calculated statistics for each model, Figure 10 and 11 differentiate further according to country and station type. In general, the temporal validation on a station level shows similar results between the models as the spatial validation. The correlation coefficient differs significantly between the different models, countries and station types (p<0.05). The highest correlation (R) between modelled and measured daily PM 10 values can be found for the CHIMERE model, and the lowest for the BelEUROS model. Furthermore, for all models, R is significantly higher in Belgium, France and the United Kingdom than in Germany and Luxemburg (Figure 10). The difference with R for the Netherlands depends on the model used: in the AURORA model, R for the Netherlands is lower than all other countries, while in CHIMERE and BelEUROS it is intermediate Interim report on Joaquin WP2A7 19

20 between Germany and the other countries. Differences between models can be attributed to different shares in station types, f.e. Germany has a high number of urban compared to rural background stations. However, this also accounts for France, leading to the conclusion that station representativeness may also differ between countries. The correlation coefficient for BelEUROS and AURORA is significantly higher for suburban and urban background stations compared to rural background stations (Figure 11). Mean bias is also least negative for the BelEUROS model, although at the French stations a large overestimation can be seen (Figure 10). The most negative mean bias can be found for the AURORA model. This can be attributed to the fact that some sources (sea salt) are missing in this model. No significant differences were found between the different countries or station types for each model, except for the BelEUROS model where French stations showed a less negative bias compared to stations in other countries and urban background stations a less negative bias compared to suburban background stations (Figure 11). The Root Mean Square Error (RMSE) only differs significantly between AURORA and CHIMERE (Figure 9). A slightly higher RMSE can be found for Belgian and French stations compared to German and English stations (Figure 10) and for urban and suburban stations compared to rural background stations (Figure 11). Differences between countries could possibly be attributed to station representativeness. Figure 12 presents the target plot as defined in Thunis et al. (2012) and shows how the models perform with respect to the model quality objective (MQO) set within the Fairmode community. The figure represents the model performance for each station on the target plot, which has the bias/2 RMS U in the vertical axis and the CRME/2 RMS U in the horizontal axis. The radius is equal to RMSE/ 2RMS U. The green area indicates the area in which the MQO criterium is satisfied (see 2.6). As CRMSE/ 2RMS U is always positive, a figure consisting of half a circle would be sufficient. The left and right hand side of the X-axis are used to provide further insight into whether the CRMSE related error is dominated by correlation or by standard deviation. Four main zones are identified in the diagram distinguished by the 45 lines: the lower and top zones identify errors dominated by bias whereas the left and right zones identify errors dominated by correlation or standard deviation. Since the Model Quality Objective requires that 90% of the stations have to fulfil the MQO, none of the models included in our study perform well enough. CHIMERE has the highest number of stations that fulfil the MQO, i.e. 81%, while AURORA has the lowest, i.e. 54%. If for a particular station the MQO is not fulfilled, this is mainly due to a low correlation coefficient or a high standard deviation. This can generally be seen for the German and Dutch stations where the lowest correlation coefficients were found (Figure 13). Occasionally also a negative (all models) or positive (only BelEUROS) bias is predominant, which is then usually observed for French and Belgian stations (Figure 11). In general, the MQO is fulfilled for more stations if daily mean PM 10 values are considered compared to annual mean values (see 3.1.1). This is due to the fact that measurement uncertainty is higher for daily mean than for annual mean values and that consequently the model performance criteria are less strict for daily mean values and easier to fulfil. The high MQO fulfilment for daily mean values also indicates that the models are able to capture the weekly and monthly variation at each station quite well. Interim report on Joaquin WP2A7 20

21 Figure 9 Boxplots of the correlation coefficient, mean bias and RMSE between measured and modelled hourly PM 10 concentrations at monitoring stations, grouped by model Figure 10 Boxplots of the correlation coefficient, mean bias and RMSE between measured and modelled hourly PM 10 concentrations at monitoring stations, grouped by model and country Figure 11 Boxplots of the correlation coefficient, mean bias and RMSE between measured and modelled hourly PM 10 concentrations at monitoring stations, grouped by model and station type Interim report on Joaquin WP2A7 21

22 Figure 12 Target diagram showing the fulfilment of Model Performance Criteria for CHIMERE, AURORA, BelEUROS and LotosEUROS. The percentages indicates the percentage of stations in the green zone that fulfil the MPC. Figure 13 Map indicating for which stations the model quality objective (Thunis et al., 2012) for annual mean values is fulfilled (< 1, green dot) and if not (red symbols), what the main reason for the observed error is. Interim report on Joaquin WP2A7 22

23 3.1.3 Temporal validation on hourly values Since the highest temporal resolution available is for hourly values, statistical indicators were also calculated for this time resolution. France was not included here, since only daily measurement values were available in AirBase. Similar trends between models are observed in Figure 14 compared to Figure 9, but overall the median r is lower than for daily mean PM 10 values. Figure 14 Boxplots of the correlation coefficient, mean bias and RMSE between measured and modelled hourly PM 10 concentrations at monitoring stations, grouped by model Figure 15 Boxplots of the correlation coefficient, mean bias and RMSE between measured and modelled hourly PM 10 concentrations at monitoring stations, grouped by model and country If results are further analysed per country (Figure 15), we notice that all models show the significantly (p<0.05) higher correlation coefficients for stations in Belgium and the United Kingdom, and lower r values for German and Dutch stations. Also, AURORA specifically has low r values for stations in the Netherlands. For both AURORA and BelEUROS the highest r values are found for urban compared to rural background stations (Figure 16). Furthermore, it can be observed that all models show the largest negative bias for Belgian stations, while the bias is almost equal for Germany and the United Kingdom. Mean bias for stations in the Netherlands depends on the model used, in LOTOS-EUROS and BelEUROS it is situated between Interim report on Joaquin WP2A7 23

24 the values for Belgian and German stations, while in AURORA the bias for the Netherlands is the least negative of all countries. RMSE values in Figure 14 are significantly higher for AURORA than for LOTOS-EUROS, which is in turn higher than CHIMERE. RMSE is, for all models, lowest for stations in England, followed by Germany. Highest RMSE values can be found in the Netherlands and Belgium. No significant differences were found between different station types for each model. Figure 16 Boxplots of the correlation coefficient, mean bias and RMSE between measured and modelled hourly PM 10 concentrations at monitoring stations, grouped by model and station type 3.2 PM Spatial validation on annual mean Comparable to PM 10, modelled PM 2.5 concentrations show generally the highest variation between rural and urban areas for the BelEUROS and CHIMERE model, and the lowest for LOTOS-EUROS and AURORA. Differences in concentrations are highest for the regions with highest air pollution levels, i.e. Flanders, north of France and the Netherlands. While CHIMERE and BelEUROS show concentrations between µg/m³ here, AURORA and LOTOS-EUROS concentrations only range between 7-10 or µg/m³ (Figure 17). As for PM 10, PM 2.5 concentrations above sea are elevated for the BelEUROS model due to the underestimation of sea salt deposition, although the effect is less pronounced than for PM 10. Scatterplots of observed versus modelled data (Figure 18 and Figure 19) show that the slope of the regression line is highest for BelEUROS, followed by CHIMERE, LOTOS-EUROS and AURORA. The correlation coefficient is, however, almost the same for all models. Average mean bias for AURORA, BelEUROS, CHIMERE and LOTOS-EUROS is -5.73, -2.18, and -6.39, respectively. The BelEUROS model overestimates PM 2.5 concentrations the most in the Paris region, hereby lowering its correlation coefficient. Correlation coefficients without the French stations amount to 0.59, 0.61, 0.73 and 0.67 for AURORA, BelEUROS, CHIMERE and LOTOS-EUROS respectively. Interim report on Joaquin WP2A7 24

25 Figure 17 Annual mean concentration of PM 2.5 in North-West Europe modelled with CHIMERE, AURORA, BelEUROS and LotosEUROS. Bullets indicate the measured concentrations at urban and rural background stations Figure 18 Scatterplot of modelled versus measured annual mean PM 2.5 concentrations per station type for CHIMERE, AURORA, BelEUROS and LOTOS-EUROS Interim report on Joaquin WP2A7 25

26 Figure 19 Scatterplot of modelled versus measured annual mean PM 2.5 concentrations per country for CHIMERE, AURORA, BelEUROS and LOTOS-EUROS. Figure 20 Map indicating for which stations the model quality objective (Thunis et al., 2012) for annual mean values is fulfilled (< 1, green dot) and for which not (>1, red dot) Interim report on Joaquin WP2A7 26

27 In general, the MQO for annual mean PM 2.5 concentrations is fulfilled for more stations than for annual mean PM 10 concentrations, except for LOTOS-EUROS which performs equally good (Figure 20). For 46, 18, 58 and 6% of the stations CHIMERE, AURORA, BelEUROS and LOTOS-EUROS, respectively, perform sufficiently well. These percentages are higher for all models if only German stations are considered (75, 50, 81 and 14% for CHIMERE, AURORA, BelEUROS and LOTOS- EUROS) and for BelEUROS and CHIMERE if only Dutch stations are considered (72 % for both models). BelEUROS performs less well for urban background stations than for rural and suburban background stations (16 vs. 63 and 62 %, respectively). However, it can be concluded that all models do not fulfil the MQO set for annual mean PM Temporal validation on hourly values Significant differences (p < 0.05) in correlation coefficient (r), mean bias and Root Mean Square Error (RMSE) can be found between the four models. Correlation coefficients are significantly lower for the BelEUROS model compared to the other three models, while mean bias is significantly higher (less negative) for BelEUROS and CHIMERE compared to AURORA and LOTOS-EUROS (Figure 21). RMSE is slightly higher for CHIMERE compared to BelEUROS, although the effect is rather weak (p = 0.03). For BelEUROS, mean bias is significantly higher for French stations than for stations in other countries (Figure 22). Correlation coefficients for BelEUROS, CHIMERE and LOTOS-EUROS are significantly higher for Belgian and English stations, compared to French, German and Dutch stations. Figure 21 Boxplots of the correlation coefficient, mean bias and RMSE between measured and modelled hourly PM 2.5 concentrations at monitoring stations, grouped by model Interim report on Joaquin WP2A7 27

28 Figure 22 Boxplots of the correlation coefficient, mean bias and RMSE between measured and modelled hourly PM 2.5 concentrations at monitoring stations, grouped by model and country Figure 23 Boxplots of the correlation coefficient, mean bias and RMSE between measured and modelled hourly PM 2.5 concentrations at monitoring stations, grouped by model and station type 3.3 NO Spatial validation on annual mean values Concentration maps for NO 2 clearly show the different NO 2 emission sources (Figure 24) hereby illustrating the local character of this pollutant. Especially BelEUROS concentration maps exhibit a clear contrast between urban and rural areas, while AURORA concentration maps have pollution levels more spread out over the whole of NW Europe. CHIMERE and LOTOS-EUROS maps are very similar. Interim report on Joaquin WP2A7 28

29 Figure 24 Annual mean concentration of NO 2 in North-West Europe modelled with CHIMERE, AURORA, BelEUROS and LotosEUROS. Bullets indicate the measured concentrations at urban and rural background stations. From the scatterplots (Figure 25 and Figure 26) of modelled versus measured concentrations, it can be observed that the slope of the regression line is comparable for CHIMERE, AURORA and LOTOS- EUROS, while BelEUROS has a higher slope. Correlation coefficients do not differ much between the models and range from 0.66 to Both CHIMERE and LOTOS-EUROS tend to underestimate annual NO 2 concentrations at all stations, while BelEUROS overestimates concentration levels at several suburban and urban stations in all countries. AURORA overestimates for rural stations and underestimates for urban and suburban stations. The Model Quality Objective for annual mean NO 2 value is fulfilled for 70% of the stations in the CHIMERE simulation, 82% in AURORA simulation, 65% in the BelEUROS simulation and 74% in the LOTOS-EUROS simulation (Figure 27). As the objective is to have 90% of the stations fulfilling the MQO, none of the models in this study can be used to assess NO 2 annual mean values. Interim report on Joaquin WP2A7 29

30 Figure 25 Scatterplot of modelled versus measured annual mean NO 2 concentrations per station type for CHIMERE, AURORA, BelEUROS and LOTOS-EUROS Figure 26 Scatterplot of modelled versus measured annual mean NO 2 concentrations per station type for CHIMERE, AURORA, BelEUROS and LOTOS-EUROS Interim report on Joaquin WP2A7 30

31 Figure 27 Map indicating for which stations the model quality objective (Thunis et al., 2012) for annual mean NO 2 values is fulfilled (< 1, green dot) and for which not (>1, red dot) Temporal validation on hourly values In general, correlation coefficients per station between modelled and measured hourly NO 2 values differ significantly between the models and increase from BelEUROS over AURORA and LOTOS- EUROS to CHIMERE (Figure 28). For CHIMERE and AURORA, highest r values are found for stations in the Netherlands and lowest values for Luxemburg and some German stations (Figure 29). For BelEUROS, values are highest in Belgium and the United Kingdom. No significant differences were found between stations types (Figure 30). Mean bias was most negative for CHIMERE (-7.15) and LOTOS-EUROS (-6.83) and slightly positive for AURORA (0.25). Mean bias for BelEUROS varied considerably from negative to positive values depending on the station. Values were for CHIMERE, AURORA and BelEUROS highest for stations in the Netherlands and generally lowest for French or German stations (most negative). For LOTOS- EUROS, no significant differences between countries were found. Station type only affected Mean Bias for CHIMERE, AURORA and LOTOS-EUROS, with a significantly more negative bias for suburban and urban stations. This could be explained by the spatial resolution of the models under study, which does not allow assessing the locally higher concentrations in urban environments. The Root Mean Square error was significantly lower for CHIMERE and LOTOS-EUROS compared to BelEUROS and AURORA. Differences between countries were rather limited, only for AURORA a higher RMSE was found for British stations compared to Dutch stations. RMSE was significantly affected by station type, with lower RMSE values for rural than for urban and suburban stations. Interim report on Joaquin WP2A7 31

32 Figure 28 Boxplots of the correlation coefficient, mean bias and RMSE between measured and modelled hourly NO 2 concentrations at monitoring stations, grouped by model Figure 29 Boxplots of the correlation coefficient, mean bias and RMSE between measured and modelled hourly NO 2 concentrations at monitoring stations, grouped by model and country Figure 30 Boxplots of the correlation coefficient, mean bias and RMSE between measured and modelled hourly NO 2 concentrations at monitoring stations, grouped by model and station type Interim report on Joaquin WP2A7 32

33 Figure 31 shows the target plot indicating the number of stations fulfilling the MQO on hourly NO 2 values for each model. Only the BelEUROS model does not attain the criterium that for 90% of the stations the MQO needs to be fulfilled. This is mainly due to a positive bias for several stations in Belgium, the Netherlands and the United Kingdom. Since the same high resolution bottom-up emissions were used in all models, the overestimation for several stations is solely attributable to the BelEUROS model itself. Figure 31 Target diagram showing the fulfilment of Model Performance Criteria on hourly NO 2 values for CHIMERE, AURORA, BelEUROS and LotosEUROS. The percentages indicate the percentage of stations in the green zone that fulfil the MPC. 3.4 EC Spatial validation on annual mean values Figure 32 represents modelled annual mean EC concentrations in North-West Europe calculated by the four models. Concentration maps of CHIMERE and LOTOS-EUROS are very similar, while in the BelEUROS map much more peak concentrations are observed in cities and in the AURORA map less (with the exception of London). Also, in the BelEUROS concentration map the shipping routes in the North Sea are clearly visible while for the CHIMERE and AURORA models an overall increased EC concentration in the North Sea is observed. All models exhibit high concentrations in the region of Paris, due to the high emissions here (see 2.3). Interim report on Joaquin WP2A7 33

34 Figure 32 Annual mean concentration of EC in North-West Europe modelled with CHIMERE, AURORA, BelEUROS and LotosEUROS. Scatterplots between modelled and measured annual mean values show that the variation in EC over North-West Europe is not well captured by the models (Figure 33 and Figure 34), but here large differences exists between countries. Correlation coefficients over NWE amount to 0.14, -0.04, 0.15 and for CHIMERE, AURORA, BelEUROS and LOTOS-EUROS, respectively. This is mainly due to the bad validation statistics with French and British measurements (see and 3.4.5). Without these stations, spatial correlation is 0.81, 0.83, 0.82 and 0.71 for the four models, respectively (Figure 355). Interim report on Joaquin WP2A7 34

35 Figure 33 Scatterplot of modelled versus measured annual mean EC concentrations for CHIMERE, AURORA, BelEUROS and LOTOS-EUROS per station type Figure 34 Scatterplot of modelled versus measured annual mean EC concentrations for CHIMERE, AURORA, BelEUROS and LOTOS-EUROS per country Interim report on Joaquin WP2A7 35

36 Figure 35 Scatterplot of modelled versus measured annual mean EC concentrations for AURORA, BelEUROS, CHIMERE and LOTOS-EUROS without the French and British stations BC measurements Belgium In 2009 only 6 Black Carbon stations were operational in Belgium, of which 5 were situated in the Flanders region and one in Brussels. Black Carbon in Flanders is measured with Multi-Angle Absorption Photometer (MAAP) devices, Thermo Scientific Model 5012, while in Brussels Aethalometer 22 (AE22) devices are used. The MAAP determines the black carbon content of aerosols by simultaneous measurement of optical absorption and scattering of light by the particles collected on the filter tape (Joaquin WP1 Methods, 2015). AE22 devices operate typically at two wavelengths (880 and 370 nm), of which the 880 wavelength is used for BC measurement. In general, MAAP's tend to measure mainly BC originating from traffic, while AE22 devices also track BC from domestic sources (e.g. biomass burning). Black Carbon concentrations measured with optical devices tends to be 1.5 to 2 times higher than EC concentrations measured with a thermo-optical method (unpublished data). Scatterplots with annual mean averages are not produced here, since a data capture of 75% as required to calculate the annual mean was only available for 3 out of 6 stations. As can be seen from Figure 366, the EC emissions used as input for our models result, in general, in daily average EC concentrations that are a factor 2 lower than the BC concentrations measured in the monitoring stations. Only at higher concentration levels, the CHIMERE, LOTOS-EUROS and AURORA models deviate from this 1:2 relationship. BelEUROS tends to model higher EC concentrations at certain stations (42R801, 41WOL1 and 40SZ01), yielding a regression line between the 1:2 and 1:1 line. These stations are classified as urban, traffic and suburban respectively, while the other three stations are industrial stations. Linear regression between hourly modelled and measured values returns R² values that are highest for CHIMERE and AURORA, and lowest for BelEUROS and LOTOS-EUROS (see Figure 377 as an example). Temporal validation statistics (Figure 38) yield significantly higher correlation coefficients for CHIMERE and AURORA compared to BelEUROS and LOTOS-EUROS. In contrast, mean bias is significantly less negative for BelEUROS than for the other models. Interim report on Joaquin WP2A7 36

37 Figure 36 Scatterplot of daily modelled EC values versus BC measurements per station in Belgium. BC was measured using MAAP or AE22 (41WOL1) devices Figure 37 Scatterplot and linear regression lines per model of EC model values versus BC measurements. Plots are made for an urban background station in Antwerp. Interim report on Joaquin WP2A7 37

38 Figure 38 Boxplots of the correlation coefficient, mean bias and RMSE between measured BC and modelled hourly EC concentrations at monitoring stations, grouped by model The normalized Taylor diagram in Figure 39 provides a different way of graphically summarizing how closely the models match observations at each measurement station, quantified by the position of each model on the plot. The centred RMSE (indicated by the brown contours) between the simulated and observed patterns is proportional to the distance to the point on the x-axis identified as observed. The standard deviation of the simulated pattern is proportional to the radial distance from the origin. Simulated patterns that agree well with observations will lie nearest the point marked "observed" on the x-axis. These models will have relatively low RMSE and high correlation. If the model then also lies on the dashed arc, it will have the correct standard deviation (which indicates that the pattern variations are of the right amplitude) (Taylor, 2001). Figure 39 shows that the CHIMERE, AURORA and LOTOS-EUROS results are closest together for all stations, with a standard deviation which is about one-third of the observed value due to the fact that EC is compared with BC (about times higher). For 42R801, 41WOL1 and 40SZ01, BelEUROS has a standard deviation closer or higher (42R801) to the observations. Figure 40Figure 41 illustrates in more detail the time variation of all models compared to the observations for January 2009 at the urban background station 42R801. This figure shows that the BelEUROS model overestimates EC concentrations at some peaks, while the other three models always underestimate these. Interim report on Joaquin WP2A7 38

39 Figure 39 Taylor diagram per model showing the validation statistics for each measuring station. Figure 40 BC observations and EC model values in function of time during January 2009 at station 42R801 Interim report on Joaquin WP2A7 39

40 3.4.3 EC measurements Belgium In Flanders, several chemical characterization campaigns of particulate matter took place since 2006 onwards. Results of the second campaign that took place at 6 hotspot and 3 background locations from October 2008 to December 2009 could be used for our validation exercise. EC was thermooptically analysed in PM 10 using the NIOSH2 (National Institute for Occupational Safety and Health) protocol which distinguishes EC from Organic Carbon in total carbon. NIOSH2 is one of the protocols that could potentially become the reference method to determine EC/OC in Europe. Compared to other protocols, this protocol generally returns the lowest EC and the highest OC concentrations and is the purest tracer for traffic emissions (VMM, 2010; Reisinger et al., 2008). As can be seen from Figure 41, the relation between annual mean modelled and measured concentrations is closer to the 1:1 line than in section 3.4.2, although CHIMERE, AURORA and LOTOS-EUROS still seem to underestimate higher concentrations. In contrast, BelEUROS tends to both under- and overestimate at all measuring stations, but overestimates especially at R801 which is an urban background station in Antwerp (Figure 42). Spatial correlation coefficients amount to 0.87, 0.81, 0.81 and 0.74 for CHIMERE, AURORA, BelEUROS and LOTOS-EUROS, respectively and RMSE values to 0.27, 0.36, 0.71 and 0.36, respectively. Figure 41 Scatterplot of annual mean modelled values versus EC measurements per station in Belgium Interim report on Joaquin WP2A7 40

41 Figure 42 Scatterplot of daily modelled EC values versus EC measurements per station in Belgium. EC was measured using the NIOSH protocol. Correlation coefficients range from 0.3 to 0.8, depending on the stations, but are not significantly different between the models (Figure 43 and Figure 44). Correlation coefficients are for all models very low for the station OB01 (see also Figure 44). This could be due to the fact that measured organic matter was relatively high at this location, compared to EC. This OM originates from a wood processing factory near the measurement location (VMM, 2010). The split between EC and OC is highly dependent on the analytical protocol being used, in this case NIOSH which allocates mainly traffic originating EC to the EC fraction and other sources to the OC fraction. In contrast, emission inventories will also add biomass combustion to the EC fraction. It is therefore possible that the models completely miss the temporal profile because they model both traffic and biomass combustion as sources of EC. Mean bias ranges from for LOTOS-EUROS to 0.32 for BelEUROS and is significantly higher for BelEUROS compared to the other three models. No significant differences can be found for the RMSE error. From the Taylor diagram at Figure 44 we can see again that the BelEUROS model tends to overestimate EC concentrations at certain stations, especially at stations located in an urban area. The other three models lie closely together with LOTOS-EUROS generally having a slightly higher correlation coefficient or lower RMSE error. Figure 45 shows the temporal profile at a hotspot location compared to a rural background location. All models are able to reproduce the temporal profile at the rural background station, while at the hotspot location, mainly influenced by traffic, several peak concentrations are missed. Interim report on Joaquin WP2A7 41

42 Figure 43 Boxplots of the correlation coefficient, mean bias and RMSE between measured and modelled daily EC concentrations at Belgian monitoring stations, grouped by model Figure 44 Taylor diagram per model showing the validation statistics for each measuring station Interim report on Joaquin WP2A7 42

43 Figure 45 EC observations and model values in function of time at the hotspot location M705 (top) and the rural background location N012 (bottom) Interim report on Joaquin WP2A7 43

44 3.4.4 BC measurements United Kingdom In the United Kingdom BC was measured at 15 locations which are situated within the Joaquin modelling domain in All stations are part of the UK Black Carbon Network and are urban, suburban or rural background stations, except the station with code GB0682 which is a traffic station at London Marylebone Road (Butterfield et al., 2011). This station was excluded from the analysis. BC is measured by means of Aethalometer 22 devices, which operate typically at two wavelengths (880 and 370 nm). The 880nm wavelength is used to measure the Black Carbon (BC) concentration of the aerosol, while the 370nm wavelength gives a measure of the UV component of the aerosol. Comparisons between Black Carbon concentrations and Elemental Carbon concentrations showed good agreement between the measurements at all the sites where these measurements are collocated (North Kensington, Marylebone Road and Harwell) (Butterfield et al., 2011). EC was measured here by a protocol called Quartz, a close variation of the NIOSH protocol (Beccacci et al., 2010). Spatial validation statistics are rather poor for the English stations, with correlation coefficients of , -0.21, 0.03 and for CHIMERE, AURORA, BelEUROS and LOTOS-EUROS, respectively (Figure 46). This could be due to the fact that most stations are classified as urban background, for which the resolution of the models used is too low. Figure 46 Scatterplot of annual mean modelled values versus EC measurements per station in England Also temporal correlation coefficients are generally lower ( ) for the British BC stations than for the Belgian stations. Between the models, they are significantly higher for the CHIMERE than for AURORA (Figure 47 and Figure 48). Mean bias is significantly less negative for BelEUROS than for the other three models, but this is because the BelEUROS model often overestimates lower concentrations and underestimates the higher concentrations. No significant differences were found for the RMSE. Interim report on Joaquin WP2A7 44

45 Figure 47 Scatterplot of daily modelled EC values versus BC measurements per station in England. BC was measured using AE22 devices Figure 48 Boxplots of the correlation coefficient, mean bias and RMSE between measured BC and modelled hourly EC concentrations at monitoring stations (except GB0682), grouped by model. Also from the Taylor diagram in Figure 49 it is clear that validation statistics are highly variable depending on the station considered. For GB0036 and GB0135, respectively a rural and a suburban background station, correlation coefficients are not very high, but all models approach the standard deviation of the observations. In contrast, for GB0182A and GB0303A, which are suburban and urban background stations, the standard deviation of the model values is much lower than those of the Interim report on Joaquin WP2A7 45

46 observations. It needs to be mentioned here, as in section 3.4.2, that EC model values are compared with BC observations. Although a good agreement was found between BC measured by the AE22 devices and separate EC measurements that were made, and that AE22 devices generally measured both traffic and combustion related BC in contrast to MAAP devices, overall model performance is less good for the English stations than for the Belgian BC stations. Several explanations are possible here. Firstly, most of the English stations are classified as urban or suburban background stations, for which the resolution of the models is perhaps too low, depending on the spatial representativeness of the stations. Secondly, EC emissions used in the models were derived differently for Belgium and England, i.e. at different sector levels (Denier van der Gon et al., 2014). Figure 49 Taylor diagram per model showing the validation statistics for each measuring station EC measurements France For France, EC measurements were obtained at six monitoring sites within the framework of a source apportionment campaign of PM in the Île-De-France region from September 2009 to September 2010 (Airparif, 2012). One of the sites was a traffic site and was excluded from the analysis. Furthermore, there were three rural background sites in the South, Northeast and Northwest of Paris, an urban background site in the city centre and a suburban background site at Villemomble. EC was analysed by a thermo-optical method, the type of protocol used is not specified. It is immediately clear from Figure 50 and Figure 51 that all models overestimate EC concentrations at the urban (Par) and suburban (VILL) background site, while they underestimate the concentrations at the three rural sites. This can be explained by the emissions used for the models (see also 2.3), i.e. the top-down MACC emission inventory based on EMEP does not present the same level of accuracy as a local bottom-up inventory from Airparif. Particulate emissions are overestimated by a factor of 3 Interim report on Joaquin WP2A7 46

47 for the city of Paris (Denier van der Gon et al., 2012). Consequently, spatial correlation coefficients are very low (negative) for the French stations. Figure 50 Scatterplot of annual mean modelled values versus EC measurements per station in France Temporal correlation coefficients per station ( Figure 52) range from 0.2 to 0.6 and are highest for the urban and suburban stations, despite the large overestimation of EC concentrations. Mean bias is closest to 0 and RMSE lowest for the station South of Paris, but here correlation coefficients are not good. No significant differences could be found between the different models regarding r, MB or RMSE, most likely due to the large variation in the values between the stations. Interim report on Joaquin WP2A7 47

48 Figure 51 Scatterplot of daily modelled EC values versus EC measurements per station in France. EC was measured using a thermo-optical method. Figure 52 Boxplots of the correlation coefficient, mean bias and RMSE between measured and modelled daily EC concentrations at monitoring stations, grouped by model From the Taylor diagram in Figure 53, it seems that all models still perform best for the rural background stations. For the urban and suburban background station, the CHIMERE model lies closest to the observations. Interim report on Joaquin WP2A7 48

49 Figure 53 Taylor diagram per model showing the validation statistics for each measuring station Figure 54 Modelled and measured EC concentrations in function of time for the measurement station Northeast of Paris Interim report on Joaquin WP2A7 49

50 3.5 Calibrated ensemble maps and population exposure Since it was clear from 3.1 and 3.2 that all models underestimate PM 10 and PM 2.5 concentrations, a calibration of model results was necessary to present final concentration maps over the NWE region. This calibration was performed on ensemble model maps, i.e. the mean of all model results in a grid cell. In order to perform this exercise, all models were interpolated to the same grid of x Figure 55 presents the ensemble model results versus the available measurements for PM 10, PM 2.5, EC and NO 2. French measurements were excluded from the analysis for PM 10, PM 2.5 and EC since in section 3.1, 3.2 and 3.4 it was clearly shown that model results were out of range for these pollutants in France due to the bad allocation of emission data between rural and urban areas. Model results for BelEUROS were excluded from the analysis for PM 10, due to the serious overestimation of sea salt concentrations. A test revealed that this had very little impact on ensemble annual mean concentration levels above land. Figure 56 presents the final ensemble maps for PM 10, PM 2.5, EC and NO 2. Based on these results, population exposure of the North-West European people to certain concentration levels was calculated based on available population data from Eurostat, the European Statistical database ( This resulted in population exposure graphs per NUTS1 region and per pollutant (Figure 57 to Figure 60). Figure 55 Scatterplots of ensemble modelled results versus observed values for PM 10, PM 2.5, NO 2 and EC Interim report on Joaquin WP2A7 50

51 Figure 56 Ensemble maps of all models for PM 10, PM 2.5, EC and NO 2. Model results for PM 10 and PM 2.5 are calibrated based on the regression equations in Figure 55. Interim report on Joaquin WP2A7 51

52 Figure 57 Exposure of people per NUTS1 region to annual mean EC concentrations Figure 58 Exposure of people per NUTS1 region to annual mean NO 2 concentrations Model intercomparison exercise over the Joaquin domain September

53 Figure 59 Exposure of people per NUTS1 region to annual mean PM 10 concentrations Figure 60 Exposure of people per NUTS1 region to annual mean PM 2.5 concentrations Model intercomparison exercise over the Joaquin domain September