COST ES1006 Model evaluation protocol

Size: px

Start display at page:

Download "COST ES1006 Model evaluation protocol"

Brian Richardson
5 years ago
Views:

1 COST ES1006 Model evaluation protocol COST Action ES1006 Evaluation, improvement and guidance for the use of local-scale emergency prediction and response tools for airborne hazards in built environments April 2015

2 Legal notice by the COST Office Neither the COST Office nor any person acting on its behalf is responsible for the use which might be made on the information contained in this publication. The COST Office is not responsible for the external websites referred to in this publication. Contact: Dr. Deniz Karaca Science Officer Cost Office Avenue Louise Brussels Belgium Tel: Fax: Deniz.Karaca@cost.eu Contributing authors: S. Andronopoulos, F. Barmpas, J.G. Bartzis, K. Baumann-Stanzer, E. Berbekar, G. Efthimiou, C. Gariazzo, F. Harms, A. Hellsten, S. Herring, K. Jurcakova, B. Leitl, S. Trini-Castelli COST Office, 2015 No permission to reproduce or utilize the contents of this book by any means is necessary, other than in the case of images, diagrams or other material from other copyright holders. In such cases, permission of the copyright holders is required. This book may be cited as: COST ES1006 Model Evaluation Protocol, COST Action ES1006, April 2015 ISBN: Distributed by University of Hamburg Meteorological Institute Bundesstraße 55 D Hamburg, Germany

3 Contents 1 Introduction Terms of Reference Objectives of model evaluation Who carries out the model evaluation and who uses its results? Background Model evaluation frameworks Existing model evaluation protocols Selected model evaluation protocol Specific requirements regarding emergency response Reference data requirements for emergency response modelling Problem related data selection criteria Problem representativeness The source Measurements criteria Flow measurements Concentration measurements Statistical representativeness of the measurements QA/QC of the measurements and measurement post processing Measurement uncertainty Description of the Model Evaluation Protocol Model description Reference data base description Scientific evaluation Code Verification User-oriented evaluation Model validation (corroboration) Selection of quantities to be compared... 23

4 6.6.2 Statistical characteristic of quantity Qualitative analysis / graphical depiction Quantitative indicators (statistical metrics) Acceptance criteria based on indicators values Uncertainty and sensitivity analysis Background Definitions of main uncertainty components in the context of ER Recommendations for the application of the ADMEP Introduction General recommendations Stage specific recommendations Reference data base selection Scientific evaluation Model validation Open issues and future work Model Validation Considerations (J G Bartzis) Model to model validation A novel metric to evaluate the model s performance in predicting hazard zones (C. Gariazzo and A. Pelliccioni) References Annex: Glossary of Terms... 57

5 1 Introduction In COST Action ES1006 a main goal is to draw an Atmospheric Dispersion Model Evaluation Protocol (ADMEP) for the quality assurance of Air Dispersion Models (ADM) numerical simulations in cases of accidental or deliberate releases of airborne agents in urban areas in the Emergency Response (ER) context. The main aim for the application of an ADM in ER cases is to forecast the state of the physical system at any given time within the lifetime of the accidental release, under conditions that contain a large degree of uncertainty. In the ER framework, three distinct phases, comprising the actual application of these models, need to be addressed: 1. pre-event analysis and planning / preparedness (a priori predictions) 2. predictions during an event / emergency phase 3. post-event analysis (a posteriori simulations) This ADMEP is specifically intended to target the problem of best practice in the particularly strict time limitations, since fast response is needed, and limited available human and computational resources. Also, the larger number of unknowns involved during accidental releases, together with the complexity of the urban structure, makes the evaluation problem distinctive with respect to routinely air-quality assessment, since particular attention has to be paid to parameters which generally are not considered critical in standard ADM applications. The experience gained from a large number of past efforts in the scientific community leads it to conclude that any quality assurance procedure of numerical models should include the following broad features: Scientific review Assurance of correct coding (code verification) Comparison of model results with experimental data (model validation) Model s uncertainty and sensitivity quantification Operational evaluation A thorough review of literature and the elaboration of specific concepts for ER applications allow establishing the basis of this ADMEP document and are illustrated in the following chapters. 1

6 In particular, when considering the term prediction for ADMs applied in ER context, it is important to provide an estimation and quantification of the uncertainties. In an emergency situation, the model must be able to predict the transient dispersion process associated with short (puff) and long (continuous) duration releases. Time averaged concentrations may thus display a large scatter, and should therefore be treated as a stochastic process. In addition, within the urban canopy layer atmospheric dispersion is highly inhomogeneous in space and intermittent in time, and the uncertainty in the source term plays a key role in affecting the reliability of the model outputs. This implies that the simulations can exhibit a high degree of uncertainty, besides the uncertainties related to the model itself. Within the ADMEP, this issue is addressed considering a probabilistic and comprehensive approach, which can provide a quantification for such stochastic processes. When approaching the operational evaluation, it is also necessary to find an efficient way to communicate to the stakeholders the quantified uncertainties of the numerical outcomes of a particular model. The ADMEP document is oriented towards the support of responsible authorities and stakeholders in the decision making process and aims at proposing practical recommendations. The protocol is expected to reflect a consensus among the various parties involved: the model developer, the model user and the stakeholder. The structure of the document is designed to address all introduced issues with a concise and efficient approach, in order to make it easily accessible and usable by the interested scientific community. 2

7 2 Terms of Reference As anticipated in the Introduction, this document describes an Atmospheric Dispersion Model Evaluation Protocol (ADMEP) oriented towards Emergency Response (ER) to accidental or malevolent atmospheric releases of hazardous substances in built-up areas. It is based on and expands the work contained in chapter 5 of COST ES1006 (2012) - Background and Justification Document, as well as the work of Trini-Castelli et al. (2014). Atmospheric Dispersion Models (ADM) are numerical models, in the sense defined by Steyn and Galmarini (2008), i.e. models in which variables in a numerically solved system of equations are taken as analogous to measurable environmental quantities. For ER, the first variable environmental quantity of interest is concentration in the air of the hazardous substance(s), from which other important variables can be calculated (e.g. doses, plume arrival times, affected areas, etc.). Deposition on the ground is another quantity of interest, especially for nuclear and radiological substances, which continue to irradiate after having deposited on the ground. Of special interest to ER to releases of hazardous substances (toxic or flammable) are the probability density function and the statistical moments about the mean of concentration in the air: variance, skewness and kurtosis. From these, maximum values of concentration and short-term dosages can be calculated. The ADM that concern this document are used in a predictive way, i.e., to provide a prognosis of the evolution in time and structure in space of the dispersion phenomenon. From this prognosis, consequences can be assessed and countermeasures can be planned. Model evaluation is defined as the examination of a model according to a set of well-defined rules embodied in the evaluation protocol (Duijm and Carissimo, 2001). 2.1 Objectives of model evaluation Based on the existing literature (USEPA 2009, Dennis et al. 2010, FAIRMODE 2010), model evaluation objectives are: 1. Determining the suitability of an ADM for a specific application and configuration: in the particular topic of COST Action ES1006 this objective refers to determining when a model, given its uncertainties, can be used as emergency response decision support for hazardous substance releases in urban environments. 3

8 2. Distinguishing the performance among different models or different versions of the same model: this objective provides important information for best practice guidance on the appropriate use of different types of models for the purposes of emergency response 3. Guiding model improvement: identification of model weaknesses and the origin of uncertainties are expected to lead to model improvements. 2.2 Who carries out the model evaluation and who uses its results? The model evaluation is carried out either by the model developer or by the model user. To avoid issues of impartiality, the relation of the evaluator with the model should be documented and the evaluation results should be verifiable by means of an audit. The training and skills required for the person that carries out the evaluation depend on the complexity of the particular model and on the specific step of the model evaluation. For instance, different skills are required to perform scientific assessment of the model than to perform operational assessment. If the evaluation is performed by a model user, it is important to document whether clarification or advice has been requested from the model developer for the appropriate use of the model. It is important to consider that in case of use of the model for actual emergency response, feedback from the developer may not be available. Ideally, more than one evaluations of the same model by different users should be performed, to assess the uncertainty in model evaluation results related to the model user factor. Stakeholders for the atmospheric dispersion model evaluation results are the model developers and the authorities that use the model for decision support in cases of emergency in all the three phases previously introduced. Model developers are interested in improving their model, while authorities are interested in determining whether or not and how a specific model should be used in specific scenarios to achieve reliable model results. A certain level of skill and knowledge in both, the models and its practical application is required to correctly interpret the model evaluation results. 4

9 3 Background Protocols, methodologies and computerized systems for evaluation of air quality models have been presented by Fox (1981), Hanna (1989), Cox and Tikvart (1990), Hanna and Davies (2002), Chang and Hanna (2004), Dennis et al. (2010), Appel et al. (2011), Thunis et al. (2012a) and Thunis et al. (2012b). Hanna and Chang (2012) proposed performance acceptance criteria for general urban dispersion models. The above methodologies and validation metrics include several elements that can be adopted in the frame of the present MEP. The evaluation protocols / procedures and data sets for denser-than-air gases dispersion models described by Duijm et al. (1996), Duijm et al. (1997) and Duijm and Carissimo (2001), are relevant to the topic of COST Action ES1006, as they concern hazardous gases dispersion at the local scale (i.e., up to 1km distance from the release location). Similar evaluation steps for models of local-scale dispersion in built up environment are described by Schatzmann et al. (1997). The above model evaluation elements were adopted by Schatzmann and Leitl (2002) and also by COST Action 732 (2007a, b). Closely related to emergency response is the methodology and on-line system for evaluation of long-range atmospheric transport models of harmful substances presented by Mosca et al. (1998) and Bellasio et al. (1999). The concept of ensemble dispersion modelling for emergency response and its evaluation is presented by Galmarini et al. (2001) and Galmarini et al. (2004a, b). Examples of atmospheric dispersion model evaluation studies concerning hazardous and in particular denser-than-air gases are those reported by Hanna et al. 1991, Hanna 1993, and Hanna et al More recent studies concerning dispersion modelling in urban environment are those presented by Hanna et al. (2006), Harms et al. (2011) and Schatzmann and Leitl (2011). The issue of model uncertainty is closely related to any model evaluation protocol. Atmospheric dispersion models (and scientific computing in general) uncertainty analysis studies have been presented by Fox (1984), Venkatram (1988), Hanna et al. (1998), Hwang et al. (1998), Sørensen (1998), Dabberdt and Miller (2000), Yegnan et al. (2002), Rao (2005), Roy and Overkampf (2010), Baumann-Stanzer and Stenzel (2011). Performance assessment frameworks for environmental models that are used for management and decision making are proposed by USEPA (2009) and Bennett et al. (2013). The document of FAIRMODE (2010) reviews air quality model evaluation protocols existing by that time. 5

10 6

11 4 Model evaluation frameworks 4.1 Existing model evaluation protocols The model evaluation frameworks that have been identified in the literature are structured by a series of steps or components with specific tasks. Two of the most characteristic existing model evaluation schemes consist of the following components: USEPA (2009): (a) Peer review / scientific evaluation of the model; (b) Quality Assessment (QA) project planning and data QA; (c) Qualitative and/or quantitative model corroboration (comparison of model results with experimental data); and (d) Sensitivity and uncertainty analyses Dennis et al. (2010): (a) operational evaluation (comparison of model predictions to experimental data; calculations of model errors and biases); (b) dynamic evaluation (the sensitivity of the model to changes in input data mainly meteorological and source term); (c) diagnostic evaluation (determination of the origin of model uncertainties: from input data, or from limitations in the physics modelled); (d) probabilistic evaluation (determination of confidence in the model s predicted values; comparisons of experimental data with uncertainty ranges associated with the model s predictions) The above mentioned evaluation protocols have been designed for and are applied to air quality models mainly. Duijm et al. (1997), Duijm and Carissimo (2001) and COST 732 (2007a, b) have proposed similar structures for model evaluation protocols, composed by the following steps: (a) Model description (origin, type, documentation, etc.); (b) Description of the database used for the validation (references, type, release conditions, quality information, etc.); 7

12 (c) scientific assessment of the model (description of the physical and chemical phenomena accounted for by the model, assumptions made, mathematical and physical algorithms, model constants, solution techniques); (d) user oriented assessment of the model ( user-friendliness, guidance and assistance of the user, quality of user documentation, computer requirements); (e) code verification (software errors); (f) Validation (or corroboration) of the model by comparing model predictions with (experimental) observations. These model evaluation protocols have been designed and applied for dense gas dispersion models and for local scale flow field in built up environments. 4.2 Selected model evaluation protocol For the aims of COST Action ES1006, taking into account what has been presented in section 4.1 and also considering the recommendations presented in COST ES1006 (2012), the following structure for an ADMEP is adopted here: a) Model description (origin, type, documentation, etc.); b) Description of the database used for the validation (references, type, release conditions, quality information, etc.); c) Scientific assessment of the model (description of the physical and chemical phenomena accounted for by the model, assumptions made, mathematical and physical algorithms, model constants, solution techniques); d) User oriented assessment of the model ( user-friendliness, guidance and assistance of the user, quality of user documentation, computer requirements); e) Code verification (software errors); f) Validation (or corroboration) of the model by comparing model predictions with (experimental) observations. g) Sensitivity and uncertainty analyses: characterization of model s uncertainties, propagation of input parameters uncertainties to assess the uncertainties in the results, assessment of the model s response to changes in input data or to in-model parameterizations and methods of solution. 8

13 The above steps will need to address the specific requirements of local-scale emergency response applications and the objectives of COST Action ES1006 as far as the improvement of reliability of model simulation results are concerned. 4.3 Specific requirements regarding emergency response Specific requirements for ADM applied for accidental releases and, specifically, in Emergency Response that in addition arise from the objectives of COST Action ES1006 are the following (see also COST ES1006, 2012): (a) Computation of dispersion from transient in time releases: the requirement for computing dispersion from time-dependent releases dictates the selection of quantities that need to be considered in the evaluation process: dosage (i.e., time-integrated concentration), peak concentration, cloud arrival time, time of peak concentration, cloud passage duration. The stochastic nature of the dispersion process from transient releases dictates also the statistical characteristics of the variables that will be used in the model evaluation: ensemble mean value, most probable value, specific percentiles (e.g., 5 th or 95 th percentiles). (b) Computation of flow and dispersion in built up (urban or industrial) environment: computation of the wind flow field in the local scale, i.e., the flow scales that are influenced by the presence of buildings, can be performed either diagnostically or prognostically. Prognostic calculations require more time and computational resources but are of higher quality. Computation of dispersion influenced by the presence of buildings in the local scale (having as input the previously mentioned wind fields) is performed only by advanced ADM of Lagrangian or Eulerian type, which, of course, require more time and computational resources than simple Gaussian models. (c) Computation of affected areas based on a defined threshold of a quantity of interest (e.g. concentration for a continuous release, dosage, maximum concentration or deposition for a puff release): The affected areas are of direct relevance to emergency response because they are used to define areas of intervention or application of countermeasures. 9

14 (d) Modelling of special physico-chemical phenomena: buoyancy effects (positively or negatively buoyant substances in air), phase changes (evaporation, condensation), chemical reactions (especially deflagration or detonation) are phenomena often encountered during the release of hazardous substances into the air. (e) Required computing resources (computing time and hardware): during the emergency response phase the shortest computing time, combined with the smallest hardware requirements and the simplicity of input data is an absolute requirement for an ADM. During the preparedness as well as the post-event phases restrictions on computing resources are much less stringent. The above requirements need to be considered in the corresponding steps of the MEP. 10

15 5 Reference data requirements for emergency response modelling In order to perform a quantitative model evaluation, reference data sets are needed. Before been used for this purpose, the experimental data must be evaluated themselves and selected against specific criteria that depend on the applications of interest for the model under evaluation. Therefore, it was considered necessary to include a section in this ADMEP, dedicated to the requirements that reference experimental data should fulfil to be suitable for evaluation of dispersion models used for emergency response. The criteria for the selection of experimental databases that are to be used for the validation of local-scale dispersion models used in emergency response can be grouped in the following two categories: 1. Problem related criteria 2. Measurements set up criteria 5.1 Problem related data selection criteria The criteria of this category concern mainly the representativeness of the problem and the source of the release Problem representativeness The test cases should represent as close as possible the reality for which the model will be/is/was used in terms of geometry, ambient conditions and releases. Any reference database must provide sufficiently detailed information on the location of the test site and the geometry features, boundary conditions and release configuration to assess its problem-related representativeness The source The source, i.e. the release scenario, should be chosen as realistic as possible. A reference database should provide sufficiently detailed information on the type of the source, the type of the agent and the physical and chemical discharge conditions. 11

16 Generally, in case of an emergency, three generic types of sources are expected: a) point sources, b) line sources and c) area sources and three realistic discharge conditions a) instantaneous releases, b) finite releases and c) continuous releases. Concerning the type of the agent a database is expected to include passive or/and buoyant gas agents. In a reference database the above mentioned information should contain a proper description of the source profile such as the release duration, the source location and the density of the agent as well as possible changes of source characteristics in time. 5.2 Measurements criteria The criteria of this category mainly concern the experimental flow and concentration measurements Flow measurements Sufficient meteorological/hydrodynamic data must be available in a database, obtained from sensors located near and/or within the area of interest. A database should include typical meteorological and hydrodynamic information such as the location and the height of wind measurements, velocity and temperature time series of sufficient time resolution to be able to estimate turbulence and stability related model parameters. For field experiments air humidity and precipitation data (if applicable) need to be included as well. As an option, the lack of time series can be replaced by statistical parameters such as mean values, velocities variances or stability classes for instance. The data should be sufficient in order to: a) Derive appropriate flow boundary conditions for the numerical models b) Describe the flow behavior in the areas of interest. It must be noted also that documentation of the dispersion over a preferably wide range of different meteorological wind directions on the site is very important for systematic, application-oriented validation of an emergency response model. Also possible reference data should be evaluated with respect to the location of available flow data driving the dispersion processes. Hence, emergency response related model validation data sets require ground level and in-canopy data to be provided especially. Wind data at higher elevations might help to understand weak points of a model but should not dominate the validation. This refers to the above-mentioned issue of application-specific test data sets.

5.2.2 Concentration measurements A database should provide sufficient concentration measurements in order to reliably detect hazardous areas and to estimate key human exposure related parameters.

17 5.2.2 Concentration measurements A database should provide sufficient concentration measurements in order to reliably detect hazardous areas and to estimate key human exposure related parameters. Prerequisites to accomplish such a goal include: a) a sufficiently dense grid of concentration data (sensors at more than one distance downwind of the source with sufficient lateral resolution) and b) a sufficiently high time resolution (usually less than the smaller of the duration of the release or the travel time from the point of the release to the nearest sensor). It is desirable the experimentally derived concentration data set to be of sufficient spatial and temporal resolution to produce relatively accurate the concentrationrelated quantities required by the models. Such quantities include hazardous areas on maps, hazardous distances and statistical parameters, as confidence intervals, according to responses by end-users of models (see illustration in Fig. 1). Figure 1: End-user responses regarding presentation of model results The reference parameters proposed by COST ES1006 for validation in case of dispersion from continuous releases are highlighted in Table 1. The peak values are represented by higher percentiles of the results. The actual percentiles to be selected can depend on the release scenario, the toxicity and the amount of released material. It should be noted that, guidelines, such as the Acute Exposure Guideline Levels (AEGLs) from the US EPA, used during emergency situation to judge the threat of a hazardous gas release are defining the exposure limit for different time intervals. Therefore, and due to the transient nature of the dispersion phenomena, the values 13

18 used for validation should be defined for different averaging times or evaluation time intervals. Parameter Statistics Mean Concentration n th percentile (n=95 or 99) Variance Concentration at different averaging time intervals (e.g. 10 min, 30 min, 60 min) Note: For field measurements the maximum time interval that the air turbulence can be assumed stationary needs also to be included. n th percentile (n=95 or 99) Table 1: Relevant statistical values of concentration in case of a continuous release for the validation of emergency response tools The reference parameters proposed for validation in case of dispersion from instantaneous (or finite-time) releases are given in Table 2. Here, the extreme values are represented by the 5 th and 95 th percentile of the results, and the most likely values are defined as the maximums of fitted probability density functions, again for a given temporal resolution of the time series. As for the continuous release, actual percentile levels have to be chosen considering the release scenario, the intended use of the model and the desired accuracy. 14

19 Parameter Statistics 95 th percentile Dosage Most likely value Ensemble average (if applicable) 5 th percentile Arrival time 95 th percentile Most likely value Ensemble average (if applicable) 5 th percentile Peak time 95 th percentile Most likely value Ensemble average (if applicable) 5 th percentile Duration 95 th percentile Most likely value Ensemble average (if applicable) Maximum value Peak concentration 95 th percentile Most likely value Ensemble average (if applicable) 15

20 Peak concentrations at different averaged time intervals (e.g. 15s) including the ones related to release time duration Maximum value 95 th percentile Most likely value Ensemble average (if applicable) Table 2: Most relevant parameters and their statistical values in case of instantaneous release for the validation of emergency response tools Statistical representativeness of the measurements Due to the turbulent nature of the flow field, each realization of a release is different from another, even under apparently identical atmospheric conditions. To representatively characterize one scenario by experimental results, measurements should be of a sufficient quantity so that a sufficiently smooth frequency distribution with constant shape can be derived from the results. Statistical representativeness can be achieved by increasing the integration time of measurements for continuous releases under stationary boundary conditions or by increasing the ensemble size of transient data for finite duration releases. During field measurements, to achieve statistical representativeness is a very challenging task due to the variability of boundary conditions (e.g. wind speed and direction). The task is easier for wind tunnel measurements, because the boundary conditions are controlled to a large degree. Figure 2 shows two examples, how mean quantities (dosage and puff arrival time) depend on the ensemble size of measurements for an instantaneous release scenario taken from the idealized Michelstadt urban dispersion test case (courtesy of University of Hamburg, Environmental Wind Tunnel Laboratory). Results from single puff releases show respectively more than 250% and 100% variability relative to the statistical mean value in this particular case. The larger the ensemble size used for the averaging is, the smaller the variability or rather uncertainty becomes. The variability of mean quantities decreases with the increase of the ensemble size to less than 10% and 5% respectively at 200 realizations of the experiment (Fig. 2). When the variability of the results does not change by increasing the number of measurements, then the ensemble size is statistically representative. The sufficient ensemble size can be different for each scenario, each measurement point, each parameter and each statistical value. 16

21 Figure 2: Exemplary diagrams about the variability of the mean dosage and puff arrival time at a certain sensor calculated from different ensemble averages for an instantaneous release scenario in Michelstadt urban dispersion test case (University of Hamburg, Environmental Wind Tunnel Laboratory) QA/QC of the measurements and measurement post processing Experiments producing data qualified for model validation are expected to fulfil basic QA/QC procedures as they have been established for field trials (EPA QA/G-5S) and physical modeling (VDI, Snyder, NL-Guideline) to be used in the context of local scale emergency response. However, these basic standards do principally not consider specifics of transient flow and dispersion phenomena dominating local-scale airborne hazmat transport. Here, the variability of measured quantities which is caused mainly by atmospheric turbulence requires substantially more effort to document the quality of both the measured data and the post-processed validation data. Although there is no dedicated QA/QC procedure for reference data available, the following basic rules are expected to be considered and documented in the validation data generation process: The data sampling design must be chosen appropriately with respect to the information to be extracted. For example the grid of measurement points is expected to differ for exposed area information, where spatial resolution is required at the affected area, or peak exposure measurements, where probably relevant values are found in the inner parts of a plume or puff. Hence, a validation data set should contain a description of the underlying data sampling design(s). 17

22 The measurement equipment must be selected adequately, i.e. for each measured quantity the fitness for purpose of the chosen instrumentation must be proven and/or explained. Again the instruments qualification strongly depends on the type of quantity to be extracted for validation. Temporal resolution can be seen as a specific challenge of local-scale transient dispersion measurements in this regard. Any data processing applied to raw measurement data should be motivated and must be completely documented as it affects the physical representativeness of the validation data. For example, averaging and/or filtering of data, for example to reduce noise, will change the statistical characteristics of an ensemble of data and it must be documented and/or explained that the processed data still provides the desired information content. Uncertainty of all measured/derived quantities must be provided for a quantitative model validation. Data without a specified uncertainty are principally not qualified for quantitative model validation because it remains unknown if a mismatch between model result and reference values is caused by incorrect modeling or a too uncertain reference measurement. Both, the measurement uncertainty as well as the statistical uncertainty have to be specified. The latter is a particular challenge for atmospheric flow measurements and corresponding fluid modeling because the inherently present variability of boundary conditions requires a sufficient number of repetitions of experiments to quantify statistical uncertainty as the major source of uncertainty. Last but not least, minimum documentation standards have to be met in terms of completeness of a data set documentation. Without a complete documentation, substantiating data to be qualified for model validation, any data cannot be used for validation purposes Measurement uncertainty A key accompanying information of every emergency response database is the quantification and documentation of the measurement uncertainty. The uncertainty of each measurement result and each value derived from the results should be included in the dataset. A description of the method used to define the measurement uncertainty should also be included. 18

23 6 Description of the Model Evaluation Protocol In this section the steps of the ADMEP listed in section 4.2 are described. 6.1 Model description The essential information on the ADM is given here. This information can be collected through a questionnaire to be filled by the model developer and includes: 1. model name, version and release date 2. contact information of the originating person or organisation intended application range 3. brief description of the model characteristics, model type, theoretical background, parameterisations, solution methods 4. input data 5. output data 6. (minimum) hardware requirements: processor type, memory (RAM), storage, other necessary devices 7. software requirements: operating system, graphic packages, drivers 8. typical computing times 9. quality assurance: guidelines or standards used during model development 10. references: model documentation (user manuals, tutorials, and technical reference manuals), publications in the open literature, including at least the scientific sources of the model s algorithms The questionnaires and information database (ERMIDT, Tavares et al., 2014) produced in the frame of COST Action ES 1006 are good examples for such information templates. 6.2 Reference data base description The experimental database description identifies the data that will be used in the model validation process and describes their most important aspects. Quality information of the data, accessibility and assessment of the data suitability are important elements for the model evaluation procedure and must be included in the experimental data description. A large part of the information contained in this step of the ADMEP is the result of the data evaluation procedure described in section 5. 19

24 More specifically the experimental data base description includes the following information: (a) Name or identifier of the data and collection date (b) Type of the data: analytic results, results from existing (more sophisticated) models, laboratory experiments, large-scale field experiments, accident reports (c) Data ownership, accessibility and format (d) Source / emission description (e) Geometry description (buildings, topography) (f) Meteorology description (g) Description of the data available for comparison with the model results (h) Data origin: trace-back information must be included, if the data have been transcribed from previous sources, together with information on data losses or processing during transcription. (i) Quality assurance and uncertainty information, such as detectors calibration, correction of data for detector errors, error bars or uncertainty values for the measured quantities, an estimate of the inherent variability in the measured variables (j) Information on data appropriateness for using in the model evaluation, properties that either make the data useful or limit their usefulness in the evaluation process (k) Features (i.e., physical phenomena) and ranges of model input parameters covered by the data set In Section 5 evaluation criteria for the selection of experimental data to be used for emergency response model are presented. 6.3 Scientific evaluation The purposes of the scientific assessment are the following: 1. To describe the scientific basis of the model and how it is implemented, 2. To describe the capabilities and the limitations of model applicability on the basis of the physical phenomena accounted for by the model and attaching particular importance to phenomena involving emergency response 3. To provide a judgment on the adequacy of the scientific basis and its implementation in relation to the present state of the art both for the model as a whole and for the individual features 20

25 Desirable capabilities for ADM to be suitable for application in emergency response are those mentioned in Section 4.3: (a) Prediction of the dispersion of transient releases, especially of short duration, and of the relevant quantities: dosage, peak concentration, arrival time, time of peak, duration of cloud passage, in the form of ensemble mean, most probable value and specific percentiles (b) Simulation of realistic dispersion in built up (urban or industrial) environment (c) Inclusion of specific physico-chemical phenomena: buoyancy effects, phase changes, chemical reactions (especially detonations / deflagrations) Aspects to be included in the scientific assessment depend on the type of model. In COST Action ES1006, three types of models have been distinguished, according to whether or not they solve the flow field between buildings and obstacles, and, for those that do solve this, according to whether they solve diagnostically or prognostically. The basic structure of the scientific assessment report proposed here is adopted from Duijm and Carissimo (2001) and it is adapted to the needs of emergency response applicability: (a) Comprehensive Description of the Model (b) Assessment of the Scientific Content (c) Limits of Applicability with specific regard to emergency response (d) Limitations and Advantages of the Model (e) Special Features with emphasis to those important for emergency response, as listed above (f) Possible Improvements 6.4 Code Verification Verification is the process of comparing the implementation of a model with its mathematical basis. Most commonly, this refers to checking that a computer implementation of a model (computer software) is an accurate representation of the algorithms in the model. Verification provides evidence that the model has been checked on a correct coding of algorithms, databases, and interfaces. It should be emphasized that code verification is a very demanding task in terms of human resources, especially for complex models with thousands of code lines. Therefore 21

26 code verification is model developers responsibility performed in the frame of QA/QC in the process of model development. The procedure can be facilitated if the model code has a modular form, in which case modules can be tested independently as to whether they produce the expected output for a specific input. The code verification is also facilitated if comment lines have been inserted in the code by the developer. Potential procedures to be followed for code verification are: Checks for internal consistency, e.g., mass and/or mass flux balances Run the code for simple scenarios for which analytic solutions or results from workbooks may be available Examine the behaviour of the code for limiting conditions Automatic tools can be used in some cases to check the correct types of variables and the correct branching of conditional tests The Method of Manufactured Solutions (MMS) is proposed by ASME (2009) as a general procedure for code verification applicable to codes based on the solution of partial differential equations. Comparison of results from a number of models (the larger the better) for the same case, with the same input data (as much as possible) is another means for code verification. A large discrepancy for a certain model may be an indication for a bug in the code. Of course this may be due to erroneous input by the user, so this possibility should also be examined. 6.5 User-oriented evaluation The user-oriented assessment concerns operational aspects, such as: (a) Level of user expertise or scientific background to use the model (b) Existing documentation for the model, including example calculations (c) Installation procedures (d) User interfaces (pre- and post-processing) (e) Accompanying data bases (hazardous substances with their respective physical-chemical properties, source terms etc.) (f) Selection of model options (availability and guidance) (g) Preparation of input data (possible forms and modes of input data, guidance) (h) Checks of input data 22

27 (i) Format of output data (j) Error messages and warnings produced by the model (k) Computational requirements in terms of hardware and computing time 6.6 Model validation (corroboration) Validation is the process of comparing the predictions of a model, which has been run to simulate a given event, with the observations made in connection with the same event (Duijm and Carissimo, 2001). The purpose of validation is to examine how well the output results of the model compare with experimental data. In the absence of experimental data model validation could be performed by comparing the predictions of a model for a given event with the predictions of another model (or group of models) for the same event. In this case the results of the reference model(s) should have already been sufficiently validated Selection of quantities to be compared For continuous releases Steady-state concentration in air Affected area delineated on the basis of a pre-defined threshold value of a relevant quantity Time-integrated deposition on the ground (where applicable) Concentration time series statistics (if available from both validation data and model results): - Variance - Specific percentiles (e.g., 5 th or 95 th or both) - Probability density function of concentration For finite-duration releases Time history of concentration in air Peak concentration for a specific time interval Cloud arrival time dependent on the definition of threshold concentration Time of peak concentration Duration of cloud passage dependent on the definition of threshold concentration 23

28 Time integral of concentration ( exposure or dosage ) weakly dependent on the definition of threshold concentration Affected area delineated on the basis of a pre-defined threshold value of a relevant quantity Pattern of time-integrated deposition on the ground (where applicable) Statistical characteristic of quantity The following statistical figures of the above quantities can be compared, provided they can be calculated from both the reference data and the model. Ensemble average Median Most probable value Specific percentiles (e.g., 5 th or 95 th or both) Probability density function of concentration Variance of concentration Skewness of concentration Kurtosis of concentration 24

29 6.6.3 Qualitative analysis / graphical depiction Scatter plots: paired only in space or also in time modelled vs. observed values are plotted against each other as points in a two dimensional plot. This approach provides an overall visualisation of the comparison between model results and observations. Straight lines of ratios 1:1, 2:1 and ½:1 are usually plotted too to give a better overview of the distribution of the cloud of points, connected to FAC2 index. The axes of the plot can be linear of logarithmic (if the range of values spans over several orders of magnitude). Figure 3: Example of a scatter plot comparing calculated concentrations by several models to experimental data (from COST ES1006, 2015a). 25

30 Time-history plots: Paired in space quantities (usually concentrations in air) - modelled and experimental - are plotted as functions of time in a two dimensional plot. This plot gives an overview of the agreement on the following quantities: arrival time, peak concentration, time of peak, duration of cloud passage. Of course it is relevant for transience in time releases. A smooth experimental curve can be drawn only if adequate measurements as function of time exist at the specific location. Figure 4: Example plot of comparison between calculated concentration time series and measurements at a receptor point for a real case (from COST ES1006, 2015a). 26

31 Profiles of a quantity along a spatial direction: profiles of concentration in the vertical direction at specific locations where adequate measurements exist, is a characteristic example of such plots. Crosswind profiles at a certain height and distance from the release location are a second example. Such plots can provide an overview of the plume s dimensions and boundaries. Figure 5: Example of comparison between calculated and measured profiles of concentration in the vertical direction at a specific location for a continuous release. 27

32 Plots of affected areas based on a defined threshold of a quantity of interest (e.g. concentration for a continuous release, dosage, maximum concentration or deposition for a puff release): These plots are of direct relevance to emergency response cases because they can be used to define areas of intervention or application of countermeasures. To be able to draw the experimentally observed affected areas an adequately dense and extended network of monitoring locations must exist. These plots are directly connected to the Figure-of-Merit-in-Space (FMS) mentioned in the next section. Figure 6: Example of model prediction for the maximum affected area following an accidental release at an industrial site (from COST ES1006, 2015a). Quantile-quantile plots: modelled and observed values are separately ranked in magnitude and then they are plotted unpaired as points in a two dimensional plot. A straight line of 1:1 ratio is also plotted. These plots provide an overview of the distributions and percentiles agreement between model results and observations. Residual scatter plots: model residuals are defined as the ratio of predicted (C p ) to observed (C o ) concentrations. Residuals are plotted as function of some independent 28

33 variable, e.g., distance from the release location, wind speed or atmospheric stability. Such plots can provide insight on the reasons behind discrepancies between model results and observations. Lines of ratios 1:1, 2:1 and ½:1 are also plotted to delineate the factor-of-two limits. Residual box plots: Residual plots can also be drawn in the form of boxes, to summarize the distribution of model residuals when the number of data points in each bin is large. Vector plots: for models that solve for the wind field in the built up area (either prognostically or empirically), the wind field, even if not validated, should be qualitatively inspected using visualizations to avoid user-errors, e.g., in boundarycondition settings, etc. Cumulative Frequency Distribution: this plot reproduces the running total of all the preceding frequencies in a frequency distribution. Independently of space and time, it gives the percentage of occurrences of observations and predictions lying below/above a given value of the variable considered, for instance the concentration. It is thus based on a global approach and the comparison between the curves obtained provides an immediate interpretation of the capability of the model simulation to catch the observed dispersion event. Figure 7: Example plots comparing cumulative frequency distributions of calculated and measured concentrations for a wind-tunnel urban dispersion experiment and a continuous release; black lines: observations; red, blue, green, orange lines: different models. 29

34 6.6.4 Quantitative indicators (statistical metrics) Each of the following indices can be applied to several calculated and measured variables: - Fractional bias = = Geometric mean bias = ln ln = ln - Normalised mean square error = - Geometric mean variance = ln ln = ln - Correlation coefficient = - Fraction of predictions within a factor-of-2 of observations 2 = fraction of data that satisfy

35 - Factor of excedance = 0.5 where denotes model predictions, denotes observations, overbar ( ) denotes the average over the dataset, and denotes the standard deviation over the dataset. - Index of Agreement IA = 1 i= 1, N ( Φ Φ ) pi i= 1, N ( ) 2 Φ pi Φo + Φoi Φo oi 2 - Normalised absolute difference = + - Figure of Merit in Space = where is the predicted contour area based on a certain threshold, and is the observed contour area based on the same threshold. A perfect model would have MG, VG, R, FAC2, IA and FMS= 1.0; and FB, NMSE, FOEX and NAD = 0.0. The following discussion is taken from Chang and Hanna (2005). It should be taken into account that the distribution of atmospheric pollutants concentration resembles to a log-normal distribution. So, when linear measures FB and NMSE are applied to concentrations, they may be strongly influenced by infrequently occurring high observed and predicted values, whereas logarithmic 31