Session 2: The SDMX Content-oriented Guidelines (COG) 1
The use of SDMX compliant Data Structure Definitions Agreed SDMX Data Structure Definitions are used for national and international data exchange; SDMX Data Structure Definitions are stored in SDMX compliant registries (e.g. the Euro SDMX Registry); Data flows between organisations using these SDMX DSDs need to be defined; SDMX Data Structure Definitions can also be used for data dissemination (however Data Dissemination DSDs are not necessarily identical to Data Exchange DSDs); The number of needed DSD depends on the specificity of the datasets and the statistical domain (can be few DSDs, can be more); For a comprehensive statistical system with more than 100 statistical domains, several hundreds of DSDs are needed and will be used. 2
Use of SDMX compliant Data Structure Definitions in the context of the Generic Statistical Business Process Model SDMX 3
2.2 SDMX Metadata Structure Definitions (MSDs) SDMX Metadata Structure Definitions are used for reference (explanatory) metadata (= metadata which is not directly linked to the data values); The SDMX MSDs use the SDMX cross-domain concepts to define the structure of the reference metadata files to be exchanged (cross-domain code lists are less used); As for DSDs, also metadata flows between statistical organisations which use these MSDs need to be defined. In general only very few MSDs are defined and used by statistical organisations (this is in contrary to DSDs which are linked to statistical domains). 4
Two examples of Metadata Structure Definitions (MSDs) used within the European Statistical System 2.2.1. The Euro SDMX Metadata Structure (ESMS): The ESMS uses a subset of the 66 main statistical concepts and of the 69 statistical sub-concepts as defined in the SDMX Cross-domain Concepts. The ESMS is the main MSD used for reference (explanatory) metadata in the European Statistical System. The ESMS covers 21 main statistical concepts selected from the list of SDMX cross-domain concepts. The ESMS also covers the main quality criteria such as accuracy, timeliness, comparability, 5
2.2.1. The Euro SDMX Metadata Structure (ESMS) The ESMS replaces former metadata structures (such as the SDDS structure). The ESMS is now fully in use at Eurostat, with more than 300 reference metadata files produced and disseminated on Eurostat website. The ESMS will also be used for national reference metadata files produced, exchanged and disseminated within the European Statistical System (around 40 national metadata flows in statistical domains in a first wave). The ESS Metadata Handler (as main IT infrastructure) allows the production, exchange and dissemination of ESMS files within the European Statistical System and beyond. 6
The Euro SDMX Metadata Structure (ESMS) 7
2.2.2 The ESS Standard for Quality Report Structure (ESQRS) The ESQRS is the main MSD used for Quality Reports within the ESS. The ESQRS is more oriented towards data producers and less towards data users. The ESQRS is using the main quality concepts of the ESMS (based on the SDMX COG), and deepens them with more detailed quality measures; however: not all detailed quality concepts are part of the SDMX COG yet. The ESS Metadata Handler (as main IT infrastructure) allows the production, exchange and dissemination of ESQRS files within the European Statistical System and beyond. 8
The ESS Standard for Quality Report Structure (ESQRS) 9
The ESS Metadata Handler National Statistical Institute ESS Metadata Handler ESS Metadata Handler Eurostat Website National dissemination Eurostat Metadata Files National Metadata Files (ESMS, ESQRS ) edamis National + Eurostat Metadata Files (ESMS, ESQRS ) National Metadata Files (ESMS, ESQRS ) 06 PROCESSING May 2011 SDMX Global PROCESSING Conference and 2011 ANALYSIS DISSEMINATION 10
2.2.3 Some more considerations related to SDMX Metadata Structure Definitions SDMX based reference metadata need to be exchanged more intensively at international level; The burden on metadata providing countries should be reduced as much as possible (countries should send their reference metadata to different international organisations only once); More Metadata Structure Definitions are in development describing the data treatment and data processing more in detail; this should stimulate the integration of statistical business processes; If the same SDMX cross-domain concepts are used in different MSDs, consistency between the respective contents of the MSDs needs to be assured. 11
2.3 SDMX Registries SDMX Registries contain all SDMX artefacts (components) related to SDMX based data and metadata structures, flows, etc. 12
2.3 SDMX Registries The Euro SDMX Registry is an example of a SDMX Registry - it contains: Data/Metadata Structure Definitions used for data/metadata transmission; draft Data/Metadata Structure Definitions; The list of statistical concepts (dimensions/attributes) used in data and metadata structure definitions; Code lists used in the data and metadata structure definitions; An inventory of data and metadata flows organized between the sending and receiving organizations; Additional information (such as provision agreements.) International organisations upload the SDMX artefacts and act as owners of these artefacts (i.e. maintaining agencies ). 13
How can a SDMX Registry be used? 2.3 SDMX Registries National Statistical Authorities can retrieve the SDMX Data structure Definitions and the SDMX Metadata Structure Definitions and use it when implementing SDMX; National/International Statistical Authorities can retrieve all other artefacts from the registry (e.g. harmonized code lists, concepts used in the data structure/metadata structure definitions, data/metadata flows..) and use it when implementing SDMX; SDMX registries might also allow National or International Statistical Organisations to store own SDMX artefacts (such as own Data Structure Definitions); in this case these organisations act as maintaining agencies. 14
3. The SDMX Content-oriented Guidelines the next steps 3.1. The SDMX COG and the SDMX Technical Standards 2.1 The SDMX Technical Standards 2.1 use partial code lists in Data Structure Definitions. This will be made through the use of constraints. Selection of certain codes from a full code list without creating «physically» sub-code lists. Example: - Domain X uses the ISIC code list in one of its DSD but this domain does not use all codes in the ISIC code list; - Constraints will be created in order to define the exhaustive list of codes «allowed» for this DSD. Main advantage: only the «main» and «full» ISIC code list needs to be maintained and not partial code lists used in DSDs. 15
3.2 The improvement of the SDMX Content-oriented Guidelines The SDMX Content-oriented Guidelines (version 2009) should be further improved. This means more in detail: Annex 1: the SDMX Cross-domain Concepts need to be reviewed; this will lead to an improvement of the consistency between the different concepts; moreover a limited number of additional (quality related) concepts might need to be added; Annex 2: the SDMX Cross-domain Code lists: the amount of SDMX Cross-domain Code lists needs to be increased considerably; this is very necessary in order to be able to build up more SDMX compliant data/metadata structure definitions. 16
3.2 The improvement of the SDMX Content-oriented Guidelines Annex 3: the SDMX Subject-matter domains: no immediate revision of the list of Subject-matter domains seems necessary; Annex 4: the Metadata Common Vocabulary (MCV): the MCV needs to be reviewed; this will lead to an improvement of the consistency between the different concepts; moreover additional concepts might be added (e.g. related to data quality or to the statistical business process); (the annexes 1 and 4 need to be kept consistent). 17
3.2 How is the further work organised? The SDMX Governance has been improved: the SDMX Statistical Working Group (SDMX SWG) has been created, with around 20 participants from statistical organisations or sponsoring organisations; This SDMX SWG will deal with all questions related to the improvement and further development of the SDMX Content-oriented Guidelines; A first kick-off meeting of the SDMX SWG needs to be scheduled in the later course of 2011. 18
Summary The SDMX Content-oriented Guidelines are the statistical guidelines produced and released by the SDMX sponsors. They need to be applied together with the technical SDMX standards. The first comprehensive version of the SDMX COG was released in 2009. The SDMX COG are increasingly used by national and international statistical organisations when SDMX is implemented (e.g. with the creation of data structure definitions or metadata structure definitions). However: Further improvements of the SDMX COG are necessary (in particular with regard to the SDMX Cross-domain Concepts and the SDMX Cross-domain Code lists). 19