Content-Oriented Guidelines Framework August Götzfried Eurostat OECD, Paris, 21 January 2009
What is SDMX? Technical and statistical standards and guidelines, together with IT architecture and tools, for efficient exchange and sharing of statistical data and metadata Why is SDMX so important? Seven international organizations (BIS, ECB, Eurostat, IMF, OECD, UN, World Bank) have joined forces: Memorandum of Understanding (March 2007) SDMX has been recognized as the preferred standard for exchange and sharing of data and metadata (latest in 02/2008 at the UNSC) SDMX is not just another data transmission format 2
Deliverables 1. Technical standards: the SDMX information model (like GESMES.) message and query formats for data and metadata, registry services.. 2. IT architecture two complementary modes for data and metadata exchange: the push mode and the pull mode In addition: the hub mode = users obtain the data from a central hub (the latter assembles the data by querying other data sources) 3
Deliverables 3. The SDMX IT tools SDMX sponsoring and other organizations developed many IT tools which are in general freely available as open source software IT applications deal with editing, conversion, database loading, query formulation, validation, visualization tools, registries,.. 4
Deliverables 4. Capacity building, training and support of SDMX implementation In general organised decentralised at the level of sponsoring or other organisations Different types of training course for managers, IT staff or statisticians exist In addition: The SDMX user guide (version 2009), SDMX self-learning package SDMX implementation stimulated at the level of statistical domains 5
Deliverables 5. The SDMX Content-oriented Guidelines (COG) = the SDMX deliverable standardizing the statistical contents of the SDMX data and metadata messages Successive versions of the SDMX COG were compiled in 2006, 2007, 2008 and 2009. The quality and scope of the SDMX COG increased successively. The SDMX Content-oriented Guidelines comprise the following parts: 6
The SDMX Content-oriented Guidelines 5.1 Annex 1: Cross-domain concepts List of statistical concepts and sub-concepts which are relevant for many or even all statistical domains These SDMX concepts are to be used within data and metadata messages. 5.2 Annex 2: Cross-Domain Code Lists A collection of harmonized SDMX code lists are relevant for many or even all statistical domains These SDMX code lists are to be used within data and metadata messages. More of those lists need to be produced and added. 7
The SDMX Content-oriented Guidelines 5.3 Annex 3: Statistical subject matter domains A list of statistical domains subdivided into demographic and social statistics, economic statistics and environment and multi-domain statistics. This list can be used for different purposes. 5.4 Annex 4: The Metadata Common Vocabulary Broader collection of statistical terminology related to metadata Used within the other annexes of the SDMX COG 8
Why do we need COG? Up to now international organisations used many different standards for data and metadata structures. These standards use statistical concepts or codes which are different between international organisations Even if the same statistical concepts are used there was not the same understanding of what these concepts mean. This puts extra burden on national or international data provider and increases costs and burden. The SDMX COG should help to overcome those difficulties as far as the statistical contents of the data and metadata is concerned. 9
Steps forward The SDMX COG have progressed considerably in scope and quality in the recent years will need to be further enhanced (in particular with regard to the SDMX Cross domain code lists) will need to be promoted towards further broader implementation at national and international level will need to get into a regular rhythm of updates without endangering the achievable already reached 10
More details on the various components of SDMX Content-oriented Guidelines 11
Subject-matter Domains Steven Vale United Nations Economic Commission for Europe OECD, Paris, 21 January 2009
Overview Purpose Background to the current list Presentation of the current list Future work 13
Purpose Standard against which domain lists of statistical organizations can be mapped to facilitate data and metadata exchange Framework for registering and searching for statistical data on SDMX registries Tool for identifying and organizing domain groups developing and using SDMX standards 14
Background Most existing schemes based on the internal organization structure of the originating agency Producer view rather than user view Something more generic needed Database of International Statistical Activities (DISA) has a classification agreed between many organizations and domain groups as a basis for reporting their activities Most international statistical organizations already map their schemes to the DISA classification 15
DISA Classification 5 Domains: Demographic and Social Statistics Economic Statistics Environment and Multi-domain Statistics Methodology of data collection, processing, dissemination and analysis Strategic and managerial issues of official statistics SDMX subject-matter domain list uses the first 3 16
Structure......... 17
Future Work Verification, maintenance and updating Feedback mechanism Decision-making mechanism More detailed description of levels 2 and 3 Inclusions / exclusions Case law Advocacy for wider use as a dissemination standard? 18
Cross-domain Concepts Metadata Common Vocabulary Marco Pellegrino Eurostat OECD, Paris, 21 January 2009
Purpose MCV: A common understanding Background to the CDC list (2006-2008) The cross-domain concepts list 2009 Future work 20
Metadata Common Vocabulary (MCV) Vocabulary of metadata-related terms wider than the cross-domain concepts list Improved visibility for existing definitions (either authored as SDMX or taken from existing authoritative sources where possible to avoid a proliferation of standard terminologies) Possibility of mapping different metadata systems, including those at national level, independently from any specific metadata model Support to standardisation and consistency of metadata compiled Support to XML structures and web services for search and comparison ensuring a semantic equivalence Annex 4 21
The Tower of Babel Same name for different concepts or different names for the same concept Different metadata and quality frameworks Metadata still hard to exchange in an automated way From the Tower of Babel to a lingua franca? Syntax Technical standards, SDMX-ML Semantics Cross-domain concepts, Metadata Common Vocabulary Joint application of SDMX technical standards and the contentoriented guidelines is the key 22
Why don t we speak the same language?? 3.1.3 3.1.3 Source data timeliness 4.1.2 4.1.2 Timeliness 15.1 Timeliness 15.2 Punctuality? 16.1 Geographical comparability? 4.2.2 4.2.2 Temporal consistency 4.2.3 4.2.3 Intersectoral and cross-domain consistency. 4.2.1 4.2.1 Internal consistency 16.2 Comparability over time 17.1 Coherence - cross domain? 0.2.1 0.2.1 Staff, facilities, computing resources, and financing 0.2.2 0.2.2 Ensuring efficient use of resources 17.2 Coherence - internal 18 Cost and burden 23
Cross-domain Concepts For each of the 66 statistical concepts (+ sub-items): Name and ID Description and explanation of context Representation (free text, code list) Possible role (as a dimension, or attribute, in a DSD or MSD) Link to IMF-Eurostat-OECD metadata frameworks CDCs are not: a requisite for SDMX technical conformance an imposition to statistical organisations CDC are: a framework to promote reusability when data and metadata are exchanged Annex 1 24
Use of cross-domain concepts 25
Data Structure Definition (DSD) Uses SDMX Cross-domain concepts as dimensions (to identify and describe series) Uses SDMX Cross-domain concepts as attributes (to describe series) Provides code lists and representations for the concepts Gives an attachment level for the concepts, based on the packaging structure (Data Set, Series, Observation) 26
Metadata Structure Definition (MSD) Set of metadata describing the organisation of a set of reference metadata Identifies: concepts to be used for metadata reporting (Eurostat Euro-SDMX Metadata Structure) relationships between concepts type of representation of concepts which data-flow element they describe 27
Metadata Attributes for ESMS REPORT Concept ID Parent Concept Scheme Version Agency Usage Text type 3 STAT_PRES ESTAT_ADD_CONCEPTS 1.0 ESTAT Mandatory String 3.1 DATA_DESCR STAT_PRES SDMX_CDC 1.0 ESTAT Mandatory String 3.2 CLASS_SYS STAT_PRES SDMX_CDC 1.0 ESTAT Mandatory String 3.3 COVERAGE_SECTOR STAT_PRES SDMX_CDC 1.0 ESTAT Mandatory String 3.4 STAT_CONC_DEF STAT_PRES SDMX_CDC 1.0 ESTAT Mandatory String 3.5 STAT_UNIT STAT_PRES SDMX_CDC 1.0 ESTAT Mandatory String 3.6 STAT_POP STAT_PRES SDMX_CDC 1.1 ESTAT Mandatory String 3.7 REF_AREA STAT_PRES SDMX_CDC 1.0 ESTAT Mandatory String 3.8 COVERAGE_TIME STAT_PRES SDMX_CDC 1.0 ESTAT Mandatory String 3.9 BASE_PER STAT_PRES SDMX_CDC 1.0 ESTAT Mandatory String 4 UNIT_MEASURE 5 REF_PERIOD 6 INST_MANDATE SDMX_CDC SDMX_CDC SDMX_CDC 1.0 ESTAT Mandatory String 1.0 ESTAT Mandatory String 1.0 ESTAT Mandatory String 6.1 INST_MAN_LA_OA INST_MANDATE SDMX_CDC 1.0 ESTAT Mandatory String 6.2 INST_MAN_SHAR INST_MANDATE SDMX_CDC 1.0 ESTAT Mandatory String 28
Metadata flows identifiers Metadata Flow Identifier Variables Relation with PEEI ID Description SSTSIND_PRODR_MS 110 3.1 Reference metadata for production in industry SSTSIND_TURNR_MS 120, 121, 122 N/A SSTSIND_ORDR_MS 130, 131, 132 3.3 SSTSIND_PRICR_MS 310, 311, 312, 340 3.2, 3.4 SSTSCONS_PROD R (_MS, _QS) 110, 115, 116 3.5 Reference metadata for turnover in industry, total, domestic and non-domestic (total, Euro-zone, non-euro-zone) Reference metadata for new orders received in industry, total, domestic and non-domestic (total, Euro-zone, non-euro-zone) Reference metadata for output prices in industry, total, domestic market, nondomestic market (total, Euro-zone, non Euro-zone), import prices (total, Eurozone, non-euro-zone) Reference metadata for production in construction, total, building construction, civil engineering SSTSRTD_TURNR_MS 120, 123 3.6 Reference metadata for turnover in retail trade, value or deflated SSTSSERV_TURNR_QS 120, 123 3.7 SSTSSERV_TURNR_MS 120, 123 N/A Reference metadata for turnover in repair and other services, value or deflated (Quarterly) Reference metadata for turnover in repair and other services, value or deflated (Monthly) SSTSSERV_PRICR_QS 310 3.8 Reference metadata for output prices in other services SSTSSERV_EMPLR_QS 210, 211 N/A Reference metadata for number of persons employed, Number of employees, in repair and other services 29
CDC on-going work Maintenance and updating Feedback mechanism Periodic review More detailed description of levels 1 and 2 Inclusions / exclusions Wider use of CDC in DSDs and MSDs 30
Cross-domain Code lists Christos Androvitsaneas European Central Bank SDMX Global Conference OECD, Paris, 21 January 2009 Views expressed are those of the presenter and not necessarily those of the ECB
Overview Code lists in SDMX scope and use Cross domain code lists - why? Categories of SDMX code lists Recommended code lists: current status On-going work and steps ahead 32
Code lists: scope and use Internal use Statistical data collection, production and reporting Dissemination and data sharing among organisations 33
Cross-domain code lists - why? Harmonisation Reduction of semantic noise for users when confronted with data coming from different sources Enhanced interoperability for applications Support of a framework that would also facilitate creation of extensions (and their transparency) for serving particular additional needs 34
SDMX Cross-domain code lists SDMX Recommended cross domain code lists Deviations, additional codes & extensions are and may (also in future) be in use for serving more specific requirements Extra benefits from also making deviations from cross domain code lists public: Additional codes / extensions / deviations become transparent and available for use also by other institutions facing similar specific requirements Some deviations and additions in code lists may evolve to recommended codes 35
Recommended code lists Cross-domain code lists for [status Jan.2009] Observation Status Confidentiality Status Decimals Frequency Sex Time format Unit multiplier Area Currency 36
On-going work steps ahead Code list for official data producing agencies and organisations Code list for units of measurement Field of education and training (FIELD) Code list for institutional sectors Classification on the functions of government (COFOG) International classification of diseases (ICD) Language Civil or marital status Standard goods classification for transport statistics, Standard international trade classification (SITC) Standard classification of economic activities (e.g. ISIC Rev. 4, NACE Rev.2). Other code lists with relevance for more than one domain 37
SDMX cross-domain code lists Source: http://www.sdmx.org/ (Annex 2 - Cross-Domain Code Lists) 38