SDMX architecture for data sharing and interoperability

Similar documents
SDMX self-learning package SDMX Architecture using the pull method for data sharing TEST

SDMX self-learning package No. 8 Student book. SDMX Architecture Using the Pull Method for Data Sharing

SDMX Tools Task Force Progress report. Item 3.1 of the agenda IT Working Group meeting 2017

SDMX implementation in Istat: progress report

ORGANISATION FOR ECONOMIC COOPERATION AND DEVELOPMENT (OECD) STATISTICS DIRECTORATE

ESSnet on SDMX. Technical issues. Workshop on ESSnets 31/5 1/6 2010, Stockholm

European Census Hub: A Cooperation Model for Dissemination of EU Statistics

Global Conference OECD Conference Centre, Paris

How to implement an SDMX infrastructure for dissemination and reporting

A cross-cutting project on Information Models and Standards

Content-Oriented Guidelines Framework. August Götzfried Eurostat

Principal Global Indicators Innovation in communication of official statistics

Economic and Social Council

Session 2: The SDMX Content-oriented Guidelines (COG)

Economic and Social Council

EXECUTIVE SUMMARY 1. RECOMMENDATION FOR ACTION

Using SDMX standards for dissemination of short-term indicators on the European economy 1

IT Working Group meeting Luxembourg, 26 May 2016

FAO CountrySTAT Model of "Statistical Data and Metadata Exchange (SDMX)"

USING SDMX IN THE UNIFIED STATISTICAL INFORMATION SYSTEM (UNISIS)

SDMX Technical Working Group Activity Report

Practical experience in the implementation of data and metadata standards in the ESS

Reporting tools and format for the 2012 data collection on WStatR (reference year 2010)

Standards and processes for integrating metadata in the European Statistical System

VTL (Validation and Transformation Language) The international language for data validation and transformation

Combining technical standards for statistical business processes from end-to-end

SDMX IT Tools SDMX Converter

SDMX global conference of SDMX framework 2.0. Laura Vignola Italian National Institute of Statistics

Electronic Data Collection in Accommodation Statistics

Implementing SDMX for Energy Domain

ESSnet. Common Reference Architecture. Common Reference Architecture (CORA) ESSnet History and Work-packages

VIP.ESBRs interoperability pilots. Item 2.5 of the agenda IT Working Group meeting 2017

The European Census Hub

Toward the Modernization of Official Statistics at BPS-Statistics Indonesia: Standardization Initiatives and Future Directions

Economic and Social Council

Economic and Social Council

Agenda SDMX in Production at the Fund SDMX in Development at the Fund Discussion Topics As an SDMX sponsor, the IMF is committed to implementing SDMX

National Accounts implementation stories

SDMX Roadmap In this Roadmap 2020, the SDMX sponsors outline a series of strategic objectives:

7th Meeting of the Expert Group on SDMX. SDMX Implementation. in Statistics Korea. Youngok PARK

Implementing SDMX for the collection and dissemination of Balance of Payments data: challenges and achievements

Bridging standards a case study for a SDMX- XBRL(DPM) mapping

- International Roundtable on Business Survey Frames November 2008 OECD, Paris France

Service Oriented Architecture

SDMX. The basis for renovating the ESS Metadata Systems. August Götzfried Head of Unit EUROSTAT

COORDINATING WORKING PARTY ON FISHERY STATISTICS. Twenty-third Session. Hobart, Tasmania February 2010

Implementation of SDMX in the National Institute of Statistics and Geography of Mexico. SDMX Global Conference 2013 Paris, France, Sep 2013

ModernStats World Workshop April 2018, Geneva, Switzerland

The concept of the new organization of statistical surveys

The Why, What, and How of SDMX Source

PILOT PROJECT TO IMPLEMENT SDMX-IMTS IN MOROCCO

e-prior Facilitating interoperable electronic procurement across Europe Technical Overview

Modernization of Statistical Information systems Global initiatives

SDMX Implementations

CountryData Technologies for Data Exchange. SDMX Markup Language (SDMX-ML)

SOA Concepts. Service Oriented Architecture Johns-Hopkins University

SCFE final meeting - Newport. Agenda. Day 1: Wednesday 29 November. Day 2: Thursday 30 November. Day 3: Friday 1 December

SDMXUSE MODULE TO IMPORT DATA FROM STATISTICAL AGENCIES USING THE SDMX STANDARD 2017 PARIS STATA USERS GROUP MEETING. Sébastien Fontenay

Building an Enterprise Architecture of Statistics Korea

Reporting instructions for the «AnaCredit» report

Bridging the Gap: Importing Health Indicators Warehouse Data into SAS Visual Analytics Using SAS Stored Processes and APIs

Data Transmission Item 8 of the agenda Eurostat ESTAT-B5: Luca Gramaglia ESTAT-F5: Giuliano Amerini

Actian DataConnect 11

DDI and SDMX: Complementary, Not Competing, Standards

Joint ADB/ESCAP SDMX Capacity Building Initiative:

Planning Industrial Strength Implementation of SDMX

Data Warehousing provides easy access

CSPA. Common Statistical Production Architecture State of the art of current sharing activities between NSIs and statistical institutions

ESS technical standards and tools for quality reporting

CORA FINAL REPORT. Grant Agreement number: Project acronym: CORA. Project title: COmmon Reference Architecture

Interoperability of business registers in the European Statistical System: the Eurostat VIP.ESBRs project

GSBPM as an aid for self-assessment Activity A1: June 2016

ITDG 2012/3.1/EN IT DIRECTORS GROUP 27 TH AND 28 TH NOVEMBER Item 3.1 of the agenda. Presentation of Enterprise Architecture

STS Use case SDMX implementation and validation services

CALL FOR COLLABORATION POSITIONS

HAWKWARE SOLUTIONS. HAWKWARE for SOLIDWORKS. HAWKWARE Tools for SOLIDWORKS PRICE: FREE

CIS 8090 Intro. Setting the stage for the semester Arun Aryal & Tianjie Deng

The Data Documentation Initiative (DDI): An Introduction for National Statistical Institutes

National Single Window Prototype

ORGANISATION FOR ECONOMIC COOPERATION AND DEVELOPMENT (OECD) STATISTICS DIRECTORATE

25 th Meeting of the Wiesbaden Group on Business Registers - International Roundtable on Business Survey Frames. Tokyo, 8 11 November 2016.

Microsoft SQL Server 2000 Reporting Services

The 21 st Century Library Collaborative Services, Standards, and Interoperability

ESSnet on Free and Open Source Software for Statistical Production

Modernization of statistical production and services

DYNAMIC CATENATION AND EXECUTION OF CROSS ORGANISATIONAL BUSINESS PROCESSES THE JCPEX! APPROACH

Business Case ESS.VIP.BUS.ADMIN. ESSnet "Quality of multisource statistics"

5/4/2015. IT tools serving National Accounts in Istat. Meeting s Objective. Elena Forconi. Elena Forconi: work experience.

Alphatax Ireland Release Notes. Version T: E: W:

Windchill PDMLink Curriculum Guide

Copyright 2014, Oracle and/or its affiliates. All rights reserved. 2

Oracle Service Cloud. New Feature Summary. Release 18C ORACLE

DESKTOP DOCUMENT PROCESSING & AUTOMATION. Easily create automated document processes that eliminate tedious and repetitive manual tasks

What You Need to Know Too

United Nations Economic Commission for Europe Statistical Division

Business Intelligence for SUPRA. WHITE PAPER Cincom In-depth Analysis and Review

Simpler Enterprise Interoperability with Acendre Cloud

S p e c i f i c G r a n t A g r e e m e n t N o 2 ( S G A - 2 )

Istat quality policy: Preconditions of quality assessment

e-sens white paper D3.4 Preliminary Proposal for a governance body Instruments Deliverable 3.4, version 3

Transcription:

Distr. GENERAL WP.10 15 April 2010 ENGLISH ONLY UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE (UNECE) CONFERENCE OF EUROPEAN STATISTICIANS EUROPEAN COMMISSION STATISTICAL OFFICE OF THE EUROPEAN UNION (EUROSTAT) ORGANISATION FOR ECONOMIC COOPERATION AND DEVELOPMENT (OECD) STATISTICS DIRECTORATE Meeting on the Management of Statistical Information Systems (MSIS 2010) (Daejeon, Republic of Korea, 26-29 April 2010) Topic (i): Developing common high-level architectures SDMX architecture for data sharing and interoperability Prepared by Adam Wronski, Eurostat and Francesco Rizzo, Istat Italy I. Introduction 1. For several years the SDMX Sponsor Organisations have been developing software and tools in order to facilitate the introduction of SDMX in their countries. 2. Eurostat has been discussing the effectiveness of these tools with the Member States (MSs) during task forces meetings and working groups. Many suggestions from the MSs were collected which led in 2009 to an internal project named "SDMX implementation and support for Member States". The aim of that project was to develop a set of software, organized in re-usable building blocks and tools to help MSs in implementing an SDMX service infrastructure for a more efficient participation to some SDMX projects currently running within the European Statistical System (e.g. European Census Hub project). 3. A prototype was presented during the "Statistics, Telematic Networks & EDI" Working Group (STNE) in June 2009. The participants welcomed with enthusiasm the initiative and encouraged Eurostat to proceed forward. 4. The main objectives of the project can be listed as follows: To support for the Census Hub project and other Eurostat projects; To facilitate SDMX implementations within the MSs with a particular attention to the large PC- AXIS community; To stimulate a "SDMX community of developer". 5. The project deliverables consist essentially of: the SDMX Reference infrastructure document. It represents the syntheses of several experiences worldwide and may be considered not as a strict specification but rather a guide or best practice document. The main objective of the document is to provide a

2 description/specification of a generalized infrastructure that could be re-used partially or entirely by s interested in SDMX projects; a set of software building blocks that can be used as APIs to be integrated in already existing statistical dissemination information systems; the Mapping Tool software. It is a desktop application that allows mapping of concepts and code lists stored in a "local" dissemination database with concepts and code lists used in a SDMX Data Structure Definition (DSD). 6. In September 2009 a technical workshop on "From the SDMX Information Model to the development of reusable software components" was organized by Eurostat. During the workshop all the deliverables mentioned above were presented and discussed with more than 40 IT experts from 25 countries and organisations. A lot of feedbacks had been already received and many participants had been testing the software. 7. For the time being the version 1.0 of the software is available for downloading from CIRCA at the following URL, but version 2.0 is under testing and in few weeks will be available for downloading: http://circa.europa.eu/public/irc/dsis/stne/library?l=/x-dis/tools/reference_architecture&vm=detailed&sb=title II. Why to introduce SDMX in a 8. There are many reasons why National Statistical Institutes could decide to use the SDMX standards. At the base of all there is the tremendous pressure on there resources of the statistical organisations which face everyday new data demands without a parallel financial allocation. Synergies, standardization and optimization of the processes and infrastructures are the only solution to this challenge. In this context SDMX can help by: (d) improving quality and efficiencies in the exchange and dissemination of data and metadata through: harmonisation and coherence of data; preservation of meaning by coupling data with metadata that defines and explains it accurately; open format (XML) rather than a proprietary one; reducing national reporting burden to European and international institutions, in fact a data reporting organization publishes data once, and lets their counterparties "pull" data and related metadata as required; reducing costs through the re-use of the software; facilitating and standardizing the use of new technologies as XML and services. Many s are already using, or are planning to use, XML as the basis for their data management and dissemination systems. By choosing SDMX one could avoid the proliferation of many XML grammars. II. SDMX architectures for data sharing and exchange 9. In order to facilitate the introduction of SDMX within the s, Eurostat and other Sponsor Organizations have been putting in place different initiatives, among which capacity building actions and development of re-usable software and tools. Moreover during the last years Eurostat has developed two SDMX service infrastructures for data collecting that are used in several projects jointly with the Member States: Data Repository (warehousing) architecture; Data Hub architecture.

3 The offered SDMX service infrastructure could be used by national statistical authorities as an add-on to the IT architecture and links to national reference or dissemination databases. Therefore it does not require any changes to the national IT architecture. Below the two SDMX architectures are described. A. Data Repository (warehousing) architecture 10. The Data Repository architecture is implemented by those collecting organisations that periodically collects the data and to load them in their database. In general a batch process is used in order to automate the flow in which a whole or a partial dataset, including incremental updating, is used. 11. The Data Repository architecture supports both push and pull methods. 12. The push method within Data Repository architecture expects that the data provider sends a file in SDMX format to the data collector. In this case the data provider can: Create SDMX format file directly while extracting data from the data warehouse, using an suitable software, or convert a data file (generally in CSV) using tools available from the SDMX sponsors. 13. Then the SDMX file is pushed using the appropriate channel (edamis in the ESS). 14. The pull approach within Data Repository architecture includes the following steps based on a provision agreement: when data for transmission, the data provider creates an SDMX-ML file containing the to be transmitted data set or provides a (WS) capable building SDMX-ML messages upon request. Notification to data consumers about the available data and the details on how to obtain them are normally done with an RSS web feed; the data collector Pull Requestor reads the arriving RSS feed entry (or receives the information on the new data by other means. He can now retrieve the SDMX-ML file from the specified URL or use the Query Message included in the RSS feed to query the data provider s. register SDMX Registry query P U L L RSS Eurostat Pull Requestor Received data in SDMX-ML Loader Database P U S H RSS edamis Data Reception Verification / Conversion To SDMX Loading preparation Data Warehouse Dissemination XSL for SDMX-ML Fig. 1 Data Repository (warehousing) architecture

4 B. Data Hub architecture 15. The Data Hub architecture supports the pull method only i.e., a group of partners agree on providing access to their data directly from their database according to standard processes, formats and technologies (web service). 16. From the data management point of view, the hub is also based on a pre-specified datasets, which are - contrary to the database driven architecture - not kept locally at the central hub system. Instead the following process operates as follows: (d) A user identifies a dataset through the Graphical User Interface of the Data hub using the structural metadata, and requests it; The Data Hub translates the user request in one or more queries and sends them to the related data providers systems; Data Providers systems process the query and send the result to the Data Hub in standard format (SDMX-ML); The Data Hub puts together all the results originated in all implicated Data Providers systems and presents them in a human readable format. SDMX messages Data Hub Query Dissemination G U I cache XSL for SDMX-ML Data Providing Organizations Fig. 2 Data Hub architecture Data collector Organization III. Strategies to foster SDMX implementations within s 17. SDMX was born with improving quality and reducing costs ideas in mind. Eurostat for some time has been developing re-usable software in order to facilitate the introduction of SDMX within s. These software can generally be freely downloaded (open source) from the SDMX website. The source code for these tools is available so that they can be used as components for building own IT systems in statistical organisations. 18. Sharing free software can have various forms: the distribution of tools developed by one member of the community for the benefit of the others or the joint development in a collaborative way such that each partner contributes to the final product. Eurostat is currently supporting both approaches with: project aiming to design an SDMX service infrastructure for s and developing related building blocks;

5 support, through SDMX ESSnet 1, a group of Member States that have joined their resources in order to develop SDMX re-usable software. 19. The main deliverable of the above approaches is the SDMX service infrastructure composed by several Building Blocks that can be re-used entirely or as single blocks to be integrated in an existing statistical information system. IV. SDMX Reference infrastructure 20. The infrastructure represents the syntheses of several experiences (in several statistical offices) and can be considered not a strict specification rather than a guide or good practice document. 21. The main objective is to provide a description / specification of a generalized service infrastructure that can be re-used partially or as whole by s interesting in starting SDMX projects. To that end Eurostat have been developing software and tools that facilitate the production of SDMX data and their exposure via s technologies. Data Collector (Eurostat) SDMX Infrastructure DSDs Dissemination environment Mapping Store Mapping Assistant Hub Pull Provider SDMX Query Parser Data Retriever DDB Pull Requestor SDMX Data Generator Fig. 3 A simplified view of the SDMX Reference service infrastructure 22. In the Figure 3, three areas can be identified (bordered by dashed lines). The left-hand side area concerns the Data Collector, e.g., Eurostat. It contains the modules "pulling" SDMX data from a Data Producer, e.g.,. The right-hand side area concerns the Data Producer. The only part of Data Producer IT environment concerned the dissemination environment is presented in the figure 3. The dissemination 1 ESSnet - European Statistical System Centres and Networks of Excellence is an instrument created by Eurostat in order to find synergies (from cooperation between partners), harmonization and dissemination of best practices in the ESS.

6 responsibility, inter alia, is to provide data to Data Collectors. The central area represents the software developed by Eurostat that acts like an interface between the Data Collector and the dissemination databases in Data Providers' environment 23. A can decide to use the SDMX service infrastructure as a whole, can extend the infrastructure adding new modules, can modify some modules, or can integrate some building block within its existing dissemination environment. 24. Details on the modules participating in this infrastructure are included in the following sections. More specific information can be found on CIRCA website at the following URL: http://circa.europa.eu/public/irc/dsis/stne/library?l=/x-dis/tools/reference_architecture/reference_architecture/_en_1.0_&a=d 25. Dissemination Database (DDB) is the storage data warehouse (or database) of the Data Provider dissemination environment maintained to store data ready for publication / dissemination to potential Data Collector. In some cases, the DDB may consist of files, e.g., PC-Axis files. 26. Provider is responsible for receiving an SDMX Query message and responding with an SDMX-ML data messages. It concerns the dynamic pull scenario. It also co-ordinates building blocks used when producing the response. This component exposes the underlying functionality using a SOAP interface. 27. The Mapping Assistant (MA) is a desktop tool allows user to create through a Graphical User Interface a mapping between the structure metadata provided by an SDMX-ML Data Structure Definition (DSD) and those that reside in a Dissemination Database of an dissemination environment. 28. The Mapping Assistant is designed to edit and store the mapping information in a DBMS called Mapping Store (MS), and communicates with both the Mapping Store and the Dissemination databases in standard SQL. 29. The Mapping Store contains the mappings between the SDMX and the native format (a file or a DB schema). It is a database maintained by the "Mapping Assistant" in order to provide these mappings to the "Data Retriever" module.

7 30. A user creates with the help of Mapping Assistant: DATASET; MAPPINGSET; TRANSCODING. 31. The DATASET defines a physical mapping of storage schemas from Dissemination databases or PC-AXIS files, to a DSD related schema in such a way that DSD component information resides in one or more DATASET columns: represented by SQL queries. E.g., one can map one or more columns of a Dissemination database to one or more dimensions of a DSD. 32. Alternatively a user can write a custom "select" SQL queries for the DATASET. SQL query belongs to four predefined query types relating columns and tables and joining tables to the parameters of the DSD (e.g., codes, dimensions, measures). 33. The MAPPINGSET contains the logical mapping between a DATASET and a DSD. It allows user defining relationships when the mapping of concepts used in the Dissemination database to concepts described in the DSD is not one to one. 34. For example the local concept in the Dissemination database named Unit could be mapped to two concepts in the DSD: Unit of measure and Unit multiply. 35. The TRANSCODING relates codes from code lists in the Dissemination database to those in the DSD. This operation can be achieved directly through the GUI or importing the transcoding rules from a CSV file. E.g., transcoding on the Frequency concept: local codes DSD codes 1 A 4 Q 12 M 36. For the time dimension, the tool allows relating several kinds of time formats. E.g., if the Dissemination database time dimension is YYYY:MM:DD, the tool allows to map as YYYY for annual data, or YYYY-MM for monthly data VII. Software maintenance and governance 37. The design of the SDMX Reference Architecture and the development of the related building blocks were conceived with the goal of offering everything as open source package under the EUPL licence. 38. Up to now Eurostat is managing both the evolutive and adaptative maintenance. In the future the governance of the versioning could be difficult to achieve, because the re-using of the building blocks and their improvement by s could bring to a scenarios with different versions of the same software created by different subjects. 39. For the time being there are very few experience of open source software development within the statistical community, so it is hard to learn from the experience. The experience developed worldwide in many open source communities could help, but they could be not adapted directly to the statistical community. 40. Know-how could come from s participating in two ESSnet projects.