Global Conference OECD Conference Centre, Paris

Size: px
Start display at page:

Download "Global Conference OECD Conference Centre, Paris"

Transcription

1 Global Conference 2009 OECD Conference Centre, Paris

2 Global Conference 2009 Capacity Building Workshop Wednesday, 21 January OECD Conference Centre, Paris

3 Overview Session A Capacity Building Workshop 1. Introduction - Stuart Feder, Chair 2. Business Case - Gabriele Becker 3. Technical Standards - René Piché 4. Content-Oriented Guidelines - Lars Thygesen 5. User Guide - John Allen 3

4 Capacity Building Workshop Follow-Up Parallel Sessions Session B: Technical Standards Gabriele Becker, Chair Conference Centre Room 1 Session C: Content-Oriented Guidelines Lars Thygesen, Chair Conference Centre Room 10 4

5 Global Conference 2009 SDMX: The Business Case Gabriele Becker Bank for International Settlements Wednesday, 21 January OECD Conference Centre, Paris Views expressed are those of the presenter and not necessarily those of the BIS.

6 Overview SDMX vision The statistical process Components of the SDMX framework Analyse the relationships Conclusions 6

7 The SDMX vision Facilitate data and metadata exchange Make efficient use of technologies and standards Reduce reporting burden Enhance availability of statistical data and metadata for the users Data reporting = data dissemination = data sharing 7

8 Statistical process chain in an NSO /NCB Requirements specification Data collection design Data collection from reporting agents Data quality control Data compilation Data dissemination (internal, respondents, public, international organisations) Data analysis (data discovery, navigation, search) 8

9 SDMX components Information model for data, metadata and the data exchange process Two syntaxes to exchange data and metadata (EDI, XML) SDMX registry standards and registry interfaces SDMX tools Content-Oriented Guidelines Cross Domain Concepts Meta Data Common Vocabulary How do these relate to the statistical process? 9

10 The statistical process and SDMX Information Model EDI and XML syntax expressions and tools SDMX registry and registry interfaces 1 Requirements specification 2 Data collection design 3 Data collection from reporting agents 4 Data quality control Content-Oriented Guidelines 5 Data compilation 6 Data dissemination (to respondents, public, international organisations) 7 Data analysis (data discovery, navigation, search within the organisation) 10

11 Some initial conclusions Information model is everywhere in the statistical process Technology (syntaxes, tools and registry) are enablers when it comes to exchanging or sharing data and metadata Content-Oriented Guidelines are everywhere as well How can SDMX impact the individual steps in the statistical processing chain? 11

12 Requirements specification and data collection design Information model Forces us to consider metadata from the start Helps to organise the data when specifying new data collections Applying the (same) information model from the start leads to economies of scale Learning effect across the statistical organisation(s) First data structure definition is the most difficult. SDMX Tools to help defining data structures Content Guidelines Potential re-use of existing code lists Use of registries to share code-lists and structures 12

13 Data collection from reporting agents Standard formats (SDMX-EDI or SDMX-ML) Reflect the Information Model and Content-Oriented Guidelines Support collection of data and metadata Data structures and code lists exchanged in computer readable formats Support for automation Push or pull model Benefits: speed up reporting and reduce errors Stepwise implementation 13

14 Stepwise implementation Currently push model prevails Reporting agents send data (files) to the collecting agency Can start with simple tools (eg EXCEL SDMX creator) Automation possibilities for reporting agents and for collection agency already for push model Generalisation of SDMX file creation Creation of SDMX-EDI or SDMX-ML out of the database Conversion between SDMX-EDI and SDMX-ML via tools Move to pull model and use of SDMX registry 14

15 Data quality control and compilation Standard reporting formats allow for automated checking Check formulas may be derived from the code lists eg in the case of hierarchical code lists SDMX supports move towards metadata driven processing Generic processing systems based on information (data) model Decreasing marginal cost for new data collections 15

16 Data dissemination Commonalities with data collection Changed roles: collector (NSO / NCB) becomes provider to international organisations or the public Same SDMX formats as used for data collection Disseminate Data and metadata Data (and meta data) structure definitions Push and pull models, SDMX queries Use of SDMX registry Automation Benefits: Economies of scale and better service for users 16

17 Data analysis: discovery/navigation/search Any navigation system is metadata driven SDMX information model forces us to be clear about the metadata and data structures from the start Generalised navigation system based on the model can easily incorporate new data collections (= new data structures based on the information model) Registry provides a mechanisms to find data from remote sources Generic tools (stylesheets) for easy data rendering Benefits: enhanced access to data and better metadata 17

18 Conclusions: Information model SDMX is not only for IT experts SDMX benefits can only be fully achieved if statistical experts apply the information model in their work Define data and metadata structures for their statistics SDMX information model and content guidelines can Affect the complete statistical processing chain Influence the design of our processing systems Influence the design of our navigation systems 18

19 Conclusions: technical standards and tools Technical standards allow efficient and generalised application of SDMX information model Technology (eg registry technology) is an enabler to make SDMX work better A stepwise introduction of SDMX is possible 19

20 SDMX: Benefits Speed up movement of data through the processing chain Reduce time to users for data and metadata Increase level of automation Reduce risk of errors Enhance facilities for metadata exchange Easier to ship metadata with the data Better understanding of data Reporting = dissemination = data sharing Reduction of reporting burden 20

21 Review SDMX information model Sample data structures used by others Practical steps Relate to your own data model as implemented in your systems There will be commonalities Experiment with the SDMX tools to understand and build data structures for your own data Review the technical standards and formats Understand relationship between the information model and its practical implementation as SDMX-EDI or SDMX-ML Review the content-oriented guidelines Understand benefits of re-usable cross-domain concepts and code lists for internal and external exchange and navigation (interoperability) Assess impact on your own institution s processing and exchanges / sharing with users/public and other institutions 21

22 Towards SDMX Conformance Apply information model for data structure definitions Use SDMX-EDI or SDMX-ML for data and metadata exchange Use SDMX cross-domain concepts as much as possible when defining data and metadata structures Publish data structure definitions Re-use existing (internationally shared) code lists Offer data in SDMX-ML on (public) website Offer data via SDMX registry / web services Contribute experiences and ideas to strengthen SDMX 22

23 Help Make SDMX Increasingly Useful Let us hear about your business case: Find out more about SDMX at our website: 23

24 Thank you! Gabriele Becker Head, Statistical Information Systems Monetary and Economic Department Bank for International Settlements Basle, Switzerland 24

25 Global Conference 2009 The Technical Standards René Piché International Monetary Fund 21 January 2009 OECD, Paris, 21 January 2009 The views expressed in this presentation are those of the author and should not be attributed to the IMF, its Executive Board, or its management.

26 Outline The SDMX Technical Specifications within the Overall Standards Package Functional areas of the SDMX Technical Specifications Data structures and data formats Metadata structures and formats Registry-based web-services architecture Using SDMX: The Toolkit Approach SDMX and Legacy Systems Tools and Support 26

27 The SDMX Standards Package Covers statistical issues SDMX Content-Oriented Guidelines (Cross-Domain Concepts, Subject-Matter Domains, Metadata Common Vocabulary) Covers the use of IT SDMX Technical Specifications (Data, Metadata, Registry Architecture) The SDMX Technical Specifications are the technology foundation on which statistical issues are addressed. 27

28 Differences The Technical Standards Apply to any statistical concepts or subject-matter They are only concerned with IT Go through an approval process also involving the International Organization for Standardization (ISO) and change infrequently The Content-Oriented Guidelines Recommend common statistical concepts Offer shared classification of topics for exchange Are concerned with standard approaches across statistical domains to improve comparability Are maintained by SDMX and will change as frequently as needed involving open review periods to provide an opportunity for public comment 28

29 Why? This is an intentional design, based on experience of standards development in other areas (e-business, technical publishing, etc.) It is easier to agree on the solution to statistical issues when there is a shared approach to the use of IT 29

30 Model-Based Approach Similar to other modern standards, SDMX is modelbased All of the SDMX Technical Standards are derived from the SDMX Information Model This is a conceptual model of processes and interactions within statistical exchange It makes SDMX more consistent and easier for developers to work with The following slide shows a high-level schematic of this model as an example Don t try to learn it from this presentation! 30

31 Structure Maps Data or Metadata Set structure and code list maps conforms to business rules of the data/metadata flow SDMX Information Model Schematic Data or Metadata Structure Definition uses specific data/metadata structure Data or Metadata Flow can be linked to categories in multiple category schemes Category Scheme Category comprises subject or reporting categories Data Provider publishes/reports data/metadata sets can provide data/metadata for many data/metadata flows using agreed data/metadata structure Provision Agreement registers existence of data and metadata can get data/metadata from multiple data/metadata providers URL, registration date etc. can have child categories Data or Metadata Set 31

32 Functional Areas: Data SDMX provides a shared way of: Describing the structure of any aggregate or time series data (concepts, codelists, dimensions, etc.) This is known as a Data Structure Definition (or Key Family ) It can be expressed as an XML file or as an EDIFACT ( SDMX- EDI ) file Formatting the data As an XML file As an EDIFACT ( SDMX-EDI ) file Creating XML schemas These describe the XML data files and are needed by applications They can be automatically generated from the Data Structure Definition 32

33 An Improvement Many applications today exchange data with CSV files or in similar formats This requires knowledge of exactly how the file is organized But the file itself does not necessarily have this information With SDMX data files, you always know exactly what the structure is Data files reference their DSDs Data files can be automatically validated to catch errors 33

34 Functional Areas: Metadata Some information is not data, but it is very important to exchange For example, Eurostat s SDMX Metadata Structure (ESMS) and the IMF s SDDS and GDDS Many organizations collect and report quality metadata SDMX has an approach to metadata which is similar to how it handles data Metadata Structure Definitions describe how metadata files are structured (in an XML file) They can use any metadata concepts or representations SDMX provides XML formats for exchanging metadata files XML schemas can be automatically generated 34

35 An Improvement Many applications exchange metadata as Word documents or in similar formats These can be tricky to process, because authors can make mistakes in templates These documents can be difficult to validate With SDMX Metadata files, you always know exactly what the structure is Metadata files always reference their Metadata Structure Definitions They can be automatically validated to catch errors 35

36 Functional Areas: Registry-Based Web - Services Architecture SDMX provides standards to support web services ( service-oriented architecture ) Web services are becoming the normal way to write applications They provide a good way to automate processes and to lower the cost of developing and owning applications SDMX provides a standard for SDMX Registry Services This is an important part of a web-services architecture 36

37 SDMX Registry The SDMX Registry acts as a shared repository (a database) where applications can discover What data and metadata is available, and where to find it What the data and metadata structures are (concepts, codelists, DSDs, MSDs, etc.) Who is providing what data to which data sets, and which metadata to which metadata sets An SDMX Registry is used on a network and can be shared, for example Among the applications within an organization (intranet), or Among the counterparties within a statistical domain (Internet) The SDMX Registry facilitates automation of statistical collection and processing It can send out notifications to applications when new data or metadata, or new structures become available 37

38 Indexes data and metadata Describes data and metadata sources and reporting processes Describes data and metadata structures SDMX Registry/Repository REGISTRY Data Set/ Metadata Set REPOSITORY Provisioning Metadata REPOSITORY Structural Metadata Register Query Submit Query Submit Query S D M X R e g i s t r y I n t e r f a c e s 38

39 Indexes data and metadata Subscription/ Notification Applications can subscribe to notification of new or changed objects Describes data and metadata structures SDMX Registry/Repository REGISTRY Data Set/ Metadata Set REPOSITORY Provisioning Metadata REPOSITORY Structural Metadata Register Query Submit Query Submit Query S D M X R e g i s t r y I n t e r f a c e s 39

40 SDMX Toolkit Approach The SDMX Technical Standards are designed to support a toolkit approach You can choose to use some features and not use other features For example, you may only wish to use the data structures and formats Because of the potential complexity of web services, not all SDMX applications will use the registry-based SDMX architecture It does maximize efficiency through the data sharing model, but But it is not necessary to realize increasingly significant benefits of using SDMX 40

41 SDMX and Legacy Systems SDMX is designed as a standard for improving the exchange of data (reporting, collection, dissemination) It does not dictate how systems which are internal to an organization must function Although it does provide a useful model for designing internal systems It is expected that organizations can continue to use their existing systems and immediately start to use the SDMX standards for the purposes of exchange, sharing and dissemination This minimizes the cost of implementing SDMX, because you need only be able to import/export the SDMX files And by providing practical benefits this can encourage internal use of SDMX 41

42 SDMX Implementations and Tools The SDMX website is the first place to go for information and links to SDMX-related implementations and a growing number of free tools for working with SDMX: 42

43 Implementation Page on SDMX website 43

44 Tools Page on SDMX website 44

45 Questions? 45

46 Global Conference 2009 Content-Oriented Guidelines Lars Thygesen OECD Wednesday, 21 January 2009 OECD Conference Centre, Paris 46

47 SDMX Technical Standards Information model Syntax for the exchange of data and metadata (EDI, XML) SDMX registry and other IT tools we can exchange data efficiently but what if each domain and organisation uses own concepts? e.g. a country reporting to Eurostat, IMF and OECD etc 47

48 Content-Oriented Guidelines: Objectives Recommend practices for interoperable data and metadata sets Harmonise concepts and terminology Avoid stove-pipes in isolation SDMX Technical Standards and Content-Oriented Guidelines can be used independently But they are supported by a well-known technical framework 48

49 History March 2006 version - review January 2007 version Public review: hundreds of proposals, NSOs and Central Banks Intensive discussions 2 nd half of version published 14 January 2009 Future: More input from users More discussions More common codelists 49

50 What COG is Cross-Domain Concepts + Code lists short list of statistical concepts relevant to many statistical domains (e.g. frequency, observation status, time format, unit of measure, timeliness) Statistical Subject-Matter Domains list of subject-matter domains or themes (e.g. demographic and social statistics, economic statistics, environment) coordination of existing experts/groups concerned with domainspecific data (e.g. national accounts, balance of payments, external debt) Metadata Common Vocabulary common cross-domain statistical terminology; nomenclature used in the SDMX content-oriented guidelines 50

51 What COG isn t Domain-specific guidelines and nomenclatures will be developed by domains reference to vocabularies, e.g. Part of ISO Mandatory for SDMX exchange but most advisable Imposing to use SDMX codes in own databases only for exchange mapping on both sides 51

52 How are COG used: Cross-Domain Concepts 52

53 How are COG used: Cross-Domain Code lists A limited set of SDMX code lists, e.g. on observation status, frequency, etc. Recommended code descriptions listed together with recommended SDMX codes Only a certain harmonisation of these structural metadata is achieved at this stage; more harmonisation as soon as possible In addition: improvement of SDMX code lists needed for next release of COG 53

54 How are COG used: Statistical subject-matter domains Standard scheme to which similar national or international lists can be mapped to facilitate the exchange of data and metadata Framework which alleviates the searching of data and metadata e.g. on SDMX registries Navigation aid for identification and organization of corresponding domain groups playing an active role in SDMX 54

55 Metadata Common Vocabulary The MCV is a common vocabulary of metadata terms: wider than the SDMX Cross-domain Concepts list Improved visibility for existing definitions (either authored as SDMX or taken from existing authoritative sources where possible to avoid a proliferation of standard terminologies) Possibility of mapping different metadata systems, including those at national level, independently from any specific metadata model Support to standardisation and consistency of metadata compiled Support to XML structures and web services for searching and comparing data and metadata with minimum need to determine a semantic equivalence 55

56 Mapping of Metadata Frameworks Along with the website availability of COG you will find mappings involving SDMX metadata concepts and the IMF Data Quality Assessment Framework (DQAF) Eurostat SDMX Metadata Structure (ESMS) OECD Metastore There are four mappings: SDMX to DQAF > ESMS > OECD DQAF to SDMX > ESMS > OECD ESMS to SDMX > DQAF > OECD OECD to SDMX > ESMS > DQAF 56

57 Metadata Framework Mapping: Example 57

58 Uses of Cross-Domain Concepts 58

59 What we hope to achieve When we use a term, all know its meaning Dimensions in a data cube can be standardised across domains You don t receive 200 different time formats Metadata items in your local databases can be updated from metadata in provider s base 59

60 Don t miss Special capacity building session on COG Practical implementations To follow in Room CC10 60

61 Global Conference Capacity Building Workshop SDMX User Guide John Allen Eurostat Wednesday, 21 January 2009 OECD Conference Centre, Paris 61

62 Why an SDMX User Guide? The short answer «explanations and guidance to users and potential users of SDMX» shorter and easier to understand than the standards documents a «cookbook» of solutions based on SDMX 62

63 Short history of the User Guide Before 2007 Various tutorials «Getting started» chapter based on tutorials from Eurostat and ECB «Tutorial: Data Structure Definitions» in SDMX standards documentation 2007 SDMX User Guide Version Working Draft compiled by SDMX Advocacy Group (secretariat members from BIS, ECB, Eurostat, ECB) 63

64 Issues for the User Guide more explanation of how to make use of SDMX from the perspective of statisticians in national statistical agencies balanced coverage of standards, guidelines, architecture, tools and we have to allow for the fact that SDMX users have very different starting points in terms of existing knowledge have different motivations Can one User Guide satisfy everyone? 64

65 New approach for 2009 User Guide with two «threads» «core thread» based on real-world use cases - things which statisticians do (and no XML.) parallel «tutorial thread» which gives the more technical explanations readers have choices: they can read through the «core thread» and then divert or go back into the more technical explanations when they need them 65

66 Original plan (OECD meeting, April 2008) «core thread» What are structural metadata What are reference metadata Structuring data and metadata for reusability and interoperability Generating SDMX-structured data Exchanging data and metadata using SDMX Publishing data and metadata using SDMX Uses for an SDMX Registry «tutorial thread» SDMX Information Model SDMX message types DSDs and MSDs Using SDMX tools and standards within a pull-based architecture The SDMX hub approach Building and operating an SDMX Registry Obtaining and using SDMX tools XML-based technologies used by SDMX Differences between GESMES and SDMX-ML 66

67 User Guide release PART A: CORE THREAD A.1 What is SDMX A.2 How does SDMX fit with statistical work (the statistical value chain) A.3 The SDMX Information Model: Metadata Structures A.4 The SDMX Information Model: Data Structures A.5 The SDMX Cross-Domain concepts A.6 Publishing data and metadata using SDMX A.7 Uses for an SDMX Registry 67

68 User Guide release PART B: TUTORIAL THREAD B.1 SDMX Technical Overview B.2 Obtaining and using SDMX tools B.3 XML-based technologies used by SDMX SDMX message types B.4 Differences between SDMX-EDI and SDMX-ML B.5 SDMX message types for data B.6 SDMX message types for reference metadata B.7 SDMX architectures using the pull mode for data sharing B.8 Building and operating an SDMX Registry B.9 Data Structure Definitions: a tutorial B.10 Guidance on setting up Data Structure Definitions 68

69 What the User Guide can do for you. The User Guide can... explain SDMX (from the basics to some advanced topics) tell you where to look for more (for example, which document in the standards package ) give you examples based on real experience of people working with SDMX (the «SDMX cookbook» 69

70 and what you can do for the User Guide You can... tell us what else needs to be explained tell us if something is wrong or not well explained provide more contributions based on your experience applying SDMX standards and guidelines The SDMX User Guide is a collaborative project 70

71 and where to find it 71

72 and where to find it 72