WP8 Methodology. Piet Daas

Size: px
Start display at page:

Download "WP8 Methodology. Piet Daas"

Transcription

1 WP8 Methodology Piet Daas

2 WP8 Methodology The aim of this workpackage is laying down a general foundation in the areas of Methodology, Quality and IT infrastructure when using Big Data within the European Statistical System. SGA-2 specific NL BG PL SL PT AT IT Piet Daas Valentin Chavdarov, Galia Stateva Jacek Maslankowsk Vesna Horvat, Manca Golmajer (Boro Nikic) Sónia Quaresma Magdalena Six Tiziana Tuoto

3 Overview What has been done? Recent changes What is planned? Important points

4 What has been done? Workshop in Heerlen on Important topics in the area of Methodology, Quality and IT when using Big Data for official statistics Date 25/26 April 2017 Attended by: Piet Daas (CBS, Netherlands), Owen Abbott (ONS, United Kingdom), Ciprian Alexandru (INS, Romania), Eleni Bisioti (EL, Greece), Valentin Chavdarov (BNSI, Bulgaria), Marc Debusschere (SB, Belgium), Vesna Horvat (SURS, Slovenia), Jean-Marc Museux & Fernando Reis (Eurostat), Maiki Ilves (EE, Estonia), Øyvind Langsrud (SSB, Norway), Jacek Maślankowski (GUS, Poland), António Portugal (INE, Portugal), Marco Puts & Martijn Tennekes (CBS, Netherlands), Luis Sanguiao (INE, Spain), Magdalena Six (STAT, Austria) and Dan Wu (SCB, Sweden) Identified topics of importance for IT, Quality and Methodology They are related

5 Priorities in Big Data IT 1. Metadata management (ontology) 2. Big Data Processing Life Cycle 3. Format of Big Data processing/unified framework (languages/libraries) 4. Datahub (access 2 Big Data) 5. Data source integration 6. Chosing the right infrastructure 7. List of secure and tested API s 8. Shared Libraries and standards to document 9. Data-lakes (link with traditional sources) 10. Training/skills/knowledge 11. Speed of algorithmes 5

6 Priorities in Big Data Quality 1. Coverage 2. Comparability over time 3. Processing errors 4. Process chain control 5. Linkability 6. Measurement error 7. Model errors and Precision 6

7 Priorities in Big Data Methodology 1. Assessing accuracy 2. What should our final product look like? 3. Deal with spatial dimension 4. Changes in datasources 5. Machine learning in official statistics 6. Data linkage 7. Secure multi-party computation 8. Inference 9. Sampling 10. Who processes data-architecture 11. Unit identification problem 7

8 List of common issues identified across topics IT Quality Methodology Big Data processing Life Cycle Comparability over time Changes in Data Sources Data source integration Linkability Data linkage Coverage Process chain control Process chain control Model errors & precision Unit identification problem Secure multi-party computation Who processes architecture Inference Described in a report and a PowerPoint

9 and then

10 The solution Anke joined Yeeah!

11 WP8 Methodology Anke Consten & Piet Daas

12 Id Deliverables and Milestones New deadline Who? 8.5 Report of the Big Data/Data Science expert workshop & 1 st internal WP meeting - What has been planned? 8.1 Literature overview Anke/Piet 8.6 Progress report on the status of the remaining deliverables Anke/Piet 8.3 Report describing the IT-infrastructure used and the accompanying processes developed and skills needed to study or produce Big Data based official statistics PL, NL, PT, IT 8.7 Report of 2 nd internal WP-meeting Anke/Piet 8.2 Report describing the quality aspects identified in studies focussing on the use of Big Data for official statistics 8.4 Report describing the methodology (principles of finding and collecting Big data and assuring stable access, methodology of using Big Data as a single or major source of input, methodology of using Big Data as an additional data source in combination with others) of using Big data for official statistics and the most important questions for future studies AT, BG, PL, SL, IT BG, NL, SL, IT Adjusted plans!

13 What has been done? The official first deliverable Start of literature overview Relevant for Big Data and official statistics Draft online on p/wp8_documentation. Extended version on Date Is a living document (will be updated by the other NSI s in WP8) Additional info will be added (PL)

14 What is planned? Divided the work on IT, Methodology and Quality IT (PL, PT, NL, IT) Quality (AT, SL, BG, PL, IT ) Methodology (BG, NL, IT, SL) Progress report on remaining work in WP8 Date

15 IT Planning (Jacek PL in lead) All 11 topics are assigned (Doodle) Draft versions planned between & Jacek, Piet and Sónia (and Monica) Deadline final version

16 Methodology Planning (Valentin/Galya, BG in lead) Doodle (now) A total of 11 topics but reduced at start to: 2. What should our final product look like? 9. Sampling 7. Secure multi-party computation 6. Data linkage 8. Inference 1. Assessing accuracy 3. Deal with spatial dimension Interlinkage with quality 1. Metadata 2. Processing errors 3. Biased statistics Deadline final version

17 Quality Planning (Magdalena, AT, in lead) Doodle now Votes on all 7 topics Make use of Quality frameworks already developed UN, SE, And work done in WP1-7

18 Important points Approach followed for WP8 is top-down Will be linked with the experiences in WP1-7 of ESSnet This is done bottom-up Timing issue At the end of SGA2 important output of many WP s for WP8 Send draft versions as early as possible

19 Questions?