The ESSnet Big Data. Peter Struijs. DGINS 2018 Bucharest 10 October 2018

Size: px
Start display at page:

Download "The ESSnet Big Data. Peter Struijs. DGINS 2018 Bucharest 10 October 2018"

Transcription

1 The ESSnet Big Data Peter Struijs DGINS 2018 Bucharest 10 October 2018

2 Scheveningen Memorandum on Big Data Examine the potential of big data sources for official statistics Official statistics big data strategy as part of wider government strategy Address privacy and data protection Collaboration at European and global level Address need for skills Partnerships between different stakeholders (government, academics, private sector) Developments in methodology, quality assessment and IT Adopt action plan and roadmap for the ESS 2

3 Envisaged Benefits of Using Big Data for Official Statistics Faster data production Higher detail, e.g. geographically, frequency More data More flexible response to user needs Increased efficiency Stay relevant 3

4 Framework of the ESSnet BD I Framework Partnership Agreement: January 2016 May countries (22 partners) Two Specific Grant Agreements: SGA-1: February 2016 July 2017 SGA-2: January 2017 May

5 Partner countries of the ESSnet BD I 5

6 Pilots of the ESSnet BD I List of pilot projects Web scraping (2 work packages) job vacancies; enterprise characteristics Smart meters electricity consumption; temporary vacant dwellings Automatic Identification System (AIS) vessel identification data Mobile phone data preparing for access to data Early estimates various domains Multiple domains population, tourism / border crossing, agriculture 6

7 Subdivision of Pilots into Phases 1. Data access Conditions; partnerships 2. Data handling Production criteria; micro versus aggregated data; visualisation 3. Methodology and technology Methodology for long lasting statistics; process design 4. Statistical output Examples of existing and new outputs; potential users; comparison with current estimates (quality, timeliness, level of detail) 5. Future perspectives Applicability in ESS; future production process; exploration of further possibilities of using and combining (big) data sources 7

8 WP 1: Webscraping / Job Vacancies WP leader: UK Partners: Belgium, Denmark, France, Germany, Greece, Italy, Portugal, Sweden, Slovenia Data access: job portals Data handling: legal and technical aspects, test webscraping Methodology for output production: from semi-structured to structured data Future perspectives: webscraping enterprise websites, methodology for future production, explore new products 8

9 Online Job Vacancy Data Landscape 9

10 Model for Measuring Job Vacancies Target Population: All job vacancies Ghost Vacancies Advertised on a job portal Employing business is identifiable Advertised through an agency Advertised on enterprise website 10

11 Approach to Data Integration 1. Survey and Online 2. Online only 3. Survey only 4. Neither survey or online Business Register Enterprise A Enterprise B Enterprise C Enterprise D Enterprise E Enterprise F Enterprise G Enterprise H Enterprise I Enterprise J 4. Modelled estimates 3. Use survey estimates Survey Estimates Enterprise A Enterprise B Enterprise C Enterprise F Enterprise G Matching Scaling Factors (by NACE?) Integrated data set Enterprise A Enterprise B Enterprise C Enterprise D Enterprise E Enterprise F Enterprise G Enterprise H Enterprise I Enterprise J Counts from online sources Enterprise A Enterprise B Enterprise C Enterprise D Enterprise E 1. Scale online data to survey estimates 2. Apply scaling factors to on-line data Total = Survey Estimate 11

12 WP 2: Webscraping / Enterprise Characteristics WP leader: Partners: Italy Bulgaria, Netherlands, Poland, Sweden, UK Data access: inventory of target enterprises, URLs; legal and privacy aspects Data handling: use cases; actual webscraping Testing of methods and techniques: proof of concept for selected use cases; build and apply predictor for estimates of enterprise characteristics 12

13 Logical Reference Architecture 13

14 Towards a List of URLs 14

15 WP 3: Smart Meters WP leader: Partners: Estonia Austria, Denmark, Italy, Portugal, Sweden Data access: availability of smart meters, legal aspects Data handling: coverage assessment, production of cleaned datasets Methodology and techniques: linkage with administrative data; methodology for electricity consumption businesses and households; also seasonally vacant living spaces Future perspectives: potential new products, feasibility of using aggregated data 15

16 Grid Structure in Sweden 16

17 Estonian Data Structure Estonian data structure: 4 main tables Metering data main table with hourly consumptions Metering points location Agreements contract info Customers contract holder information 17

18 WP 4: AIS Data WP leader: Partners: Netherlands Denmark, Greece, Norway, Poland Data access: data availability (in particular EMSA) Data handling: processing and storage, aimed at linking with data from port authorities, traffic analyses, journeys Methodology and techniques: for linking with data from port authorities and traffic analyses; estimate emissions Future perspectives: qualitative cost-benefit analysis 18

19 Data Received 19

20 Error Types and Causes 20

21 Possible Tools 21

22 WP 5: Mobile Phone Data WP leader: Partners: Spain Belgium, Finland, France, Germany, Italy, Netherlands, Romania, UK Data access: data availability (workshop with MNOs) Data handling: investigation of IT tools and aggregation level needed Statistical outputs: describe a statistical output to be presented to MNO to carry out a pilot 22

23 Processing Steps of Mobile Phone Data 23

24 WP 6: Early Estimates WP leader: Partners: Slovenia Finland, Italy, Netherlands, Poland, Portugal Data access: sources for consumer confidence index, nowcasts of turnover and early estimates Data handling: technical requirements; deployment of collection system Methodology and techniques: includes feability of linking administrative and other existing sources Future perspectives: calculation of the consumer confidence index and nowcasts of turnover; pilots for combining sources for early estimates 24

25 Domains of First Interest Tourism Population mobility Health statistics Agriculture Quick and dirty statistics (all domains) Economic indicators: GDP Consumer Price Index (CPI) Retail sales Balance of Payments (BoP) Economic sentiment indicators 25

26 Early Estimate of GDP vs Official Release 26

27 WP 7: Multi Domains WP leader: Partners: Poland Netherlands, Portugal, UK Data access: data availability (inventory, based on questionnaire), aimed at three domains (populations, tourism / border crossings, agriculture) Data feasibility: exploration of combining sources for these domains Data combination: experiments Future perspectives: suggest pilots for 2018 /

28 Combining Sources Big Data sources Administra tive data Statistical data may enrich statistical output in domains: 28

29 WP 8: Methodology, Quality and IT WP leader: Partners: Netherlands Austria, Bulgaria, Italy, Poland, Portugal, Slovenia Literature overview Quality of big data Big data and IT Big data methodology 29

30 Main Aspects Identified Quality IT Methodology coverage metadata management assessing accuracy comparability processing life cycle final product definition processing errors format of processing spatial dimension chain control datahub changes in data sources linkability data source integration machine learning measurement errors infrastructure data linkage model errors; precision secure and tested APIs multi-party computation shared libraries; standards inference data lakes sampling training, skills and knowledge data process architecture speed of algorithms unit identification 30

31 Acknowledgements ESSnet BD I WP leaders: Project secretary: Review Board: Anke Consten Piet Daas Marc Debusschere Maiki Ilves Boro Nikic Anna Nowicka David Salgado Monica Scannapieco Nigel Swier Martin van Sebille Lilli Japec Anders Holmberg Faiz Alsuhail Project officer Eurostat: Albrecht Wirthmann 31

32 From the ESSnet BD I to the ESSnet BD II Assessment of results: Can and will the results be used, i.e. implemented? Does the ESSnet have an ESS wide impact? Timeline: End of ESSnet BD I: May 2018 Call for proposal: May 2018 Proposal submitted: September 2018 Evaluation of proposal: October 2018 Start of ESSnet BD II: November 2018 End of ESSnet BD II: December

33 Partner countries of the ESSnet BD II 33

34 ESSnet BD II: Track 1 of 3 Track 1: Implementation Online Job Vacancies (12 partners) Enterprise Characteristics (8) Smart Energy (4) Tracking Ships (3) Process and Architecture (8) 34

35 ESSnet BD II: Track 2 of 3 Track 2: New pilot projects Financial Transactions Data (6) Earth Observation (9) Mobile Networks Data (9) Innovative Tourism Statistics (8) Methodology and Quality (6) 35 35

36 ESSnet BD II: Track 2 of 3 Track 3: Preparing Smart Statistics Smart Farming, Smart Cities, Smart Devices, Smart Traffic (12) Duration: 12 months 36

37 Overview of Work Packages WP WP name WP leader Country WPA Coordination and Communication Peter Struijs NL Marc Debusschere (deputy) BE WPB Online Job Vacancies Tomaž Speh SI WPC Enterprise Characteristics Galya Stateva BG WPD Smart Energy Arko Kesküla EE WPE Tracking Ships Anke Consten NL WPF Process and Architecture Monica Scannapieco IT WPG Financial Transactions Data Johan Fosen NO WPH Earth Observation Marek Morze PL WPI Mobile Networks Data David Salgado ES WPJ Innovative Tourism Statistics Marek Cierpiał-Wolan PL WPK Methodology and Quality Alexander Kowarik AT WPL Preparing Smart Statistics Natalie Rosenski DE 37

38 Conclusions Approach very successful Increased ambitions for coming years Implement results obtained so far Start with trusted smart statistics Challenges Data access, privacy, methods, implementation, etc. The ESS dimension Support and commitment Commitment at all levels High interest in participation Recognition of relevance 38

39 Questions? Thank you for your attention! 39