Unlocking Open Data in the Cloud

Size: px
Start display at page:

Download "Unlocking Open Data in the Cloud"

Transcription

1 Unlocking Open Data in the Cloud Grischa Gundelsweiler Public Sector Account Manager, DACH Loft + Lab Munich 11 th November , Amazon Web Services, Inc. or its Affiliates. All rights reserved.

2 What this session is about 1) Open Data: Concepts, Examples & Trends 2) AWS as a Platform for Open Data 3) Case Study: Provide Open Data on AWS 4) Case Study: Use Open Data on AWS 2

3 Open Data: Concepts, Examples & Trends 3

4

5 Open data is data that can be freely used, shared and builton by anyone, anywhere, for any purpose. Definition by Open Knowledge Foundation,

6 The 8 Open Government Data Principles 1. Complete 2. Primary 3. Timely 4. Accessible 5. Machine processable 6. Non-discriminatory 7. Non-proprietary 8. License-free OGD Principles

7 Why Open Data? 1. Transparency 2. Releasing social and commercial value 3. Participation and engagement

8 8 McKinsey report from October

9 9 EC study from November 2015: Creating Value through Open Data: Study on the Impact of Re-use of Public Data Resources

10 10 Open Data Portal of Deutsche Bahn

11 12

12 14

13 15

14 16

15 17

16 AWS as a Platform for Open Data 18

17 Why does AWS care about Open Data? Sharing data makes it accessible to a large and growing community of researchers, entrepreneurs, and enterprises. Many of our commercial sector customers rely on quality open data as much as they rely on our cloud infrastructure services. Many of our public sector customers use AWS to make their data available to a global community of researchers, entrepreneurs, students, and fellow government agencies. 19

18 Data Acquisition in the Cloud The cloud allows users from anywhere to take their algorithms to data rather than downloading data to their computing resources. 20

19 Open data as a platform Complex APIs Consumer applications Lower cost of knowledge (Efficiency) Data-driven journalism Sensemaking Basic APIs Data Catalogs Focused data dashboards Predictive modeling Algorithmic policy Data at Rest (Object storage) Visualizations Data Creation Data Enrichment 21

20 A Rich Set of Programmable Services Technical and Business Support Support Professional Services Partner Ecosystem Training and Certification Solutions Architects Account Management Security and Pricing Reports Enterprise Applications Virtual Desktop Sharing and Collaboration Business Analytics App Services Developer Tools and Operations Mobile Services Hadoop Queuing and Notifications Transcoding Deployment Resource Templates Identity Platform Services Real-Time Streaming Data Data Warehouse Workflow DevOps Containers Sync Mobile Analytics Data Pipelines App Streaming Search Application Lifecycle Management Event-Driven Computing Push Notifications Administration and Security Identity Management Access Control Resource and Usage Auditing Key Management and Storage Monitoring and Logs Core Services Compute (VMs, Auto-Scaling and Load Balancing) Storage (Object, Block, and Archival) CDN Databases (Relational, NoSQL, and Caching) Networking (VPC, DX, and DNS) Infrastructure Regions Availability Zones Points of Presence 22

21 24 Providing Open Data on AWS

22 Case Study: Transport for London 25 graphics from TfL, October 2016

23 Why open data at TfL? Transparency Reach Optimal use of transport network Economic benefit Innovation 26

24 Available Datasets The API supports all the data requirements of the TfL website. Every data-driven aspect of the website (including maps) is powered by the unified API. 27 Some of the multi-modal core datasets included and available to developers are: Journey Planning (current and future) Status (current and future) Disruptions (current) and Planned works (future) Arrival/departure predictions (instant and websockets) Timetables Embarkation points and facilities Routes and lines (topology and geographical) Fares

25 London Munich Almost 500 apps produced. Playground for innovation. Improving transportation, collaboratively. 28 Apps by public transportation authorities: MVV, MVG, DB. No info how to access data, lacks documentation.

26 29 graphic from TfL, October 2016

27 Outcomes Cloud Benefits Customers save time, economic benefits New jobs and investments in startup and tech ecosystem Usage of data has since doubled Data consolidation and quality Pay for what you use Lower maintenance costs Elasticity Automation and consistency Blue/green deployment zero downtime Highly secure 30 mwd advisors cased study

28 Solutions for providing Open Data on AWS Open data platforms Catalog Publish Discover Visualize Analyze Share 31

29 32 Using Open Data on AWS

30 Public Data Sets on AWS Several high-value datasets are available for anyone to access for free on AWS. Examples include: 3K Rice Genome Landsat on AWS NEXRAD on AWS 33

31 More available Public Datasets on AWS GDELT: Over a quarter-billion records monitoring the world's broadcast, print, and web news from nearly every corner of every country, updated daily.. IRS 990 Filings on AWS: Machine-readable data from certain electronic 990 forms filed with the IRS from 2011 to present Common Crawl Corpus: A corpus of web crawl data composed of over 5 billion web pages TCGA on AWS: Raw and processed genomic, transcriptomic, and epigenomic data from The Cancer Genome Atlas (TCGA) available to qualified researchers via the Cancer Genomics Cloud ICGC on AWS: Whole genome sequence data available to qualified researchers via The International Cancer Genome Consortium (ICGC) 1000 Genomes Project: A detailed map of human genetic variation Multimedia Commons: A collection of nearly 100M images and videos with audio and visual features and annotations Google Books Ngrams: A dataset containing Google Books n-gram corpuses A list of other Public Datasets is available here. 34

32 35

33 Accessing and processing Landsat data What is Landsat on AWS? How to access Landsat on AWS? How to use Landsat on AWS? 36

34 Landsat on AWS We have committed to make up to 1 petabyte of Landsat imagery readily available as objects on Amazon S3. All Landsat 8 scenes from 2015 and 2016 are available, along with a selection of cloud-free scenes from 2013 and All new Landsat 8 scenes are made available each day (~700 per day), often within hours of production. 37

35 Landsat on AWS Landsat on AWS makes each band of each scene readily available as objects on Amazon S3. Data can be accessed programmatically via HTTP and quickly deployed to any of our products for analysis and processing. USGS.tar Amazon EC2 Users do not need to worry about local storage and have access to virtually unlimited computing power on demand..tiff s3://landsat-pds 38

36 Undifferentiated heavy lifting We use GDAL to add internal tiling on each Landsat on AWS tiff, which allows developers to use HTTP range gets to access specific portions of each scene. This allows people to only access the data they need when they need it Standard tiff object Internal tiled tiff object 39

37 Think of URLs instead of copies Wellington, New Zealand RGB Visible light Infrared Vegetation Shortwave infrared Urban areas

38 Using Landsat on S3 reliable, performant data access Landsat on Amazon S3 ArcGIS Server on Amazon EC2 user AWS US West Oregon Region

39 Landsat on AWS Sentinel-2 on AWS Usage in the first year: Over 400,000 scenes available Over 1 billion hits globally Used for new product development by: Small invest, big impact: Public dataset hosted in FRA Apps for agriculture, disaster relief, vegetation monitoring, property taxation,.. Used for new product development by: 42

40 Next steps Depending on your role, your goals Use open data in your projects / your organisation Provide open data from your organisation Build a new business on open data AWS offers Technology platform that constantly evolves Enablement through workshops, training, ProServ Customer and partner ecosystem to connect and build 44

41 Thank you! Grischa Gundelsweiler