Data-Powered Clouds: Challenges for Data Management, Cloud Computing, and Software

Size: px
Start display at page:

Download "Data-Powered Clouds: Challenges for Data Management, Cloud Computing, and Software"

Transcription

1 Data-Powered Clouds: Challenges for Data Management, Cloud Computing, and Software Marcos Vaz Salles Assistant Professor, University of Copenhagen (DIKU)

2 About the Speaker Marcos Vaz Salles Assistant Professor, University of Copenhagen (DIKU) Postdoc: Cornell University PhD: ETH Zurich Mission: Find creative ways to expand the reach of the 30+ years of top-level R&D invested in database technology, broadly defined Examples: Database techniques for search and integration, games, simulations, geospatial data 2

3 Where does your most important data live? 3

4 Where does your most important data live? DATABASES! 4

5 Historical Justification for Databases 5

6 Historical Justification for Databases Common applications Record maintenance, banking, government Complex implementation Concurrency, integrity, durability, storage, representation, Enough abstraction Operating systems virtualize low-level hardware Competing platforms No virtualization of platform: IBM, DEC, Data-Driven Applications Data Sharing (DBMS) Virtualization (Operating Systems) Platforms (Hardware) 6

7 Historical Justification for Databases Common applications Record maintenance, banking, government Complex implementation Concurrency, integrity, durability, storage, representation, Enough abstraction Operating systems virtualize low-level hardware Competing platforms No virtualization of platform: IBM, DEC, Data-Driven Applications But the Cloud today is completely different?! Data Sharing (DBMS) Virtualization (Operating Systems) Platforms (Hardware) 7

8 The Cloud Today Common applications Web Services, Data Warehousing, Big Data Complex implementation Data consistency and management, distribution, scalability, fault tolerance Enough abstraction Cloud IaaS virtualizes enormous clusters of machines Competing platforms No virtualization of platform: Amazon, Microsoft, Data-Driven Applications Data Sharing (????) Virtualization (Cloud IaaS) Platforms (Cloud Datacenter) 8

9 The Cloud Today Common applications Web Services, Data Warehousing, Big Data Complex implementation Data consistency and management, distribution, scalability, fault tolerance Enough abstraction Cloud IaaS virtualizes enormous clusters of machines Competing platforms No virtualization of platform: Amazon, Microsoft, Data-Driven Applications Challenge: What Data Sharing (????) should be the new Data Sharing Virtualization (Cloud IaaS) Abstraction in the Cloud? Platforms (Cloud Datacenter) 9

10 From Databases to Dataclouds While there were databases in the past, we will have dataclouds in the future Databases à Database Management System (DBMS) Dataclouds à Datacloud Management System (DCMS) Emerging application systems already being built! But at high cost And with less features than desired 10

11 Emerging Datacloud Application Systems Programmable news services Example: Guardian.co.uk Open Platform & MicroApps Programmable social networks Example: Apps on Facebook Programmable CRM Example: Salesforce Platform Far-fetched (?!) future Programmable government Programmable banking Programmable whoever-has-data 11

12 Emerging Datacloud Application Systems Programmable news services Example: Guardian.co.uk Open Platform & MicroApps Programmable social networks Example: Apps on Facebook Programmable CRM Example: Salesforce Platform Far-fetched (?!) future Programmable government Programmable banking Programmable whoever-has-data Data is a new means of production 12

13 Challenges in Dataclouds and DCMS Programming, programming, programming Resources, resources, resources Scale, scale, scale 13

14 Challenges in Dataclouds and DCMS Programming, programming, programming Re-use or create new programming abstractions? How to incorporate data into software engineering? Resources, resources, resources How to deal with virtualized environments and abstract cost? Scale, scale, scale How to scale applications to petabytes automatically? 14

15 Challenges in Dataclouds and DCMS Programming, programming, programming Re-use or create new programming abstractions? How to incorporate data into software engineering? Resources, resources, resources How to deal with virtualized environments and abstract cost? Scale, scale, scale How to scale applications to petabytes automatically? 15

16 Challenges in Programming Re-use or create new programming abstractions? Skills gap Europe: IT skills shortage 700,000 by 2015 CompTIA: Data management top-3 in skills gaps concerns Programmers vs. data management skills Algorithmic vs. declarative Objects vs. (multi)sets Data structures vs. physical data independence Not clear how to bridge the gap 16

17 Challenges in Programming Re-use or create new programming abstractions? Technology gap MapReduce vs. Parallel SQL Databases Loose structure vs. strong typing Ease of use vs. efficiency Not clear how to bridge the gap 17

18 Challenges in Programming How to incorporate data into software engineering? Data-Software gap Large data collections available Machine Learning techniques maturing Software architecture vs. adaptive learning system Modular vs. data-driven Not clear how to bridge the gap 18

19 Wrap-up Dataclouds and DCMS New abstractions for data sharing in cloud environments Avoid complex re-implementation Challenges require new R&D Programming, programming, programming Resources, resources, resources Scale, scale, scale Better use of human resources Skills gap 19 Technology gap Data-Software gap

20 Wrap-up Dataclouds and DCMS New abstractions for data sharing in cloud environments Avoid complex re-implementation Thank you! Challenges require new R&D Programming, programming, programming Resources, resources, resources Scale, scale, scale Better use of human resources Skills gap 20 Technology gap Data-Software gap

21 Backup Slides 21

22 Challenges in Dataclouds and DCMS Programming, programming, programming Re-use or create new programming abstractions? How to incorporate data into software engineering? Resources, resources, resources How to deal with virtualized environments and abstract cost? Scale, scale, scale How to scale applications to petabytes automatically? 22

23 Challenges in Resources and Scalability How to deal with virtualized environments and abstract cost? Abstraction gap Costs for parallel / virtualized execution hard to model Abstraction vs. cost-based reasoning Sequential vs. parallel semantics Not clear how to bridge the gap How to scale applications to petabytes automatically? Expressivity gap 23 Ideally, reuse of investment in database technology However, not obvious how to SQL-ize whole applications Modularization vs. scalability Large software vs. large data Not clear how to bridge the gap