A NON-GEEK S BIG DATA CHEAT SHEET: FIVE QUESTIONS FOR SAVVY TECHNOLOGY LEADERS

Size: px
Start display at page:

Download "A NON-GEEK S BIG DATA CHEAT SHEET: FIVE QUESTIONS FOR SAVVY TECHNOLOGY LEADERS"

Transcription

1 A NON-GEEK S BIG DATA CHEAT SHEET: FIVE QUESTIONS FOR SAVVY TECHNOLOGY LEADERS TAMARA DULL, DIRECTOR OF EMERGING TECHNOLOGIES

2 Big data is not new. POS DATA CRM FINANCIAL DATA LOYALTY CARD DATA WORD TROUBLE TICKETS DOCUMENTS 20% GPS WEB LOG DATA PHOTOS SATELLITE IMAGES SOCIAL MEDIA DATA SPREAD- PDF FILES PROCESSING RFID TAGS SHEETS CLICK- BLOGS FORUMS STREAM VIDEOS XML DATA DATA 80% MOBILE DATA WEBSITE CONTENT RSS FEEDS AUDIO FILES CALL CENTER TRANSCRIPTS

3 TODAY S AGENDA

4 WHAT S TRENDING? PART 1 OF 3

5 The market is growing. SOURCE:

6 The success rate is okay, but not great.

7 People issues trump technology issues.

8 Analytics keep them coming back.

9 THE 5 QUESTIONS PART 2 OF 3

10 THE 5 QUESTIONS 1. What can Hadoop do that my data warehouse can t? 2. We re not doing big data, so why do we need Hadoop? 3. Is Hadoop enterprise-ready? 4. Isn t a data lake just the data warehouse revisited? 5. What are some of the pros and cons of a data lake?

11 QUESTION 1 WHAT CAN HADOOP DO THAT MY DATA WAREHOUSE CAN T? $ 1. Store data more cheaply. 2. Process data more quickly (and cheaply).

12 QUESTION 2 WE RE NOT DOING BIG DATA, SO WHY DO WE NEED HADOOP? Stage structured data. Process structured data. Archive any data. Process any data. Access any data. (via data warehouse) Access any data. (via Hadoop)

13 QUESTION 3 IS HADOOP REALLY ENTERPRISE-READY? For your organization: Maybe For all organizations: No Are we there yet?

14 QUESTION 4 ISN T A DATA LAKE JUST THE DATA WAREHOUSE REVISITED? DATA WAREHOUSE vs. DATA LAKE structured, processed DATA structured / semistructured / unstructured, raw schema-on-write PROCESSING schema-on-read expensive for large data volumes STORAGE designed for low-cost storage less agile, fixed configuration AGILITY mature SECURITY maturing highly agile, configure and reconfigure as needed business professionals USERS data scientists et. al.

15 QUESTION 5 WHAT ARE SOME OF THE PROS AND CONS OF A DATA LAKE? strengths lower costs one-stop data shopping weaknesses data management security opportunities discovery advanced analytics threats status quo skills

16 A COMPARISON & CONTRAST EXERCISE PART 3 OF 3

17 COMPARISON & CONTRAST A FUNCTIONAL COMPARISON: TRADITIONAL & BIG DATA Business Requirements Traditional Big Data Discovery of unexplored business questions Clean, transformed, high-quality aggregated data Low latency, interactive reports, OLAP High volumes of raw, highly granular, unstructured data Exploratory analysis of preliminary data

18 COMPARISON & CONTRAST A COST COMPARISON: THE TCOD MODEL Challenge: Which platform is the most cost-effective EDW or Hadoop? The Total Cost of Data (TCOD) model: Calculates the cost of using data over a 5- year period Includes these costs:» System and data administration» Data integration» Query development» Procedural program development» Analytic application development Free downloads: Special Report: Spreadsheet:

19 COMPARISON & CONTRAST TCOD EXAMPLE 1: BUILDING A DATA WAREHOUSE Requirements: Large number of data sources, users, complex queries, analyses and analytic applications Data integration and integrity Reusability and agility to accommodate rapidly changing business requirements and long data life Data volume: 500 TB Source: Special Report Big Data: What Does It Really Cost?, Wintercorp, 2013

20 COMPARISON & CONTRAST TCOD EXAMPLE 2: BUILDING A DATA REFINERY Objective: Refine the sensor output of large industrial diesel engines Requirements: Rapid, intensive processing of a small number of closely-related data sets Analysis reads the entire dataset Life of the raw data is relatively short Small group of experts collaborate on analysis Data volume: 500 TB Source: Special Report Big Data: What Does It Really Cost?, Wintercorp, 2013

21 COMPARISON & CONTRAST A COST COMPARISON: TCOD 5-YEAR SUMMARY (IN USD$) Cost Example 1: Data Warehouse Data Warehouse Platform Hadoop Example 2: Data Refinery Data Warehouse Appliance Hadoop System Cost $44.6 $1.4 Initial acquisition $10.8 $0.2 Upgrades $16.4 $0.3 Maintenance/support $15.9 $0.2 Power/space/cooling $1.5 $0.6 Administration $7.7 $8.5 Application development $16.5 $36.0 ETL $ Complex queries $88.7 $475.0 Analysis $88.7 $219.0 Total Cost of Data $265.0 million $740.0 million HADOOP 3X MORE EXPENSIVE $22.7 $1.4 $5.5 $0.2 $8.4 $0.3 $8.2 $0.2 $0.6 $0.7 $0.8 $0.8 $6.6 $ $30.0 million $9.3 million HADOOP 1/3 rd THE COST Source: Special Report Big Data: What Does It Really Cost?, Wintercorp, 2013

22 KEY TAKEAWAYS

23

24 IT S A BIG DATA WORLD OUT THERE. NOW LET S BE SAFE. Tamara Dull sas.com