A NON-GEEK S BIG DATA CHEAT SHEET: FIVE QUESTIONS FOR SAVVY TECHNOLOGY LEADERS

Similar documents
Transcription:

A NON-GEEK S BIG DATA CHEAT SHEET: FIVE QUESTIONS FOR SAVVY TECHNOLOGY LEADERS TAMARA DULL, DIRECTOR OF EMERGING TECHNOLOGIES #SASGIS16 @tamaradull

Big data is not new. POS DATA CRM FINANCIAL DATA LOYALTY CARD DATA WORD TROUBLE TICKETS DOCUMENTS 20% GPS WEB LOG DATA PHOTOS SATELLITE IMAGES SOCIAL MEDIA DATA SPREAD- EMAIL PDF FILES PROCESSING RFID TAGS SHEETS CLICK- BLOGS FORUMS STREAM VIDEOS XML DATA DATA 80% MOBILE DATA WEBSITE CONTENT RSS FEEDS AUDIO FILES CALL CENTER TRANSCRIPTS

TODAY S AGENDA

WHAT S TRENDING? PART 1 OF 3

The market is growing. SOURCE: http://wikibon.org/wiki/v/big_data_vendor_revenue_and_market_forecast_2013-2017

The success rate is okay, but not great.

People issues trump technology issues.

Analytics keep them coming back.

THE 5 QUESTIONS PART 2 OF 3

THE 5 QUESTIONS 1. What can Hadoop do that my data warehouse can t? 2. We re not doing big data, so why do we need Hadoop? 3. Is Hadoop enterprise-ready? 4. Isn t a data lake just the data warehouse revisited? 5. What are some of the pros and cons of a data lake?

QUESTION 1 WHAT CAN HADOOP DO THAT MY DATA WAREHOUSE CAN T? $ 1. Store data more cheaply. 2. Process data more quickly (and cheaply).

QUESTION 2 WE RE NOT DOING BIG DATA, SO WHY DO WE NEED HADOOP? Stage structured data. Process structured data. Archive any data. Process any data. Access any data. (via data warehouse) Access any data. (via Hadoop)

QUESTION 3 IS HADOOP REALLY ENTERPRISE-READY? For your organization: Maybe For all organizations: No Are we there yet?

QUESTION 4 ISN T A DATA LAKE JUST THE DATA WAREHOUSE REVISITED? DATA WAREHOUSE vs. DATA LAKE structured, processed DATA structured / semistructured / unstructured, raw schema-on-write PROCESSING schema-on-read expensive for large data volumes STORAGE designed for low-cost storage less agile, fixed configuration AGILITY mature SECURITY maturing highly agile, configure and reconfigure as needed business professionals USERS data scientists et. al.

QUESTION 5 WHAT ARE SOME OF THE PROS AND CONS OF A DATA LAKE? strengths lower costs one-stop data shopping weaknesses data management security opportunities discovery advanced analytics threats status quo skills

A COMPARISON & CONTRAST EXERCISE PART 3 OF 3

COMPARISON & CONTRAST A FUNCTIONAL COMPARISON: TRADITIONAL & BIG DATA Business Requirements Traditional Big Data Discovery of unexplored business questions Clean, transformed, high-quality aggregated data Low latency, interactive reports, OLAP High volumes of raw, highly granular, unstructured data Exploratory analysis of preliminary data

COMPARISON & CONTRAST A COST COMPARISON: THE TCOD MODEL Challenge: Which platform is the most cost-effective EDW or Hadoop? The Total Cost of Data (TCOD) model: Calculates the cost of using data over a 5- year period Includes these costs:» System and data administration» Data integration» Query development» Procedural program development» Analytic application development Free downloads: Special Report: http://www.wintercorp.com/tcod-report Spreadsheet: http://www.wintercorp.com/tcod-spreadsheet

COMPARISON & CONTRAST TCOD EXAMPLE 1: BUILDING A DATA WAREHOUSE Requirements: Large number of data sources, users, complex queries, analyses and analytic applications Data integration and integrity Reusability and agility to accommodate rapidly changing business requirements and long data life Data volume: 500 TB Source: Special Report Big Data: What Does It Really Cost?, Wintercorp, 2013

COMPARISON & CONTRAST TCOD EXAMPLE 2: BUILDING A DATA REFINERY Objective: Refine the sensor output of large industrial diesel engines Requirements: Rapid, intensive processing of a small number of closely-related data sets Analysis reads the entire dataset Life of the raw data is relatively short Small group of experts collaborate on analysis Data volume: 500 TB Source: Special Report Big Data: What Does It Really Cost?, Wintercorp, 2013

COMPARISON & CONTRAST A COST COMPARISON: TCOD 5-YEAR SUMMARY (IN USD$) Cost Example 1: Data Warehouse Data Warehouse Platform Hadoop Example 2: Data Refinery Data Warehouse Appliance Hadoop System Cost $44.6 $1.4 Initial acquisition $10.8 $0.2 Upgrades $16.4 $0.3 Maintenance/support $15.9 $0.2 Power/space/cooling $1.5 $0.6 Administration $7.7 $8.5 Application development $16.5 $36.0 ETL $18.4 -- Complex queries $88.7 $475.0 Analysis $88.7 $219.0 Total Cost of Data $265.0 million $740.0 million HADOOP 3X MORE EXPENSIVE $22.7 $1.4 $5.5 $0.2 $8.4 $0.3 $8.2 $0.2 $0.6 $0.7 $0.8 $0.8 $6.6 $7.2 -- -- -- -- -- -- $30.0 million $9.3 million HADOOP 1/3 rd THE COST Source: Special Report Big Data: What Does It Really Cost?, Wintercorp, 2013

KEY TAKEAWAYS

IT S A BIG DATA WORLD OUT THERE. NOW LET S BE SAFE. Tamara Dull tamara.dull@sas.com @tamaradull sas.com