The Intersection of Big Data and DB2

Similar documents
From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN

Big Data Live selbst analysieren

Big Data at the Speed of Business IBM Innovations for a new era! Rob Thomas Vice President, Big Data Sales IBM Software Group, Information Management

IBM Big Data Summit 2012

Bringing the Power of SAS to Hadoop Title

ActualTests.C Q&A C Foundations of IBM Big Data & Analytics Architecture V1

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

Harnessing the Power of Big Data to Transform Your Business Anjul Bhambhri VP, Big Data, Information Management, IBM

Microsoft Azure Essentials

Big Data The Big Story

Operational Hadoop and the Lambda Architecture for Streaming Data

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11

InfoSphere Warehousing 9.5

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

Business Analytics and Optimization An IBM Growth Priority

InfoSphere Warehouse. Flexible. Reliable. Simple. IBM Software Group

Microsoft Big Data. Solution Brief

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop

Real-time Streaming Insight & Time Series Data Analytic For Smart Retail

Hadoop Integration Deep Dive

Analyze Big Data Faster and Store it Cheaper. Dominick Huang CenterPoint Energy Russell Hull - SAP

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

Exploring the Benefits of the Modernized Data Warehouse Philip Russom

IBM Software IBM InfoSphere BigInsights

ETL on Hadoop What is Required

DLT AnalyticsStack. Powering big data, analytics and data science strategies for government agencies

Evolution to Revolution: Big Data 2.0

HP SummerSchool TechTalks Kenneth Donau Presale Technical Consulting, HP SW

Big Data: Essential Elements to a Successful Modernization Strategy

SAS & HADOOP ANALYTICS ON BIG DATA

Hortonworks Powering the Future of Data

Hybrid Data Management

Modernizing Data Integration

Cloud Based Analytics for SAP

ETL challenges on IOT projects. Pedro Martins Head of Implementation

Deloitte School of Analytics. Demystifying Data Science: Leveraging this phenomenon to drive your organisation forward

Oracle Big Data Discovery The Visual Face of Big Data

Real-Time Streaming: IMS to Apache Kafka and Hadoop

The Mainframe s Relevance in the Digital World

SAP Big Data. Markus Tempel SAP Big Data and Cloud Analytics Services

What s Happening to the Mainframe? Mobile? Social? Cloud? Big Data?

Jason Virtue Business Intelligence Technical Professional

E-Guide THE EVOLUTION OF IOT ANALYTICS AND BIG DATA

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

S/4 HANA Introduction & Roadmap. Dr. Bjoern Ganzhorn Enterprise Architect - SAP Americas Inc.

The disruptive power of big data

Architecture Overview for Data Analytics Deployments

Bringing Big Data to Life: Overcoming The Challenges of Legacy Data in Hadoop

Copyright - Diyotta, Inc. - All Rights Reserved. Page 2

Simplifying Your Modern Data Architecture Footprint

Architected Blended Big Data With Pentaho. A Solution Brief

Data Strategy: How to Handle the New Data Integration Challenges. Edgar de Groot

Hadoop and Analytics at CERN IT CERN IT-DB

Building Your Big Data Team

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB

Brian Macdonald Big Data & Analytics Specialist - Oracle

Information Server 11.3 Overview. Kevin D Silva Client Technical Professional, InfoSphere Information Server

The Evolution of Big Data

ARCHITECTURES ADVANCED ANALYTICS & IOT. Presented by: Orion Gebremedhin. Marc Lobree. Director of Technology, Data & Analytics

Can Advanced Analytics Improve Manufacturing Quality?

How Data Science is Changing the Way Companies Do Business Colin White

Your Big Data to Big Data tools using the family of PI Integrators

Session 30 Powerful Ways to Use Hadoop in your Healthcare Big Data Strategy

INTRODUCING BIRST INFOR S GO-FORWARD BUSINESS INTELLIGENCE SOLUTION

Sr. Sergio Rodríguez de Guzmán CTO PUE

Big Business Value from Big Data and Hadoop

Reduce Money Laundering Risks with Rapid, Predictive Insights

Big Data Analytics for Retail with Apache Hadoop. A Hortonworks and Microsoft White Paper

NEW VALUE FOR THE FUTURE

Big Data & Hadoop Advance

Oracle Big Data Cloud Service

Got Data Silos? Automate Data Ingestion Into Isilon In Support Of Analytics

Analytics empowering clients to see farther & go faster

More information for FREE VS ENTERPRISE LICENCE :

Big Data Anwendungsfälle aus dem Bereich der digitalen Medien

Mike Strickland, Director, Data Center Solution Architect Intel Programmable Solutions Group July 2017

IBM i2 Enterprise Insight Analysis

Microsoft Dynamics ERP. Success for your business. Success for you.

Enterprise Architecture for Digital Business

An Effective Convergence of Analytics and Geography

IBM Corporation

Transforming IIoT Data into Opportunity with Data Torrent using Apache Apex

Transforming Big Data to Business Benefits

Leveraging Oracle Big Data Discovery to Master CERN s Data. Manuel Martín Márquez Oracle Business Analytics Innovation 12 October- Stockholm, Sweden

THE DATA WAREHOUSE EVOLVED: A FOUNDATION FOR ANALYTICAL EXCELLENCE

In-Memory Analytics: Get Faster, Better Insights from Big Data

White Paper. Five industries where big data is making a difference

2018 Big Data Trends: Liberate, Integrate & Trust

SAS ANALYTICS AND OPEN SOURCE

BullSequana S series. Powering Enterprise Artificial Intelligence

Cloud Object Storage And The Use Of Gateways

THE MAGIC OF DATA INTEGRATION IN THE ENTERPRISE WITH TIPS AND TRICKS

Let s distribute.. NOW: Modern Data Platform as Basis for Transformation and new Services

White Paper: SAS and Apache Hadoop For Government. Inside: Unlocking Higher Value From Business Analytics to Further the Mission

WHITE PAPER SPLUNK SOFTWARE AS A SIEM

1. Intoduction to Hadoop

Microsoft BI Product Suite

DELL EMC HADOOP SOLUTIONS

Internet of Things. Point of View. Turn your data into accessible, actionable insights for maximum business value.

Big Data Trends to Watch

Transcription:

The Intersection of Big Data and DB2 May 20, 2014 Mike McCarthy, IBM Big Data Channels Development mmccart1@us.ibm.com

Agenda What is Big Data? Concepts Characteristics What is Hadoop Relational vs Hadoop Traditional vs Big Data Complementary Solutions and Big SQL Data Warehouse Augmentation Summary 2 2013 IBM Corporation

What is Big Data? All kinds of data Large volumes Valuable insight, but difficult to extract May be extremely time sensitive Big Data is a Hot Topic Because Technology Makes it Possible to Analyze ALL Available Data Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery and/or analysis. 3 Source: Matt Eastwood, IDC 2013 IBM Corporation

Characteristics of Big Data V 4 = Volume Velocity Variety Veracity Cost efficiently processing the growing Volume 50x 35 ZB Responding to the increasing Velocity 30 Billion RFID sensors and counting Collectively analyzing the broadening Variety 80% of the worlds data is unstructured 2010 2020 Establishing the Veracity of big data sources 1 in 3 business leaders don t trust the information they use to make decisions 4 2013 IBM Corporation

Big Data Sources Transactional & Application Data Machine Data Social Data Enterprise Content Volume Velocity Variety Variety Structured Semi-structured Highly unstructured Highly unstructured Throughput Ingestion Veracity Volume

Where Is This Data Coming From? 12+ TBs of tweet data every day 30 billion RFID tags today (1.3B in 2005) 4.6 billion camera phones world wide? TBs of data every day 100s of millions of GPS enabled devices sold annually 500+ TBs of log data every day 76 million smart meters in 2009 200M by 2014 2+ billion people on the Web by end 2011 6 2013 IBM Corporation

Sources of Big Data Big data sources Respondents were asked which data sources are currently being collected and analyzed as part of active big data efforts within their organization.

What is Hadoop? Open source project Written in Java Optimized to handle Massive amounts of data through parallelism A variety of data (structured, unstructured, semi-structured) Using inexpensive commodity hardware Great performance Reliability provided through replication Not for OLTP, not for OLAP/DSS, good for Big Data Current version: 2.2.0

Hadoop / MapReduce timeline

Two Key Aspects of Hadoop MapReduce framework How Hadoop understands and assigns work to the nodes (machines) Hadoop Distributed File System = HDFS Where Hadoop stores data A file system that spans all the nodes in a Hadoop cluster It links together the file systems on many local nodes to make them into one big file system

Hadoop is not for all types of work Not to process transactions (random access) Not good when work cannot be parallelized Not good for low latency data access Not good for processing lots of small files Not good for intensive calculations with little data But, the technology is evolving 11 2013 IBM Corporation

Big Data Core Use Cases Resulting from High Value Initiatives Big Data Exploration - Find, visualize, understand ALL big data to improve business knowledge Complete View of the Customer - Achieve a true unified view, incorporating internal and external sources, to drive positive interactions Security/ Intelligence - Lower risk, detect fraud and monitor cyber security in realtime 12 IT Operations Analysis - Analyze a variety of machine data for improved business results Data Warehouse Augmentation -Integrate big data and data warehouse capabilities to increase operational efficiency

The IBM Big Data Platform Process any type of data Structured, unstructured, inmotion, at-rest Built-for-purpose engines Designed to handle different requirements Analyze data in motion Manage and govern data in the ecosystem Enterprise data integration Grow and evolve on current infrastructure Solutions Analytics and Decision Management Visualization & Discovery Hadoop System IBM Big Data Platform Application Development Accelerators Stream Computing Information Integration & Governance Big Data Infrastructure Systems Management Data Warehouse

RDBMS vs Hadoop RDBMS Hadoop Data sources Structured data with known schemas Data type Records, long fields, objects, XML Files Data Updates Updates allowed Unstructured and structured Only inserts and deletes Language SQL & XQuery Pig (Pig Latin), Hive (HiveQL), Jaql Processing type Data integrity Quick response, random access Data loss is not acceptable Security Security and auditing Partial Batch processing Data loss can happen sometimes Compress Sophisticated data compression Simple file compression Hardware Enterprise hardware Commodity hardware Data access Random access (indexing) History ~40 years of innovation < 5 years old Access files only (streaming) Community Widely used, abundant resources Not widely adopted yet 14 2013 IBM Corporation

Warehouse vs Hadoop Data Warehouse Hadoop Data sources Structured, high value data. Pre - Processed Data type Records, long fields, objects, XML Files Data Updates Updates allowed Unstructured and structured Only inserts and deletes Language Vendor specific Pig (Pig Latin), Hive (HiveQL), Jaql Processing type Data integrity Batch Processing Data loss is not acceptable Security Security and auditing Partial Batch processing Data loss can happen sometimes Compress Sophisticated data compression Simple file compression Hardware Enterprise hardware Commodity hardware Data access Random access (indexing) History ~20 years of innovation < 5 years old Access files only (streaming) Community Widely used, abundant resources Not widely adopted yet 15 2013 IBM Corporation

Merging the Traditional and Big Data Approaches Traditional Approach Structured & Repeatable Analysis Big Data Approach Iterative & Exploratory Analysis Business Users Determine what question to ask IT Delivers a platform to enable creative discovery IT Structures the data to answer that question Business Explores what questions could be asked Monthly sales reports Profitability analysis Customer surveys Brand sentiment Product strategy Maximum asset utilization 16 2014 2013 IBM IBM Corporation

Big Difference: Schema on Run Regular database Schema on load Big Data (Hadoop) Schema on run Raw data Raw data Schema to filter Storage (unfiltered, raw data) Schema to filter Storage (pre-filtered data) Output 17 2013 IBM Corporation

Complementary Analytics Traditional Approach Structured, analytical, logical New Approach Creative, holistic thought, intuition Transaction Data Data Warehouse Hadoop Streams Web Logs Internal App Data Structured Repeatable Mainframe Data Linear Monthly sales reports Profitability analysis Customer surveys OLTP System Data Structured Repeatable Linear Enterprise Integration Unstructured Exploratory Iterative Social Data Unstructured Exploratory Iterative Text Data: emails Brand sentiment Product strategy Maximum asset utilization Sensor data: images ERP data Traditional Sources New Sources RFID

Big SQL interface.... Rich SQL query capabilities SQL '92 and 2011 features Correlated subqueries Windowed aggregates Application SQL Language JDBC / ODBC Driver SQL access to all data stored in InfoSphere BigInsights JDBC / ODBC Server Robust JDBC/ODBC support SQL interface Engine Take advantage of key features of each data source Leverage MapReduce parallelism OR achieving low-latency Data Sources HiveTables HBase tables CSV Files InfoSphere BigInsights

The challenge: spreading data transformation and analytic components across multiple platforms can increase data latency, cost, complexity and governance risk Customer Interaction Data In Transactional Data DB2 for z/os IMS VSAM Non IBM Data Movement, Cleansing & Management Replicate, Integrate, Cleanse, Manage Data Warehousing Data Warehouse, Operational Data Store, Data Mart Data Analysis Business Intelligence, Predictive Analytics Business Insight Out zenterprise Off z platform

The Solution: DB2 Analytics Accelerator Hybrid Approach Traditional Approach to Analytic Systems Operational Applications Analytic Applications Combined Workloads Transaction Processing Data Store, Business Intelligence, Predictive Analytics Transactional Processing, Traditional Analytics & Business Critical Analytics Data transfer Shared Everything DB Latency? Security? Data Governance? Complexity? Shared Nothing DB Hybrid DB High volume business transactions and batch processing running concurrently Low volume complex queries and batch reporting Reduced Latency. Greater Security. Improved Data Governance. Reduced Complexity. High volume business transactions and batch reporting running concurrently with complex queries Delivering business critical analytics

Drivers for Enterprise Data Warehouse Augmentation Need to Leverage Variety of Data Impractical to Store all Data in the EDW Enterprise Data Warehouse Not Optimized Structured, semi-structured, unstructured, and streaming Low latency requirements (hours not weeks or months) Requires query access to data Improved Business Insights Cannot afford to store Big Data in the EDW Potential impact to normal OLAP EDW data volumes reaching Big Data levels A lot of low-touch, cold data Large portion of data in EDW not accessed frequently

New Architecture to Leverage All Data and Analytics Data in Motion Data at Rest Data in Many Forms Streams Information Ingestion and Operational Information Real-time Analytics Stream Processing Data Integration Master Data Video/Audio Network/Sensor Entity Analytics Predictive Landing Area, Analytics Zone and Archive Raw Data Structured Data Text Analytics Data Mining Entity Analytics Machine Learning Exploration, Integrated Warehouse, and Mart Zones Discovery Deep Reflection Operational Predictive Information Governance, Security and Business Continuity Intelligence Analysis Decision Management BI and Predictive Analytics Navigation and Discovery 23

Filter and Summarize Big Data for the Warehouse BigInsights can manage all enterprise data upon arrival Organizations can manipulate, analyze, and summarize incoming data BigInsights can be utilized as a source for a data warehouse Sift through large volumes of data Broaden analytic coverage without undue burden on systems Augment existing corporate data within warehouses Big data analytic applications BigInsights Traditional analytic tools Data warehouse Filter Transform Aggregate 24 2014 2013 IBM IBM Corporation

BigInsights as a Query-ready Archive for a Data Warehouse Allow firms to manage the size of their existing data management platforms Use BigInsights as a query-ready archive With frequently accessed data maintained in the warehouse and cold or outdated information offloaded to BigInsights Better manage the size and usability of data within the enterprise Traditional analytic tools Big Data analytic applications BigInsights 25 2014 2013 IBM IBM Corporation

A new architecture to leverage all Data has emerged. All Data Information ingestion and operational information zone Real-time analytics zone Exploration, landing and archive zone Enterprise warehouse data mart and analytic appliances zone Harness All Data & All Paradigms Information governance zone

The Big Data Ecosystem: Interoperability is Key Streaming Data Internet- Scale Data Sets Non-Traditional / Non-Relational Data Feeds Non-Traditional / Non-Relational Data Sources Traditional / Relational Data Sources Streams RTAP: Analytics on Data in Motion BigInsights Analytics on Data at Rest Data Explorer Platform Traditional Warehouse Traditional / Relational Data Sources Data Warehouse Analytics on Structured Data

Every Industry can Leverage Big Data and Analytics Banking Insurance Telco Energy & Utilities Media & Entertainment Optimizing Offers and Cross-sell Customer Service and Call Center Efficiency 360 View of Domain or Subject Catastrophe Modeling Fraud & Abuse Pro-active Call Center Network Analytics Location Based Services Smart Meter Analytics Distribution Load Forecasting/Scheduling Condition Based Maintenance Business process transformation Audience & Marketing Optimization Retail Travel & Transport Consumer Products Government Healthcare Actionable Customer Insight Merchandise Optimization Dynamic Pricing Customer Analytics & Loyalty Marketing Predictive Maintenance Analytics Shelf Availability Promotional Spend Optimization Merchandising Compliance Civilian Services Defense & Intelligence Tax & Treasury Services Measure & Act on Population Health Outcomes Engage Consumers in their Healthcare Automotive Chemical & Petroleum Aerospace & Defense Electronics Life Sciences Advanced Condition Monitoring Data Warehouse Optimization Operational Surveillance, Analysis & Optimization Data Warehouse Consolidation, Integration & Augmentation Uniform Information Access Platform Data Warehouse Optimization Customer/ Channel Analytics Advanced Condition Monitoring Increase visibility into drug safety and effectiveness

Resources www.bigdatauniversity.com www.ibmbigdatahub.com www.ibm.com/developerworks Big Data community Harnessing the Power of Big Data ebook https://ibm.biz/bdxkc8 Authors: Paul Zikopoulos, Thomas Deutsch, Dirk deroos, Krishnan Parasuraman, David Corrigan, James Giles Big Data Big Agriculture with IBM DB2 for z/os Replay of John Deere webcast url TBA Future of farming videos - http://www.youtube.com/watch?v=aeuprbxvfx8

Questions? 30

InfoSphere BigInsights v2.1 A Closer Look User Interfaces Integration More Than Hadoop Visualization Accelerators Text Analytics Dev Tools BigInsights Engine Admin Console Application Accelerators Databases Content Management Performance & workload optimizations Unique text analytic engines Spreadsheet-style visualization for data discovery & exploration Map Reduce + Indexing Built-in IDE & admin consoles Workload Mgmt Security Information Governance Enterprise-class security Apache Hadoop High-speed connectors to integration with other systems Analytical accelerators

IBM InfoSphere Streams v3.1 A platform for real-time analytics on BIG data Volume Terabytes per second Petabytes per day Variety All kinds of data All kinds of analytics Velocity Insights in microseconds Agility Dynamically responsive Rapid application development Millions of events per second Just-in-time decisions Powerful Analytics Sensor, video, audio, text, and relational data sources Microsecond Latency

Big Data in Real Time with InfoSphere Streams Filter / Sample Modify Annotate Analyze Fuse Classify Score Windowed Aggregates 33

PureData System for Analytics The Simple Appliance for Serious Analytics Built-in Expertise No indexes or tuning Data model agnostic Fully parallel, optimized In Database Analytics Integration by Design Server, Storage, Database in one easy to use package Automatic parallelization and resource optimization to scale economically Enterprise-class security and platform management Simplified Experience Up and running in hours Minimal ongoing administration Standard interfaces to best of breed Analytics, BI, and data integration tools Built-in analytics capabilities allow users to derive insight from data quickly Easy connectivity to other Big Data Platform components

IBM Netezza Analytics Ecosystem Tanay GPU Appliance by Fuzzy Logix IBM InfoSphere BigInsights Cloudera IBM InfoSphere Streams Software Development Kit User-Defined Extensions (UDF,UDA, UDTF,UDAP) Language Support (Map/Reduce, Java, Python, Lua, Perl, C, C++, Fortran, PMML) 3 rd rd Party In-Database Analytics Revolution Analytics R Fuzzy Logix SAS Zementis IBM SPSS Mathworks Netezza In-Database Analytics Transformations Mathematical Geospatial [Esri / nzspatial] Predictive Statistics Time Series Data Mining IBM SPSS SAS Revolution Analytics Eclipse BI Tools Esri Apache Hadoop PureData for Analytics AMPP Platform Visualization Tools

The proper foundation can optimize these new capabilities All Data IBM Watson Foundations New/Enhanced Applications Information ingestion and operational information zone Real-time analytics zone Exploration, landing and archive zone Enterprise warehouse data mart and analytic appliances zone What action should I take? Decision management What is happening? Discovery and exploration Cognitive Fabric Why did it happen? Reporting, analysis, content analytics Information governance zone What could happen? Predictive analytics and modeling Systems Security Storage On premise, Cloud, As a service IBM Big Data & Analytics Infrastructure