Welcome! 2013 SAP AG or an SAP affiliate company. All rights reserved.

Similar documents
Simplifying the Process of Uploading and Extracting Data from Apache Hadoop

Microsoft Azure Essentials

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

Bringing the Power of SAS to Hadoop Title

Microsoft Big Data. Solution Brief

Big Data The Big Story

20775: Performing Data Engineering on Microsoft HD Insight

SAP Big Data. Markus Tempel SAP Big Data and Cloud Analytics Services

Cask Data Application Platform (CDAP)

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

MapR: Solution for Customer Production Success

Operational Hadoop and the Lambda Architecture for Streaming Data

ETL challenges on IOT projects. Pedro Martins Head of Implementation

Evolution to Revolution: Big Data 2.0

Why Big Data Matters? Speaker: Paras Doshi

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia

Analyze Big Data Faster and Store it Cheaper. Dominick Huang CenterPoint Energy Russell Hull - SAP

Common Customer Use Cases in FSI

Big Business Value from Big Data and Hadoop

HP SummerSchool TechTalks Kenneth Donau Presale Technical Consulting, HP SW

Jason Virtue Business Intelligence Technical Professional

Building Your Big Data Team

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN

MapR Pentaho Business Solutions

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

Real-time Streaming Insight & Time Series Data Analytic For Smart Retail

1. Intoduction to Hadoop

IBM Big Data Summit 2012

Hybrid Data Management

Welcome to. enterprise-class big data and financial a. Putting big data and advanced analytics to work in financial services.

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect

ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS)

ETL on Hadoop What is Required

Deloitte School of Analytics. Demystifying Data Science: Leveraging this phenomenon to drive your organisation forward

Data Analytics. Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC

Big Data & Hadoop Advance

COPYRIGHTED MATERIAL. 1Big Data and the Hadoop Ecosystem

Session 30 Powerful Ways to Use Hadoop in your Healthcare Big Data Strategy

The Intersection of Big Data and DB2

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB

SAP Predictive Analytics Suite

Cask Data Application Platform (CDAP) The Integrated Platform for Developers and Organizations to Build, Deploy, and Manage Data Applications

Hadoop and Analytics at CERN IT CERN IT-DB

How Data Science is Changing the Way Companies Do Business Colin White

DLT AnalyticsStack. Powering big data, analytics and data science strategies for government agencies

SAS & HADOOP ANALYTICS ON BIG DATA

E-Guide THE EVOLUTION OF IOT ANALYTICS AND BIG DATA

In-Memory Analytics: Get Faster, Better Insights from Big Data

PI Integrator for Business Analytics

The Internet of Things Wind Turbine Predictive Analytics. Fluitec Wind s Tribo-Analytics System Predicting Time-to-Failure

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT

Business In The Moment: From Reactive to Proactive. Timo Elliott, May 2012

ARCHITECTURES ADVANCED ANALYTICS & IOT. Presented by: Orion Gebremedhin. Marc Lobree. Director of Technology, Data & Analytics

Cloud Based Analytics for SAP

E-guide Hadoop Big Data Platforms Buyer s Guide part 3

Oracle Enterprise Data Quality Product Roadmap and Statement of Direction. October 2016

Ray M Sugiarto MAPR Champion Indonesia

Boston Azure Cloud User Group. a journey of a thousand miles begins with a single step

Big Data Job Descriptions. Software Engineer - Algorithms

Pentaho 8.0 and Beyond. Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara

Architecture Overview for Data Analytics Deployments

Copyright - Diyotta, Inc. - All Rights Reserved. Page 2

Big Data Management Best Practices for Data Lakes Philip Russom, Ph.D.

Got Data Silos? Automate Data Ingestion Into Isilon In Support Of Analytics

Pentaho 8.0 Overview. Pedro Alves

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11

The Rise of Engineering-Driven Analytics

Leveraging Oracle Big Data Discovery to Master CERN s Data. Manuel Martín Márquez Oracle Business Analytics Innovation 12 October- Stockholm, Sweden

Azure PaaS and SaaS Microsoft s two approaches to building IoT solutions

Hortonworks Data Platform. Buyer s Guide

Enterprise Architecture for Digital Business

SAP Business One OnDemand. SAP Business One OnDemand Solution Overview

Berkeley Data Analytics Stack (BDAS) Overview

The IoT Solutions Space: Edge-Computing IoT architecture, the FAR EDGE Project John Professor Athens Information

Oracle Big Data Cloud Service

Let s distribute.. NOW: Modern Data Platform as Basis for Transformation and new Services

Bringing Big Data to Life: Overcoming The Challenges of Legacy Data in Hadoop

Hortonworks Powering the Future of Data

Integrating MATLAB Analytics into Enterprise Applications

ActualTests.C Q&A C Foundations of IBM Big Data & Analytics Architecture V1

Konica Minolta Business Innovation Center

Microsoft FastTrack For Azure Service Level Description

Smart Mortgage Lending

BIG DATA and DATA SCIENCE

Sr. Sergio Rodríguez de Guzmán CTO PUE

Hadoop Integration Deep Dive

Your Big Data to Big Data tools using the family of PI Integrators

THE MAGIC OF DATA INTEGRATION IN THE ENTERPRISE WITH TIPS AND TRICKS

Digitalisieren Sie Ihr Unternehmen mit dem Internet der Dinge Michael Epprecht Microsoft GBB IoT

The Alpine Data Platform

LEVERAGING DATA ANALYTICS TO GAIN COMPETITIVE ADVANTAGE IN YOUR INDUSTRY

Architected Blended Big Data With Pentaho. A Solution Brief

FORIS Business Intelligence. Innovative Analytics

Operational Intelligence in Industrial Environments

Data Lake or Data Swamp?

Application Integrator Automate Any Application

PCM Update. Jérémie Brunet Solution Management Dave Parsons Engineering

Transcription:

Welcome! 2013 SAP AG or an SAP affiliate company. All rights reserved. 1

SAP Big Data Webinar Series Big Data - Introduction to SAP Big Data Technologies Big Data - Streaming Analytics Big Data - Smarter Data Virtualization Big Data - Gain New Insight from Hadoop Big Data - Spatial Data Processing for Richer Insights Big Data - Text Analytics

Speaker Introduction Yuvaraj Athur Raghuvir, is a Senior Director in the SAP HANA Platform Solution Management at SAP. He leads the Big Data Analytics portfolio including SAP Real-time Data Platform. Yuvaraj has over 14 years of experience spanning Business Applications, Business Analytic Solutions, Architecture and Engineering. 2013 SAP AG or an SAP affiliate company. All rights reserved. 3

SAP Big Data Webinar Series Gain New insight from Hadoop Presented by: Yuvaraj Athur Raghuvir, SAP HANA Platform July 24 2013

SHOPPERS GET FASHION ADVICE THAT FITS THEIR STYLE

Big Data Economics Streaming Data incl. Sensors, Social and Mobile Predictive incl. data mining and machine learning Urgent Need Unstructured Analysis Incl. text, media, spatial etc. Scalable Storage 2013 SAP AG or an SAP affiliate company. All rights reserved. 6

Open Source Community Gift: Apache Hadoop Streaming Data Apache Hadoop incl. Sensors, Social and Mobile Predictive [Commons] incl. data mining and machine learning Urgent Need [Projects] Unstructured Analysis Incl. text, [Distributions] media, spatial etc. [Data Scientists] Scalable Storage 2013 SAP AG or an SAP affiliate company. All rights reserved. 7

Hadoop Possibilities Amazon has reported that it started 2 million Elastic MapReduce (EMR) clusters in a single year. Dynamic Community with over 2500+ Hadoop related projects in GitHub. Growing Popularity among Vendors Storage Compute Network High Performance Computing Vendors Pushing New Boundaries! Source: 1) Gartner Blog http://blogs.gartner.com/merv-adrian/2013/02/23/hadoop-2013-part-three-platforms/ retrieved on July 17 2013 2) Gartner Blog http://blogs.gartner.com/merv-adrian/2013/03/08/hadoop-2013-part-four-players/ retrieved on July 17 2013 3) GitHub query Hadoop retrived on July 17 2013 2013 SAP AG or an SAP affiliate company. All rights reserved. 8

Hadoop Community Complexity! 2500+ Community Projects!! Highly Dynamic & Evolving Ecosystem Projects & Alternatives Gartner Infographic Source: Gartner Blog http://blogs.gartner.com/mervadrian/2013/02/21/hadoop-2013-part-two-projects/ Source: GitHub query Hadoop retrived on July 17 2013 Source: GigaOm Infographic, Mar 5, retrieved on July 17 2013 retreived on July 17 2013 2013 SAP AG or an SAP affiliate company. All rights reserved. GigaOm Infographic 9

The Big Data Phenomenon Big Data is more than just Hadoop Business Trends Technology Trends Exploding data volumes Increasing data variety Accelerating data velocity Storage / Memory / CPU advances Hadoop & distributed MPP Data Mining/Predictive analysis In-memory computing Complex event processing Enterprise Big Data 2013 SAP AG or an SAP affiliate company. All rights reserved. 10

Big Data Challenges Business and technical needs Business Needs Quick insights from all business-relevant data Pick the right action among many choices 100101 011010 100101 Technical Challenges Cost of store vs. cost to process data considerations Pressure to gain insight quickly from data Action 1 Action 2 Action 3 All Relevant Big Data Quick Insight Diversity of data formats makes it difficult to analyze Right Action Need to have disparate technologies interoperate across enterprise Determine the right action among many valuable insights across variety, volume, velocity and technology is difficult 2013 SAP AG or an SAP affiliate company. All rights reserved. 11

How to Capitalize on the Big Data Opportunity and Address Big Data Technical Challenges? To deploy an integrated data processing framework Optimize data management in each phase of the information lifecycle process Regardless of data source, processing technologies, latency challenges, number of user demands To enable real-time, actionable insights in business process context Marry business process insights from structured data analysis with deep pattern, behavior analysis of unstructured data Enable decision making based on multi-factor considerations, not just instinct/experience To derive new value from INFORMATION Focus on deriving new value from data by enabling new business and technology use cases previously not feasible Augment existing business scenarios with new data insights to enable better decision 2013 SAP AG or an SAP affiliate company. All rights reserved. 12

Enterprise Scenarios with Hadoop Hadoop as a flexible data store Streaming Data Social Media Reference Data Transaction Data Enterprise Data SAP Business Suite Other SAP solutions Hadoop as a simple database Hadoop SAP Solutions Data warehouse/database (SAP HANA, SAP Sybase IQ/SAP Sybase ASE) SAP Data Services Computation Engine(s) Job Management Data storage (Hadoop Distributed File system) Hadoop as a processing engine Hadoop for data analytics Non-SAP solutions BI and analytics software from SAP In-memory Analytic engine and/or... Disk-based data ware-house (SAP Sybase IQ) Analytic engine 2013 SAP AG or an SAP affiliate company. All rights reserved. 13

Hadoop as a Simple Database SAP Business Suite SAP Solutions Focus: Storage and Retrieval of data from Hadoop typically using interfaces provided by HIVE or direct HDFS access Other SAP solutions Data warehouse/database (SAP HANA, SAP Sybase IQ/SAP Sybase ASE) Hadoop as a flexible data store Streaming Data Social Media Reference Data Transaction Data Enterprise Data Hadoop as a simple database Hadoop SAP Data Services Computation Engine(s) Job Management Data storage (Hadoop Distributed File system) Scenarios of Use include: Extract-Transform-Load from other systems to Hadoop. SAP Data Services provides ETL support from Hadoop to SAP HANA. Store & Retrieve Structured data based on projects like Hive. Depending on the scenario, scalable analytical systems like Sybase IQ can also be considered. Handle large documents as Blobs to do retrievals or analytics later. Use as a near-line store for offloading data that is considered cold or frozen Data lifecycle management is typically manual. 2013 SAP AG or an SAP affiliate company. All rights reserved. 14

Hadoop as a Processing Engine SAP Business Suite SAP Solutions Focus: Distributed Compute leveraging the Map-Reduce computation framework of Hadoop on large distributed data sets Other SAP solutions Data warehouse/database (SAP HANA, SAP Sybase IQ/SAP Sybase ASE) Hadoop as a flexible data store Streaming Data Social Media Reference Data Transaction Data Enterprise Data Hadoop SAP Data Services Computation Engine(s) Job Management Data storage (Hadoop Distributed File system) Hadoop as a processing engine Scenarios of Use include: Data Enrichment. Push down of Text Data Transforms from Data Services is an example Data Pattern Analysis. This is an emerging space across new data forms. Convergence between procedural data science and declarative access patterns are evolving. 2013 SAP AG or an SAP affiliate company. All rights reserved. 15

Hadoop for Data Analytics BI and analytics software from SAP Analytic engine In-memory and/or... Disk-based data ware-house (SAP Sybase IQ) Analytic engine Focus: A combination of storage and delegated analytics supporting two approaches: Two-Phase Analytics: Background processing engine refine and feeding data Federated Queries: Client side federation across data stores Hadoop as a flexible data store Streaming Data Social Media Reference Data Transaction Data Enterprise Data Hadoop for data analytics Hadoop Computation Engine(s) Job Management Data storage (Hadoop Distributed File system) Scenarios of Use include: Cross DM analytics. Practical only when performance from Hadoop is acceptable Stand-alone analytics. Emerging area to use Hadoop as the direct data store for analytics 2013 SAP AG or an SAP affiliate company. All rights reserved. 16

SAP HANA SP6 - smart data access capability Data virtualization for on-premise and hybrid cloud environments New HANA Tables Transactions + Analytics SAP HANA Virtual Tables Benefits Enables access to remote data access just like local table Provides SAP HANA to SAP HANA queries Smart query processing including query decomposition with predicate push-down, functional compensation Supports data location agnostic development No special syntax to access heterogeneous data sources Non-disruptive evolution Teradata IQ Heterogeneous data sources SAP HANA to Hadoop (Hive) Teradata SAP Sybase ASE SAP Sybase IQ Hadoop ASE SAP HANA 2013 SAP AG. All rights reserved. 17

Example Reference Architecture : Machine-to-Machine Infrastructure Run-Time Architecture Device End User Apps Web App Mobile App Dashboard Edge Cloud / Backend Device Management Stream M2M Services Data Acquisition & Processing Batch M2M Application Server Core Services Application and Analytical Services Industry-Specific Services SQLA / UltraLite Data Persistence Synchronization Real-Time Data Platform ESP Event Processing SQLA Data Synchronization HANA Hot Data Data Models Predictive Models ASE / IQ Warm and Cold Data Hadoop Big Data Sets 2013 SAP AG. All rights reserved. 18

Beyond Business Networks Internet of Things Meters, Drills MRI, PDAs Generators Turbines Healthcare Industrial Windmills Implants, Surgical Equipment UPS Batteries Pumps, Monitors Public Sector Fuel Cells, etc. Pumps, Valves, Vats, Conveyors, Pipelines Telemedicine, etc. Motors, Drives, Converting, Fabrication Annual smart meter shipments to surpass 140 million units worldwide by 2016, representing a CAGR of 32.9%.[3] Tanks, Fighter Jets Battlefield Comms Jeeps, Cars, Ambulances Breakdown, Lone Worker Homeland Security Tolls, etc. Environ. Monitoring, etc. Planes, Signage Assembly/Packaging, Vessels/Tanks, etc. Vehicles, Lights, Ships Meter Data Management will exceed $420 Million by 2020 with a CAGR of 16.8%[2] Cellular M2M revenue opportunity projected to reach $1.2 Trillion by 2020 Picture: Beecham Research 1. GSMA; 2.Pike Research; 3. IDC Energy Insights

DRIVERS DIVERTED BEFORE FATAL ACCIDENTS HAPPEN

SAP Big Data Webinar Series Thank You! Presented by: Yuvaraj Athur Raghuvir, SAP HANA Platform yuvaraj.athur.raghuvir@sap.com 2013 SAP AG or an SAP affiliate company. All rights reserved.

Hadoop Commons : Core / Common Components Hadoop Distributed File System: HDFS, the storage layer of Hadoop, is a distributed, scalable, Java-based file system adept at storing large volumes of unstructured data. MapReduce: MapReduce is a software framework that serves as the compute layer of Hadoop. MapReduce jobs are divided into two (obviously named) parts. The Map function divides a query into multiple parts and processes data at the node level. The Reduce function aggregates the results of the Map function to determine the answer to the query. Hive: Hive is a Hadoop-based data warehousing-like framework originally developed by Facebook. It allows users to write queries in a SQL-like language caled HiveQL, which are then converted to MapReduce. This allows SQL programmers with no MapReduce experience to use the warehouse and makes it easier to integrate with business intelligence and visualization tools such as Microstrategy, Tableau, Revolutions Analytics, etc. Pig: Pig Latin is a Hadoop-based language developed by Yahoo. It is relatively easy to learn and is adept at very deep, very long data pipelines (a limitation of SQL.) HBase: HBase is a non-relational database that allows for low-latency, quick lookups in Hadoop. It adds transactional capabilities to Hadoop, allowing users to conduct updates, inserts and deletes. EBay and Facebook use HBase heavily. Flume: Flume is a framework for populating Hadoop with data. Agents are populated throughout ones IT infrastructure inside web servers, application servers and mobile devices, for example to collect data and integrate it into Hadoop. Oozie: Oozie is a workflow processing system that lets users define a series of jobs written in multiple languages such as Map Reduce, Pig and Hive -- then intelligently link them to one another. Oozie allows users to specify, for example, that a particular query is only to be initiated after specified previous jobs on which it relies for data are completed. Source: http://wikibon.org/wiki/v/hbase,_sqoop,_flume_and_more:_apache_hadoop_defined retrieved on Jul 17 2013 2013 SAP AG or an SAP affiliate company. All rights reserved. 22

Hadoop Commons : Core / Common Components Ambari: Ambari is a web-based set of tools for deploying, administering and monitoring Apache Hadoop clusters. It's development is being led by engineers from Hortonwroks, which include Ambari in its Hortonworks Data Platform. Avro: Avro is a data serialization system that allows for encoding the schema of Hadoop files. It is adept at parsing data and performing removed procedure calls. Mahout: Mahout is a data mining library. It takes the most popular data mining algorithms for performing clustering, regression testing and statistical modeling and implements them using the Map Reduce model. Sqoop: Sqoop is a connectivity tool for moving data from non-hadoop data stores such as relational databases and data warehouses into Hadoop. It allows users to specify the target location inside of Hadoop and instruct Sqoop to move data from Oracle, Teradata or other relational databases to the target. HCatalog: HCatalog is a centralized metadata management and sharing service for Apache Hadoop. It allows for a unified view of all data in Hadoop clusters and allows diverse tools, including Pig and Hive, to process any data elements without needing to know physically where in the cluster the data is stored. BigTop: BigTop is an effort to create a more formal process or framework for packaging and interoperability testing of Hadoop's sub-projects and related components with the goal improving the Hadoop platform as a whole. Source: http://wikibon.org/wiki/v/hbase,_sqoop,_flume_and_more:_apache_hadoop_defined retrieved on Jul 17 2013 2013 SAP AG or an SAP affiliate company. All rights reserved. 23