IBM Big Data Summit 2012

Similar documents
IBM BigInsights - Hadoop jako rozwiązanie korporacyjne. Tomasz Zawadzki Dyrektor Zarządzający Atom-tech

Big Data Platform Overview

IBM s InfoSphere BigInsights: Smart Analytics for Big Data

BigInsights on Cloud. Mike Nobles Executive, BigInsights Solution Specialist WW Technical Sales, Cloud Data Services

Big Data Live selbst analysieren

WELCOME TO. Cloud Data Services: The Art of the Possible

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop

The Intersection of Big Data and DB2

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN

Bringing the Power of SAS to Hadoop Title

Microsoft Azure Essentials

Microsoft Big Data. Solution Brief

BIG DATA AND HADOOP DEVELOPER

IBM InfoSphere BigInsights V2.0 delivering enterprise Hadoop capabilities with easy-to-use analytic tools and visualization

Angat Pinoy. Angat Negosyo. Angat Pilipinas.

E-guide Hadoop Big Data Platforms Buyer s Guide part 3

SAS and Hadoop Technology: Overview

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation

KnowledgeSTUDIO. Advanced Modeling for Better Decisions. Data Preparation, Data Profiling and Exploration

Hadoop Course Content

DataAdapt Active Insight

ETL on Hadoop What is Required


Big Data & Hadoop Advance

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

5th Annual. Cloudera, Inc. All rights reserved.

Big Data und Hadoop. BI/DW Modernisierungs-Szenarien auf System z

Hadoop Integration Deep Dive

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

ABOUT THIS TRAINING: This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA.

IBM SmartCloud public images with selected software

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11

Why Big Data Matters? Speaker: Paras Doshi

IBM Analytics Unleash the power of data with Apache Spark

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

GET MORE VALUE OUT OF BIG DATA

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud

Cognitive Data Warehouse and Analytics

Harnessing the Power of Big Data to Transform Your Business Anjul Bhambhri VP, Big Data, Information Management, IBM

Intro to Big Data and Hadoop

1. Intoduction to Hadoop

Trifacta Data Wrangling for Hadoop: Accelerating Business Adoption While Ensuring Security & Governance

Big Data The Big Story

Lesson 3 Cloud Platform as a Service usages for accelerated Design and Deployment of IoTs

Big Data Analytics met Hadoop

Mobile Application Developer

IBM PureData System for Analytics Overview

Enterprise-Scale MATLAB Applications

David Taylor

MapR Pentaho Business Solutions

Cask Data Application Platform (CDAP) Extensions

Hortonworks Data Platform

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

Business is being transformed by three trends

Alexander Klein. ETL meets Azure

Practices of Business Intelligence. (Business Intelligence, Analytics, and Data Science)

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

The Information Integration Platform

New Big Data Solutions and Opportunities for DB Workloads

Exploring Big Data and Data Analytics with Hadoop and IDOL. Brochure. You are experiencing transformational changes in the computing arena.

Information Server 11.3 Overview. Kevin D Silva Client Technical Professional, InfoSphere Information Server

20775A: Performing Data Engineering on Microsoft HD Insight

MapR: Solution for Customer Production Success

IBM Db2 Warehouse. Hybrid data warehousing using a software-defined environment in a private cloud. The evolution of the data warehouse

Welcome! 2013 SAP AG or an SAP affiliate company. All rights reserved.

Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect

Analytics in Action transforming the way we use and consume information

20775 Performing Data Engineering on Microsoft HD Insight

20775A: Performing Data Engineering on Microsoft HD Insight

InfoSphere Warehousing 9.5

Insights to HDInsight

Cask Data Application Platform (CDAP)

20775: Performing Data Engineering on Microsoft HD Insight

Analyzing Data with Power BI

Information Builders Enterprise Information Management Solution Transforming data into business value Fateh NAILI Enterprise Solutions Manager

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand

Modernizing Your Data Warehouse with Azure

Amsterdam. (technical) Updates & demonstration. Robert Voermans Governance architect

SAP Cloud Platform Big Data Services EXTERNAL. SAP Cloud Platform Big Data Services From Data to Insight

COPYRIGHTED MATERIAL. 1Big Data and the Hadoop Ecosystem

What s Happening to the Mainframe? Mobile? Social? Cloud? Big Data?

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper

The Sysprog s Guide to the Customer Facing Mainframe: Cloud / Mobile / Social / Big Data

Oracle Big Data Discovery The Visual Face of Big Data

Pentaho 8.0 and Beyond. Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara

Oracle Service Cloud. New Feature Summary

InfoSphere Warehouse. Flexible. Reliable. Simple. IBM Software Group

Realising Value from Data

Social Analytics. More than Listening Social Media Strategy. Creating relationship. Build advocacy. Improve loyalty

Copyright 2014, Oracle and/or its affiliates. All rights reserved. 2

Spark and Hadoop Perfect Together

NICE Customer Engagement Analytics - Architecture Whitepaper

Course 20467C: Designing Self-Service Business Intelligence and Big Data Solutions

Architecture Optimization for the new Data Warehouse. Cloudera, Inc. All rights reserved.

Berkeley Data Analytics Stack (BDAS) Overview

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

WHY THE GEOGRAPHIC APPROACH? Everything occurs somewhere. Geography is common

Active Analytics Overview

Hortonworks Connected Data Platforms

Achieving Agility and Flexibility in Big Data Analytics with the Urika -GX Agile Analytics Platform

Transcription:

IBM Big Data Summit 2012 12.10.2012

InfoSphere BigInsights Introduction Wilfried Hoge Leading Technical Sales Professional hoge@de.ibm.com twitter.com/wilfriedhoge 12.10.1012

IBM Big Data Strategy: Move the Analytics Closer to the Data New analytic applications drive the requirements for a big data platform Integrate and manage the full variety, velocity and volume of data Apply advanced analytics to information in its native form Visualize all available data for adhoc analysis Development environment for building new analytic applications Workload optimization and scheduling Security and Governance BI / Exploration / Functional Industry Predictive Reporting Visualization App App Analytics Visualization & Discovery Hadoop System Analytic Applications IBM Big Data Platform Application Development Accelerators Stream Computing Content Analytics Systems Management Data Warehouse Information Integration & Governance

BigInsights analytical platform for persistent Big Data Based on open source & IBM technologies Distinguishing characteristics Built-in analytics... enhances business knowledge Enterprise software integration... complements and extends existing capabilities Production-ready platform with tooling for analysts, developers, and administrators... speeds time-to-value and simplifies development/maintenance IBM advantage Combination of software, hardware, services and advanced research BI / Exploration / Functional Industry Predictive Reporting Visualization App App Analytics Visualization & Discovery Hadoop System Analytic Applications IBM Big Data Platform Application Development Accelerators Stream Computing Content Analytics Systems Management Data Warehouse Information Integration & Governance

About the BigInsights Platform Flexible, enterprise-class support for processing large volumes of data Based on Google s MapReduce technology Inspired by Apache Hadoop; compatible with its ecosystem and distribution Well-suited to batch-oriented, read-intensive applications Supports wide variety of data Enables applications to work with thousands of nodes and petabytes of data in a highly parallel, cost effective manner CPU + disks = node Nodes can be combined into clusters New nodes can be added as needed without changing Data formats How data is loaded How jobs are written

Hadoop Explained Map Reduce Hadoop computation model Data stored in a distributed file system spanning many inexpensive computers Bring function to the data Distribute application to the compute resources where the data is stored Scalable to thousands of nodes and petabytes of data public static class TokenizerMapper extends Mapper<Object,Text,Text,IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); Hadoop Data Nodes public void map(object key, Text val, Context StringTokenizer itr = new StringTokenizer(val.toString()); while (itr.hasmoretokens()) { word.set(itr.nexttoken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWrita private IntWritable result = new Intritable(); public void reduce(text key, Iterable<IntWritable> val, Context context){ int sum = 0; for (IntWritable v : val) { sum += v.get();... MapReduce Application Distribute map tasks to cluster Shuffle 1. Map Phase (break job into small parts) 2. Shuffle (transfer interim output for final processing) 3. Reduce Phase (boil all output down to a single result set) Result Set Return a single result set

BigInsights Value Beyond Open Source Technical differentiators Built-in analytics Text processing engine, annotators, Eclipse tooling Statistical and predictive analysis Interface to project R (statistical platform) Enterprise software integration (DBMS, warehouse) Spreadsheet-style analytical tool for analysts Ready-made business process accelerators Integrated installation of supported open source and IBM components Web Console for administration and application access Platform enrichment: additional security, performance features,... Standard IBM licensing agreement and world-class support Business benefits Quicker time-to-value due to IBM technology and support Reduced operational risk Enhanced business knowledge with flexible analytical platform Leverages and complements existing software assets

Zookeeper IBM LZO Compression Avro InfoSphere BigInsights Embrace and Extend Hadoop Analytics ML Analytics Text Analytics BigSheets Interface Web console Application Pig Hive Jaql MapReduce AdaptiveMR FLEX BigIndex Oozie Lucene Monitor cluster health Add / remove nodes Start / stop services Inspect job status Inspect workflow status Deploy apps Launch apps / jobs Work with distrib. file system Work with spreadsheet interface Support REST-based API... Storage HDFS HBase GPFS-SNC Eclipse plug-ins Data Sources/ Connectors Streams Netezza BoardReader R Text analytics MapReduce programming Jaql development Hive query development Data Stage DB2 CSV / XML / JSON SPSS Flume JDBC Web Crawler IBM Open Source

Web Installation Tool Seamless process for single node and cluster environments Integrated installation of all selected components Post-install validation of IBM and open source components No need to iteratively download, configure, and test multiple open source projects and their pre-requisite software.

Web Console Manage BigInsights Inspect system health Add / drop nodes Start / stop services Run / monitor jobs (applications) Explore / modify file system Launch applications Spreadsheet-like analysis tool Pre-built applications (IBM supplied or user developed) Publish applications Leverage community resources

Quick start applications or apps Reusable software assets based on customer engagements Useful for starting point for various applications Can be customized by BigInsights application developers as needed Accessible through Web console Available assets Data export (to relational DBMS, files, HBase) Data import (from relational DBMS, files) Web crawler, Twitter crawler Boardreader.com support (Web forum search engine) Ad hoc queries for Jaql, Hive, Pig TeraGen-TeraSort, WordCount sample applications

Running Applications from the Web Console

DEMO web console 12.10.1012

BigSheets BigSheets is a visual tool for data manipulation and prototyping Allows more users to do more work, more quickly Simply stated, growing an army of MapReduce developers is not cost effective In your BI environments you have a ratio of 30+ report users for every complex SQL developer. We need to support the same ratios with BigInsights Sample Uses Data exploration and visualization Visual job creation

BigSheets Spreadsheet-style Data Analysis and Discovery

BigSheets Visualization

DEMO BigSheets 12.10.1012

Text Analytics in BigInsights Text analytics Distill structured information from unstructured data Rich annotator library supports multiple languages Declarative Information Extraction (IE) system based on an algebraic framework Richer, cleaner rule semantics Better performance through optimization Developed at IBM Research since 2004 Embedded in several IBM products Lotus Notes Cognos Consumer Insights InfoSphere Streams Compose operators to build complex annotators

Text Analytics highly accurate analysis of textual content How it works Parses text and detects meaning with annotators Understands the context in which the text is analyzed Hundreds of pre-built annotators for names, addresses, phone numbers, along others Accuracy Highly accurate in deriving meaning from complex text Performance AQL language optimized for MapReduce Unstructured text (document, email, etc) Football World Cup 2010, one team distinguished themselves well, losing to the eventual champions 1-0 in the Final. Early in the second half, Netherlands striker, Arjen Robben, had a breakaway, but the keeper for Spain, Iker Casillas made the save. Winger Andres Iniesta scored for Spain for the win. Classification and Insight

BigInsights Text Analytics Development AQL

Text Analytics Tooling AQL Editor Result Viewer Runtime Explain

DEMO Text Analytics 12.10.1012

Ways to get started with BigInsights In the Cloud Via RightScale, or directly on Amazon, Rackspace, IBM Smart Enterprise Cloud, or on private clouds. Pay only for the resources used. In the Virtual Classroom Free Hadoop Fundamentals training course www.bigdatauniversity.com e.g. BD105EN - Text Analytics Essentials On Your Cluster Download Basic Edition from ibm.com. In the Classroom Enroll in the InfoSphere BigInsights Essentials course.

Visit the BigInsights technical portal.... Free links to papers, demos, discussion forum, and more http://www.ibm.com/developerworks/wiki/biginsights/

IBM Big Data Summit 2012 12.10.2012