Why Big Data Matters? Speaker: Paras Doshi

Similar documents
Microsoft Big Data. Solution Brief

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

Microsoft Azure Essentials

Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN

Designing Business Intelligence Solutions with Microsoft SQL Server 2014

Bringing the Power of SAS to Hadoop Title

Analyzing Data with Power BI

1. Intoduction to Hadoop

COPYRIGHTED MATERIAL. 1Big Data and the Hadoop Ecosystem

E-guide Hadoop Big Data Platforms Buyer s Guide part 3

Big Data The Big Story

IBM Big Data Summit 2012

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

Jason Virtue Business Intelligence Technical Professional

SAS & HADOOP ANALYTICS ON BIG DATA

Knowledge Discovery and Data Mining

Building Your Big Data Team

Harnessing the Power of Big Data to Transform Your Business Anjul Bhambhri VP, Big Data, Information Management, IBM

Analyzing Data with Power BI

Audience Profile The course will likely be attended by SQL Server report creators who are interested in alternative methods of presenting data.

ARCHITECTURES ADVANCED ANALYTICS & IOT. Presented by: Orion Gebremedhin. Marc Lobree. Director of Technology, Data & Analytics

HP SummerSchool TechTalks Kenneth Donau Presale Technical Consulting, HP SW

Session 30 Powerful Ways to Use Hadoop in your Healthcare Big Data Strategy

The Intersection of Big Data and DB2

Self Service BI Vision to Reality. Asaf Bar MRS & Eternity CEO

Hadoop and Analytics at CERN IT CERN IT-DB

Digging into Hadoop-based Big Data Architectures

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB

Application Integrator Automate Any Application

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11

SAP Big Data. Markus Tempel SAP Big Data and Cloud Analytics Services

Cask Data Application Platform (CDAP)

Hybrid Data Management

BIG DATA and DATA SCIENCE

Big Data & Hadoop Advance

Faster Insights from Any Data Technical White Paper

Big Business Value from Big Data and Hadoop

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

Deloitte School of Analytics. Demystifying Data Science: Leveraging this phenomenon to drive your organisation forward

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

Big Data Live selbst analysieren

MapR: Solution for Customer Production Success

Azure Offerings for Big data. In Kee Paek Cloud Data Solution Architect Microsoft Korea October. 2016

Big Data Job Descriptions. Software Engineer - Algorithms

Got Data Silos? Automate Data Ingestion Into Isilon In Support Of Analytics

ETL on Hadoop What is Required

Common Customer Use Cases in FSI

Data Analytics. Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC

The Alpine Data Platform

Big Data: Essential Elements to a Successful Modernization Strategy

Operational Hadoop and the Lambda Architecture for Streaming Data

E-Guide THE EVOLUTION OF IOT ANALYTICS AND BIG DATA

Microsoft BI Product Suite

New Features in Data Visualization Desktop and OBIEE

ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS)

Next Enterprise Application Architecture under Convergence of Cloud, Mobile, Social and Big Data

Hadoop Integration Deep Dive

Copyright - Diyotta, Inc. - All Rights Reserved. Page 2

Designing Business Intelligence Solutions with Microsoft SQL Server 2014

Leveraging Oracle Big Data Discovery to Master CERN s Data. Manuel Martín Márquez Oracle Business Analytics Innovation 12 October- Stockholm, Sweden

ActualTests.C Q&A C Foundations of IBM Big Data & Analytics Architecture V1

Cask Data Application Platform (CDAP) The Integrated Platform for Developers and Organizations to Build, Deploy, and Manage Data Applications

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia

Big Data Analytics for Retail with Apache Hadoop. A Hortonworks and Microsoft White Paper

Boston Azure Cloud User Group. a journey of a thousand miles begins with a single step

Business Intelligence in Azure Alex Whittles

Evolution to Revolution: Big Data 2.0

Hortonworks Data Platform. Buyer s Guide

Mid-Atlantic CIO Forum

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

The Evolution of Big Data

How Data Science is Changing the Way Companies Do Business Colin White

Architecture Overview for Data Analytics Deployments

7 October Rolf Tesmer. SQL & BI Solutions Architect. b in

MapR Pentaho Business Solutions

Solution Components Sugar 6.5 Release

Your Big Data to Big Data tools using the family of PI Integrators

: 20776A: Performing Big Data Engineering on Microsoft Cloud Services

Berkeley Data Analytics Stack (BDAS) Overview

Oracle Big Data Cloud Service

Brian Macdonald Big Data & Analytics Specialist - Oracle

WebFOCUS: Business Intelligence and Analytics Platform

Reduce Money Laundering Risks with Rapid, Predictive Insights

Integrating MATLAB Analytics into Enterprise Applications

SAS ANALYTICS AND OPEN SOURCE

Machine-generated data: creating new opportunities for utilities, mobile and broadcast networks

Design of material management system of mining group based on Hadoop

TIBCO Live Datamart providing an operational command and control center in a virtual train application.

Cloud Based Analytics for SAP

Exploring the Benefits of the Modernized Data Warehouse Philip Russom

Implementing a Data Warehouse with Microsoft SQL Server

Oracle Autonomous Data Warehouse Cloud

Ray M Sugiarto MAPR Champion Indonesia

Konica Minolta Business Innovation Center

Hortonworks Powering the Future of Data

Transforming Big Data to Business Benefits

Transcription:

Why Big Data Matters? Speaker: Paras Doshi If you re wondering about what is Big Data and why does it matter to you and your organization, then come to this talk and get introduced to Big Data and learn about Tools that you could start using right away!

Goals: What is Big Data? Why does Big Data matter? which Big Data Tools are available to us?

About Paras:

My Background w/ Big Data Studied Data Intensive Computing in cloud at University of Washington, USA (2012) Independent Study on Big Data Topics/Tools since October 2011 Blogging about Hadoop on Azure and Hadoop on Windows since they came out. Blog: ParasDoshi.com Answer questions about Hadoop on Windows & Big Data on MSDN and Stackoverflow forums Got to attend Dr. David DeWitt s presentations related to Big Data at SQL PASS Summit 2011 as well as PASS summit 2012

Business Intelligence Hadoop Big Data Courtesy: Google Trends

Evolution of Big Data

Advance Analytics Bigger Data

Bigger Data Advance Analytics

Is it just about Volume?

Why 3V? 1010101011 0101010101 0101010101 01

Other Definitions

Source: Big Data Demystified: Using Unstructured Data for Competitive Advantage www.deloitte.com/view/en_us/us/services/additional-services/deloitte-analyticsservice/217c19e69249b310vgnvcm2000003356f70arcrd.htm Big data is a nickname for the recent increase in largely external and unstructured business and consumer information

In information technology, big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications Source: http://en.wikipedia.org/wiki/big_data#cite_note-1

I collected such definitions from the Internet, Here s the summary:

Hadoop & Map Reduce Running computationally intensive Algorithms Large scale data analysis OR data intensive computing Analyzing Data using Distributed Systems Big Data Data Mining on massive data sets Analyzing External /Public / unstructured / Social Network Datasets Massively parallel processing databases

Working Definition of Big Data Company s Data Needs exceeds its infrastructure

Cost of Data Acquisition has dramatically decreased: 30 years ago: 1 KB = $1* 1 TB = ~$1 Billion** Today: 1 TB Definitely not $1 Billion People do it voluntarily on social networks Machines Generates Data too. *Assume **Approximately Source: Dave Campbell Interview at Build 2012

Unstructured information accounts for more than 70% 80% of all data in organizations and is growing 10 50x more than structured data Source: http://en.wikipedia.org/wiki/unstructured_data Emails There s Value in Unstructured Data: Social Network Data Images Videos Audio

Bigger Data Advance Analytics

Yay! We sold 10 Xbox's today! Who bought it? Who referred them? Context of the purchase? Can we Recommend products to them?

There s value in doing advance analytics Almost Everyone s doing Business Intelligence, Where s the competitive advantage? Examples of Advance analytics over #1: Unstructured Data #2: External Data

Big Data Problems Customer churn Analysis Risk Modeling Recommendation engine Ad Targeting Point of Sale Transaction Analysis Analyzing network data to predict failure Threat Analysis Trade surveillance Search Quality Data Sandbox Source: Cloudera s Top Hadoopable Problems resolved paper

Recap Big (ger) Data Volume Velocity Variety Value Advanced Analytics Examples: Recommendation Engine Sentiment Analysis

In pioneer days they used oxen for heavy pulling, and when one ox couldn t budge a log, they didn t try to grow a larger ox; We shouldn t be trying for bigger computers, but for more systems of computers - Grace Hopper

Thanks Andrew Brust: https://twitter.com/andrewbrust/status/291611569924734976 Tools: HDInsight: Hadoop for Windows Server & Azure Massively parallel processing (MPP) databases: Parallel Data Warehouse (PDW) NoSQL: Windows Azure Tables Real Time Analysis: StreamInsight

Quick Numbers about PDW: Capacity (in TB) Microsoft Data Warehouse Offerings 5 BDWA 14-40 Fast Track DW 80-500+ PDW Effort to Build such solutions is function of Capacity, Concurrency, Query Complexity Source: Data Warehousing: SQL Server Parallel Data Warehouse AU3 Update (SQL Server 2012) http://www.youtube.com/watch?feature=player_embedded&v=anxj4otmgsk

What is Hadoop? Hadoop is a Big Data Platform Distributes and Replicates Data Manages Parallel Tasks created by Users Runs as Several Processes on a cluster Hadoop is a Toolset

What is HDInsight? Preview announced on 24 th OCT 2012 at Strata & Hadoop world Conference Apache Hadoop based solution for Windows Server and Windows Azure

Hadoop Ecosystem Hive PIG SQooP HDFS (Hadoop Distributed File System) MapReduce

Hadoop Ecosystem HDFS - distributed file system. Map Reduce A distributed framework for executing work in parallel. HIVE a SQL like language on top of Map Reduce. PIG - A scripting language to Manipulate SQOOP - enables data exchange between relational databases and Hadoop clusters. Mahout - scalable machine learning libraries.

Microsoft & Hadoop Ecosystem: Make Hadoop Windows Compatible Microsoft.NET SDK for Hadoop Hive ODBC Driver Excel Hive Add-in JavaScript layer on Hadoop Hive & JavaScript web console Connectors for SQL server and PDW Integration with Active Directory for access control Integration with System center for administration and management

Hadoop shouldn't replace your current data infrastructure, only augment it Source: http://www.itworld.com/bigdatahadoop/280919/what-hadoop-can-and-cant-do

SQL Server + Hadoop HiveODBC driver can be used to connect Excel, SSIS & Tabular Models to Hive Tables SQooP can be used to load operational data from SQL Server to Hadoop

Future of Hadoop & Big Data Tools Microsoft s Polybase mashes up SQL Server and Hadoop (demo shown at SQL PASS summit 2012) Hadoop support for ad-hoc and real time queries HDInsight for windows/azure should be available for production use. Hopefully, support for tools such as Hbase & Mahout will be added More use cases emerge *Mahout: Mahout is included in the HDInsight for Windows Azure

Related Microsoft Technologies/Projects: Windows Azure Data Market Dryad (Deprecated Microsoft Research Project) Project Daytona (iterative MapReduce runtime for Windows Azure by MSR) SQL Azure Labs: Microsoft CodeName Data Explorer Microsoft codename Cloud Numerics Microsoft codename Social Analytics

Other Big Data vendors Cloudera Hortonworks (Micorosft using Hortonworks hadoop distribution) Amazon s Elastic Map Reduce EMC s GreenPlum IBM s Infosphere Oracle Big Data Google BigQuery Aster data, Vertica, among others..

Concluding comments: Big Data means different things in different context Your Task: What does Big Data mean to you? Growing number of business users realize the value of Unstructured & External datasets Your Task: Identify External and/or unstructured that can be used to find insights for your organization. Traditional BI Tools were not designed for Big Data Analytics Your Task: Play w/ some of the Big Data Tools

Thank you! Special Thanks to Rushabh Mehta and Nigel Sammy. Contact Information: Email: Contact@ParasDoshi.com Blog: www.parasdoshi.com Twitter: @Paras_Doshi SolidQ: www.solidq.com Slides can be downloaded from: <link>