BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

Size: px
Start display at page:

Download "BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW"

Transcription

1 BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

2 TOPICS COVERED 1 2 Fundamentals of Big Data Platforms Major Big Data Tools

3 Scaling Up vs. Out SCALE UP (SMP) SCALE OUT (MPP) + (n) Upgrade components or buy bigger server each time Multiprocessor system where processors share resources : Operating System (OS) Memory I/O devices connected using a common bus Add nodes to the cluster Multiple processing nodes OS RAM Network

4 Innovation Timeline Doug Cutting & Mike Cafarella started working on Nutch Doug Cutting adds DFS & MapReduce support to Nutch NY Times converts 4TB of image archives over 100 EC2s Fastest sort of a TB, 3.5 mins over 910 nodes Fastest sort of a TB, 62 secs over 1,460 nodes Sorted a PB in hours over 3,658 nodes Google publishes GFS & MapReduce papers Yahoo! Hires Cutting, Hadoop spins out of Nutch Founded Doug Cutting joins Cloudera Facebooks launches Hive: SQL Support for Hadoop Hadoop Summit 2009, 750 attendees

5 THE FUNDAMENTALS OF HADOOP Hadoop evolved directly from commodity scientific supercomputing clusters developed in the 1990s Hadoop consists of: MapReduce Hadoop Distributed File System (HDFS)

6 WHAT S NEW

7 BASICS OF MPP 400 bills 1 bill/ sec = 400 Seconds

8 BASICS OF MPP 200 bills 1 bill/ sec = 200 Seconds 200 bills 1 bill/ sec = 200 Seconds Total = 200 Seconds

9 BASICS OF MPP 100 Bills 1 bill/ sec = 100 Seconds 100 Bills 1 bill/ sec = 100 Seconds 100 Bills 1 bill/ sec = 100 Seconds 100 Bills 1 bill/ sec = 100 Seconds Total = 100 Seconds

10 HDFS & MAPREDUCE The Main Node: runs the Job tracker and the name node controls the files. Each node runs two processes: Task Tracker and Data Node Map Reduce Job Tracker Task Tracker Task Tracker HDFS Cluster Name Node Data Node Data Node 1 N

11 BASICS OF MAPREDUCE The Main Node: runs the Job tracker and the name node controls the files. Each node runs two processes: Task Tracker and Data Node Query Result Data Nodes/Task Trackers Query Name Node/ Job Tracker

12 EXECUTION UNITS MAPREDUCE

13 SOME DISTRIBUTIONS OF APACHE HADOOP Apache Foundation

14 Sandbox Hortonworks

15 MAPREDUCE PIG & HIVE MAPREDUCE PIG HIVE Java Write many lines of code Mostly used by Yahoo Most used for data processing Shares some constructs w/ SQL Is more Verbose Needs a lot of training for users with limited procedural programming background Offers control over the flow of data Mostly used by Facebook for analytic purposes Used for analytics Relatively easier for developers w/ SQL experience Less control over optimization of data flows compared to Pig Not as efficient as MapReduce Higher productivity for data scientists and developers

16 THE EXPLOSION OF HADOOP

17 THE HISTORY OF SPARK MapReduce Top Level Spark Paper BSD Open Source Apache 17

18 SPARK SHARED LIBRARIES

19 SPARK THE UNIFIED PLATFORM FOR BIG DATA APIs for : Scala Java Python R Spark SQL Spark Streaming MLlib (machine learning) GraphX (graph) Spark Core

20 SPARK BENEFITS Performance Using in-memory computing, Spark is considerably faster than Hadoop (100x in some tests). Can be used for batch and realtime data processing. Unified Engine Integrated framework includes higher-level libraries for interactive SQL queries, processing streaming data, machine learning and graph processing. A single application can combine all types of processing. Developer Productivity Easy-to-use APIs for processing large datasets. Includes 100+ operators for transforming. Ecosystem Spark has built-in support for many data sources such as HDFS, RDBMS, S3, Apache Hive, Cassandra and MongoDB. Runs on top of the Apache YARN resource manager.

21 ANALYTICS CORTANA

22 SQL Server Big Data Optimizations

23 SQL Server APS

24 SQL Server APS Growth Topology Scale Unit Base Unit Base Unit Extension

25 SQL Server Azure SQL DW

26 SQL Server Deployment options and hybrid solutions

27 SQL Server Connecting Islands of Data with PolyBase Select Result set Provides a single T-SQL query model for PDW and Hadoop with rich features of T-SQL, including joins without ETL Microsoft Azure HDInsight Hortonworks for Windows and Linux Cloudera SQL Server Parallel Data Warehouse PolyBase Microsoft HDInsight Uses the power of MPP to enhance query execution performance Supports Windows Azure HDInsight to enable new hybrid cloud scenarios Provides the ability to query non- Microsoft Hadoop distributions, such as Hortonworks and Cloudera

28 USE CASE: SUPPLY CHAIN MANAGEMENT Use Case 3: Supply Chain Management

29 USE CASE: SMART GRID MANAGEMENT

30 USE CASES SMART GRID PREDICTIVE MAINTENANCE DEMAND FORECASTING GRID OPTIMIZATION THEFT PERVENTION DEMAND RESPONSE CUSTOMER PROFILING

31

32 USE CASES SMART GRID

33 USE CASE: REAL TIME TRAFFIC ANALYSIS

34 REAL TIME TRAFFIC ANALYSIS

35 USE CASES STREAM ANALYTICS Real-time fraud detection Connected cars Click-stream analysis Real-time financial portfolio alerts Smart grid, energy management CRM alerting sales to customer case Data and identity protection services Real-time sales tracking

36 ML PROBLEMS SOLVED BY AZURE ML Classification Regression Recommenders Anomaly Detection Clustering

37 INDUSTRY USE CASES MACHINE LEARNING

38 THE MACHINE LEARNING WORKFLOW Input data Data transformation Define model Split data Train model Score (prediction) Evaluate Model

39 AZURE DATA FACTORY

40 HD INSIGHT

41 BIG DATA & Advanced Analytics Roadshow Questions? Orion Gebremedhin

Analytics Platform System

Analytics Platform System Analytics Platform System Big data. Small data. All data. Audie Wright, DW & Big Data Specialist Audie.Wright@Microsoft.com Ofc 425-538-0044, Cell 303-324-2860 Sean Mikha, DW & Big Data Architect semikha@microsoft.com

More information

BIG DATA AND HADOOP DEVELOPER

BIG DATA AND HADOOP DEVELOPER BIG DATA AND HADOOP DEVELOPER Approximate Duration - 60 Hrs Classes + 30 hrs Lab work + 20 hrs Assessment = 110 Hrs + 50 hrs Project Total duration of course = 160 hrs Lesson 00 - Course Introduction 0.1

More information

ADVANCED ANALYTICS & IOT ARCHITECTURES

ADVANCED ANALYTICS & IOT ARCHITECTURES ADVANCED ANALYTICS & IOT ARCHITECTURES Presented by: Orion Gebremedhin Director of Technology, Data & Analytics Marc Lobree National Architect, Advanced Analytics EDW THE RIGHT TOOL FOR THE RIGHT WORKLOAD

More information

20775A: Performing Data Engineering on Microsoft HD Insight

20775A: Performing Data Engineering on Microsoft HD Insight 20775A: Performing Data Engineering on Microsoft HD Insight Duration: 5 days; Instructor-led Implement Spark Streaming Using the DStream API. Develop Big Data Real-Time Processing Solutions with Apache

More information

Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect

Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect 2005 Concert de Coldplay 2014 Concert de Coldplay 90% of the world s data has been created over the last two years alone 1 1. Source

More information

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation Roger Ding Cloudera February 3rd, 2018 1 Agenda Hadoop History Introduction to Apache Hadoop

More information

ABOUT THIS TRAINING: This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA.

ABOUT THIS TRAINING: This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA. ABOUT THIS TRAINING: The world of Hadoop and Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. This comprehensive training has been designed

More information

Modernizing Your Data Warehouse with Azure

Modernizing Your Data Warehouse with Azure Modernizing Your Data Warehouse with Azure Big data. Small data. All data. Christian Coté S P O N S O R S The traditional BI Environment The traditional data warehouse data warehousing has reached the

More information

20775: Performing Data Engineering on Microsoft HD Insight

20775: Performing Data Engineering on Microsoft HD Insight Let s Reach For Excellence! TAN DUC INFORMATION TECHNOLOGY SCHOOL JSC Address: 103 Pasteur, Dist.1, HCMC Tel: 08 38245819; 38239761 Email: traincert@tdt-tanduc.com Website: www.tdt-tanduc.com; www.tanducits.com

More information

Big data is hard. Top 3 Challenges To Adopting Big Data

Big data is hard. Top 3 Challenges To Adopting Big Data Big data is hard Top 3 Challenges To Adopting Big Data Traditionally, analytics have been over pre-defined structures Data characteristics: Sales Questions answered with BI and visualizations: Customer

More information

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight. Course Content Course Description: The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight. At Course Completion: After competing this course,

More information

ARCHITECTURES ADVANCED ANALYTICS & IOT. Presented by: Orion Gebremedhin. Marc Lobree. Director of Technology, Data & Analytics

ARCHITECTURES ADVANCED ANALYTICS & IOT. Presented by: Orion Gebremedhin. Marc Lobree. Director of Technology, Data & Analytics ADVANCED ANALYTICS & IOT ARCHITECTURES Presented by: Orion Gebremedhin Director of Technology, Data & Analytics Marc Lobree National Architect, Advanced Analytics EDW THE RIGHT TOOL FOR THE RIGHT WORKLOAD

More information

20775A: Performing Data Engineering on Microsoft HD Insight

20775A: Performing Data Engineering on Microsoft HD Insight 20775A: Performing Data Engineering on Microsoft HD Insight Course Details Course Code: Duration: Notes: 20775A 5 days This course syllabus should be used to determine whether the course is appropriate

More information

Intro to Big Data and Hadoop

Intro to Big Data and Hadoop Intro to Big and Hadoop Portions copyright 2001 SAS Institute Inc., Cary, NC, USA. All Rights Reserved. Reproduced with permission of SAS Institute Inc., Cary, NC, USA. SAS Institute Inc. makes no warranties

More information

20775 Performing Data Engineering on Microsoft HD Insight

20775 Performing Data Engineering on Microsoft HD Insight Duración del curso: 5 Días Acerca de este curso The main purpose of the course is to give students the ability plan and implement big data workflows on HD. Perfil de público The primary audience for this

More information

5th Annual. Cloudera, Inc. All rights reserved.

5th Annual. Cloudera, Inc. All rights reserved. 5th Annual 1 The Essentials of Apache Hadoop The What, Why and How to Meet Agency Objectives Sarah Sproehnle, Vice President, Customer Success 2 Introduction 3 What is Apache Hadoop? Hadoop is a software

More information

BIG DATA & ADVANCED ANALYTICS ROADSHOW

BIG DATA & ADVANCED ANALYTICS ROADSHOW BIG DATA & ADVANCED ANALYTICS ROADSHOW 2 Copyright 2014, Neudesic. All rights reserved. CO-SPONSORS UPCOMING ROADSHOW STOPS Los Angeles: Wednesday, February 10 th Orange County: Thursday, February 11 th

More information

Insights to HDInsight

Insights to HDInsight Insights to HDInsight Why Hadoop in the Cloud? No hardware costs Unlimited Scale Pay for What You Need Deployed in minutes Azure HDInsight Big Data made easy Enterprise Ready Easier and more productive

More information

HDInsight - Hadoop for the Commoner Matt Stenzel Data Platform Technical Specialist

HDInsight - Hadoop for the Commoner Matt Stenzel Data Platform Technical Specialist HDInsight - Hadoop for the Commoner 10-1-2016 Matt Stenzel Data Platform Technical Specialist SQL Saturday #557 Thank you Sponsors! Please visit the sponsors and enter their end-of-day raffles. Event After

More information

Spark and Hadoop Perfect Together

Spark and Hadoop Perfect Together Spark and Hadoop Perfect Together Arun Murthy Hortonworks Co-Founder @acmurthy Data Operating System Enable all data and applications TO BE accessible and shared BY any end-users Data Operating System

More information

Databricks Cloud. A Primer

Databricks Cloud. A Primer Databricks Cloud A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to

More information

EXAMPLE SOLUTIONS Hadoop in Azure HBase as a columnar NoSQL transactional database running on Azure Blobs Storm as a streaming service for near real time processing Hadoop 2.4 support for 100x query gains

More information

Transforming Analytics with Cloudera Data Science WorkBench

Transforming Analytics with Cloudera Data Science WorkBench Transforming Analytics with Cloudera Data Science WorkBench Process data, develop and serve predictive models. 1 Age of Machine Learning Data volume NO Machine Learning Machine Learning 1950s 1960s 1970s

More information

AZURE HDINSIGHT. Azure Machine Learning Track Marek Chmel

AZURE HDINSIGHT. Azure Machine Learning Track Marek Chmel AZURE HDINSIGHT Azure Machine Learning Track Marek Chmel SESSION AGENDA Understanding different scenarios of Hadoop Building an end to end pipeline using HDInsight Using in-memory techniques to analyze

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

Why Big Data Matters? Speaker: Paras Doshi

Why Big Data Matters? Speaker: Paras Doshi Why Big Data Matters? Speaker: Paras Doshi If you re wondering about what is Big Data and why does it matter to you and your organization, then come to this talk and get introduced to Big Data and learn

More information

Alexander Klein. ETL meets Azure

Alexander Klein. ETL meets Azure Alexander Klein ETL meets Azure Thanks to our sponsors: Who am I? Independent BI Consultant > 15 years experience of SQL Server Focus on Microsoft BI Stack & AI & Azure a.klein@consulting-bi.de @SQL_Alex

More information

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK Are you drowning in Big Data? Do you lack access to your data? Are you having a hard time managing Big Data processing requirements?

More information

Big Data Introduction

Big Data Introduction Big Data Introduction Who we are Experts At Your Service Over 50 specialists in IT infrastructure Certified, experienced, passionate Based In Switzerland 100% self-financed Swiss company Over CHF8 mio.

More information

Berkeley Data Analytics Stack (BDAS) Overview

Berkeley Data Analytics Stack (BDAS) Overview Berkeley Analytics Stack (BDAS) Overview Ion Stoica UC Berkeley UC BERKELEY What is Big used For? Reports, e.g., - Track business processes, transactions Diagnosis, e.g., - Why is user engagement dropping?

More information

IBM Analytics Unleash the power of data with Apache Spark

IBM Analytics Unleash the power of data with Apache Spark IBM Analytics Unleash the power of data with Apache Spark Agility, speed and simplicity define the analytics operating system of the future 1 2 3 4 Use Spark to create value from data-driven insights Lower

More information

Apache Spark and R A (big data) love story?

Apache Spark and R A (big data) love story? Apache Spark and R A (big data) love story? Mark Sellors - Technical Architect @ Mango Solutions About me. Technical Architect Design and deploy analytic computing environments Not really an R user but

More information

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved. Apache Spark 2.0 GA The General Engine for Modern Analytic Use Cases 1 Apache Spark Drives Business Innovation Apache Spark is driving new business value that is being harnessed by technology forward organizations.

More information

Bringing the Power of SAS to Hadoop Title

Bringing the Power of SAS to Hadoop Title WHITE PAPER Bringing the Power of SAS to Hadoop Title Combine SAS World-Class Analytics With Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities ii Contents Introduction... 1 What

More information

Business is being transformed by three trends

Business is being transformed by three trends Business is being transformed by three trends Big Cloud Intelligence Stay ahead of the curve with Cortana Intelligence Suite Business apps People Custom apps Apps Sensors and devices Cortana Intelligence

More information

How In-Memory Computing can Maximize the Performance of Modern Payments

How In-Memory Computing can Maximize the Performance of Modern Payments How In-Memory Computing can Maximize the Performance of Modern Payments 2018 The mobile payments market is expected to grow to over a trillion dollars by 2019 How can in-memory computing maximize the performance

More information

Azure Data Analytics & Machine Learning Seminar. Daire Cunningham: BI Practice Area Manager

Azure Data Analytics & Machine Learning Seminar. Daire Cunningham: BI Practice Area Manager Azure Data Analytics & Machine Learning Seminar Daire Cunningham: BI Practice Area Manager AGENDA 09:00 AM 09:30 AM Registration & Refreshments 09.30AM 10:00 AM 10:00 AM 10:30 AM Welcome & Keynote, Ger

More information

Big Data Application Engineer/ Developer. Specialization in Apache Spark, Kafka, Airflow, HBase

Big Data Application Engineer/ Developer. Specialization in Apache Spark, Kafka, Airflow, HBase BIG DATA COURSE Big Data Application Engineer/ Developer Specialization in Apache Spark, Kafka, Airflow, HBase In Exclusive Association with 21,347+ Participants 10,000+ Brands 1200+ Trainings 45+ Countries

More information

INTRODUCTION TO R FOR DATA SCIENCE WITH R FOR DATA SCIENCE DATA SCIENCE ESSENTIALS INTRODUCTION TO PYTHON FOR DATA SCIENCE. Azure Machine Learning

INTRODUCTION TO R FOR DATA SCIENCE WITH R FOR DATA SCIENCE DATA SCIENCE ESSENTIALS INTRODUCTION TO PYTHON FOR DATA SCIENCE. Azure Machine Learning Data Science Track WITH EXCEL INTRODUCTION TO R FOR DATA SCIENCE PROGRAMMING WITH R FOR DATA SCIENCE APPLIED MACHINE LEARNING SCENARIOS HDInsight Certificate of DATA SCIENCE ORIENTATION QUERYING DATA WITH

More information

Microsoft Big Data. Solution Brief

Microsoft Big Data. Solution Brief Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,

More information

Cloudera, Inc. All rights reserved.

Cloudera, Inc. All rights reserved. 1 Data Analytics 2018 CDSW Teamplay und Governance in der Data Science Entwicklung Thomas Friebel Partner Sales Engineer tfriebel@cloudera.com 2 We believe data can make what is impossible today, possible

More information

Session 30 Powerful Ways to Use Hadoop in your Healthcare Big Data Strategy

Session 30 Powerful Ways to Use Hadoop in your Healthcare Big Data Strategy Session 30 Powerful Ways to Use Hadoop in your Healthcare Big Data Strategy Bryan Hinton Senior Vice President, Platform Engineering Health Catalyst Sean Stohl Senior Vice President, Product Development

More information

Jason Virtue Business Intelligence Technical Professional

Jason Virtue Business Intelligence Technical Professional Jason Virtue Business Intelligence Technical Professional jvirtue@microsoft.com Agenda Microsoft Azure Data Services Azure Cloud Services Azure Machine Learning Azure Service Bus Azure Stream Analytics

More information

MapR: Solution for Customer Production Success

MapR: Solution for Customer Production Success 2015 MapR Technologies 2015 MapR Technologies 1 MapR: Solution for Customer Production Success Big Data High Growth 700+ Customers Cloud Leaders Riding the Wave with Hadoop The Big Data Platform of Choice

More information

Preface About the Book

Preface About the Book Preface About the Book We are living in the dawn of what has been termed as the "Fourth Industrial Revolution" by the World Economic Forum (WEF) in 2016. The Fourth Industrial Revolution is marked through

More information

Outline of Hadoop. Background, Core Services, and Components. David Schwab Synchronic Analytics Nov.

Outline of Hadoop. Background, Core Services, and Components. David Schwab Synchronic Analytics   Nov. Outline of Hadoop Background, Core Services, and Components David Schwab Synchronic Analytics https://synchronicanalytics.com Nov. 1, 2018 Hadoop s Purpose and Origin Hadoop s Architecture Minimum Configuration

More information

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

E-guide Hadoop Big Data Platforms Buyer s Guide part 1 Hadoop Big Data Platforms Buyer s Guide part 1 Your expert guide to Hadoop big data platforms for managing big data David Loshin, Knowledge Integrity Inc. Companies of all sizes can use Hadoop, as vendors

More information

Common Customer Use Cases in FSI

Common Customer Use Cases in FSI Common Customer Use Cases in FSI 1 Marketing Optimization 2014 2014 MapR MapR Technologies Technologies 2 Fortune 100 Financial Services Company 104M CARD MEMBERS 3 Financial Services: Recommendation Engine

More information

SAP Predictive Analytics Suite

SAP Predictive Analytics Suite SAP Predictive Analytics Suite Tania Pérez Asensio Where is the Evolution of Business Analytics Heading? Organizations Are Maturing Their Approaches to Solving Business Problems Reactive Wait until a problem

More information

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica Accelerating Your Big Data Analytics Jeff Healey, Director Product Marketing, HPE Vertica Recent Waves of Disruption IT Infrastructu re for Analytics Data Warehouse Modernization Big Data/ Hadoop Cloud

More information

Analytics in Action transforming the way we use and consume information

Analytics in Action transforming the way we use and consume information Analytics in Action transforming the way we use and consume information Big Data Ecosystem The Data Traditional Data BIG DATA Repositories MPP Appliances Internet Hadoop Data Streaming Big Data Ecosystem

More information

Cloud Based Analytics for SAP

Cloud Based Analytics for SAP Cloud Based Analytics for SAP Gary Patterson, Global Lead for Big Data About Virtustream A Dell Technologies Business 2,300+ employees 20+ data centers Major operations in 10 countries One of the fastest

More information

Microsoft Developer Day

Microsoft Developer Day Microsoft Developer Day Dr Graham Williams Microsoft Developer Day Director of Data Science, Pacific Asia, Data Group, Cloud and Enterprise Data Scientists Transform Data into Information Data Scientists

More information

Evolution or Revolution: Top Ten Development Trends

Evolution or Revolution: Top Ten Development Trends Evolution or Revolution: Top Ten Development Trends Jim Lundy CEO and Lead Analyst IT Development Trends: Building a Fighter Jet Agenda What are the Top Ten Trends in Development? What are the Best Practices

More information

Investor Presentation. Second Quarter 2016

Investor Presentation. Second Quarter 2016 Investor Presentation Second Quarter 2016 Note to Investors Certain non-gaap financial information regarding operating results may be discussed during this presentation. Reconciliations of the differences

More information

Big Data & Hadoop Advance

Big Data & Hadoop Advance Course Durations: 30 Hours About Company: Course Mode: Online/Offline EduNextgen extended arm of Product Innovation Academy is a growing entity in education and career transformation, specializing in today

More information

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11 Top 5 Challenges for Hadoop MapReduce in the Enterprise Whitepaper - May 2011 http://platform.com/mapreduce 2 5/9/11 Table of Contents Introduction... 2 Current Market Conditions and Drivers. Customer

More information

red red red red red red red red red red red red red red red red red red red red CYS Rithu P Ravi CYS Saumya K

red red red red red red red red red red red red red red red red red red red red CYS Rithu P Ravi CYS Saumya K red red red red red red red red red red red red red red red red red red red red CYS14011 - Rithu P Ravi CYS14012 - Saumya K Why and What HADOOP?... Apache Hadoop is an open-source software framework A

More information

Data Lake Organization A Hadoop Eco-System. Jan Cordtz, Microsoft Denmark Cloud Solution Architect

Data Lake Organization A Hadoop Eco-System. Jan Cordtz, Microsoft Denmark Cloud Solution Architect Data Lake Organization A Hadoop Eco-System Jan Cordtz, Microsoft Denmark jcordtz@microsoft.com Cloud Solution Architect Hyper scale Infrastructure 100+ Datacenters across 42 Regions Worldwide Learn more:

More information

HP SummerSchool TechTalks Kenneth Donau Presale Technical Consulting, HP SW

HP SummerSchool TechTalks Kenneth Donau Presale Technical Consulting, HP SW HP SummerSchool TechTalks 2013 Kenneth Donau Presale Technical Consulting, HP SW Copyright Copyright 2013 2013 Hewlett-Packard Development Development Company, Company, L.P. The L.P. information The information

More information

Got Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes

Got Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes Got Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes Contents Introduction...3 Hadoop s humble beginnings...4 The benefits of Hadoop...5

More information

New Big Data Solutions and Opportunities for DB Workloads

New Big Data Solutions and Opportunities for DB Workloads New Big Data Solutions and Opportunities for DB Workloads Hadoop and Spark Ecosystem for Data Analytics, Experience and Outlook Luca Canali, IT-DB Hadoop and Spark Service WLCG, GDB meeting CERN, September

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 1, 2017 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2457

More information

Azure ML Data Camp. Ivan Kosyakov MTC Architect, Ph.D. Microsoft Technology Centers Microsoft Technology Centers. Experience the Microsoft Cloud

Azure ML Data Camp. Ivan Kosyakov MTC Architect, Ph.D. Microsoft Technology Centers Microsoft Technology Centers. Experience the Microsoft Cloud Microsoft Technology Centers Microsoft Technology Centers Experience the Microsoft Cloud Experience the Microsoft Cloud ML Data Camp Ivan Kosyakov MTC Architect, Ph.D. Top Manager IT Analyst Big Data Strategic

More information

A World of Data. Raghu Ramakrishnan. CTO for Data, Technical Fellow Microsoft

A World of Data. Raghu Ramakrishnan. CTO for Data, Technical Fellow Microsoft A World of Data Raghu Ramakrishnan CTO for Data, Technical Fellow Microsoft Content Optimization Agrawal et al., CACM 56(6):92-101 (2013) Content Recommendation on Web Portals Key Features Package Ranker

More information

Enterprise Database Systems for Big Data, Big Data Processing, and Big Data Analytics. Sunnie Chung Cleveland State University 1

Enterprise Database Systems for Big Data, Big Data Processing, and Big Data Analytics. Sunnie Chung Cleveland State University 1 Enterprise Database Systems for Big Data, Big Data Processing, and Big Data Analytics Sunnie Chung Cleveland State University 1 Big Data: 3V s Sunnie Chung Cleveland State University 2 Volume (Scale) Data

More information

Realising Value from Data

Realising Value from Data Realising Value from Data Togetherwith Open Source Drives Innovation & Adoption in Big Data BCS Open Source SIG London 1 May 2013 Timings 6:00-6:30pm. Register / Refreshments 6:30-8:00pm, Presentation

More information

Pre-Requisites A good understanding of Azure data services A basic knowledge of the Microsoft Windows operating system and its core functionality

Pre-Requisites A good understanding of Azure data services A basic knowledge of the Microsoft Windows operating system and its core functionality [MS20776]: Performing Big Data Engineering on Microsoft Cloud Services Length : 5 days Audience(s) : Data Professionals Level : 300 Technology : SQL Server Delivery Method : Instructor-led (Classroom)

More information

Redefine Big Data: EMC Data Lake in Action. Andrea Prosperi Systems Engineer

Redefine Big Data: EMC Data Lake in Action. Andrea Prosperi Systems Engineer Redefine Big Data: EMC Data Lake in Action Andrea Prosperi Systems Engineer 1 Agenda Data Analytics Today Big data Hadoop & HDFS Different types of analytics Data lakes EMC Solutions for Data Lakes 2 The

More information

Big Data with Azure: where to begin?

Big Data with Azure: where to begin? Big Data with Azure: where to begin? Concepts and best practices October 15 th 2016 Sofia Satya SK Jayanty Principal Architect & Managing Consultant consulting@dbia.uk Sponsors Gold sponsors: Silver sponsors:

More information

Nouvelle Génération de l infrastructure Data Warehouse et d Analyses

Nouvelle Génération de l infrastructure Data Warehouse et d Analyses Nouvelle Génération de l infrastructure Data Warehouse et d Analyses November 2011 André Münger andre.muenger@emc.com +41 79 708 85 99 1 Agenda BIG Data Challenges Greenplum Overview Use Cases Summary

More information

Introduction to Stream Processing

Introduction to Stream Processing Introduction to Processing Guido Schmutz DOAG Big Data 2018 20.9.2018 @gschmutz BASEL BERN BRUGG DÜSSELDORF HAMBURG KOPENHAGEN LAUSANNE guidoschmutz.wordpress.com FRANKFURT A.M. FREIBURG I.BR. GENF MÜNCHEN

More information

By: Shrikant Gawande (Cloudera Certified )

By: Shrikant Gawande (Cloudera Certified ) By: Shrikant Gawande (Cloudera Certified ) What is Big Data? For every 30 mins, a airline jet collects 10 terabytes of sensor data (flying time) NYSE generates about one terabyte of new trade data per

More information

Designing Business Intelligence Solutions with Microsoft SQL Server 2014

Designing Business Intelligence Solutions with Microsoft SQL Server 2014 Designing Business Intelligence Solutions with Microsoft SQL Server 2014 20467D; 5 Days, Instructor-led Course Description This five-day instructor-led course teaches students how to implement self-service

More information

Microsoft Azure Essentials

Microsoft Azure Essentials Microsoft Azure Essentials Azure Essentials Track Summary Data Analytics Explore the Data Analytics services in Azure to help you analyze both structured and unstructured data. Azure can help with large,

More information

Enterprise Analytics Accelerating Your Path to Value with an Open Analytics Platform

Enterprise Analytics Accelerating Your Path to Value with an Open Analytics Platform Enterprise Analytics Accelerating Your Path to Value with an Open Analytics Platform Federico Pozzi @fedealbpozzi Mathias Coopmans @macoopma Characteristics of a badly managed platform No clear data

More information

BIG DATA and DATA SCIENCE

BIG DATA and DATA SCIENCE Integrated Program In BIG DATA and DATA SCIENCE CONTINUING STUDIES Table of Contents About the Course...03 Key Features of Integrated Program in Big Data and Data Science...04 Learning Path...05 Key Learning

More information

Digital Transformation 2.0

Digital Transformation 2.0 Digital Transformation 2.0 Job roles and skills that every IT Services company must know We have been hearing for quite some time, that the world is going through digital transformation & HR department

More information

Designing Business Intelligence Solutions with Microsoft SQL Server 2014 Course Code: 20467D

Designing Business Intelligence Solutions with Microsoft SQL Server 2014 Course Code: 20467D Designing Business Intelligence Solutions with Microsoft SQL Server 2014 Course Code: 20467D Duration: 5 Days Overview About this course This five-day instructor-led course teaches students how to implement

More information

What s New. Bernd Wiswedel KNIME KNIME AG. All Rights Reserved.

What s New. Bernd Wiswedel KNIME KNIME AG. All Rights Reserved. What s New Bernd Wiswedel KNIME 2018 KNIME AG. All Rights Reserved. What this session is about Presenting (and demo ing) enhancements added in the last year By the team Questions? See us at the booth.

More information

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop Simplifying the Process of Uploading and Extracting Data from Apache Hadoop Rohit Bakhshi, Solution Architect, Hortonworks Jim Walker, Director Product Marketing, Talend Page 1 About Us Rohit Bakhshi Solution

More information

The Sysprog s Guide to the Customer Facing Mainframe: Cloud / Mobile / Social / Big Data

The Sysprog s Guide to the Customer Facing Mainframe: Cloud / Mobile / Social / Big Data Glenn Anderson, IBM Lab Services and Training The Sysprog s Guide to the Customer Facing Mainframe: Cloud / Mobile / Social / Big Data Summer SHARE August 2015 Session 17794 2 (c) Copyright 2015 IBM Corporation

More information

COPYRIGHTED MATERIAL. 1Big Data and the Hadoop Ecosystem

COPYRIGHTED MATERIAL. 1Big Data and the Hadoop Ecosystem 1Big Data and the Hadoop Ecosystem WHAT S IN THIS CHAPTER? Understanding the challenges of Big Data Getting to know the Hadoop ecosystem Getting familiar with Hadoop distributions Using Hadoop-based enterprise

More information

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia MapR: Converged Data Pla3orm and Quick Start Solu;ons Robin Fong Regional Director South East Asia Who is MapR? MapR is the creator of the top ranked Hadoop NoSQL SQL-on-Hadoop Real Database time streaming

More information

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB Data Analytics and CERN IT Hadoop Service CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB 1 Data Analytics at Scale The Challenge When you cannot fit your workload in a desktop Data

More information

Apache Hadoop in the Datacenter and Cloud

Apache Hadoop in the Datacenter and Cloud Apache Hadoop in the Datacenter and Cloud The Shift to the Connected Data Architecture Digital Transformation fueled by Big Data Analytics and IoT ACTIONABLE INTELLIGENCE Cloud and Data Center IDMS Relational

More information

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1 Spotlight Sessions Nik Rouda Director of Product Marketing Cloudera @nrouda Cloudera, Inc. All rights reserved. 1 Spotlight: Protecting Your Data Nik Rouda Product Marketing Cloudera, Inc. All rights reserved.

More information

Advanced Analytics in Azure

Advanced Analytics in Azure Explore What s Possible. Advanced Analytics in Azure Amie Mason, Practice Lead Data Science & Analytics amiem@attunix.com The Attunix Difference business technology Attunix delivers results at the intersection

More information

Making Analytics Viable in Enterprises: Potential routes for Industry 4.0

Making Analytics Viable in Enterprises: Potential routes for Industry 4.0 Making Analytics Viable in Enterprises: Potential routes for Industry 4.0 Jorge Sanz Anusha Choori Business Analytics Center National University of Singapore Agenda and Goals Business Analytics as an enabler

More information

Investor Presentation. Fourth Quarter 2015

Investor Presentation. Fourth Quarter 2015 Investor Presentation Fourth Quarter 2015 Note to Investors Certain non-gaap financial information regarding operating results may be discussed during this presentation. Reconciliations of the differences

More information

2013 PARTNER CONNECT

2013 PARTNER CONNECT 2013 PARTNER CONNECT Cloud OS: Data Insights Plays Driving the Modern Data Driving Broad BI Adoption Warehouse Market is here Chasm to Mainstream Innovators Early Adopters Early Majority Late Majority

More information

Taking Advantage of Cloud Elasticity and Flexibility

Taking Advantage of Cloud Elasticity and Flexibility Taking Advantage of Cloud Elasticity and Flexibility Fred Koopmans Sr. Director of Product Management 1 Public cloud adoption is surging 2 Cloudera customers are leading the way 3 Hadoop was born for the

More information

Advanced Analytics With Spark Patterns For Learning From Data At Scale

Advanced Analytics With Spark Patterns For Learning From Data At Scale Advanced Analytics With Spark Patterns For Learning From Data At Scale We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing it

More information

Course 20467C: Designing Self-Service Business Intelligence and Big Data Solutions

Course 20467C: Designing Self-Service Business Intelligence and Big Data Solutions Course 20467C: Designing Self-Service Business Intelligence and Big Data Solutions Length : 5 Days Audience(s) : IT Professionals Level : 300 Technology : Microsoft SQL Server 2014 Delivery Method : Instructor-led

More information

Cask Data Application Platform (CDAP) The Integrated Platform for Developers and Organizations to Build, Deploy, and Manage Data Applications

Cask Data Application Platform (CDAP) The Integrated Platform for Developers and Organizations to Build, Deploy, and Manage Data Applications Cask Data Application Platform (CDAP) The Integrated Platform for Developers and Organizations to Build, Deploy, and Manage Data Applications Copyright 2015 Cask Data, Inc. All Rights Reserved. February

More information

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN From Information to Insight: The Big Value of Big Data Faire Ann Co Marketing Manager, Information Management Software, ASEAN The World is Changing and Becoming More INSTRUMENTED INTERCONNECTED INTELLIGENT

More information

R and Hadoop. Ram Venkat Dawn Analytics

R and Hadoop. Ram Venkat Dawn Analytics R and Hadoop Ram Venkat Dawn Analytics What is Hadoop? Hadoop is an open source Apache software for running distributed applications on 'big data' It contains a distributed file system (HDFS) and a parallel

More information

Achieving Agility and Flexibility in Big Data Analytics with the Urika -GX Agile Analytics Platform

Achieving Agility and Flexibility in Big Data Analytics with the Urika -GX Agile Analytics Platform Achieving Agility and Flexibility in Big Data Analytics with the Urika -GX Agile Analytics Platform Analytics R&D and Product Management Document Version 1 WP-Urika-GX-Big-Data-Analytics-0217 www.cray.com

More information

Cask Data Application Platform (CDAP)

Cask Data Application Platform (CDAP) Cask Data Application Platform (CDAP) CDAP is an open source, Apache 2.0 licensed, distributed, application framework for delivering Hadoop solutions. It integrates and abstracts the underlying Hadoop

More information