Hadoop Roadmap 2012 A Hortonworks perspective

Size: px
Start display at page:

Download "Hadoop Roadmap 2012 A Hortonworks perspective"

Transcription

1 Hadoop Roadmap 2012 A Hortonworks perspective Eric Baldeschwieler CTO February 2012 Page 1

2 About Eric Baldeschwieler Co-Founder and CTO of Hortonworks Prior to Hortonworks: VP Hadoop Software Engineering for Yahoo! Technology leader for Inktomi s web service engine (acquired by Yahoo! in 2003) Developed software for video games, video post production systems and 3D modeling systems Master s degree in Computer Science University of California, Berkeley Bachelor s degree in Mathematics and Computer Science Carnegie Mellon Universityeric14. Follow me on Architecting the Future of Big Data Page 2

3 Agenda Hortonworks Data Platform Community roadmap process Hortonworks development process & support model 2012 roadmap highlights Observed Trends and Anticipated Investment Areas Architecting the Future of Big Data Page 3

4 Apache Hadoop & Hortonworks Yahoo! embraced Apache Hadoop, an open source platform, to crunch epic amounts of data using an army of dirt-cheap servers 2006 Hadoop at Yahoo! 40K+ Servers 170PB Storage 5M+ Monthly Jobs Active Users 2011 Yahoo! spun off 22+ engineers into Hortonworks, a company focused on enabling Apache Hadoop to be next-generation data platform Page 4

5 Balancing Innovation & Stability Apache: Be aggressive - ship early and often Projects need to keep innovating and visibly improve Aim for big improvements Make early buggy releases Hortonworks: Be predictable - ship when stable We need to ship stable, working releases Make packaged binary releases available We need to do regular sustaining engineering releases HDP quarterly release trains sweep in stable Apache projects Enables HDP to stay reasonably current and predictable while minimizing risk of thrashing that coordinating large # of Apache projects can cause Page 5

6 The Hortonworks Development Process Apache Development process Hortonworks Development. The Hortonworks Data Platform is 100% open source We plan our engineering in the open We take feedback and use it to refine our plans We develop enhancements to Hadoop in collaboration with the Apache community, via the Apache processes & infrastructure Others contribute their own improvements and refine our work We iterate and we decide as a community when a release is ready Architecting the Future of Big Data Page 6

7 This presentation is part of that process Apache Hadoop and related projects are owned by the Apache foundation and worked on by a host of volunteers This presentation outlines what Hortonworks engineers currently plan to contribute to Apache projects We share our plans regularly with Apache contributors & the wider Apache Hadoop user community our business partners & customers We listen and we refine the plan Others share their plans & help us refine our thinking Our partners express their needs & requirements The plan evolves Architecting the Future of Big Data Page 7

8 Hortonworks Data Platform (HDP) Fully Supported Integrated Platform Challenge: Integrate, manage, and support changes across a wide range of open source projects that power the Hadoop platform; each with their own release schedules, versions, & dependencies. Time intensive, Complex, Expensive Hadoop Core Pig Zookeeper Hive HCatalog HBase Solution: Hortonworks Data Platform Integrated, certified platform distributions 100% Open Source Extensive Q/A process Industry-leading Support with clear service levels for updates and patches Continuity via multi-year Support and Maintenance Policy = New Version Page 8

9 Support & Distribution Model Hortonworks Data Platform Fully supported, integrated, tested, maintained 100% Apache license, or compatible: BSD, MIT/X11, NCSA, W3C Software license, X.Net Universe: Open Source Ecosystem Validated & interoperable with HDP Technical guidance support; work with OSS projects 100% OSI-compliant licenses Optionally installed Multiverse: Commercial Ecosystem Validated & interoperable with HDP Technical guidance support; work with TSANet 3 rd -party vendor licenses and support options Optionally installed Model and terminology conceptually similar to Ubuntu s model: Page 9

10 Hortonworks Data Platform (HDP) Key Components of Standard Hadoop Open Source Stack Hortonworks Data Platform Universe Zookeeper (Cluster Coordination) HBase (Columnar NoSQL Store) Pig (Data Flow) Hive (SQL) MapReduce (Distributed Programing Framework) HCatalog (Table & Schema Management) Ambari & Ganglia and Nagios Oozie Workflow scheduling Sqoop & Other Ingest, ETL tools HDFS (Hadoop Distributed File System) Mahout & Other libraries Page 10

11 Hadoop Now, Next, and Beyond Apache community, including Hortonworks investing to improve Hadoop: Make Hadoop an Open, Extensible, and Enterprise Viable Platform Enable More Applications to Run on Apache Hadoop Hadoop.Now (Hadoop 1.0) HDP1 Most stable Hadoop ever HBase, security, WebHDFS Hadoop.Next (Hadoop 0.23) HDP2 HA, Next-gen MapReduce Extension & Integration APIs HCatalog data APIs Hadoop.Beyond Future investments Page 11

12 Hortonworks Data Platform Timeline Q1 Q2 Q3 Q4 1.0 preview 1.0 preview 1.0 preview preview Hortonworks Data Platform preview Hortonworks Data Platform 2 36 Month support policy, from GA date Page 12

13 Hortonworks Data Platform 1 Consumable Hadoop.Now Platform Based on Hadoop 1.0 (a.k.a ) The most stable release of Hadoop, ever! Differentiators: Code straight from Apache Apache release process restarted after hiatus since 2010! First Apache line supporting Security, HBase, WebHDFS and many stability fixes Common table and schema management via HCatalog (M/R, Pig and Hive) Capacity scheduler (very stable, high RAM job support, multi-tenant protections) Components: Standard Hadoop stack: HDFS+M/R, Hive, Pig, HBase, HCatalog, ZooKeeper Universe items: Sqoop, Oozie, Mahout, Packaging (.tar, RPM, DEB, single-box VM, single-box AMI, ) Monitoring via Nagios and Ganglia Page 13

14 Continuing investment in 1.0 line Quarterly updates will include bug fixes and enhancements to all components, from Apache Hortonworks plans to invest substantially in: Ambari and other management components The Hadoop community has traditionally not provided good open source tooling in this area, we are committed to closing this gap HCatalog When fully realized, we think HCatalog will have a huge impact Vastly simplifying the management of data on Hadoop Architecting the Future of Big Data Page 14

15 Management in HDP1.0 and beyond Now: Focus on Monitoring capability (most often used) Package Nagios & Ganglia The most common Open Source Hadoop Monitoring Tools Web dashboard for unified Hadoop-specific view System Health: NetSNMP Hadoop Metrics: Ganglia plug-in, JMX, SNMP MIB (to add) Alerting: Nagios Collect & Display: Nagios, Ganglia, RRD Next: Deployment and Management Provisioning & Configuration with Puppet (Chef + others to follow) VM and cloud packaging LDAP, Active directory integration User metering and management API/GUI Page 15

16 HCatalog 0.3 and beyond Common Table, Schema, Metadata Management In HDP 1.0 Enable Hive, Pig, and MR to use the same tables / metadata Generalizes Hive s Table/Metadata System Pig/Hive/MR use same HCatalog file storage methods Pig/Hive/MR use common code for their IO Manages Data Format and Schema Changes Allows columns to be appended to tables in new partitions Allows storage format changes CLI to create table w DDL, change default format, add columns, change data location, compact data, register data as a table Templeton APIs layer on top Longer-term: A simple uniform API that supports a rich set of data management tools transparent migration of data between systems & formats document models and other non-relational data gracefully Easy lookup, addition & modification of objects / records Page 16

17 Related Hortonworks Webinars HCatalog, Table Management for Hadoop Wednesday, February 22 10:00am Pacific/1:00pm Eastern Other Topics Coming Soon Importing Data Into Hadoop Monitoring and Managing Hadoop Clusters HBase and Hive Architecting the Future of Big Data Page 17

18 Hortonworks Data Platform 2 Consumable Hadoop.Next Platform Based on Hadoop 0.23.* Next generation of Hadoop First release of a set of features under development since 2010 Highlights: Next Generation MapReduce architecture Refactor to provide enhanced scalability and performance Decouple MapReduce from resource management architecture Enables MapReduce to evolve quickly Enables new application types (streaming, graph, MPI, bulk sync, etc) HDFS Federation Improved scalability and isolation Extension API that will allow new storage services to share HDFS storage HDFS NameNode High Availability Automatic failover Multiple-options for failover (shared disk, Linux HA, Zookeeper) Include latest stable standard Hadoop components Page 18

19 Related Hortonworks Webinars What s In Store For Hadoop.Next Hadoop HDFS High Availability: HA NameNode Other Topics Coming Soon Extending Hadoop beyond Map-Reduce Wednesday March 7th HDFS Federation Architecting the Future of Big Data Page 19

20 Observed Trends, Anticipated Investments Page 20

21 Trend: Agile Data, Hadoop as data hub The old way Operational systems keep only current records, short history Analytics systems keep only conformed / cleaned / digested data Unstructured data locked away in operational silos Archives offline Inflexible, new questions require system redesigns The new trend Keep all data in Hadoop for a long time (raw inputs & processed) Able to produce a new analytics view on-demand Keep a new copy of data that was previously only in silos Can immediately do new reports, experiments in Hadoop New products / services can be added very quickly Agile outcome justifies new infrastructure Page 21

22 Traditional Enterprise Data Architecture Data Silos Serving Applications Traditional Data Warehouses, BI & Analytics Web Serving NoSQL RDMS EDW Data Marts BI / Analytics Serving Logs Social Media Sensor Data Text Systems Unstructured Systems Page 22

23 Agile Data Architecture w/hadoop Connecting All of Your Big Data Serving Applications Traditional Data Warehouses, BI & Analytics Web Serving NoSQL RDMS EDW Data Marts BI / Analytics ETL, Archive all data, Custom Analytics Serving Logs Social Media Sensor Data Text Systems Unstructured Systems Page 23

24 Hadoop as data hub implies ETL integration / data ingest Hadoop should work well with industry standard tools [Importing data webinar] Offsite backups / DR HDFS Snapshots, cloud backup, other tools Object / event level storage APIs Open Data APIs Non-relational model data HCatalog (for all 3 above) [HCatalog webinar] Efficient / low cost storage Compression, Raid / reed solomon No storage limits No file limits, scale beyond 10,000 computers / cluster Architecting the Future of Big Data Page 24

25 Trend: Data-centric Applications Limited runtime logic driven by huge lookup tables Data flows from application into Hadoop for processing Flows from Hadoop back into application for serving Complex computations in Hadoop Machine learning, other expensive computation offline Personalization, classification, fraud, value analysis Application development requires data science Huge amounts of actually observed data key to modern services Hadoop used as the science platform Architecting the Future of Big Data Page 25

26 CASE STUDY YAHOO! HOMEPAGE Serving Maps Users Interests Five Minute Produc on Weekly Categoriza on models USER BEHAVIOR SERVING MAPS (every 5 minutes) SCIENCE HADOOP CLUSTER PRODUCTION HADOOP CLUSTER USER BEHAVIOR» Machine learning to build ever better categorization models CATEGORIZATION MODELS (weekly)» Identify user interests using Categorization models SERVING SYSTEMS ENGAGED USERS Build customized home pages with latest data (thousands / second) Copyright Yahoo

27 Data-centric Applications imply New programming frameworks [YARN webinar] In RAM models (MPI, Spark, Giraph ) Services in Hadoop (HBase, Stream processing, Serving ) Huge Lookup tables HDFS enhancements for HBase HBase improvements, integrations with other distributed stores Predictable latency Continuous availability Core investments [HA webinar] Data Driven applications in Hadoop Investments in Oozie, integration with HCatalog data model 27

28 Trend: Hadoop goes to the Enterprise Hadoop used in production processes Hadoop used in a multi-tenant environments Hadoop used in audited environments Hadoop integrated into many more environments Forwards and backwards compatibility a must Architecting the Future of Big Data Page 28

29 Hadoop goes to the Enterprise implies Authentication, Authorization & Security Encryption, Active directory integration simplified security deployment Open administration Open provisioning Open monitoring [Monitoring and managing webinar] Cloud support Amazon and Azure support, Whirr / JCloud VM tuning and deployment support Engineered for compatibility New protocol buffer based protocols Architecting the Future of Big Data Page 29

30 Hortonworks Focus and Value Hortonworks is focused on: Technology Leadership within Apache Hadoop Community Customer and Partner Enablement around Hadoop Platform What value do we provide? 100% Open Source Hortonworks Data Platform Expert Role-based Training Full Lifecycle Support and Services Next Webinar: HCatalog, Table Management for Hadoop Wednesday, February 22 10:00am Pacific/1:00pm Eastern Page 30

31 Thank You! Questions? Serving EDW, BI, ++ Serving Logs Social Media Sensor Data Text Systems Page 31

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop Simplifying the Process of Uploading and Extracting Data from Apache Hadoop Rohit Bakhshi, Solution Architect, Hortonworks Jim Walker, Director Product Marketing, Talend Page 1 About Us Rohit Bakhshi Solution

More information

ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS)

ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS) ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS) Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how Dell EMC Elastic Cloud Storage (ECS ) can be used to streamline

More information

Hadoop Course Content

Hadoop Course Content Hadoop Course Content Hadoop Course Content Hadoop Overview, Architecture Considerations, Infrastructure, Platforms and Automation Use case walkthrough ETL Log Analytics Real Time Analytics Hbase for Developers

More information

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

E-guide Hadoop Big Data Platforms Buyer s Guide part 1 Hadoop Big Data Platforms Buyer s Guide part 1 Your expert guide to Hadoop big data platforms for managing big data David Loshin, Knowledge Integrity Inc. Companies of all sizes can use Hadoop, as vendors

More information

AZURE HDINSIGHT. Azure Machine Learning Track Marek Chmel

AZURE HDINSIGHT. Azure Machine Learning Track Marek Chmel AZURE HDINSIGHT Azure Machine Learning Track Marek Chmel SESSION AGENDA Understanding different scenarios of Hadoop Building an end to end pipeline using HDInsight Using in-memory techniques to analyze

More information

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation Roger Ding Cloudera February 3rd, 2018 1 Agenda Hadoop History Introduction to Apache Hadoop

More information

Hortonworks Data Platform

Hortonworks Data Platform Hortonworks Data Platform An open-architecture platform to manage data in motion and at rest Highlights Addresses a range of data-at-rest use cases Powers real-time customer applications Delivers robust

More information

Hortonworks Connected Data Platforms

Hortonworks Connected Data Platforms Hortonworks Connected Data Platforms MASTER THE VALUE OF DATA EVERY BUSINESS IS A DATA BUSINESS EMBRACE AN OPEN APPROACH 2 Hortonworks Inc. 2011 2016. All Rights Reserved Data Drives the Connected Car

More information

MapR: Solution for Customer Production Success

MapR: Solution for Customer Production Success 2015 MapR Technologies 2015 MapR Technologies 1 MapR: Solution for Customer Production Success Big Data High Growth 700+ Customers Cloud Leaders Riding the Wave with Hadoop The Big Data Platform of Choice

More information

5th Annual. Cloudera, Inc. All rights reserved.

5th Annual. Cloudera, Inc. All rights reserved. 5th Annual 1 The Essentials of Apache Hadoop The What, Why and How to Meet Agency Objectives Sarah Sproehnle, Vice President, Customer Success 2 Introduction 3 What is Apache Hadoop? Hadoop is a software

More information

Intro to Big Data and Hadoop

Intro to Big Data and Hadoop Intro to Big and Hadoop Portions copyright 2001 SAS Institute Inc., Cary, NC, USA. All Rights Reserved. Reproduced with permission of SAS Institute Inc., Cary, NC, USA. SAS Institute Inc. makes no warranties

More information

Big data is hard. Top 3 Challenges To Adopting Big Data

Big data is hard. Top 3 Challenges To Adopting Big Data Big data is hard Top 3 Challenges To Adopting Big Data Traditionally, analytics have been over pre-defined structures Data characteristics: Sales Questions answered with BI and visualizations: Customer

More information

Cask Data Application Platform (CDAP)

Cask Data Application Platform (CDAP) Cask Data Application Platform (CDAP) CDAP is an open source, Apache 2.0 licensed, distributed, application framework for delivering Hadoop solutions. It integrates and abstracts the underlying Hadoop

More information

BIG DATA AND HADOOP DEVELOPER

BIG DATA AND HADOOP DEVELOPER BIG DATA AND HADOOP DEVELOPER Approximate Duration - 60 Hrs Classes + 30 hrs Lab work + 20 hrs Assessment = 110 Hrs + 50 hrs Project Total duration of course = 160 hrs Lesson 00 - Course Introduction 0.1

More information

EXAMPLE SOLUTIONS Hadoop in Azure HBase as a columnar NoSQL transactional database running on Azure Blobs Storm as a streaming service for near real time processing Hadoop 2.4 support for 100x query gains

More information

Apache Hadoop in the Datacenter and Cloud

Apache Hadoop in the Datacenter and Cloud Apache Hadoop in the Datacenter and Cloud The Shift to the Connected Data Architecture Digital Transformation fueled by Big Data Analytics and IoT ACTIONABLE INTELLIGENCE Cloud and Data Center IDMS Relational

More information

Modernizing Your Data Warehouse with Azure

Modernizing Your Data Warehouse with Azure Modernizing Your Data Warehouse with Azure Big data. Small data. All data. Christian Coté S P O N S O R S The traditional BI Environment The traditional data warehouse data warehousing has reached the

More information

Analytics Platform System

Analytics Platform System Analytics Platform System Big data. Small data. All data. Audie Wright, DW & Big Data Specialist Audie.Wright@Microsoft.com Ofc 425-538-0044, Cell 303-324-2860 Sean Mikha, DW & Big Data Architect semikha@microsoft.com

More information

Insights to HDInsight

Insights to HDInsight Insights to HDInsight Why Hadoop in the Cloud? No hardware costs Unlimited Scale Pay for What You Need Deployed in minutes Azure HDInsight Big Data made easy Enterprise Ready Easier and more productive

More information

THE CIO GUIDE TO BIG DATA ARCHIVING. How to pick the right product?

THE CIO GUIDE TO BIG DATA ARCHIVING. How to pick the right product? THE CIO GUIDE TO BIG DATA ARCHIVING How to pick the right product? The landscape of enterprise data is changing with the advent of enterprise social data, IoT, logs and click-streams. The data is too big,

More information

Sr. Sergio Rodríguez de Guzmán CTO PUE

Sr. Sergio Rodríguez de Guzmán CTO PUE PRODUCT LATEST NEWS Sr. Sergio Rodríguez de Guzmán CTO PUE www.pue.es Hadoop & Why Cloudera Sergio Rodríguez Systems Engineer sergio@pue.es 3 Industry-Leading Consulting and Training PUE is the first Spanish

More information

Architecture Optimization for the new Data Warehouse. Cloudera, Inc. All rights reserved.

Architecture Optimization for the new Data Warehouse. Cloudera, Inc. All rights reserved. Architecture Optimization for the new Data Warehouse Guido Oswald - @GuidoOswald 1 Use Cases This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently

More information

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud Datametica The Modern Data Platform Enterprise Data Hub Implementations Why is workload moving to Cloud 1 What we used do Enterprise Data Hub & Analytics What is Changing Why it is Changing Enterprise

More information

Taking Advantage of Cloud Elasticity and Flexibility

Taking Advantage of Cloud Elasticity and Flexibility Taking Advantage of Cloud Elasticity and Flexibility Fred Koopmans Sr. Director of Product Management 1 Public cloud adoption is surging 2 Cloudera customers are leading the way 3 Hadoop was born for the

More information

Outline of Hadoop. Background, Core Services, and Components. David Schwab Synchronic Analytics Nov.

Outline of Hadoop. Background, Core Services, and Components. David Schwab Synchronic Analytics   Nov. Outline of Hadoop Background, Core Services, and Components David Schwab Synchronic Analytics https://synchronicanalytics.com Nov. 1, 2018 Hadoop s Purpose and Origin Hadoop s Architecture Minimum Configuration

More information

E-guide Hadoop Big Data Platforms Buyer s Guide part 3

E-guide Hadoop Big Data Platforms Buyer s Guide part 3 Big Data Platforms Buyer s Guide part 3 Your expert guide to big platforms enterprise MapReduce cloud-based Abie Reifer, DecisionWorx The Amazon Elastic MapReduce Web service offers a managed framework

More information

Hadoop Integration Deep Dive

Hadoop Integration Deep Dive Hadoop Integration Deep Dive Piyush Chaudhary Spectrum Scale BD&A Architect 1 Agenda Analytics Market overview Spectrum Scale Analytics strategy Spectrum Scale Hadoop Integration A tale of two connectors

More information

Microsoft Big Data. Solution Brief

Microsoft Big Data. Solution Brief Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,

More information

Microsoft Azure Essentials

Microsoft Azure Essentials Microsoft Azure Essentials Azure Essentials Track Summary Data Analytics Explore the Data Analytics services in Azure to help you analyze both structured and unstructured data. Azure can help with large,

More information

20775 Performing Data Engineering on Microsoft HD Insight

20775 Performing Data Engineering on Microsoft HD Insight Duración del curso: 5 Días Acerca de este curso The main purpose of the course is to give students the ability plan and implement big data workflows on HD. Perfil de público The primary audience for this

More information

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia MapR: Converged Data Pla3orm and Quick Start Solu;ons Robin Fong Regional Director South East Asia Who is MapR? MapR is the creator of the top ranked Hadoop NoSQL SQL-on-Hadoop Real Database time streaming

More information

Architecting an Open Data Lake for the Enterprise

Architecting an Open Data Lake for the Enterprise Architecting an Open Data Lake for the Enterprise 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Today s Presenters Daniel Geske, Solutions Architect, Amazon Web Services Armin

More information

20775: Performing Data Engineering on Microsoft HD Insight

20775: Performing Data Engineering on Microsoft HD Insight Let s Reach For Excellence! TAN DUC INFORMATION TECHNOLOGY SCHOOL JSC Address: 103 Pasteur, Dist.1, HCMC Tel: 08 38245819; 38239761 Email: traincert@tdt-tanduc.com Website: www.tdt-tanduc.com; www.tanducits.com

More information

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight. Course Content Course Description: The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight. At Course Completion: After competing this course,

More information

StackIQ Enterprise Data Reference Architecture

StackIQ Enterprise Data Reference Architecture WHITE PAPER StackIQ Enterprise Data Reference Architecture StackIQ and Hortonworks worked together to Bring You World-class Reference Configurations for Apache Hadoop Clusters. Abstract Contents The Need

More information

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW TOPICS COVERED 1 2 Fundamentals of Big Data Platforms Major Big Data Tools Scaling Up vs. Out SCALE UP (SMP) SCALE OUT (MPP) + (n) Upgrade

More information

20775A: Performing Data Engineering on Microsoft HD Insight

20775A: Performing Data Engineering on Microsoft HD Insight 20775A: Performing Data Engineering on Microsoft HD Insight Duration: 5 days; Instructor-led Implement Spark Streaming Using the DStream API. Develop Big Data Real-Time Processing Solutions with Apache

More information

Make Business Intelligence Work on Big Data

Make Business Intelligence Work on Big Data Make Business Intelligence Work on Big Data Speed. Scale. Simplicity. Put the Power of Big Data in the Hands of Business Users Connect your BI tools directly to your big data without compromising scale,

More information

20775A: Performing Data Engineering on Microsoft HD Insight

20775A: Performing Data Engineering on Microsoft HD Insight 20775A: Performing Data Engineering on Microsoft HD Insight Course Details Course Code: Duration: Notes: 20775A 5 days This course syllabus should be used to determine whether the course is appropriate

More information

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake White Paper Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake Motivation for Modernization It is now a well-documented realization among Fortune 500 companies

More information

Common Customer Use Cases in FSI

Common Customer Use Cases in FSI Common Customer Use Cases in FSI 1 Marketing Optimization 2014 2014 MapR MapR Technologies Technologies 2 Fortune 100 Financial Services Company 104M CARD MEMBERS 3 Financial Services: Recommendation Engine

More information

Cask Data Application Platform (CDAP) Extensions

Cask Data Application Platform (CDAP) Extensions Cask Data Application Platform (CDAP) Extensions CDAP Extensions provide additional capabilities and user interfaces to CDAP. They are use-case specific applications designed to solve common and critical

More information

Hadoop in Production. Charles Zedlewski, VP, Product

Hadoop in Production. Charles Zedlewski, VP, Product Hadoop in Production Charles Zedlewski, VP, Product Cloudera In One Slide Hadoop meets enterprise Investors Product category Business model Jeff Hammerbacher Amr Awadallah Doug Cutting Mike Olson - CEO

More information

DataAdapt Active Insight

DataAdapt Active Insight Solution Highlights Accelerated time to value Enterprise-ready Apache Hadoop based platform for data processing, warehousing and analytics Advanced analytics for structured, semistructured and unstructured

More information

Confidential

Confidential June 2017 1. Is your EDW becoming too expensive to maintain because of hardware upgrades and increasing data volumes? 2. Is your EDW becoming a monolith, which is too slow to adapt to business s analytical

More information

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11 Top 5 Challenges for Hadoop MapReduce in the Enterprise Whitepaper - May 2011 http://platform.com/mapreduce 2 5/9/11 Table of Contents Introduction... 2 Current Market Conditions and Drivers. Customer

More information

Simplifying Your Modern Data Architecture Footprint

Simplifying Your Modern Data Architecture Footprint MIKE COCHRANE VP Analytics & Information Management Simplifying Your Modern Data Architecture Footprint Or Ways to Accelerate Your Success While Maintaining Your Sanity June 2017 mycervello.com Businesses

More information

Rapid Start with Big Data Appliance X6-2 Technical & Operational Overview

Rapid Start with Big Data Appliance X6-2 Technical & Operational Overview Rapid Start with Big Data Appliance X6-2 Technical & Operational Overview Dirk Augustin Solution Architect Hardware Presales Germany The Realities of Today s Data Center... Accelerating Customer Expectations

More information

Welcome! 2013 SAP AG or an SAP affiliate company. All rights reserved.

Welcome! 2013 SAP AG or an SAP affiliate company. All rights reserved. Welcome! 2013 SAP AG or an SAP affiliate company. All rights reserved. 1 SAP Big Data Webinar Series Big Data - Introduction to SAP Big Data Technologies Big Data - Streaming Analytics Big Data - Smarter

More information

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved. Apache Spark 2.0 GA The General Engine for Modern Analytic Use Cases 1 Apache Spark Drives Business Innovation Apache Spark is driving new business value that is being harnessed by technology forward organizations.

More information

Contents at a Glance COPYRIGHTED MATERIAL. Introduction... 1 Part I: Getting Started with Big Data... 7

Contents at a Glance COPYRIGHTED MATERIAL. Introduction... 1 Part I: Getting Started with Big Data... 7 Contents at a Glance Introduction... 1 Part I: Getting Started with Big Data... 7 Chapter 1: Grasping the Fundamentals of Big Data...9 Chapter 2: Examining Big Data Types...25 Chapter 3: Old Meets New:

More information

Jason Virtue Business Intelligence Technical Professional

Jason Virtue Business Intelligence Technical Professional Jason Virtue Business Intelligence Technical Professional jvirtue@microsoft.com Agenda Microsoft Azure Data Services Azure Cloud Services Azure Machine Learning Azure Service Bus Azure Stream Analytics

More information

Exploring Big Data and Data Analytics with Hadoop and IDOL. Brochure. You are experiencing transformational changes in the computing arena.

Exploring Big Data and Data Analytics with Hadoop and IDOL. Brochure. You are experiencing transformational changes in the computing arena. Brochure Software Education Exploring Big Data and Data Analytics with Hadoop and IDOL You are experiencing transformational changes in the computing arena. Brochure Exploring Big Data and Data Analytics

More information

Business is being transformed by three trends

Business is being transformed by three trends Business is being transformed by three trends Big Cloud Intelligence Stay ahead of the curve with Cortana Intelligence Suite Business apps People Custom apps Apps Sensors and devices Cortana Intelligence

More information

Hortonworks Powering the Future of Data

Hortonworks Powering the Future of Data Hortonworks Powering the Future of Simon Gregory Vice President Eastern Europe, Middle East & Africa 1 Hortonworks Inc. 2011 2016. All Rights Reserved MASTER THE VALUE OF DATA EVERY BUSINESS IS A DATA

More information

Big Business Value from Big Data and Hadoop

Big Business Value from Big Data and Hadoop Big Business Value from Big Data and Hadoop Page 1 Topics The Big Data Explosion: Hype or Reality Introduction to Apache Hadoop The Business Case for Big Data Hortonworks Overview & Product Demo Page 2

More information

Real World Use Cases: Hadoop & NoSQL in Production. Big Data Everywhere London 4 June 2015

Real World Use Cases: Hadoop & NoSQL in Production. Big Data Everywhere London 4 June 2015 Real World Use Cases: Hadoop & NoSQL in Production Ted Dunning Big Data Everywhere London 4 June 2015 1 Contact Information Ted Dunning Chief Applications Architect at MapR Technologies Committer & PMC

More information

SOLUTION SHEET Hortonworks DataFlow (HDF ) End-to-end data flow management and streaming analytics platform

SOLUTION SHEET Hortonworks DataFlow (HDF ) End-to-end data flow management and streaming analytics platform SOLUTION SHEET Hortonworks DataFlow (HDF ) End-to-end data flow management and streaming analytics platform CREATE STREAMING ANALYTICS APPLICATIONS IN MINUTES WITHOUT WRITING CODE The increasing growth

More information

Trifacta Data Wrangling for Hadoop: Accelerating Business Adoption While Ensuring Security & Governance

Trifacta Data Wrangling for Hadoop: Accelerating Business Adoption While Ensuring Security & Governance 575 Market St, 11th Floor San Francisco, CA 94105 www.trifacta.com 844.332.2821 1 WHITEPAPER Trifacta Data Wrangling for Hadoop: Accelerating Business Adoption While Ensuring Security & Governance 2 Introduction

More information

Spark, Hadoop, and Friends

Spark, Hadoop, and Friends Spark, Hadoop, and Friends (and the Zeppelin Notebook) Douglas Eadline Jan 4, 2017 NJIT Presenter Douglas Eadline deadline@basement-supercomputing.com @thedeadline HPC/Hadoop Consultant/Writer http://www.basement-supercomputing.com

More information

Datametica DAMA. The Modern Data Platform Enterprise Data Hub Implementations. What is happening with Hadoop Why is workload moving to Cloud

Datametica DAMA. The Modern Data Platform Enterprise Data Hub Implementations. What is happening with Hadoop Why is workload moving to Cloud DAMA Datametica The Modern Data Platform Enterprise Data Hub Implementations What is happening with Hadoop Why is workload moving to Cloud 1 The Modern Data Platform The Enterprise Data Hub What do we

More information

Got Data Silos? Automate Data Ingestion Into Isilon In Support Of Analytics

Got Data Silos? Automate Data Ingestion Into Isilon In Support Of Analytics Got Data Silos? Automate Data Ingestion Into Isilon In Support Of Analytics Key takeaways Analytic Insights Module for self-service analytics Automate data ingestion into Isilon Data Lake Three methods

More information

Hadoop Administration Course Content

Hadoop Administration Course Content Hadoop Administration Course Content Weekend Batch (2 Months): SAT & SUN (8-12pm) Course Fee: 16,000/- New Batch starts on: Free Demo Session scheduled on : Ph : 8892499499 Web:www.dvstechnologies.in mail:dvs.training@gmail.com

More information

Big Data Job Descriptions. Software Engineer - Algorithms

Big Data Job Descriptions. Software Engineer - Algorithms Big Data Job Descriptions Software Engineer - Algorithms This position is responsible for meeting the big data needs of our various products and businesses. Specifically, this position is responsible for

More information

Azure ML Data Camp. Ivan Kosyakov MTC Architect, Ph.D. Microsoft Technology Centers Microsoft Technology Centers. Experience the Microsoft Cloud

Azure ML Data Camp. Ivan Kosyakov MTC Architect, Ph.D. Microsoft Technology Centers Microsoft Technology Centers. Experience the Microsoft Cloud Microsoft Technology Centers Microsoft Technology Centers Experience the Microsoft Cloud Experience the Microsoft Cloud ML Data Camp Ivan Kosyakov MTC Architect, Ph.D. Top Manager IT Analyst Big Data Strategic

More information

SOLUTION SHEET End to End Data Flow Management and Streaming Analytics Platform

SOLUTION SHEET End to End Data Flow Management and Streaming Analytics Platform SOLUTION SHEET End to End Data Flow Management and Streaming Analytics Platform CREATE STREAMING ANALYTICS APPLICATIONS IN MINUTES WITHOUT WRITING CODE The increasing growth of data, especially data-in-motion,

More information

OSIsoft Super Regional Transform Your World

OSIsoft Super Regional Transform Your World OSIsoft Super Regional Transform Your World Copyright 208 OSIsoft, LLC OSIsoft Vision & Roadmap Chris Nelson, VP Software Development 2 st August, 208 Copyright 208 OSIsoft, LLC Copyright 208 OSIsoft,

More information

Angat Pinoy. Angat Negosyo. Angat Pilipinas.

Angat Pinoy. Angat Negosyo. Angat Pilipinas. Angat Pinoy. Angat Negosyo. Angat Pilipinas. Four megatrends will dominate the next decade Mobility Social Cloud Big data 91% of organizations expect to spend on mobile devices in 2012 In 2012, mobile

More information

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION Oracle Data Integrator Enterprise Edition delivers high-performance data movement and transformation among enterprise platforms with its open and integrated E-LT

More information

Microsoft reinvents sales processing and financial reporting with Azure

Microsoft reinvents sales processing and financial reporting with Azure Microsoft IT Showcase Microsoft reinvents sales processing and financial reporting with Azure Core Services Engineering (CSE, formerly Microsoft IT) is moving MS Sales, the Microsoft revenue reporting

More information

GET MORE VALUE OUT OF BIG DATA

GET MORE VALUE OUT OF BIG DATA GET MORE VALUE OUT OF BIG DATA Enterprise data is increasing at an alarming rate. An International Data Corporation (IDC) study estimates that data is growing at 50 percent a year and will grow by 50 times

More information

Hortonworks Data Platform. Buyer s Guide

Hortonworks Data Platform. Buyer s Guide Hortonworks Data Platform Buyer s Guide Hortonworks Data Platform (HDP Completely Open and Versatile Hadoop Data Platform 2 2014 Hortonworks, Inc. All rights reserved. Hadoop and the Hadoop elephant logo

More information

Redefine Big Data: EMC Data Lake in Action. Andrea Prosperi Systems Engineer

Redefine Big Data: EMC Data Lake in Action. Andrea Prosperi Systems Engineer Redefine Big Data: EMC Data Lake in Action Andrea Prosperi Systems Engineer 1 Agenda Data Analytics Today Big data Hadoop & HDFS Different types of analytics Data lakes EMC Solutions for Data Lakes 2 The

More information

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1 Spotlight Sessions Nik Rouda Director of Product Marketing Cloudera @nrouda Cloudera, Inc. All rights reserved. 1 Spotlight: Protecting Your Data Nik Rouda Product Marketing Cloudera, Inc. All rights reserved.

More information

Modern Data Architecture with Apache Hadoop

Modern Data Architecture with Apache Hadoop Modern Data Architecture with Apache Hadoop Automating Data Transfer with Attunity Replicate Presented by Hortonworks and Attunity Executive Summary Apache Hadoop didn t disrupt the datacenter, the data

More information

Five Questions to Ask Before Choosing a Hadoop Distribution

Five Questions to Ask Before Choosing a Hadoop Distribution Five Questions to Ask Before Choosing a Hadoop Distribution SPONSORED BY CONTENTS Introduction 1 1. What does it take to make Hadoop enterprise-ready? 1 2. Does the distribution offer scalability, reliability,

More information

DATASHEET. Tarams Business Intelligence. Services Data sheet

DATASHEET. Tarams Business Intelligence. Services Data sheet DATASHEET Tarams Business Intelligence Services Data sheet About Business Intelligence The proliferation of data in today s connected world offers tremendous possibilities for analysis and decision making

More information

Cloudera Hadoop & Industrie 4.0 wohin mit dem Datenstrom?

Cloudera Hadoop & Industrie 4.0 wohin mit dem Datenstrom? Cloudera Hadoop & Industrie 4.0 wohin mit dem Datenstrom? Bernard Doering Regional Sales Director, Central Europe 1 Cloudera Hadoop Scalable Flexible Open Cost- EffecLve 2 2014 Cloudera, Inc. All rights

More information

Table of Contents. Are You Ready for Digital Transformation? page 04. Take Advantage of This Big Data Opportunity with Cisco and Hortonworks page 06

Table of Contents. Are You Ready for Digital Transformation? page 04. Take Advantage of This Big Data Opportunity with Cisco and Hortonworks page 06 Table of Contents 01 02 Are You Ready for Digital Transformation? page 04 Take Advantage of This Big Data Opportunity with Cisco and Hortonworks page 06 03 Get Open Access to Your Data and Help Ensure

More information

ABOUT THIS TRAINING: This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA.

ABOUT THIS TRAINING: This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA. ABOUT THIS TRAINING: The world of Hadoop and Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. This comprehensive training has been designed

More information

Hadoop and Analytics at CERN IT CERN IT-DB

Hadoop and Analytics at CERN IT CERN IT-DB Hadoop and Analytics at CERN IT CERN IT-DB 1 Hadoop Use cases Parallel processing of large amounts of data Perform analytics on a large scale Dealing with complex data: structured, semi-structured, unstructured

More information

1. Intoduction to Hadoop

1. Intoduction to Hadoop 1. Intoduction to Hadoop Hadoop is a rapidly evolving ecosystem of components for implementing the Google MapReduce algorithms in a scalable fashion on commodity hardware. Hadoop enables users to store

More information

Hadoop Stories. Tim Marston. Director, Regional Alliances Page 1. Hortonworks Inc All Rights Reserved

Hadoop Stories. Tim Marston. Director, Regional Alliances Page 1. Hortonworks Inc All Rights Reserved Hadoop Stories Tim Marston Director, Regional Alliances EMEA Page 1 @timmarston Page 2 Plans for Hadoop Adoption (Gartner, May 2015) Start within 1 year 11% Start within 2 years 7% Already doing 27% No

More information

SAS and Hadoop Technology: Overview

SAS and Hadoop Technology: Overview SAS and Hadoop Technology: Overview SAS Documentation September 19, 2017 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS and Hadoop Technology: Overview.

More information

Two offerings which interoperate really well

Two offerings which interoperate really well Microsoft Two offerings which interoperate really well On-premises Cortana Intelligence Suite SQL Server 2016 Cloud IAAS Enterprise PAAS Cloud Storage Service 9 SQL Server 2016: Everything built-in built-in

More information

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB Data Analytics and CERN IT Hadoop Service CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB 1 Data Analytics at Scale The Challenge When you cannot fit your workload in a desktop Data

More information

Meta-Managed Data Exploration Framework and Architecture

Meta-Managed Data Exploration Framework and Architecture Meta-Managed Data Exploration Framework and Architecture CONTENTS Executive Summary Meta-Managed Data Exploration Framework Meta-Managed Data Exploration Architecture Data Exploration Process: Modules

More information

More information for FREE VS ENTERPRISE LICENCE :

More information for FREE VS ENTERPRISE LICENCE : Source : http://www.splunk.com/ Splunk Enterprise is a fully featured, powerful platform for collecting, searching, monitoring and analyzing machine data. Splunk Enterprise is easy to deploy and use. It

More information

Transforming Analytics with Cloudera Data Science WorkBench

Transforming Analytics with Cloudera Data Science WorkBench Transforming Analytics with Cloudera Data Science WorkBench Process data, develop and serve predictive models. 1 Age of Machine Learning Data volume NO Machine Learning Machine Learning 1950s 1960s 1970s

More information

Amsterdam. (technical) Updates & demonstration. Robert Voermans Governance architect

Amsterdam. (technical) Updates & demonstration. Robert Voermans Governance architect (technical) Updates & demonstration Robert Voermans Governance architect Amsterdam Please note IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

IBM Spectrum Scale. Advanced storage management of unstructured data for cloud, big data, analytics, objects and more. Highlights

IBM Spectrum Scale. Advanced storage management of unstructured data for cloud, big data, analytics, objects and more. Highlights IBM Spectrum Scale Advanced storage management of unstructured data for cloud, big data, analytics, objects and more Highlights Consolidate storage across traditional file and new-era workloads for object,

More information

APAC Big Data & Cloud Summit 2013

APAC Big Data & Cloud Summit 2013 APAC Big Data & Cloud Summit 2013 Big Data Analytics & Hadoop Use Cases Eddie Toh Server Marketing Manager 21 August 2013 From the dawn of civilization until 2003, we humans created 5 Exabyte of information.

More information

Cask Data Application Platform (CDAP) The Integrated Platform for Developers and Organizations to Build, Deploy, and Manage Data Applications

Cask Data Application Platform (CDAP) The Integrated Platform for Developers and Organizations to Build, Deploy, and Manage Data Applications Cask Data Application Platform (CDAP) The Integrated Platform for Developers and Organizations to Build, Deploy, and Manage Data Applications Copyright 2015 Cask Data, Inc. All Rights Reserved. February

More information

Spark and Hadoop Perfect Together

Spark and Hadoop Perfect Together Spark and Hadoop Perfect Together Arun Murthy Hortonworks Co-Founder @acmurthy Data Operating System Enable all data and applications TO BE accessible and shared BY any end-users Data Operating System

More information

Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect

Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect 2005 Concert de Coldplay 2014 Concert de Coldplay 90% of the world s data has been created over the last two years alone 1 1. Source

More information

Azure. Bruno Kovačić Axilis, Microsoft MVP

Azure. Bruno Kovačić Axilis, Microsoft MVP Azure Bruno Kovačić Axilis, Microsoft MVP Why the cloud? Game sessions hosted using Azure Hosted using >100,000 Azure Virtual Machines Why the cloud? Rapidly setup environments to drive business priorities

More information

Pentaho Technical Overview. Max Felber Solution Engineer September 22, 2016

Pentaho Technical Overview. Max Felber Solution Engineer September 22, 2016 Pentaho Technical Overview Max Felber Solution Engineer mfelber@pentaho.com September 22, 2016 Industry Leader in Self-Service Big Data Preparation Gartner recently completed a study on 36 selfservice

More information

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand Paper 2698-2018 Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand ABSTRACT Digital analytics is no longer just about tracking the number

More information

Realising Value from Data

Realising Value from Data Realising Value from Data Togetherwith Open Source Drives Innovation & Adoption in Big Data BCS Open Source SIG London 1 May 2013 Timings 6:00-6:30pm. Register / Refreshments 6:30-8:00pm, Presentation

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information