Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11

Size: px
Start display at page:

Download "Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11"

Transcription

1 Top 5 Challenges for Hadoop MapReduce in the Enterprise Whitepaper - May /9/11

2 Table of Contents Introduction... 2 Current Market Conditions and Drivers. Customer Problems... 2 Needs Current Solutions Five Challenges for Hadoop MapReduce in the Enterprise Lack of Performance and Scalability 2. Lack of Flexible and Reliable resource Management 3. Lack of Application Deployment Support 4. Lack of Quality of Service 5. Lack of Multiple Data Source Support. Use Case Example... 4 Example Scenarios. Conclusion /9/11

3 Introduction Reporting and analysis drive businesses in making the best possible decisions. The source of all these decisions is data. There are two main types of data: structured and unstructured. Though IT has been able to deliver enterprise-class services for analysis and reporting on structured data (e.g., data warehouses,) IT has struggled to deliver the same level of services for capturing, managing and processing information from unstructured data. IT organizations need to adopt new ways to deliver enterprise-class services to extract and analyze unstructured data. Though new methods such as MapReduce have been found to access, extract and organize data results sets, its delivery is becoming too expensive without enterprise-class delivery services. To meet emerging business demands to extract knowledge from unstructured data, enterprises require an enterprise-class solution that can schedule and manage data analysis processes across an entire distributed file system with the robustness that enterprise IT requires. Platform Computing has managed enterprise class distributed architectures for nearly two decades and is well suited to provide enterprise class services across a distributed file system. Platform Computing s MapReduce distributed computing runtime engine meets this need. Current Market Conditions and Drivers According to the Market Strategy and BI Research group, data volumes are doubling every year: 42.6 percent of respondents are keeping more than three years of data for analytical purposes. New sources of data are emerging at huge volumes, in different industries, such as utilities. 80 percent of data is unstructured and not effectively used in the organization. Most of the unstructured data collected is driven by business value rather than need from a pure analytics perspective. The key is to turn this data into usable information. However, with the rapid growth in data volumes, even the fastest systems cannot keep pace. For the analysis of large data sets (i.e. Big Data ), the system architecture has to be revisited and designed to scale linearly as the volume of data grows. In order to meet these big data conditions, both computational and storage solutions have evolved: Emergence of new programming frameworks to enable distributed computing on large data sets (e.g., MapReduce). New data storage techniques (e.g. file systems on commodity hardware, like the Hadoop File System, or HDFS) for structured and unstructured data. Distributed storage systems made it more affordable to NYSE is generating store large volumes of data 1TB of data per day using commodity disks. Big Facebook is generating 20TB of data per data distributed file systems used for storage support day--compressed! some enterprise-class capabilities such as data flexibility, 40TB of data per day CERN is generating adoption within the IT ecosystem, high scalability (up to petabytes,) and reasonable cost. Customer Problems So, the customer problem is not with the distributed file system, but the ability to access, extract and organize the data using MapReduce with enterprise-class services. The common implementation of MapReduce based on open source code is not inherently designed for enterprise-class deployments. In fact, using MapReduce in an enterprise data center requires a highly scalable, highly available, and easily managed solution, which includes support for multiple MapReduce applications. These key capabilities do not exist within current open source solutions. Needs There is a need for an enterprise-class MapReduce computational solution to support distributed processing of the MapReduce programming model. Enterprise-class MapReduce computational engines need to: Enable deployment and operation of the extraction and analysis programs across the enterprise. Manage and monitor large-scale environments Include a workload management system to ensure quality of service and prioritization of applications based on business objectives. 2 5/9/11

4 Service multiple MapReduce users and lines of businesses, as well as potentially other distributed processing needs. Provide flexibility to choose the right storage/file system, based on the specific application need. Deliver SLAs that IT can commit to its business users. Current Solutions There are three current approaches to performing MapReduce operations on large amounts of data: Open Source Apache Hadoop Project Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data within the Hadoop Distributed File System (HDFS) 1. Hadoop was inspired by Google s MapReduce and Google File System (GFS). Hadoop is an Apache project being built and used by a global community of contributors, using the Java programming language. Yahoo!, the largest contributor to the project, uses Hadoop extensively across its businesses. The design of the system is in flux, as the initial distribution suffers from multiple issues including a monolithic architecture of the core scheduling sub system. Being an open source distribution, customers wishing to implement the MapReduce programs on Hadoop must do so at their own risk by supporting the entire deployment themselves. The Hadoop implementation is also Javacentric and primarily works with the Hadoop file system (HDFS 2 ). There is an assumption that the customer has internal expertise on how to operate the code. The solution offers no serious manageability, high availability, or performance capability. It is designed to be used by IT departments that have an army of developers to help fix any issues they encounter. The source code is constantly evolving, and managing the infrastructure lifecycle is quite complex and may require interruption of the environment to perform system updates. Commercial Open Source Cloudera is one such commercial provider of a Hadoop stack software distribution, providing services in addition to add-on tools. Their distribution is based on open source which is still an unproven large-scale enterprise full stack solution. There are many shortcomings in the open source distribution, including the workload management capabilities. Other open source commercial distributions are emerging, with IBM and EMC entering the marketplace. However, all of these offerings are based on open source code and inevitably inherit the strengths and weaknesses of that code base and architectural design. Therefore they cannot meet the enterprise class requirements for big data problems as already mentioned. In-data Warehouse Analytics Some data warehouse vendors have implemented the MapReduce programming model on top of their data warehouses. These include EMC/Greenplum and Aster Data. Though the tight integration of MapReduce with their data warehouse is an attractive and reliable solution for their customers, it only works with their own data warehouse. Many customers will find this solution unappealing due to lack of choices Contribution plug-ins for GPFS and CEPH have been offered by the community. Five Challenges for Hadoop MapReduce in the Enterprise 1Lack of Performance and Scalability programming model do not provide a fast, scalable distributed resource infrastructure solution 3. Organizations require a MapReduce distributed solution that can deliver a competitive advantage by solving a wide range of data-intensive analytic problems. It may also require the ability to harness resources from distributed clusters in remote data centers. A complete MapReduce implementation should help organizations run complex data simulations with sub-millisecond latency with data throughput over thousands of tasks per second. Current open source implementations have job startup time measured in seconds, not milliseconds. Applications should be able to scale to tens of thousands of cores and thousands of concurrent clients and/or applications. 2 Lack of Flexible and Reliable resource Management programming model are not able to react quickly to changes based on application and/or user demand. MapReduce distributed processing requires a flexible amount of computing power to support applications even when data streams to the distributed resources in real time. Based on volume, the MapReduce distributed resources should be able to grow or shrink by reallocating up to thousands of CPUs per second to adjust to the current workload, in order to reduce cost while maximizing results. The resource manager in the current open source solution is susceptible to being a single point of failure, and tasks will need to be resubmitted upon failure of this subsystem. 3 5/9/11

5 3Lack of Application Deployment Support programming model do not make it easy to manage multiple application integrations on production-scale distributed systems with automated application service deployment capability. An enterprise-class solution should have automated capabilities including application deployment, workload policies, tuning, and general monitoring and administration. This eliminates ongoing source code maintenance and simplifies IT operations. 4Lack of Quality of Service programming model do not run at optimal capacity to take advantage of multi-core servers. Organizations with Big Data challenges are looking for a solution that can dynamically allocate matching resources with non-uniform MapReduce workloads in order to maximize their IT infrastructure. The improved resource utilization also leads to higher application performance and faster time to results, and thereby delivers a higher quality of service to an organization. MapReduce implementations often overlook the capability of infrastructure lifecycle management. Because of this, the entire systems infrastructure has to be brought down in order to perform routine maintenance such as patching or upgrades. 5Lack of Multiple Data Source Support programming model only support one distributed file system for reading and writing data, the most common being HDFS. A complete implementation of the MapReduce programming model should be agile enough to provide simultaneous support for multiple distributed file systems. With the flexibility of being able to read input from one file system and write to a different file system, the task of data processing and data storage becomes far more efficient, and eliminates additional steps for data conversion. Platform Computing s MapReduce approach is to deliver enterprise-class distributed workload services for the MapReduce application programming model. It meets enterprise IT requirements when running MapReduce analytics, delivering availability, scalability, performance and manageability. The server is designed to work specifically with multiple data file systems, avoiding customer lock-in while offering a single MapReduce solution throughout the enterprise. As an analytic distributed platform, Platform MapReduce supports an open, compatible application architecture. It can support multiple programming languages and multiple datastorage techniques, and has consistent APIs with the open source Hadoop projects. This makes it easy to integrate with third-party software when moving current applications to Platform MapReduce. The Platform Computing MapReduce enhanced approach is built around the company s core technologies in Platform LSF and Platform Symphony. Its enterprise-class capabilities include the ability to scale to thousands of cores per MapReduce application, to perform at very high execution rates, and to offer IT manageability and monitoring while controlling workload policies for multiple lines of business users. It has built-in high availability services to ensure the necessary quality of service. 3 Apache Hadoop is limited to 4,000 nodes and 40,000 concurrent tasks. It also has a single point of failure Use Case Example Data will continue to accumulate within IT organizations. Within the course of time, this data can become extremely large and complex. It is comprised of multiple formats, including documents, web feeds, system logs, online forums, SharePoint, sensor data, and images/video content. The ability to analyze and make use of this data can dramatically assist in running any business. 4 5/9/11

6 Example Scenarios As a general purpose solution, for example, users may want to perform what-if questions from a graphical user interface against the data to determine customer buying patterns. Another example would be an application continuously performing queries to detect money laundering, or credit card fraud, by correlating location and buying timelines of financial transactions. to scale to thousands of cores per MapReduce application, to perform at very high execution rates, and to offer IT manageability and monitoring while controlling workload policies for multiple lines of business users. It also offers built-in high availability services to ensure the necessary quality of service. Conclusion Platform Computing s MapReduce approach provides development flexibility, operational maturity, better performance and higher scalability to meet the needs of the most complex environments. Platform Computing brings together two decades of distributed system management capabilities, providing a solution that allows linear scalability by balancing computation needs with the ever-growing volumes of data. Designed to support multiple applications, organizations can dramatically increase their IT infrastructure utilization across all resources, resulting in a high return on investment. Unlike other less sophisticated solutions that lack multiple MapReduce application support and scalability, Platform MapReduce s distributed workload services are designed for high scalability, fast performance, and extreme application compatibility through its low-latency SOA architecture. MapReduce applications can now run with high reliability under powerful central management, thereby meeting IT s SLAs with both reliability and consistency. Solution Platform MapReduce is a product designed to run MapReduce programs on a computational distributed system that provides enterprise-class services. As a computational distributed platform, it supports an open application architecture as well as multiple distributed file systems used by organizations today. Its enterprise-class capabilities include the ability Platform Computing is the leader in cluster, grid and cloud management software - serving more than 2,000 of the world s most demanding organizations for over 18 years. Our workload and resource management solutions deliver IT responsiveness and lower costs for enterprise and HPC applications. Platform has strategic relationships with Cray, DellTM, HP, IBM, Intel, Microsoft, Red Hat, and SAS. Visit World Headquarters Platform Computing Corporation th Avenue Markham, Ontario Canada L3R 3T7 Tel: Fax: Toll-free Tel: info@platform.com Sales - Headquarters Toll-free Tel: Tel: North America New York: San Jose: Europe Bramley: +44 (0) London: +44 (0) Paris: +33 (0) Düsseldorf: Asia-Pacific Beijing: Xi an: Tokyo: +81(0) Singapore: Copyright 2011 Platform Computing Corporation. The symbols and T designate trademarks of Platform Computing Corporation or identified third parties. All other logos and product names are the trademarks of their respective owners, errors and omissions excepted. Printed in Canada. Platform and Platform Computing refer to Platform Computing Corporation and each of its subsidiaries /9/11

Optimize your FLUENT environment with Platform LSF CAE Edition

Optimize your FLUENT environment with Platform LSF CAE Edition Optimize your FLUENT environment with Platform LSF CAE Edition Accelerating FLUENT CFD Simulations ANSYS, Inc. is a global leader in the field of computer-aided engineering (CAE). The FLUENT software from

More information

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

E-guide Hadoop Big Data Platforms Buyer s Guide part 1 Hadoop Big Data Platforms Buyer s Guide part 1 Your expert guide to Hadoop big data platforms for managing big data David Loshin, Knowledge Integrity Inc. Companies of all sizes can use Hadoop, as vendors

More information

StackIQ Enterprise Data Reference Architecture

StackIQ Enterprise Data Reference Architecture WHITE PAPER StackIQ Enterprise Data Reference Architecture StackIQ and Hortonworks worked together to Bring You World-class Reference Configurations for Apache Hadoop Clusters. Abstract Contents The Need

More information

Intro to Big Data and Hadoop

Intro to Big Data and Hadoop Intro to Big and Hadoop Portions copyright 2001 SAS Institute Inc., Cary, NC, USA. All Rights Reserved. Reproduced with permission of SAS Institute Inc., Cary, NC, USA. SAS Institute Inc. makes no warranties

More information

IBM Db2 Warehouse. Hybrid data warehousing using a software-defined environment in a private cloud. The evolution of the data warehouse

IBM Db2 Warehouse. Hybrid data warehousing using a software-defined environment in a private cloud. The evolution of the data warehouse IBM Db2 Warehouse Hybrid data warehousing using a software-defined environment in a private cloud The evolution of the data warehouse Managing a large-scale, on-premises data warehouse environments to

More information

Got Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes

Got Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes Got Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes Contents Introduction...3 Hadoop s humble beginnings...4 The benefits of Hadoop...5

More information

IBM Spectrum Scale. Advanced storage management of unstructured data for cloud, big data, analytics, objects and more. Highlights

IBM Spectrum Scale. Advanced storage management of unstructured data for cloud, big data, analytics, objects and more. Highlights IBM Spectrum Scale Advanced storage management of unstructured data for cloud, big data, analytics, objects and more Highlights Consolidate storage across traditional file and new-era workloads for object,

More information

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT WHITEPAPER OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT A top-tier global bank s end-of-day risk analysis jobs didn t complete in time for the next start of trading day. To solve

More information

Building a Multi-Tenant Infrastructure for Diverse Application Workloads

Building a Multi-Tenant Infrastructure for Diverse Application Workloads Building a Multi-Tenant Infrastructure for Diverse Application Workloads Rick Janowski Marketing Manager IBM Platform Computing 1 The Why and What of Multi-Tenancy 2 Parallelizable problems demand fresh

More information

Realize More with the Power of Choice. Microsoft Dynamics ERP and Software-Plus-Services

Realize More with the Power of Choice. Microsoft Dynamics ERP and Software-Plus-Services Realize More with the Power of Choice Microsoft Dynamics ERP and Software-Plus-Services Software-as-a-service (SaaS) refers to services delivery. Microsoft s strategy is to offer SaaS as a deployment choice

More information

IBM i Reduce complexity and enhance productivity with the world s first POWER5-based server. Highlights

IBM i Reduce complexity and enhance productivity with the world s first POWER5-based server. Highlights Reduce complexity and enhance productivity with the world s first POWER5-based server IBM i5 570 Highlights Integrated management of multiple operating systems and application environments helps reduce

More information

Insights to HDInsight

Insights to HDInsight Insights to HDInsight Why Hadoop in the Cloud? No hardware costs Unlimited Scale Pay for What You Need Deployed in minutes Azure HDInsight Big Data made easy Enterprise Ready Easier and more productive

More information

BIG DATA TRANSFORMS BUSINESS. The EMC Big Data Solution

BIG DATA TRANSFORMS BUSINESS. The EMC Big Data Solution BIG DATA The EMC Big Data Solution THE JOURNEY TO BIG DATA Businesses that exploit Big Data to improve strategy and execution are distancing themselves from competitors. The Big Data solution from EMC

More information

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW TOPICS COVERED 1 2 Fundamentals of Big Data Platforms Major Big Data Tools Scaling Up vs. Out SCALE UP (SMP) SCALE OUT (MPP) + (n) Upgrade

More information

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake White Paper Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake Motivation for Modernization It is now a well-documented realization among Fortune 500 companies

More information

Evolution to Revolution: Big Data 2.0

Evolution to Revolution: Big Data 2.0 Evolution to Revolution: Big Data 2.0 An ENTERPRISE MANAGEMENT ASSOCIATES (EMA ) White Paper Prepared for Actian March 2014 IT & DATA MANAGEMENT RESEARCH, INDUSTRY ANALYSIS & CONSULTING Table of Contents

More information

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica Accelerating Your Big Data Analytics Jeff Healey, Director Product Marketing, HPE Vertica Recent Waves of Disruption IT Infrastructu re for Analytics Data Warehouse Modernization Big Data/ Hadoop Cloud

More information

IBM Accelerating Technical Computing

IBM Accelerating Technical Computing IBM Accelerating Jay Muelhoefer WW Marketing Executive, IBM Technical and Platform Computing September 2013 1 HPC and IBM have long history driving research and government innovation Traditional use cases

More information

5th Annual. Cloudera, Inc. All rights reserved.

5th Annual. Cloudera, Inc. All rights reserved. 5th Annual 1 The Essentials of Apache Hadoop The What, Why and How to Meet Agency Objectives Sarah Sproehnle, Vice President, Customer Success 2 Introduction 3 What is Apache Hadoop? Hadoop is a software

More information

WELCOME TO. Cloud Data Services: The Art of the Possible

WELCOME TO. Cloud Data Services: The Art of the Possible WELCOME TO Cloud Data Services: The Art of the Possible Goals for Today Share the cloud-based data management and analytics technologies that are enabling rapid development of new mobile applications Discuss

More information

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN From Information to Insight: The Big Value of Big Data Faire Ann Co Marketing Manager, Information Management Software, ASEAN The World is Changing and Becoming More INSTRUMENTED INTERCONNECTED INTELLIGENT

More information

Adobe Deploys Hadoop as a Service on VMware vsphere

Adobe Deploys Hadoop as a Service on VMware vsphere Adobe Deploys Hadoop as a Service A TECHNICAL CASE STUDY APRIL 2015 Table of Contents A Technical Case Study.... 3 Background... 3 Why Virtualize Hadoop on vsphere?.... 3 The Adobe Marketing Cloud and

More information

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud Datametica The Modern Data Platform Enterprise Data Hub Implementations Why is workload moving to Cloud 1 What we used do Enterprise Data Hub & Analytics What is Changing Why it is Changing Enterprise

More information

NEXT GENERATION PREDICATIVE ANALYTICS USING HP DISTRIBUTED R

NEXT GENERATION PREDICATIVE ANALYTICS USING HP DISTRIBUTED R 1 A SOLUTION IS NEEDED THAT NOT ONLY HANDLES THE VOLUME OF BIG DATA OR HUGE DATA EASILY, BUT ALSO PRODUCES INSIGHTS INTO THIS DATA QUICKLY NEXT GENERATION PREDICATIVE ANALYTICS USING HP DISTRIBUTED R A

More information

Cloud-Scale Data Platform

Cloud-Scale Data Platform Guide to Supporting On-Premise Spark Deployments with a Cloud-Scale Data Platform Apache Spark has become one of the most rapidly adopted open source platforms in history. Demand is predicted to grow at

More information

IBM Balanced Warehouse Buyer s Guide. Unlock the potential of data with the right data warehouse solution

IBM Balanced Warehouse Buyer s Guide. Unlock the potential of data with the right data warehouse solution IBM Balanced Warehouse Buyer s Guide Unlock the potential of data with the right data warehouse solution Regardless of size or industry, every organization needs fast access to accurate, up-to-the-minute

More information

ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS)

ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS) ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS) Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how Dell EMC Elastic Cloud Storage (ECS ) can be used to streamline

More information

HPC Workload Management Tools: Tech Brief Update

HPC Workload Management Tools: Tech Brief Update 89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com @EdisonGroupInc 212.367.7400 HPC Workload Management Tools: Tech Brief Update IBM Platform LSF Meets Evolving High Performance Computing

More information

Cognizant BigFrame Fast, Secure Legacy Migration

Cognizant BigFrame Fast, Secure Legacy Migration Cognizant BigFrame Fast, Secure Legacy Migration Speeding Business Access to Critical Data BigFrame speeds migration from legacy systems to secure next-generation data platforms, providing up to a 4X performance

More information

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand Paper 2698-2018 Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand ABSTRACT Digital analytics is no longer just about tracking the number

More information

Datametica DAMA. The Modern Data Platform Enterprise Data Hub Implementations. What is happening with Hadoop Why is workload moving to Cloud

Datametica DAMA. The Modern Data Platform Enterprise Data Hub Implementations. What is happening with Hadoop Why is workload moving to Cloud DAMA Datametica The Modern Data Platform Enterprise Data Hub Implementations What is happening with Hadoop Why is workload moving to Cloud 1 The Modern Data Platform The Enterprise Data Hub What do we

More information

SYSPRO Integration SYSPRO Integration Framework

SYSPRO Integration SYSPRO Integration Framework SYSPRO Integration SYSPRO Integration Framework Framework Introducing SYSPRO SYSPRO is an internationally-recognized, leading provider of enterprise business solutions. Formed in 1978, SYSPRO was one of

More information

The ABCs of. CA Workload Automation

The ABCs of. CA Workload Automation The ABCs of CA Workload Automation 1 The ABCs of CA Workload Automation Those of you who have been in the IT industry for a while will be familiar with the term job scheduling or workload management. For

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

Business Insight at the Speed of Thought

Business Insight at the Speed of Thought BUSINESS BRIEF Business Insight at the Speed of Thought A paradigm shift in data processing that will change your business Advanced analytics and the efficiencies of Hybrid Cloud computing models are radically

More information

White paper A Reference Model for High Performance Data Analytics(HPDA) using an HPC infrastructure

White paper A Reference Model for High Performance Data Analytics(HPDA) using an HPC infrastructure White paper A Reference Model for High Performance Data Analytics(HPDA) using an HPC infrastructure Discover how to reshape an existing HPC infrastructure to run High Performance Data Analytics (HPDA)

More information

Simplifying Hadoop. Sponsored by. July >> Computing View Point

Simplifying Hadoop. Sponsored by. July >> Computing View Point Sponsored by >> Computing View Point Simplifying Hadoop July 2013 The gap between the potential power of Hadoop and the technical difficulties in its implementation are narrowing and about time too Contents

More information

IBM Software IBM Business Process Manager

IBM Software IBM Business Process Manager IBM Software IBM Business Process Manager An industry-leading BPM unified platform to help drive innovation at scale 2 IBM Business Process Manager Highlights Mobile New responsive user interface controls

More information

In-Memory Analytics: Get Faster, Better Insights from Big Data

In-Memory Analytics: Get Faster, Better Insights from Big Data Discussion Summary In-Memory Analytics: Get Faster, Better Insights from Big Data January 2015 Interview Featuring: Tapan Patel, SAS Institute, Inc. Introduction A successful analytics program should translate

More information

IBM Digital Analytics Accelerator

IBM Digital Analytics Accelerator IBM Digital Analytics Accelerator On-premises web analytics solution for high-performance, granular insights Highlights: Efficiently capture, store, and analyze online data Benefit from highly scalable

More information

Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect

Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect 2005 Concert de Coldplay 2014 Concert de Coldplay 90% of the world s data has been created over the last two years alone 1 1. Source

More information

GE Intelligent Platforms. Proficy Historian HD

GE Intelligent Platforms. Proficy Historian HD GE Intelligent Platforms Proficy Historian HD The Industrial Big Data Historian Industrial machines have always issued early warnings, but in an inconsistent way and in a language that people could not

More information

An Oracle White Paper September, Oracle Exalogic Elastic Cloud: A Brief Introduction

An Oracle White Paper September, Oracle Exalogic Elastic Cloud: A Brief Introduction An Oracle White Paper September, 2010 Oracle Exalogic Elastic Cloud: A Brief Introduction Introduction For most enterprise IT organizations, years of innovation, expansion, and acquisition have resulted

More information

EXAMPLE SOLUTIONS Hadoop in Azure HBase as a columnar NoSQL transactional database running on Azure Blobs Storm as a streaming service for near real time processing Hadoop 2.4 support for 100x query gains

More information

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1 Spotlight Sessions Nik Rouda Director of Product Marketing Cloudera @nrouda Cloudera, Inc. All rights reserved. 1 Spotlight: Protecting Your Data Nik Rouda Product Marketing Cloudera, Inc. All rights reserved.

More information

Spark, Hadoop, and Friends

Spark, Hadoop, and Friends Spark, Hadoop, and Friends (and the Zeppelin Notebook) Douglas Eadline Jan 4, 2017 NJIT Presenter Douglas Eadline deadline@basement-supercomputing.com @thedeadline HPC/Hadoop Consultant/Writer http://www.basement-supercomputing.com

More information

Best Practices for Technology Renewal in Banking Institutions

Best Practices for Technology Renewal in Banking Institutions A SERIES OF ARTICLES, WHITE PAPERS AND BEST PRACTICES Best Practices for Technology Renewal in Banking Institutions Mitigating Risk, Lowering TCO and Enabling Technology Advancement through Disciplined

More information

IBM xseries 430. Versatile, scalable workload management. Provides unmatched flexibility with an Intel architecture and open systems foundation

IBM xseries 430. Versatile, scalable workload management. Provides unmatched flexibility with an Intel architecture and open systems foundation Versatile, scalable workload management IBM xseries 430 With Intel technology at its core and support for multiple applications across multiple operating systems, the xseries 430 enables customers to run

More information

Oracle Autonomous Data Warehouse Cloud

Oracle Autonomous Data Warehouse Cloud Oracle Autonomous Data Warehouse Cloud 1 Lower Cost, Increase Reliability and Performance to Extract More Value from Your Data With Oracle Autonomous Database Cloud Service for Data Warehouse Today s leading-edge

More information

An Oracle White Paper January Upgrade to Oracle Netra T4 Systems to Improve Service Delivery and Reduce Costs

An Oracle White Paper January Upgrade to Oracle Netra T4 Systems to Improve Service Delivery and Reduce Costs An Oracle White Paper January 2013 Upgrade to Oracle Netra T4 Systems to Improve Service Delivery and Reduce Costs Executive Summary... 2 Deploy Services Faster and More Efficiently... 3 Greater Compute

More information

Exalogic Elastic Cloud

Exalogic Elastic Cloud Exalogic Elastic Cloud Mike Piech Oracle San Francisco Keywords: Exalogic Cloud WebLogic Coherence JRockit HotSpot Solaris Linux InfiniBand Introduction For most enterprise IT organizations, years of innovation,

More information

Grid 2.0 : Entering the new age of Grid in Financial Services

Grid 2.0 : Entering the new age of Grid in Financial Services Grid 2.0 : Entering the new age of Grid in Financial Services Charles Jarvis, VP EMEA Financial Services June 5, 2008 Time is Money! The Computation Homegrown Applications ISV Applications Portfolio valuation

More information

IBM Tivoli Workload Scheduler

IBM Tivoli Workload Scheduler Manage mission-critical enterprise applications with efficiency IBM Tivoli Workload Scheduler Highlights Drive workload performance according to your business objectives Help optimize productivity by automating

More information

Achieving Agility and Flexibility in Big Data Analytics with the Urika -GX Agile Analytics Platform

Achieving Agility and Flexibility in Big Data Analytics with the Urika -GX Agile Analytics Platform Achieving Agility and Flexibility in Big Data Analytics with the Urika -GX Agile Analytics Platform Analytics R&D and Product Management Document Version 1 WP-Urika-GX-Big-Data-Analytics-0217 www.cray.com

More information

Engaging in Big Data Transformation in the GCC

Engaging in Big Data Transformation in the GCC Sponsored by: IBM Author: Megha Kumar December 2015 Engaging in Big Data Transformation in the GCC IDC Opinion In a rapidly evolving IT ecosystem, "transformation" and in some cases "disruption" is changing

More information

LEVERAGING DATA ANALYTICS TO GAIN COMPETITIVE ADVANTAGE IN YOUR INDUSTRY

LEVERAGING DATA ANALYTICS TO GAIN COMPETITIVE ADVANTAGE IN YOUR INDUSTRY LEVERAGING DATA ANALYTICS TO GAIN COMPETITIVE ADVANTAGE IN YOUR INDUSTRY Unlock the value of your data with analytics solutions from Dell EMC ABSTRACT To unlock the value of their data, organizations around

More information

Investor Presentation. Second Quarter 2016

Investor Presentation. Second Quarter 2016 Investor Presentation Second Quarter 2016 Note to Investors Certain non-gaap financial information regarding operating results may be discussed during this presentation. Reconciliations of the differences

More information

NICE Customer Engagement Analytics - Architecture Whitepaper

NICE Customer Engagement Analytics - Architecture Whitepaper NICE Customer Engagement Analytics - Architecture Whitepaper Table of Contents Introduction...3 Data Principles...4 Customer Identities and Event Timelines...................... 4 Data Discovery...5 Data

More information

Realising Value from Data

Realising Value from Data Realising Value from Data Togetherwith Open Source Drives Innovation & Adoption in Big Data BCS Open Source SIG London 1 May 2013 Timings 6:00-6:30pm. Register / Refreshments 6:30-8:00pm, Presentation

More information

KnowledgeSTUDIO. Advanced Modeling for Better Decisions. Data Preparation, Data Profiling and Exploration

KnowledgeSTUDIO. Advanced Modeling for Better Decisions. Data Preparation, Data Profiling and Exploration KnowledgeSTUDIO Advanced Modeling for Better Decisions Companies that compete with analytics are looking for advanced analytical technologies that accelerate decision making and identify opportunities

More information

IBM SmartCloud public images with selected software

IBM SmartCloud public images with selected software IBM SmartCloud public images with selected software Current as of September 1, 2011. To find out how your organization can leverage the IBM SmartCloud, visit our IBM SmartCloud Enterprise website. PAYG:

More information

Modernizing Your Data Warehouse with Azure

Modernizing Your Data Warehouse with Azure Modernizing Your Data Warehouse with Azure Big data. Small data. All data. Christian Coté S P O N S O R S The traditional BI Environment The traditional data warehouse data warehousing has reached the

More information

Datasheet FUJITSU Integrated System PRIMEFLEX for Hadoop

Datasheet FUJITSU Integrated System PRIMEFLEX for Hadoop Datasheet FUJITSU Integrated System PRIMEFLEX for Hadoop FUJITSU Integrated System PRIMEFLEX for Hadoop is a powerful and scalable platform analyzing big data volumes at high velocity FUJITSU Integrated

More information

Oracle Big Data Cloud Service

Oracle Big Data Cloud Service Oracle Big Data Cloud Service Delivering Hadoop, Spark and Data Science with Oracle Security and Cloud Simplicity Oracle Big Data Cloud Service is an automated service that provides a highpowered environment

More information

IBM Grid Offering for Analytics Acceleration: Customer Insight in Banking

IBM Grid Offering for Analytics Acceleration: Customer Insight in Banking Grid Computing IBM Grid Offering for Analytics Acceleration: Customer Insight in Banking customers. Often, banks may purchase lists and acquire external data to improve their models. This data, although

More information

Datasheet FUJITSU Integrated System PRIMEFLEX for Hadoop

Datasheet FUJITSU Integrated System PRIMEFLEX for Hadoop Datasheet FUJITSU Integrated System PRIMEFLEX for Hadoop FUJITSU Integrated System PRIMEFLEX for Hadoop is a powerful and scalable platform analyzing big data volumes at high velocity FUJITSU Integrated

More information

IBM Global Business Services Microsoft Dynamics AX solutions from IBM

IBM Global Business Services Microsoft Dynamics AX solutions from IBM IBM Global Business Services Microsoft Dynamics AX solutions from IBM Powerful, agile and simple enterprise resource planning 2 Microsoft Dynamics AX solutions from IBM Highlights Improve productivity

More information

IBM PureData System for Analytics Overview

IBM PureData System for Analytics Overview IBM PureData System for Analytics Overview Chris Jackson Technical Sales Specialist chrisjackson@us.ibm.com Traditional Data Warehouses are just too complex They do NOT meet the demands of advanced analytics

More information

Architected Blended Big Data With Pentaho. A Solution Brief

Architected Blended Big Data With Pentaho. A Solution Brief Architected Blended Big Data With Pentaho A Solution Brief Introduction The value of big data is well recognized, with implementations across every size and type of business today. However, the most powerful

More information

Building a solid foundation for big data analytics

Building a solid foundation for big data analytics Thought Leadership White Paper Big data analytics infrastructure Building a solid foundation for big data analytics 6 best practices for capitalizing on the full potential of big data 2 Building a solid

More information

On-Premises, Consumption- Based Private Cloud Creates Opportunity for Enterprise Out- Tasking Buyers

On-Premises, Consumption- Based Private Cloud Creates Opportunity for Enterprise Out- Tasking Buyers On-Premises, Consumption- Based Private Cloud Creates Opportunity for Enterprise Out- Tasking Buyers Stanton Jones, Analyst, Emerging Technology ISG WHITE PAPER 2014 Information Services Group, Inc. All

More information

Outline of Hadoop. Background, Core Services, and Components. David Schwab Synchronic Analytics Nov.

Outline of Hadoop. Background, Core Services, and Components. David Schwab Synchronic Analytics   Nov. Outline of Hadoop Background, Core Services, and Components David Schwab Synchronic Analytics https://synchronicanalytics.com Nov. 1, 2018 Hadoop s Purpose and Origin Hadoop s Architecture Minimum Configuration

More information

IBM and SAS: The Intelligence to Grow

IBM and SAS: The Intelligence to Grow IBM and SAS: The Intelligence to Grow IBM Partner Relationships Building Better Businesses An intelligent team Business agility the ability to make quick, wellinformed decisions and rapidly respond to

More information

Hortonworks Apache Hadoop subscriptions ( Subsciptions ) can be purchased directly through HP and together with HP Big Data software products.

Hortonworks Apache Hadoop subscriptions ( Subsciptions ) can be purchased directly through HP and together with HP Big Data software products. HP and Hortonworks Data Platform Hortonworks Apache Hadoop subscriptions ( Subsciptions ) can be purchased directly through HP and together with HP Big Data software products. Hortonworks is a major contributor

More information

SAS and Hadoop Technology: Overview

SAS and Hadoop Technology: Overview SAS and Hadoop Technology: Overview SAS Documentation September 19, 2017 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS and Hadoop Technology: Overview.

More information

IBM Big Data Summit 2012

IBM Big Data Summit 2012 IBM Big Data Summit 2012 12.10.2012 InfoSphere BigInsights Introduction Wilfried Hoge Leading Technical Sales Professional hoge@de.ibm.com twitter.com/wilfriedhoge 12.10.1012 IBM Big Data Strategy: Move

More information

Comparison of Open Source Software vs. IBM Spectrum LSF Suite for Enterprise

Comparison of Open Source Software vs. IBM Spectrum LSF Suite for Enterprise 902 Broadway, 7th Floor New York, NY 10010 www.theedison.com @EdisonGroupInc 212.367.7400 Comparison of Open Source Software vs. IBM Spectrum LSF Suite for Enterprise Key considerations when evaluating

More information

Taking Advantage of Cloud Elasticity and Flexibility

Taking Advantage of Cloud Elasticity and Flexibility Taking Advantage of Cloud Elasticity and Flexibility Fred Koopmans Sr. Director of Product Management 1 Public cloud adoption is surging 2 Cloudera customers are leading the way 3 Hadoop was born for the

More information

White paper June Managing the tidal wave of data with IBM Tivoli storage management solutions

White paper June Managing the tidal wave of data with IBM Tivoli storage management solutions White paper June 2009 Managing the tidal wave of data with IBM Tivoli storage management solutions Page 2 Contents 2 Executive summary 2 The costs of managing unabated data growth 3 Managing smarter with

More information

IBM Tivoli Service Desk

IBM Tivoli Service Desk Deliver high-quality services while helping to control cost IBM Tivoli Service Desk Highlights Streamline incident and problem management processes for more rapid service restoration at an appropriate

More information

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia MapR: Converged Data Pla3orm and Quick Start Solu;ons Robin Fong Regional Director South East Asia Who is MapR? MapR is the creator of the top ranked Hadoop NoSQL SQL-on-Hadoop Real Database time streaming

More information

Creating an Enterprise-class Hadoop Platform Joey Jablonski Practice Director, Analytic Services DataDirect Networks, Inc. (DDN)

Creating an Enterprise-class Hadoop Platform Joey Jablonski Practice Director, Analytic Services DataDirect Networks, Inc. (DDN) Creating an Enterprise-class Hadoop Platform Joey Jablonski Practice Director, Analytic Services DataDirect Networks, Inc. (DDN) Who am I? Practice Director, Analytic Services at DataDirect Networks, Inc.

More information

Hadoop Solutions. Increase insights and agility with an Intel -based Dell big data Hadoop solution

Hadoop Solutions. Increase insights and agility with an Intel -based Dell big data Hadoop solution Big Data Hadoop Solutions Increase insights and agility with an Intel -based Dell big data Hadoop solution Are you building operational efficiencies or increasing your competitive advantage with big data?

More information

Contents at a Glance COPYRIGHTED MATERIAL. Introduction... 1 Part I: Getting Started with Big Data... 7

Contents at a Glance COPYRIGHTED MATERIAL. Introduction... 1 Part I: Getting Started with Big Data... 7 Contents at a Glance Introduction... 1 Part I: Getting Started with Big Data... 7 Chapter 1: Grasping the Fundamentals of Big Data...9 Chapter 2: Examining Big Data Types...25 Chapter 3: Old Meets New:

More information

Microsoft Azure Essentials

Microsoft Azure Essentials Microsoft Azure Essentials Azure Essentials Track Summary Data Analytics Explore the Data Analytics services in Azure to help you analyze both structured and unstructured data. Azure can help with large,

More information

IBM Tivoli Monitoring

IBM Tivoli Monitoring Monitor and manage critical resources and metrics across disparate platforms from a single console IBM Tivoli Monitoring Highlights Proactively monitor critical components Help reduce total IT operational

More information

How In-Memory Computing can Maximize the Performance of Modern Payments

How In-Memory Computing can Maximize the Performance of Modern Payments How In-Memory Computing can Maximize the Performance of Modern Payments 2018 The mobile payments market is expected to grow to over a trillion dollars by 2019 How can in-memory computing maximize the performance

More information

Hadoop Integration Deep Dive

Hadoop Integration Deep Dive Hadoop Integration Deep Dive Piyush Chaudhary Spectrum Scale BD&A Architect 1 Agenda Analytics Market overview Spectrum Scale Analytics strategy Spectrum Scale Hadoop Integration A tale of two connectors

More information

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK Are you drowning in Big Data? Do you lack access to your data? Are you having a hard time managing Big Data processing requirements?

More information

Make smart business decisions when they matter most September IBM Active Content: Linking ECM and BPM to enable the adaptive enterprise

Make smart business decisions when they matter most September IBM Active Content: Linking ECM and BPM to enable the adaptive enterprise September 2007 IBM Active Content: Linking ECM and BPM to enable the adaptive enterprise 2 Contents 2 Introduction 3 Linking information and events: Creating Active Content 4 Actively delivering enterprise

More information

: Boosting Business Returns with Faster and Smarter Data Lakes

: Boosting Business Returns with Faster and Smarter Data Lakes : Boosting Business Returns with Faster and Smarter Data Lakes Empower data quality, security, governance and transformation with proven template-driven approaches By Matt Hutton Director R&D, Think Big,

More information

Actian DataConnect 11

Actian DataConnect 11 Actian DataConnect 11 Architected for Next-Gen Hybrid Integration Technical WhitePaper April 2017 Contents Introduction... 3 Actian DataConnect solution overview... 3 Connectivity Sources... 4 DataConnect

More information

Oracle Autonomous Data Warehouse Cloud

Oracle Autonomous Data Warehouse Cloud Oracle Autonomous Data Warehouse Cloud 1 Lower Cost, Increase Reliability and Performance to Extract More Value from Your Data With Oracle Autonomous Data Warehouse Cloud Today s leading-edge organizations

More information

The Role of the Operating System in Cloud Environments

The Role of the Operating System in Cloud Environments The Role of the Operating System in Cloud Environments Judith Hurwitz, President Marcia Kaufman, COO Sponsored by Red Hat Cloud computing is a technology deployment approach that has the potential to help

More information

Modern Payment Fraud Prevention at Big Data Scale

Modern Payment Fraud Prevention at Big Data Scale This whitepaper discusses Feedzai s machine learning and behavioral profiling capabilities for payment fraud prevention. These capabilities allow modern fraud systems to move from broad segment-based scoring

More information

EMC ATMOS. Managing big data in the cloud A PROVEN WAY TO INCORPORATE CLOUD BENEFITS INTO YOUR BUSINESS ATMOS FEATURES ESSENTIALS

EMC ATMOS. Managing big data in the cloud A PROVEN WAY TO INCORPORATE CLOUD BENEFITS INTO YOUR BUSINESS ATMOS FEATURES ESSENTIALS EMC ATMOS Managing big data in the cloud ESSENTIALS Purpose-built cloud storage platform designed for unlimited global scale Intelligently automates management of content through highly flexible policies

More information

The Sysprog s Guide to the Customer Facing Mainframe: Cloud / Mobile / Social / Big Data

The Sysprog s Guide to the Customer Facing Mainframe: Cloud / Mobile / Social / Big Data Glenn Anderson, IBM Lab Services and Training The Sysprog s Guide to the Customer Facing Mainframe: Cloud / Mobile / Social / Big Data Summer SHARE August 2015 Session 17794 2 (c) Copyright 2015 IBM Corporation

More information

Software-Defined Storage: A Buyer s Guide

Software-Defined Storage: A Buyer s Guide Software-Defined Storage: A Buyer s Guide Who should read this paper Software-defined storage (SDS) is a trend that is fast gaining currency within the IT profession. It seems vendors and analysts alike

More information

Transforming IIoT Data into Opportunity with Data Torrent using Apache Apex

Transforming IIoT Data into Opportunity with Data Torrent using Apache Apex CASE STUDY Transforming IIoT Data into Opportunity with Data Torrent using Apache Apex DataTorrent delivers better business outcomes for customers using industrial of things (IIoT) data Challenge The industrial

More information

Compiere ERP Starter Kit. Prepared by Tenth Planet

Compiere ERP Starter Kit. Prepared by Tenth Planet Compiere ERP Starter Kit Prepared by Tenth Planet info@tenthplanet.in www.tenthplanet.in 1. Compiere ERP - an Overview...3 1. Core ERP Modules... 4 2. Available on Amazon Cloud... 4 3. Multi-server Support...

More information