CA Workload Automation Advanced Integration for Hadoop: Automate, Accelerate, Integrate
Big Data. Big Deal. The need to mine massive sets of information for unique insights into customer behaviors, competitive plays and market fluctuations has transformed big data initiatives into imperative, business-critical priorities. But just how big is big data? $41 billion Big enough that the market for big data technolgies and services will exceed $41 billion by 2018. 1 90% And thanks to the significant emphasis organizations place on big data as well as the budget they re dedicating to it demand for IT professionals with big data expertise jumped by nearly 90 percent between December 2013 and December 2014. 2 12% That said, a big data initiative is not without its challenges. In fact, most companies estimate they re analyzing a mere 12 percent of the data they have. 3 So, what are businesses doing to get around these challenges and extract greater value from big data? 1 IDC, Workload Automation Emerges as Business Innovation Engine in the Era of Cloud, Big Data, and DevOps, April 2015. 2 Forbes, Where Big Data Jobs Will Be In 2015, December 29, 2014. 3 Forrester, The Forrester Wave : Big Data Hadoop Solutions, Q1 2014, February 28, 2014. 2
Hadoop Holds the Key to Better Big Data Analysis The open-source Apache Hadoop platform has rapidly emerged as the dominant means by which businesses process, analyze and extract insights from their growing sets of data. It has become so popular, in fact, that the global Hadoop market is expected to reach $50.2 billion by 2020. 4 So, what s behind Hadoop s rise in prominence? For starters, it s far less expensive than other data storage methods companies can create Hadoop infrastructures built almost exclusively on cost-effective, scalable and resilient commodity hardware and software. As a result, it s easy for businesses to add more storage or processing power to their clusters without overtaxing IT budgets or diverting funds from other strategic initiatives. Plus, Hadoop allows organizations to deliver a more personalized experience that meets their specific needs around big data helping them optimize customer-facing services, more effectively support changing business goals and provide better, more efficient resource utilization. Unfortunately, some challenges unique to the Hadoop environment can counteract the value the platform delivers. The largest Hadoop clusters can include upwards of 1,000 to 2,000 nodes. 5 4 Allied Market Research, Global Hadoop Market - Industry Growth, Size, Share, Insights, Analysis, Research, Report, Opportunities, Trends and Forecasts Through 2020, March 2014. 5 Enterprise Tech, Systems Edition, Hadoop Finds Its Place In The Enterprise, October 29, 2014. 3
Managing Hadoop and Traditional Infrastructures: Critical, but not Always Easy Because Hadoop infrastructures run independent of each other, operating one means workflows will inevitably incorporate both Hadoop and traditional jobs. While Hadoop does include a basic scheduler that delivers some automation, it is focused primarily on jobs that run on Hadoop clusters, and doesn t integrate well with other workload automation engines. This makes identifying the relevant data and assembling and analyzing it across multiple platforms and Hadoop clusters as well as managing dependencies on external schedulers and data sources extremely difficult. This limitation can introduce several critical challenges. So, when it comes to managing a Hadoop environment alongside your existing infrastructure, are you struggling with: Visualizing, monitoring and running multiple workflows across numerous, disparate IT environments? >> Managing parallel, time-dependent jobs, as well as event-driven usage spikes? >> Training users on multiple scheduling engines and writing and maintaining manual scripts for each Hadoop job? >> 4
Is it Hard for You to Visualize, Monitor and Run Multiple Workflows? As you work to do more and more with big data, it s only natural to expect your end-to-end business workflows to include an increasingly intricate blend of Hadoop and traditional jobs. Although this is certainly a normal result of incorporating big data into your broader workflows, it also means you ll have to contend with greater complexity as you work to simultaneously orchestrate Hadoop and traditional jobs. In most instances, Hadoop users will have to run their jobs separately from traditional ones. This greatly increases the time and effort required to deliver big data services to the business. Moreover, it limits your overarching visibility into all jobs both traditional and Hadoop currently executing. And when this happens, it can create situations where confusion over the order in which jobs should be scheduled leads to slower response times and missed business opportunities. But what if you could monitor end-to-end workflows from a single console? 5
Is it Hard for You to Manage Time-Dependent Jobs or Usage Spikes? In much the same way that they must run their jobs separately from traditional ones, the majority of Hadoop users also have no means of triggering traditional jobs that are dependent on a specific piece of the Hadoop workflow. This issue makes it quite challenging for you to manage parallel jobs, as doing so often requires users to toggle between Hadoop and traditional engines to manage the larger workflow in order to effectively meet key corporate objectives. Worse yet, because you must typically prioritize resources and activities according to shifting business needs or time-sensitive events, any inability to understand the dependencies between jobs or trigger them in the proper sequence can dramatically increase the time and cost associated with completing a workflow. But what if you could manage the dependencies between Hadoop and traditional jobs from a centralized location? 6
Is it Hard for You to Dedicate Resources Toward Training Users on Multiple Schedulers? As if jumping between Hadoop and traditional workload automation engines wasn t challenging enough, the fact that these schedulers employ dissimilar user interfaces introduces an extra set of considerations to overcome. Because you have no recourse but to schedule workflows using both approaches, you re left deciding between two unattractive options: invest time and money training users on both schedulers, or build completely separate teams for each job type. No matter which route you choose, you ll be making tradeoffs. The resources you invest in building and executing detailed training programs could have supported a strategic organizational initiative instead. Likewise, operating two unique teams adds unnecessary management burden and makes it difficult to orchestrate Hadoop and traditional jobs in a seamless manner. And, ultimately, either approach results in reduced productivity and a slower overall delivery of insights to the business. But what if you could schedule both Hadoop and traditional jobs from a single, easy-to-use, enterprise-class tool? 7
There Is a Way: CA Workload Automation Advanced Integration for Hadoop Hadoop holds the key to unlocking your organization s potential. But what if there was a way to eliminate the challenges associated with Hadoop and more effectively utilize the platform for its true purpose? When you re able to effectively use it to align technology with business operations, Hadoop can deliver value across a number of use cases everything from supporting ecommerce initiatives during the holiday season to automatically delivering personalized promotions to identifying potential acts of fraud or non-compliance. And when you implement CA Workload Automation Advanced Integration for Hadoop, it s possible. The solution allows you to add Hadoop jobs into existing CA Workload Automation workflows and monitor everything through a single console including real-time and batch processes across multiple, disparate platforms in a holistic, enterprise-wide manner. Here s how 8
How It Works CA Workload Automation Advanced Integration for Hadoop makes it possible to integrate Hadoop and traditional jobs by delivering: Seamless application integration Execute Hadoop jobs in sync with others throughout the enterprise, helping you reduce Hadoop job complexity while increasing overall reliability and flexibility. Multi-platform scheduling Visualize end-to-end business processes spanning Hadoop and other platforms from a central point of control. Critical path analysis and forecasting Identify and understand the business impact of Hadoop jobs, so you can better attain and uphold critical service-level agreements. 6 A familiar interface for all jobs Manage complete workflows without having to toggle between CA Workload Automation and Hadoop environments. Resource optimization Increase the efficiency with which you allocate resources by coordinating work based on what s currently available across physical, virtual and cloud environments. The visibility to know when you should run a specific workload Understand when a job is supposed to complete and quickly spot and triage any problems that may arise along the way. 6 Requires CA Workload Analytics. 9
Big Data Made Easy With CA Workload Automation Advanced Integration for Hadoop, you ll be able to: Gain unified visibility into your entire Hadoop environment. Lower costs by eliminating the complexity associated with disconnected monitoring tools. Easily integrate big data services into your existing technology landscape. Achieve consistent service levels, stronger business integration and lower cost and risk for customers. Support changing business goals and help the enterprise drive growth and competitive advantage. Reduce the time and effort required to deliver big data business services. Improve performance and uptime through proactive monitoring and alerts. Accelerate big data projects and foster a more agile enterprise that is better positioned to deliver greater business value. 10
To learn more about how CA Workload Automation Advanced Integration for Hadoop can help your organization derive greater value from its Hadoop investments, please visit ca.com/wla. CA Technologies (NASDAQ: CA) creates software that fuels transformation for companies and enables them to seize the opportunities of the application economy. Software is at the heart of every business, in every industry. From planning to development to management and security, CA is working with companies worldwide to change the way we live, transact, and communicate across mobile, private, and public cloud, distributed and mainframe environments. Learn more at ca.com. Copyright 2015 CA. All rights reserved. Apache is a registered trademark and Hadoop is a trademark of the Apache Software Foundation in the United States and other countries. All other trademarks, trade names, service marks and logos referenced herein belong to their respective companies. This document is for your informational purposes only. CA assumes no responsibility for the accuracy or completeness of the information. To the extent permitted by applicable law, CA provides this document as is without warranty of any kind, including, without limitation, any implied warranties of merchantability, fitness for a particular purpose, or noninfringement. In no event will CA be liable for any loss or damage, direct or indirect, from the use of this document, including, without limitation, lost profits, business interruption, goodwill or lost data, even if CA is expressly advised in advance of the possibility of such damages. CA does not provide legal advice. Neither this document nor any CA software product referenced herein shall serve as a substitute for your compliance with any laws (including but not limited to any act, statute, regulation, rule, directive, policy, standard, guideline, measure, requirement, administrative order, executive order, etc. (collectively, Laws )) referenced in this document. You should consult with competent legal counsel regarding any Laws referenced herein. CS200-130318