White Paper: SAS and Apache Hadoop For Government. Inside: Unlocking Higher Value From Business Analytics to Further the Mission

Similar documents
5th Annual. Cloudera, Inc. All rights reserved.

In-Memory Analytics: Get Faster, Better Insights from Big Data

TechValidate Survey Report. Converged Data Platform Key to Competitive Advantage

Oracle Big Data Discovery The Visual Face of Big Data

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

Investor Presentation. Second Quarter 2016

Investor Presentation. Fourth Quarter 2015

Bringing the Power of SAS to Hadoop Title

Predictive Analytics Reimagined for the Digital Enterprise

Statistics & Optimization with Big Data

The Best Enterprise Analytics Investment

Microsoft Big Data. Solution Brief

The Mainframe s Relevance in the Digital World

Datametica DAMA. The Modern Data Platform Enterprise Data Hub Implementations. What is happening with Hadoop Why is workload moving to Cloud

Why Machine Learning for Enterprise IT Operations

DLT AnalyticsStack. Powering big data, analytics and data science strategies for government agencies

White Paper. Return on Information: The New ROI. Getting value from data

Louis Bodine IBM STG WW BAO Tiger Team Leader

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

Business Rules Modeling Studio

INTRODUCTION TO R FOR DATA SCIENCE WITH R FOR DATA SCIENCE DATA SCIENCE ESSENTIALS INTRODUCTION TO PYTHON FOR DATA SCIENCE. Azure Machine Learning

Meltem Özturan

COST ADVANTAGES OF HADOOP ETL OFFLOAD WITH THE INTEL PROCESSOR- POWERED DELL CLOUDERA SYNCSORT SOLUTION

Why Web-Enable Your Legacy Application?

Analytics in Action transforming the way we use and consume information

Blueprints for Big Data Success

Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect

SAS GRID Assesments from d-wise

Analytics With Hadoop. SAS and Cloudera Starter Services: Visual Analytics and Visual Statistics

WHITE PAPER SPLUNK SOFTWARE AS A SIEM

SAS & HADOOP ANALYTICS ON BIG DATA

Creating a Data-Driven Advantage in Insurance

Nouvelle Génération de l infrastructure Data Warehouse et d Analyses

Blueprints for Big Data Success. Succeeding with four common scenarios

Oracle Big Data Cloud Service

CREATING A FOUNDATION FOR BUSINESS VALUE

Table of Contents. Are You Ready for Digital Transformation? page 04. Take Advantage of This Big Data Opportunity with Cisco and Hortonworks page 06

Network maintenance evolution and best practices for NFV assurance October 2016

Actionable enterprise architecture management

WIN BIG WITH GOOGLE CLOUD

Pega Upstream Oil & Gas Capabilities Overview

Six Critical Capabilities for a Big Data Analytics Platform

Comprehensive Enterprise Solution for Compliance and Risk Monitoring

IBM and SAS: The Intelligence to Grow

Trusted by more than 150 CSPs worldwide.

Analytic Workloads on Oracle and ParAccel

Analytics for All Your Data: Cloud Essentials. Pervasive Insight in the World of Cloud

When the Status Quo Means Getting Left Behind. Accelerating Analytics Platform Adoption through Evolving Technology

IBM Db2 Warehouse. Hybrid data warehousing using a software-defined environment in a private cloud. The evolution of the data warehouse

Building the Foundation for Digital Insurance. An IDC InfoBrief, sponsored by CSC and EMC September 2016

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

The Industry Leader in Data Warehousing, Big Data Analytics, and Marketing Solutions

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN

IBM Digital Analytics Accelerator

GE Intelligent Platforms. Proficy Historian HD

Got Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes

Solution Overview : The IBM Government Industry Framework

Enabling Self-Service Analytics Across The UDA With Teradata AppCenter

Simplifying Hadoop. Sponsored by. July >> Computing View Point

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand

Hadoop Solutions. Increase insights and agility with an Intel -based Dell big data Hadoop solution

Decision Framework for Building Platform as a Service (PaaS) based Government Services

Operationalizing Analytics

IBM Balanced Warehouse Buyer s Guide. Unlock the potential of data with the right data warehouse solution

Use of Predictive Modeling to Detect Overpayments/ Abuse

Copyright - Diyotta, Inc. - All Rights Reserved. Page 2

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

IBM Analytics. Data science is a team sport. Do you have the skills to be a team player?

Spark and Hadoop Perfect Together

Data Science at Scale

Architecting an Open Data Lake for the Enterprise

H2O Powers Intelligent Product Recommendation Engine at Transamerica. Case Study

Changing The Business landscape SAS and Open Source, Better Together. Dr Mark Chia, Head of Advanced Analytics, SAS

Savvius and Splunk: Network Insights for Operational Intelligence

Applied business analysts approach to IT projects Methodological framework

Make Business Intelligence Work on Big Data

AI in ITSM. Automate your IT to deliver great experience.

Chapter 3 DECISION SUPPORT SYSTEMS CONCEPTS, METHODOLOGIES, AND TECHNOLOGIES: AN OVERVIEW

Management Information Systems (MIS)

TOP 10 REASONS TO MOVE YOUR CONTACT CENTER TO

A Freshwater Partners White Paper

TRANSFORMING RETAIL. RINGING UP GREATER PROFITS Exceed consumer expectations. Keep your brand promise.

Management Information Systems (MIS)

Realising Value from Data

High-Performance Computing (HPC) Up-close

Next Generation Services for Digital Transformation: An Enterprise Guide for Prioritization

Ways to Transform. Big Data Analytics into Big Value

KnowledgeSTUDIO. Advanced Modeling for Better Decisions. Data Preparation, Data Profiling and Exploration

Building Your Big Data Team

BIG DATA TRANSFORMS BUSINESS. The EMC Big Data Solution

POWER REAL-TIME TELCO NETWORK OPERATIONS WITH EXTREME ANALYTICS

Embracing the Hybrid Cloud using Power BI in CSP. Name Role Group

GIVING ANALYTICS MEANING AGAIN

LEVERAGE THE WEALTH OF DATA INTELLIGENCE BUSINESS INTELLIGENCE ANALYTICS CDW FINANCIAL SERVICES

Databricks Cloud. A Primer

White Paper. Demand Signal Analytics: The Next Big Innovation in Demand Forecasting

Global Technology Office

An Overview of the AWS Cloud Adoption Framework

Transcription:

White Paper: SAS and Apache Hadoop For Government Unlocking Higher Value From Business Analytics to Further the Mission Inside: Using SAS and Hadoop Together Design Considerations for Your SAS and Hadoop Project Kick- Starting Your SAS and Hadoop Implementation 1

About This Paper Enterprises in government are awash in more data than they can make sense of. This has given rise to the current Big Data phenomenon, in which opportunities for turning data into knowledge using analytics calls for new solutions. Challenges such as scalability, performance and the ability to handle new and different types of data makes it difficult to unlock the value in the data while it is still current. One of the most important architectural trends enterprises should consider today is the integration of new Hadoop-centric Big Data approaches with user-focused business analytics capabilities. The powerful combination of SAS and Hadoop for business analytics can provide a great solution to address the many threats as well as opportunities government agencies face today. What Is Business Analytics? Tom Davenport defines business analytics as: the broad use of data and quantitative analysis for decision making within organizations. It encompasses query and reporting, but aspires to greater levels of mathematical sophistication. It includes analytics, of course, but involves harnessing them to meet defined business objectives. Business analytics empowers people in the organization to make better decisions, improve processes and achieve desired outcomes. It brings together the best of data management, analytic methods, and the presentation of results all in a closed-loop cycle for continuous learning and improvement (From: The New World of Business Analytics March 2010) SAS business analytics software is focused on delivering actionable value from enterprise data holdings. The long-term, consistent vision and continuous innovation of SAS has kept SAS the market leader in business analytics. This remains true in the age of Hadoop, where SAS has brought the power of user-focused business analytics to big data. Apache Hadoop At its core, Hadoop is an open-source framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Hadoop is a free, Java- 2

based programming framework that essentially accomplishes two tasks: massive data storage and faster processing. The open source Apache Hadoop framework has become the foundation of missionfocused data modernization activities throughout industry and government. This emerging platform has widespread adoption and has a large development community focused on continual improvement. The power of the Hadoop framework is in its ability to analyze data at scale. By using distributed computing models where many processors work over different parts of the data all at once, Hadoop enables very fast analysis. The framework supports diverse types of data. Hadoop enables this analysis using cost efficient commodity hardware, making the approach particularly virtuous. The typical enterprise use of Hadoop today is as a comprehensive platform that stores and retains data, in any state, form, or volume. This concept is one of an Enterprise Data Hub (EDH). Today, data storage costs are declining to roughly $1.5K per terabyte; therefore, Hadoop has been revolutionary in data storage by being more cost efficient. The attractiveness of this solution comes from its ability to meet mission needs in economical and efficient ways and an open design that ensures future missions can be supported without forklift upgrades. Using SAS and Hadoop Together SAS and Hadoop are a particularly good match for each other. Hadoop s ability to store and manage all data types and execute operations over the data in distributed ways has brought new power to the SAS business analytics applications. SAS is deeply commited to research and the development of its software, with user experience always top of mind. Their engineers have enabled a technical connection to Hadoop that abstracts away complexity for users, but brings the full power of all the data in the enterprise to the users. Users see easy to use business analytics tools that enable more powerful support to missions. And this is done without the need for technologists to craft queries in programming languages. It really just works. Once only available to coders, the latest SAS software enables users with drag-n-drop analytic capabilities such as creating reports, visualizing trends, identifying anomalies and outliers, and spotting variable correlations. SAS and Hadoop work especially well in situations where advanced analytic techniques are applied on large volumes of data. Some current use cases leveraging this approach have included: 3

Tracking down ground-zero and root-cause of disease outbreaks such as Ebola and measles. Identifying likely drug traffickers at border crossings. Detecting fraudulent medical claims. Identifying money laundering and terrorist financing rings. Spotting insider threat by recognizing anomalous patterns of behavior. The SAS Approach to Hadoop: From, With, and In To the analyst user of SAS, the business analytic tools work seamlessly and produce results. How this is done is something only the enterprise architects need to track. Architects see a design where SAS can be connected in three key ways: From: SAS accesses and extracts data from Hadoop to a SAS server for processing, and writes results back as required. SAS capabilities can move the right data from any source, including Hadoop, and for some analytical workloads this is the right approach: run a query and move data to a SAS analytic tool. With: SAS accesses and processes Hadoop data on SAS servers, while keeping the data and computations massively parallel. This more powerful operation is the working of SAS in conjunction with a Hadoop cluster, where some analytical tasks are performed with SAS and others are farmed out to the cluster. Results are presented in dynamic ways for analysts to iterate on and analyze. In: SAS processes data directly in the Hadoop cluster. The combination of SAS s embedded process agents and the distributed data framework of Hadoop itself make this even more powerful combination possible. This approach presents information to analysts fast and enables quick iteration over results that take into account all the data holdings of an organization. The Benefits of SAS and Hadoop Together SAS support for big data implementations and Hadoop center on one goal: helping the analyst know more, faster, so better decisions can be made in a more timely manner. The engineering to achieve this goal has resulted in a SAS and Hadoop architecture that: Allows queries in the SAS business analytic tools to run faster than if they were to run in Hadoop alone. 4

Improves the performance of Hadoop to the point where queries are now so fast that analysts can iterate their questions rapidly. Analysts see incredible speed from their SAS business analytic tools. Allows analytics on very large data sets in an enterprise situation that other vendors just can't handle. Combines SAS predictive analytics, forecasting and data visualization capabilities, with the power and large data capabilities of Apache Hadoop; therefore, making SAS analytical procedures and applications even more powerful. Enables target identification, fraud detection, and other data-intensive analysis to run faster, using the same user-friendly business analytic tools they already rely on only on a far larger volume of data. Make direct contributions to operational decisions by using machine learning to process data in new ways. All the power of Hadoop and more is brought to the analyst via tools designed specifically for them. While SAS allows for coding, Java developers are not needed for queries, and analysts do not need to write MapReduce jobs. 5

The SAS and Hadoop Ecosystem Figure 1. SAS helps users manage data on Hadoop through an intuitive user interface, so it s easy to perform self- service data preparation tasks with minimal training. The most critical part of the diagram at Figure 1 is the user interface. SAS business analytic tools are used not only because they are powerful, but also because they are focused on the needs of humans, and that remains true in a combined SAS and Hadoop architecture. But architects will also appreciate the implied interoperability and functionality of this diagram. Design Considerations for Your SAS and Hadoop Project We interviewed Doug Liming and John McCue, two of SAS s leading big data engineers, seeking insights that can help architects optimize their SAS and Hadoop implementation. The result of these interviews is a succinct list of the top principles for SAS and Hadoop project success. Our recommendations for planners include: 6

Architect so humans do what humans do best and computers do what computers do best. Organizations are optimized for analysis when they design systems that empower their analysts to do what they do best, and leverage IT to do what it does best. Analysts leverage the greatest processor on earth, their brains. They are paid to think and generate knowledge that supports their organization s mission. Humans develop insights and inferences and produce actionable intelligence for decision makers to act upon. Humans are great at utilizing their pattern recognition and sensemaking abilities, up to a point. Even the most trained and experienced analyst can only process a fixed number of objects at any one time. Once analysts pass that threshold, human processing power degrades rapidly. Using SAS as the business analytics platform empowers analysts with very capable ways to access and interact with all the data in the enterprise, and does so in a way that leverages the strengths of the human mind. Understand and focus on current use cases: The mission of your organization is key, and that is what your business analytic tools and your overall data architecture should support. Ensure this is done by dialog over well-thought-out and well-staffed use cases. This will help planners identify and clarify the most important objectives and design goals for your project. Determining the prioritized data flows for the first use cases will help ensure demonstrable success early in a project s lifecycle. Ensure the design focuses on outputs: Identify the analytical queries and algorithms required to generate desired outputs. This will enable the capturing of the advanced analytics requirements and interactive query needs that the system must meet. Plan for future expansion of use cases: First successes will be measured based on how well they meet current agency needs. But the power of a wellengineered solution of SAS and Hadoop is that it can support many new use cases and future workloads. The key action in planning for expansion is to listen to the challenges faced by mission owners, and to be prepared to iteratively incorporate the new workloads and new data flows, provided by lessons learned, into the solution. Consider the full design: Consider compute, networking, data storage and the software framework together as the data platform. SAS is the business analytics component, and Hadoop is the data framework. Optimizing them 7

should include consideration for communications and storage that performs to your expectations. Ask for design help: Repeatable patterns from other enterprises are available for reference. Engineers from SAS and their partners can help refine and turn functional reference architecture into a technical design that will rapidly bring new functionality to the agency mission. Kick-Starting Your SAS and Hadoop Implementation Ready to move out? Here are four steps to consider as you do: 1) Evaluate your enterprise in light of the recommended criteria above. Use that to build your plan. 2) Enlist the aid of your analyst community to prioritize the analytical capabilities to deliver. 3) After prioritizing the analytical capabilities your mission requires, address the enterprise technology gaps required to enhance support to mission. 4) Track improvements to your enterprise like a project: Watch cost, schedule and performance, and use those metrics to drive to goals. Concluding Thoughts Government organizations do recognize the importance of big data, and they understand the value it can bring using analytics to extract insights. As the cost of storage has decreased, Hadoop has become an affordable means to accommodate these voluminous collections of data, as well as enable a new level of analytic capability never possible before. The combination of SAS analytics and SAS data management tools for Hadoop brings analytics to a higher level of scalability and performance and overcomes many of the obstacles preventing government organizations from extracting real, timely value from their data. SAS also reduces the burden on IT by allowing users to be more self- 8

sufficient and offers tools that allow users with minimal data skills to access and prepare their own data for their own analysis. SAS is flexible in working with any hardware or database vendor and will easily integrate with all legacy and new technologies in the government enterprise today. This includes Hadoop and all data warehouse capabilities. The success of your big data analytics project will depend on the value it brings to the organization. Together, SAS and Hadoop can unlock the value that you are not experiencing today. This is the driving reason to consider SAS and Hadoop together for your enterprise data mission needs. For more information on SAS and Hadoop visit: http://sas.com/hadoopvision 9

More Reading For more federal technology and policy issues visit: CTOvision.com- A blog for enterprise technologists with a special focus on Big Data. CTOlabs.com - A reference for research and reporting on all IT issues. J.mp/ctonews - Sign up for the government technology newsletters including the Government Big Data Weekly. About the Authors Bob Flores is a co-founder and partner at Cognitio. Bob spent 31 years at the Central Intelligence Agency. While at CIA, Bob held various positions in the Directorate of Intelligence, Directorate of Support, and the National Clandestine Service. He was the agency s Chief Technology Officer. Bob serves on numerous government and industry advisory boards Bob Gourley is a co-founder of Cognitio and editor and chief of CTOvision.com He is a former federal CTO. His career included service in operational intelligence centers around the globe where his focus was operational all source intelligence analysis. He was the first director of intelligence at DoD s Joint Task Force for Computer Network Defense, served as director of technology for a division of Northrop Grumman and spent three years as the CTO of the Defense Intelligence Agency. Bob serves on numerous government and industry advisory boards. Roger Hockenberry is a co-founder and partner and CEO at Cognitio. Following a two-decade career in industry first as a technology consultant and later as a management consultant and Managing Partner at Gartner Roger spent four-years in government service in the intelligence community where he was charged with driving the realization of the vision he had helped craft as a consultant. For More Information If you have questions or would like to discuss this report, please contact me. As an advocate for better IT use in enterprises I am committed to keeping this dialogue open on technologies, processes and best practices that will keep us all continually improving our capabilities and ability to support organizational missions. Contact: Bob Gourley bob.gourley@cognitiocorp.com CTOlabs.com 10