Bringing Big Data to Life: Overcoming The Challenges of Legacy Data in Hadoop

Similar documents
Syncsort Incorporated, 2016

2018 Big Data Trends: Liberate, Integrate & Trust

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

5th Annual. Cloudera, Inc. All rights reserved.

Cognizant BigFrame Fast, Secure Legacy Migration

Microsoft Azure Essentials

Building a Single Source of Truth across the Enterprise An Integrated Solution

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

Hortonworks Connected Data Platforms

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper

Building a Data Lake on AWS EBOOK: BUILDING A DATA LAKE ON AWS 1

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1

SAP Cloud Platform Big Data Services EXTERNAL. SAP Cloud Platform Big Data Services From Data to Insight

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud

ZPSaver Suite User s Guide Document Number: SI Copyright Syncsort Incorporated All Rights Reserved.

Sr. Sergio Rodríguez de Guzmán CTO PUE

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia

Cask Data Application Platform (CDAP) Extensions

SOLUTION SHEET Hortonworks DataFlow (HDF ) End-to-end data flow management and streaming analytics platform

EBOOK: Cloudwick Powering the Digital Enterprise

Amsterdam. (technical) Updates & demonstration. Robert Voermans Governance architect

Governing Big Data and Hadoop

Trifacta Data Wrangling for Hadoop: Accelerating Business Adoption While Ensuring Security & Governance

Managing Data in Motion with the Connected Data Architecture

Datametica DAMA. The Modern Data Platform Enterprise Data Hub Implementations. What is happening with Hadoop Why is workload moving to Cloud

Building a Data Lake on AWS

Real-Time Streaming: IMS to Apache Kafka and Hadoop

TechValidate Survey Report. Converged Data Platform Key to Competitive Advantage

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

GET MORE VALUE OUT OF BIG DATA

THE CIO GUIDE TO BIG DATA ARCHIVING. How to pick the right product?

LEVERAGING DATA ANALYTICS TO GAIN COMPETITIVE ADVANTAGE IN YOUR INDUSTRY

The Mainframe s Relevance in the Digital World

Hortonworks Data Platform

Business is being transformed by three trends

DataAdapt Active Insight

AMD and Cloudera : Big Data Analytics for On-Premise, Cloud and Hybrid Deployments

EDW MODERNIZATION & CONSUMPTION

Why Machine Learning for Enterprise IT Operations

Transforming Big Data to Business Benefits

Make Business Intelligence Work on Big Data

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

When Big Data Meets Fast Data

Mainframe Development Study: The Benefits of Agile Mainframe Development Tools

Aprimo Marketing Productivity

Cask Data Application Platform (CDAP)

THE MAGIC OF DATA INTEGRATION IN THE ENTERPRISE WITH TIPS AND TRICKS

Copyright - Diyotta, Inc. - All Rights Reserved. Page 2

CA Workload Automation Advanced Integration for Hadoop: Automate, Accelerate, Integrate

IBM Analytics Unleash the power of data with Apache Spark

SOLUTION SHEET End to End Data Flow Management and Streaming Analytics Platform

DLT AnalyticsStack. Powering big data, analytics and data science strategies for government agencies

How In-Memory Computing can Maximize the Performance of Modern Payments

From Data Deluge to Intelligent Data

Transforming IIoT Data into Opportunity with Data Torrent using Apache Apex

Table of Contents. Are You Ready for Digital Transformation? page 04. Take Advantage of This Big Data Opportunity with Cisco and Hortonworks page 06

Oracle Big Data Cloud Service

Cloudera Data Science and Machine Learning. Robin Harrison, Account Executive David Kemp, Systems Engineer. Cloudera, Inc. All rights reserved.

Microsoft Big Data. Solution Brief

ROI Strategies for IT Executives. Syncsort ROI Strategies for IT Executives

Unleash the Power of Mainframe Data in the Application Economy

Infor SunSystems. Grow with flexibility. Integrate

The Five Essential Elements of Self-Service Data Integration

Active Analytics Overview

SUSiEtec The Application Ready IoT Framework. Create your path to digitalization while predictively addressing your business needs

Pentaho 8.0 Overview. Pedro Alves

Apache Hadoop in the Datacenter and Cloud

TechArch Day Digital Decoupling. Oscar Renalias. Accenture

Insights-Driven Operations with SAP HANA and Cloudera Enterprise

INDUSTRY BRIEF THE ENTERPRISE DATA HUB IN FINANCIAL SERVICES: THREE CUSTOMER CASE STUDIES

Mastering the operational complexity of IoT Applications

ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS)

Why an Open Architecture Is Vital to Security Operations

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

Streaming Data Empowers Royal Bank of Canada to be a Data-Driven Organization

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand

Common Customer Use Cases in FSI

Rocket Solutions for IBM z Systems. System Optimization & Storage Tools. Data Protection. Mainframe Modernization. Business Intelligence

Analytics for All Your Data: Cloud Essentials. Pervasive Insight in the World of Cloud

Hadoop Stories. Tim Marston. Director, Regional Alliances Page 1. Hortonworks Inc All Rights Reserved

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation

Big Data Cloud. Simple, Secure, Integrated and Performant Big Data Platform for the Cloud

Building Your Big Data Team

BIG DATA TRANSFORMS BUSINESS. The EMC Big Data Solution

MIGRATING AND MANAGING MICROSOFT WORKLOADS ON AWS WITH DATAPIPE DATAPIPE.COM

Legacy Application Retirement Guide

White Paper. Return on Information: The New ROI. Getting value from data

Bringing the Power of SAS to Hadoop Title

Embark on Your Data Management Journey with Confidence

Meta-Managed Data Exploration Framework and Architecture

Architecture Optimization for the new Data Warehouse. Cloudera, Inc. All rights reserved.

GE Intelligent Platforms. Proficy Historian HD

Big Data Platform Implementation

The Future of NAS is Object

Big Data for the Pharmaceutical Industry

Luxoft and the Internet of Things

Optimize to Modernize. Automated ERP Performance

Pentaho 8.0 and Beyond. Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara

DELL EMC HADOOP SOLUTIONS

Big Data Hadoop Administrator.

Transcription:

0101 001001010110100 010101000101010110100 1000101010001000101011010 00101010001010110100100010101 0001001010010101001000101010001 010101101001000101010001001010010 010101101 000101010001010 1011010 0100010101000 1011010 001010100010 011010 01000101010 010010 1 0010101101 010101 0 001010101 001000 1 010100010 010001 01 001001010 10001 01 010001010 11010 01 000101010 00010 10 1101000101 000101 011 0100100010 0000010 1001 0101001000 01 0001010101101 00100010101 100 1010010 010101101000 10001 0101011010010 10100010 11010001010100 11010010001 0101000100101001 010001010100010 101011010010001010 00101001 0101001001010 11010001010100010101011 000101010001000101011010001010100010101101001000101010001001 Bringing Big Data to Life: Overcoming The Challenges of Legacy Data in Hadoop

No discussion of big data is complete without addressing mainframe data. According to IBM, about 80 percent of all the transactional data in the world is stored on mainframes. This transactional data is a gold mine of reference data that can be used to make sense of enterprise-wide data and drive your big data analytics. How big of a gold mine? Here s how significant mainframes really are in the age of IoT and streaming data: Roughly 80% of the world s data either originates or is stored on mainframes. IBM z13 system can process up to 2.5 billion transactions per day. 71% of Fortune 500 companies have mainframes. According to our recent survey of over 250+ IT decision-makers, accessing Mainframe data in Hadoop is increasing in importance with over 70% of respondents stating integrating mainframe data with Hadoop is valuable. However, getting data off the mainframe is, well, challenging. That is especially true if you need to get it off the mainframe, yet keep the mainframe data format. In this ebook, we ll explore the challenges associated with integrating mainframe data into Hadoop, while allowing organizations to work with mainframe data in Hadoop or Spark in its native format and how to solve them. 0101001001010110100010101000101 1101001000101010001000101011010 0101000101011010010001010100010 1001010100100010101000101010110 1000101010001001010010010101101 1010100010101011010010001010100 1101000101010001010110100100010 0001001010010101101000101010001 1011010010001010100010010100010 1001010110100010101000101010110 1000101010001000101011010001010 0101011010010001010100000101001 1001000101010001010101101001000 0100010010100100101011010001010 0101010110100100010101000101101 1010100010101101001000101010001 0100101011010001010100010101011 0100010101000100101001010010010 1010001010100010010100100101011 0010101000101010110100100010101 1000101011010001010100010101101 0001010100010010100101010010001 1000101010110100100010101000100 0010010101101000101010001010101 0010001010100010110100010101000 0110100100010101000100101001010 1000101010001010101101001000101 0010010100101001001010110100010 0001001001001010110100010101000 1010010010101101000101010001010

Challenge: Big Data Governance Bridging the Gap between Mainframe and Apache Hadoop 0101001001010110100 0101010001010101101 0010001010100010001 0101101000101010001 0101101001000101010 0010010100101010010 0010101000101010110 1001000101010001001 0100100101011010001 New data sources are easily captured in modern enterprise data hubs, but businesses also need to reference customer or transaction history data to make sense of these newer sources. Sensor or mobile data streamed through Apache Kafka still needs to be enriched and integrated with the transaction history or customer reference data, which are often stored on the mainframes and legacy databases. This is a complex process, fraught with governance and compliance challenges. Some of the most promising data analytics insights and initiatives happen to be taking place in highly regulated industries such as finance, healthcare and insurance. In order to use data such as personal health records or financial transactions for advanced analytics, enterprises must be able to access it in a secure way, maintain and archive a copy in its original mainframe file format and track where the data has been. Security and lineage become critical for cross platform data access. To address the data governance and lineage requirements, Hadoop distributors introduced metadata management solutions, such as Cloudera Management and Apache Ambari.

Solution: Companies need a utility which will allow them to easily access and integrate mainframe data into Hadoop without having to convert the data into a different format for storage or processing in Hadoop. DMX-h By using Syncsort DMX-h, you can easily get end-to-end data lineage across platforms, accessing and processing mainframe data in Hadoop or Spark, on premise or in the cloud. DMX-h securely accesses mainframe data, even in its original EBCDIC format, and makes it available to be processed on the cluster, like any other data source. Better still, it doesn t take specialized mainframe or Hadoop skills to use DMX-h for offloading data from the mainframe to Hadoop securely. It assures the data lineage for governance purposes, while delivering the lowest possible levels of latency. You can populate your Hadoop data lake in just a few easy clicks. The Data Scientists do not need to worry about understanding mainframe data and can focus on the business insights. Syncsort DMX-h can make this data from hundreds of VSAM and sequential files, or from databases like DB2/z and IMS available in Hadoop. It can also map complex COBOL copybook metadata to the Hive metastore automatically. Alternatively, the data can be kept in its original mainframe record format, fixed or variable, for archive purposes or for just leveraging the cluster for scalable and cost-effective computing. This data can then be written back to the mainframe without format changes meeting audit and compliance requirements. In essence, Syncsort DMX-h makes mainframe data distributable for Hadoop and Spark processing. Syncsort DMX-h also secures the entire process with certified Apache Sentry and Apache Ranger integration, native Kerberos and LDAP support, and through secure connectivity. The delivery of these flexibility and strong capabilities were driven by the use cases of our joint customers.

Challenge: How to Assure Your Mainframe Data is Secure in Hadoop Data security is one of the topmost concerns for businesses and IT departments today. Last year businesses experienced the second highest number of verified and tracked data breaches since these statistics first began to be tracked in 2005. The Identity Theft Center tracked some 781 breaches in 2015, which does not include an unknown number that were either never detected or never reported. Data security on the mainframe is famously good. That s one of the reasons the mainframe is still carrying the lion s share of the world s most sensitive transactions, such as credit card payments and storing consumer data. On the other hand, Hadoop is all but essential for getting the kind of business and operational intelligence today s organizations need to survive and remain competitive. In the early days, Hadoop wasn t exactly known for its high level of security. But over time, developers have built enterprise-class security features and measures into the system. Now it s as potent for securing your data as it is for processing it and delivering valuable business insight and intelligence. 001001010110100 01000101010110 000101010001000 11010001010100 01101001000101 01001010010101 001010100010101 100100010101001 When accessing data on the mainframe, the process needs to be secured from the point of access through the offloading process and in the Hadoop cluster, as well. Now that Hadoop has security support from the likes of Kerberos and LDAP, plus the Hadoop-specific solutions that are now available, such as Apache Sentry and Apache Ranger, organizations can have total confidence that their data is secure from beginning to ending. This helps businesses stay within compliance, as well as providing protection against a legal and PR quagmire. 010100100101011010001010100010101011010010 101000100010101101000101010001010110100100

Solution: With Syncsort, your data can be as safe and secure in Hadoop (and during the ingestion process) as it is on the mainframe. Syncsort s DMX-h takes care of your security and compliance worries with support for FTPS and Connect:Direct data transfers, and also features native support for both Kerberos and LDAP. It also integrates seamlessly with all of the popular security systems, like Apache Sentry, as it handles the processing within the Hadoop cluster. Many businesses operate within industries such as finance that require that data be copied in its original format. DMX-h is able to make this happen, plus it is the easiest way to access and integrate mainframe data into Hadoop because DMX-h data integration tasks are able to work directly with mainframe data without having to convert the data into a different format for storage or processing in Hadoop. DMX-h is the ideal solution for heavily regulated industries like banking, insurance, and healthcare, which have struggled in the past to leverage Hadoop and Spark cost-effectively. These industries must deal with massive mainframe data sets while keeping the original EBCDIC format, which is not able to be processed within Hadoop. DMX-h is the only software that is able to make this happen. DMX-h

Challenge: Addressing the Hadoop Connectivity Issues with the Mainframe It s been problematic to integrate mainframe data into Hadoop because there is no native connectivity and processing capabilities in Hadoop for mainframe data. It can take a frustrating amount of time and effort to load database tables into Hadoop, primarily because developers must develop individual loads for each and every table. Access to mainframe data is limited to short periods of time in which users have to extract extremely large quantities of data. Attempting to translate and unpack the data in transit takes too much time.

Solution: Syncsort DMX-h solves this issue, allowing organizations to work with mainframe data in Hadoop or Spark in its native format essential for maintaining data lineage and compliance. Since Syncsort is a contributor to both Apache Sqoop and Apache Spark open source library for accessing the mainframe, DMX-h extends these connections in order to offer additional support for file type, data type, and COBOL Copybook. Additionally, with DMX-h Data Funnel, you can easily ingest hundreds of DB2 tables into Hadoop, all in one single swoop. It allows you to extract and migrate entire database schemas in a single invocation. Syncsort s utility has been a powerful tool in our Data Lake strategy. We were able to ingest into Hadoop over 800 tables from one source system with one press of the button, all while leveraging our existing DMX-h install. Its configuration-based approach provides great flexibility from source to target. With Syncsort DMX-h, data can be copied from the mainframe to Hadoop, while keeping the mainframe formatting, very efficiently. After the data is in Hadoop, DMX-h is able to take advantage of the distributed resources of the clusters in order to access and integrate the data natively, without staging a translated copy. Alternatively, if you need your mainframe data in an open format like ASCII, Parquet or Avro, DMX-h can translate your data in-flight, or on the cluster to avoid a bottleneck on the edge node. DMX-h

Summary The significance of mainframe data is ever more apparent in our daily lives. Every time you swipe your credit card, you are accessing a mainframe; every time you make a payment with your mobile phone, you are accessing a mainframe; and of course, your social security checks are generated based on data on mainframes. If we leave these critical data assets outside of the big data analytics platforms and exclude from the enterprise data lakes, it is a missed opportunity. Making these data assets available in the data lake for predictive and advanced analytics opens up new business opportunities and significantly increases business agility. Syncsort s DMX-h software allows you to quickly access mainframe data unchanged and work with it like any other data source, without the need for specialized skills in either Hadoop or mainframe. By ingesting or loading the data via DMX-h, you can preserve the data lineage for the purposes of governance while eliminating much of the latency often associated with these tasks. It just takes a few simple clicks to do. 01010010010101101 DMX-h 00010101000101010 11010010001010100 01000101011010001

About Syncsort Syncsort is a provider of enterprise software and the global leader in Big Iron to Big Data solutions. As organizations worldwide invest in analytical platforms to power new insights, Syncsort s innovative and high-performance software harnesses valuable data assets while dramatically reducing the cost of mainframe and legacy systems. Thousands of customers in more than 85 countries, including 87 of the Fortune 100, have trusted Syncsort to move and transform mission-critical data and workloads for nearly 50 years. Now these enterprises look to Syncsort to unleash the power of their most valuable data for advanced analytics. Whether on premise or in the cloud, Syncsort s solutions allow customers to chart a path from Big Iron to Big Data. Experience Syncsort at www.syncsort.com syncsort.com/liberate 2017 Syncsort Incorporated. All rights reserved. All other company and product names used herein may be the trademarks of their respective companies. DMXH-EB-011817US DMX-h