Governing Big Data and Hadoop

Similar documents
Big Data Management Best Practices for Data Lakes Philip Russom, Ph.D.

Modernizing Data Integration

Sr. Sergio Rodríguez de Guzmán CTO PUE

Common Customer Use Cases in FSI

Exploring the Benefits of the Modernized Data Warehouse Philip Russom

Copyright - Diyotta, Inc. - All Rights Reserved. Page 2

How Data Science is Changing the Way Companies Do Business Colin White

Information Server: 11.x Information Governance Catalog. Marc Haber Senior Offering Manager, Governance Catalog & Tools

DLT AnalyticsStack. Powering big data, analytics and data science strategies for government agencies

Modernizing Data Integration

Bringing Big Data to Life: Overcoming The Challenges of Legacy Data in Hadoop

PERSPECTIVE. Monetize Data

Bringing the Power of SAS to Hadoop Title

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia

SAP Predictive Analytics Suite

Analytics empowering clients to see farther & go faster

EBOOK: Cloudwick Powering the Digital Enterprise

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop

From Data Discovery to Adaptive Decision Making Barry Devlin

Cognos 8 Business Intelligence. Evi Pohan

Middleware Modernization: lay the foundation to your digital success

MapR Pentaho Business Solutions

Pentaho 8.0 Overview. Pedro Alves

Oracle Big Data Cloud Service

Accelerating Business Agility with Boomi

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

Free Online Training: Talend Open Studio for ESB

Data Analytics. Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC

Hortonworks Powering the Future of Data

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

Oracle Big Data Discovery The Visual Face of Big Data

Architecture Overview for Data Analytics Deployments

Evolution to Revolution: Big Data 2.0

Going Big Data? You Need A Cloud Strategy

Boundaryless Information PERSPECTIVE

An Enterprise Architect s Guide to API Integration for ESB and SOA

Hybrid Data Management

Roundtable Study: Analytic and Use Cases

Integration Competency Center Deployment

Operational Hadoop and the Lambda Architecture for Streaming Data

Data Governance and Data Quality. Stewardship

Cask Data Application Platform (CDAP)

Pentaho 8.0 and Beyond. Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara

Trusted by more than 150 CSPs worldwide.

4/26. Analytics Strategy

LEVERAGING DATA ANALYTICS TO GAIN COMPETITIVE ADVANTAGE IN YOUR INDUSTRY

Microsoft Azure Essentials

PERSPECTIVE. Microservices A New Application Paradigm. Abstract

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

Cloud Based Analytics for SAP

: Boosting Business Returns with Faster and Smarter Data Lakes

The Internet of Things Platform

Increase Value and Reduce Total Cost of Ownership and Complexity with Oracle PaaS

Oracle Big Data Discovery Cloud Service

Turn Your Business Vision into Reality with Microsoft Dynamics SL

Real-Time Streaming: IMS to Apache Kafka and Hadoop

Boomi Basics: Going Beyond Integration with APIs, Data Management and Workflow Automation

Delivering Governed Self-Service BI across the Enterprise

Embracing SaaS: A Blueprint for IT Success

BEST PRACTICES REPORT Q Data Lakes. Purposes, Practices, Patterns, and Platforms. By Philip Russom. Co-sponsored by

MapR Streams A global pub-sub event streaming system for big data and IoT

Test Environment Management. Full Lifecycle Delivery and Support

L approccio Accenture alla migrazione SAP S/4HANA. SAP Forum Fieramilanocity 20 ottobre 2016

THE DATA WAREHOUSE EVOLVED: A FOUNDATION FOR ANALYTICAL EXCELLENCE

Market Systems Enhancement

Assessing Your BI Maturity. Wayne Eckerson Director, TDWI Research

ERP Consolidation & Migration 5 Critical Steps to SAP S/4HANA Central Finance

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

Analytics With Hadoop. SAS and Cloudera Starter Services: Visual Analytics and Visual Statistics

WHITE PAPER SPLUNK SOFTWARE AS A SIEM

Practical Steps to Building a Big Data & Analytics Business

Safe Harbor Statement

Harnessing the Power of Big Data to Transform Your Business Anjul Bhambhri VP, Big Data, Information Management, IBM

ARCHITECTURES ADVANCED ANALYTICS & IOT. Presented by: Orion Gebremedhin. Marc Lobree. Director of Technology, Data & Analytics

Leveraging Oracle Big Data Discovery to Master CERN s Data. Manuel Martín Márquez Oracle Business Analytics Innovation 12 October- Stockholm, Sweden

Master Data Management (MDM) and Data Quality with SAP Information Steward Kevin Hutto Consultancy by Kingfisher

Cloud Data Integration and Data Quality: Extending the Informatica Platform to the Cloud

A Guide for Application Providers: Choosing the Right Integration Partner

SAS Decision Manager

Got Data Silos? Automate Data Ingestion Into Isilon In Support Of Analytics

TDWI Self-Service Analytics Maturity Model Guide

Big Data The Big Story

Transforming your Business with SAP HANA Enterprise Cloud

InfoSphere Warehouse. Flexible. Reliable. Simple. IBM Software Group

Architected Blended Big Data With Pentaho. A Solution Brief

2018 Big Data Trends: Liberate, Integrate & Trust

E-Guide THE EVOLUTION OF IOT ANALYTICS AND BIG DATA

1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Taking an Intelligent Approach to Revenue Growth Management. Matthew Campbell, Managing Director Accenture CG&S Digital Customer Transformation

Data Lake or Data Swamp?

Symantec ediscovery Platform, powered by Clearwell

Business Intelligence & Analytics. 6 High Impact Business Intelligence Trends for 2016

Deloitte School of Analytics. Demystifying Data Science: Leveraging this phenomenon to drive your organisation forward

Bob Picciano Senior Vice President IBM ANALYTICS International Business Machines Corporation

Ventana Research Big Data and Information Management Research in 2017

Digital Transformation

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB

Oracle s Integration Strategy

Joerg Wiederspohn, Senior Development Architect, Financial Services, SAP AG Karin Fischenbeck (BIAN)

Transcription:

Governing Big Data and Hadoop Philip Russom Senior Research Director for Data Management, TDWI October 11, 2016

Sponsor 2

Speakers Philip Russom Senior Research Director for Data Management, TDWI Jean-Michel Franco Director of Product Marketing, Talend 3

Agenda Background Evolving Data, Management, Business Big Data, Hadoop, New Data Data Governance (DG) High-Priority Tasks for DG w/big Data 1. Self-service access to big data 2. Data exploration and discovery 3. Metadata for governed big data 4. Simplify DG with integrated tool set 5. Use Hadoop, but govern it carefully 6. DI/DQ Infrastructure to migrate and govern big data Concluding Summary PLEASE TWEET @prussom, #TDWI, @Talend, #BigData, #Hadoop

New Checklist Report from TDWI on Governing Big Data and Hadoop The report discusses challenges and solutions for governing Hadoop, big data, and other forms of new data. In this webinar, we ll discuss some of the report s findings. Stay tuned, to learn how to get a free copy of the whole report.

BACKGROUND Ongoing Changes in Data & its Management Data is evolving Data management is evolving Business use & leverage of data is evolving Yet, data must still be governed

WHAT S THE REAL POINT? New Data for New Analytic Insights Enable analytics that are new to you Logistics optimization; Sentiment analysis; Realtime business monitoring Patient outcomes in healthcare; Predictive maintenance in manufacturing; Precision farming in agriculture Sensor data and other machine data; Streaming data; Human language text Breathe new life into older analytics Extend existing customer views with additional insights drawn from new big data Enlarge data samples of existing analytic applications for fraud, risk, customer segmentation, products of affinity Modernize data warehouses, reports, analyses

DEFINING TWO SIDES OF Data Governance Business Compliance for Regulations, Data privacy, and Security Reducing risks by creating policies for data s access and use Identifying sensitive data and guarding it accordingly Certifying new big data, so it complies, like other datasets do Technical Standards for Data and Data Management Solutions Data quality metrics, for both traditional and new big data Preferred data modeling techniques Preferred interfaces and methods for integration Development standards for solution design and dev methods Peer review, quality assurance, deployment for DM solutions

NUMBER ONE Give people Self-Service Access to big data stores, but controlled via data governance. Biz & tech people want to work with new data Business people know that new big data can improve both analytics and operations Technology staff need to give a wide range of users access to the new data Many user types want self-service data access Including data scientists and data analysts, plus some business analysts and managers They want to work autonomously and quickly, without waiting for others to create datasets To govern self-service access, tools need: Security, data protection, auditing via operational metadata, analytic sandboxing for sensitive data

NUMBER TWO Integrate new data to enable broad but governed Data Exploration and Discovery. New big data must be profiled before using Technical structure, quality, data types Business value and compliance issues The point of data exploration is discovery Previously unknown facts about partners, products, customers, operations, financials, logistics New analytic insights from new data Data study and exploration must be governed Existing DG policies will apply to most new big data Some policies may need to be extended, revised Some new big data is inherently sensitive RE: customers, financials, health, employees, legal

NUMBER THREE Develop Metadata for Big Data, to get the fullest use and governance from it. Some new big data lacks obvious metadata You cannot access source & extract it Data has structure, but not described anywhere Files may lack header describing schema New tools can automatically infer metadata You need all three forms of metadata Technical automated software access Business biz-friendly for self-service data access Operational records access events for DG Metadata helps automate data governance Inventory of data to be governed Operational metadata tracks lineage & access

NUMBER FOUR Select a platform of Integrated Data Mgt Tools for simplified governance via modern solution designs. Consider an integrated tool platform Tools for integration, quality, profiling, metadata, event processing, master data, federation New stuff, too: self-service data access, data prep, ad hoc exploration, big data formats & platforms All integrated for interoperability in development and during production runs An integrated tool platform supports DG goals: One tool (instead of several) is easier to govern Consistent dev standards and data quality Metadata, shared broadly for complete views Modern solution designs: in a single auditable data flow, multiple tool types are integrated

NUMBER FIVE Consider Hadoop for persisting and processing new big data, but beware its governance challenges. Hadoop supports desirable use cases Data landing and staging area for many new big data types and data feed speeds Processing for big data, DI, and analytics Linear scalability, but at reasonable cost Offload your DI, DW, hub, etc. But Hadoop challenges data governance Data replication Metadata management Security Data store organization

NUMBER SIX Provide infrastructure and DG to Migrate Big Data across complex Data Ecosystems. Multi-platform data architectures are new norm. Most include Hadoop. Data warehouse environments Multi-channel marketing Global supply chains Web, social, mobile, etc. Requires substantial data integration and quality infrastructure for: Data movement and migration among platforms and ecosystems Broad connectivity and access Data lineage, audit, cataloging Guided migration, via metadata bridges and interfaces, for automation, standards, and governance for migrations common in today s ecosystems

CONCLUDING SUMMARY Governing Big Data and Hadoop 1. Self-service access to big data 2. Data exploration and discovery 3. Metadata for governed big data 4. Simplify DG with integrated tool set 5. Use Hadoop, but govern it carefully 6. DI/DQ Infrastructure to migrate and govern big data

New Checklist Report from TDWI on Governing Big Data and Hadoop The report discusses challenges and solutions for governing Hadoop, big data, and other forms of new data. Download the free report in a PDF file at: www.tdwi.org/checklists

Talend enables the data-driven enterprise 1 st to deliver a DI and AI platform 1 st ETL open source tool Data Integration Data Quality 1 st unified platform for data prep & integration 1 st to market for Spark 1 st on YARN & Hadoop 2.0 Master Data Management Application Integration Big Data Hadoop 2.0 Spark & Cloud Data Preparation Key Facts NASDAQ: TLND (2016) $100M Revenue (Projected) Leader in Gartner MQ for DI 600+ employees worldwide 16 countries 1300+ customers 2M+ open source downloads (Revenue Growth) Presentation Title Date Copyright 2016 Capgemini and Sogeti. All rights reserved. 17

Data Fabric A Modern Big Data and Cloud Integration Platform Presentation Title Date Copyright 2016 Capgemini and Sogeti. All rights reserved. 18

Product Investment Themes Big Data Hadoop becomes the dominant data processing platform Cloud Cloud overtakes on-premises investments Self-service Self-service will be a top IT priority for the next decade Data Governance is Critical to All Areas Presentation Title Date Copyright 2016 Capgemini and Sogeti. All rights reserved. 19

The five pillars for managing Metadata with Talend Deliver Metadata By Design Metadata-driven, with zero coding Single point of control for Metadata Dependency checks and propagation Manage and monitor information chains beyond Hadoop Risk and compliance with data transparency Increase agility across the data landscape Turn data into a business language Democratize Data inventory for self-service the Data access Lake Auto-discovery with with superior smart guidance Facet based search accessibility Talend Metadata Manager Self-Service & Collaborative Governance Talend Studio Talend Big Data With Cloudera Navigator Talend Metadata Bridge (*) Apache Atlas integration planned in Talend Winter 2017 Synchronize Metadata accross data platforms Design once, propagate across any tools Apply change and enforce standards Accelerate migrations Data Lineage with depth on Hadoop Tackle Hadoop Governance Challenges Track, classify, and locate data & data flows Data Governance on Hadoop Presentation Title Date Copyright 2016 Capgemini and Sogeti. All rights reserved. 20

Redefining patient experiences Health assistants provide guidance and personalized services to their customers based on individual needs, including data on their lifestyle and health condition, and these are optimized to orient patients to the right care. 75% reduction of efforts and time needed to onboard a new customer Presentation Title Date Copyright 2016 Capgemini and Sogeti. All rights reserved. 21

Questions? 22

Thank You to Our Sponsor 23

Contacting Speakers If you have further questions or comments: Philip Russom prussom@tdwi.org Jean-Michel Franco, Talend jfranco@talend.com