Executive Brief. 3 Keys to Self-Service Data Preparation

Similar documents
Top 3 Strategies for Modernizing Enterprise Data Management C L O U D A N A L Y T I C S D I G I T A L S E C U R I T Y

Application Migration to the Cloud C L O U D A N A L Y T I C S D I G I T A L S E C U R I T Y

Benefits of Application Migration to Azure CLOUD ANALYTICS DIGITAL INFRASTRUCTURE SECURITY

In search of the Holy Grail?

DLT AnalyticsStack. Powering big data, analytics and data science strategies for government agencies

Cask Data Application Platform (CDAP) Extensions

How to Design a Successful Data Lake

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

Enabling Self-Service BI Success: TimeXtender s Discovery Hub Bridges the Gap Between Business and IT

PERSPECTIVE. Monetize Data

Big Data for the Pharmaceutical Industry

Big Data Management Best Practices for Data Lakes Philip Russom, Ph.D.

Progressive Organization PERSPECTIVE

From Data Deluge to Intelligent Data

Advancing Information Management and Analysis with Entity Resolution. Whitepaper ADVANCING INFORMATION MANAGEMENT AND ANALYSIS WITH ENTITY RESOLUTION

Ways to Transform. Big Data Analytics into Big Value

: Boosting Business Returns with Faster and Smarter Data Lakes

Analytics in Action transforming the way we use and consume information

SELF-SERVICE FOR DATA PREPARATION AND ADVANCED ANALYTICS OVERVIEW

Analytics empowering clients to see farther & go faster

PORTFOLIO MANAGEMENT Thomas Zimmermann, Solutions Director, Software AG, May 03, 2017

Reimagining content value to deliver personalized experiences and drive growth.

Boundaryless Information PERSPECTIVE

Your Top 5 Reasons Why You Should Choose SAP Data Hub INTERNAL

4/26. Analytics Strategy

WHITE PAPER 5 SIGNS IT S TIME TO REPLACE YOUR LEGACY ERP

Executive Summary WHO SHOULD READ THIS PAPER?

Building a Single Source of Truth across the Enterprise An Integrated Solution

Hortonworks Connected Data Platforms

Database Migration to Azure CLOUD ANALYTICS DIGITAL INFRASTRUCTURE SECURITY

Adobe and Hadoop Integration

Value Stream Services

MODERNIZING THE MIDDLEWARE

Offering Overview. Managed Innovation

Oracle Big Data Discovery Cloud Service

ACCELERATING DIGITIZATION THROUGH NEXT-GENERATION INTEGRATION

Adobe and Hadoop Integration

Hadoop Stories. Tim Marston. Director, Regional Alliances Page 1. Hortonworks Inc All Rights Reserved

Oracle Product Hub Cloud

Cognitive Data Governance

Build Applications to Meet Individual Business Needs Quickly and Economically

Enterprise Data Marketplace: Democratizing Data within Organizations

Empowering insight-driven health care organizations with self-service analytics

Amsterdam. (technical) Updates & demonstration. Robert Voermans Governance architect

Data Cleansing - From Spreadsheets to Data Centres

SAS ANALYTICS AND OPEN SOURCE

Embark on Your Data Management Journey with Confidence

Cognizant BigFrame Fast, Secure Legacy Migration

SAP Cloud Platform Big Data Services EXTERNAL. SAP Cloud Platform Big Data Services From Data to Insight

Partnering with the business to create a successful self-service analytics framework

Why Machine Learning for Enterprise IT Operations

Going Big Data? You Need A Cloud Strategy

Information Server: 11.x Information Governance Catalog. Marc Haber Senior Offering Manager, Governance Catalog & Tools

The State of Cloud Analytics Research and analysis into enterprise cloud strategies and challenges in 2016 from EMA, Informatica, and Deloitte.

FUJITSU Transformational Application Managed Services

Responsive enterprise the future of the enterprise PERSPECTIVE

Integration Competency Center Deployment

Big Data Analytics with the PI System

Transform your life and annuities business to reduce expenses and promote business growth.

Transformation Enablement

Make Innovation Real with Unique, Leading-edge Software Solutions

Oracle Big Data Discovery The Visual Face of Big Data

Leverage Excel. Avoid the Pitfalls. How to embrace and extend Excel for Enterprise Planning

SUSiEtec The Application Ready IoT Framework. Create your path to digitalization while predictively addressing your business needs

ACCURACY DRIVES AGILITY IN FINANCIAL PLANNING AND ANALYSIS

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand

Analytics for All Your Data: Cloud Essentials. Pervasive Insight in the World of Cloud

Healthcare Data Management for Providers

Government Business Intelligence

Informatica Best Practice Guide for Salesforce Wave Integration: Building a Global View of Top Customers

Solution Brief. An Agile Approach to Feeding Cloud Data Warehouses

Trifacta Data Wrangling for Hadoop: Accelerating Business Adoption While Ensuring Security & Governance

DIGITAL AGILITY. Four Data-Driven Strategies for Protecting Financial Services Revenues

The Internet of Things: Unlocking New Business Value. Let Oracle energize your business with IoT-enabled applications.

Collibra Catalog for Big Data Analytics Product Preview

Integrated Social and Enterprise Data = Enhanced Analytics

How a global consumer goods major transformed operations through a digital-first approach

Governing Big Data and Hadoop

Winning Your Customers by Creating a Smart and Connected Enterprises

AUTOMATING BIG DATA TESTING FOR PERFORMANCE

Aprimo Marketing Productivity

Microsoft Big Data. Solution Brief

This document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and

Supply Chain Innovation Fuels Success SAP ERP and Oracle Supply Chain Management: A Case for Coexistence. An Oracle White Paper


Front-to-back Architectural Re-design for a Global Universal bank

THE STATE OF CITIZEN DEVELOPMENT REPORT SEPTEMBER 2015

Access, Transform, and Connect Data with SAP Data Services Software

The IBM DataFirst Method Fueling your data-driven transformation

2 Business Processes and Forms with Office SharePoint Server 2007

The Future Moves Fast: Are You Ready to Respond?

Hortonworks Powering the Future of Data

SENPAI.

THE RISE OF THE DIGITAL CFO. Microsoft

GET MORE COMPETITIVE WITH SPEED AND AGILITY ACCENTURE LIFE INSURANCE & ANNUITY PLATFORM (ALIP) POLICY ADMINISTRATION

Fuel a New Digital Core with Complete and Accurate Data

CASE STUDY. Accelerate M&A with a Modern Information Management Platform

Cognizant Digital Media Services: One partner for all your content needs

EBOOK: Cloudwick Powering the Digital Enterprise

Business Insight and Big Data Maturity in 2014

Transcription:

Executive Brief 3 Keys to Self-Service Data Preparation CLOUD ANALYTICS DIGITAL SECURITY

3 Keys to Self-Service Data Preparation Liberate resources from time-consuming data chores to focus on high-impact data lake analytics The Black Hole of Data Preparation Suppose you spent 80% of your time having your car repaired, and just 20% of your time driving it? Or, if it took you six hours each workday to get your laptop up and running, leaving you just two hours to actually use it? That sort of time disparity is not uncommon with analytic data lakes. Data analysts and data scientists spend up to 80% of their time on essential data preparation working to generate accurate, complete and standardized data that s ready for analysis in a data lake. Alternatively, data analysts may offload data preparation to an already overworked IT team. But that introduces lengthy delays and additional costs. In either case, data preparation amounts to a black hole that consumes time and resources. It s a major obstacle for organizations that aim to use data lakes for data-driven insights that can transform customer experiences, supply chain efficiency, profitability analysis, fraud detection and more. To combat the problem, leading organizations are transitioning toward data preparation methodologies distinguished by self-service, collaboration and sound governance. The objective is to liberate resources from the time-consuming chore of data preparation to focus on value-add analytics, while improving the reliability and quality of data for analysis. Addressing the Growing Complexity of Data Preparation Data preparation comprises the nuts-and-bolts processes needed to get data ready for analytics in a data lake, typically based on the open source Apache Hadoop framework. The data preparation task is complicated by the fact that data lakes can include both structured and unstructured data from multiple sources. Those sources can range from conventional ERP, CRM, Excel and other business applications to mobile devices, social media, clickstream and server logs, and sensor-based equipment from the Internet of Things. According to Gartner, the challenges of larger and more diverse data have made data preparation one of the biggest roadblocks to pervasive and trusted modern analytics. 1 As data volumes and complexity grow, data analysts, data scientists and IT spend more time on such data preparation tasks as deduplicating records, addressing empty fields, incorporating metadata and converting currencies. They also need to standardize different treatments of dates, addresses and names, and transform data from multiple sources into a uniform format. Conventional manual approaches to data preparation through spreadsheets, custom coding and scripting are time-consuming and not scalable amid growing business demand for timely analytics. As it is, data analysts using traditional methods spend between four and six hours a day on data preparation, a survey by the analyst firm Blue Hill Research found. 2 Beside inordinate time and cost, manual data preparation can result in such problems as: Inability to derive analytic value from data Errors and inconsistencies resulting from manual work Poor accessibility and visibility into disparate data No trusted, authoritative set of data lake assets No reusability of manual, one-off data preparation efforts 1 Gartner, Market Guide for Self-Service Data Preparation, August 25, 2016. 2 Blue Hill Research, Quantifying the Case for Enhanced Data Preparation, February 2016. 2

Ultimately, organizations run the risk of turning data lakes into data swamps full of information that s incomplete, inaccurate and lacks context. Embracing Best-Practice Models and Technologies The barriers imposed by traditional approaches have given rise to a new strategic focus on more efficient and collaborative data preparation. Leading organizations are embracing best-practice data preparation models and a relatively new class of selfservice data preparation tools distinguished by straightforward ease of use in a cloud-based framework. As Gartner put it, The escalating challenges associated with larger and more diverse data are contributing to the demand for adaptive and easy-touse data preparation tools. Self-service data preparation addresses these challenges by accelerating the way users find, access, clean, combine, model and transform, and collaborate with data in an agile yet trusted way. Technology-driven self-service data preparation is increasingly in use at leading organizations, and delivering significant benefits. For instance: 25% of organizations now use selfservice data preparation tools 3 By 2020, more than 50% of new data integration efforts for analytics will utilize self-service data preparation tools 4 Business benefits increase 2x at organizations that provide agile, curated datasets for a range of content authors 5 79% of those using self-service data preparation tools accelerate their processes 6 Another key benefit is reducing the burden that data preparation imposes on IT. When data analysts are set up for self-service data preparation, IT can devote more time and resources to value-add initiatives. Organizations also eliminate the days or weeks that IT may require for data preparation that analysts were unable to perform themselves. 3 Key Focus Areas for Effective Data Preparation A robust data strategy goes hand in hand with purpose-built data preparation technology for organizations to eliminate the struggles of conventional data preparation and help maximize value from a data lake. A data strategy typically has two phases. The first is an evaluation and gap analysis of current internal and external data assets needed for a data lake. The second is planning and designing for a data lake that meets business objectives from a near-term, oneyear, three-year and longer term perspective. This effort should involve stakeholders in both business and IT in assessing the overall data landscape and refining goals for the data lake. It should be anchored by sound data governance and include evaluation of data management approaches, such as data catalogs, data lineage, data relationships, business glossaries and data protection. Organizations should baseline present-day data preparation efforts and quantify time investments by data analysts, data scientists and IT personnel. Identifying critical pain points and limitations is instrumental in mapping out a new approach, and tracking benefits accrued once the initiative is in place. 3 Blue Hill Research, Quantifying the Case for Enhanced Data Preparation, February 2016. 4 Gartner, Market Guide for Self-Service Data Preparation, August 25, 2016. 5 Gartner, Market Guide for Self-Service Data Preparation, August 25, 2016. 6 Blue Hill Research, Quantifying the Case for Enhanced Data Preparation, February 2016. 3

In both modeling data preparation processes and evaluating potential use of standalone data preparation technology, three key focus areas are: 1) Self-service 2) Collaboration 3) Governance Self-Service Equipping data analysts and scientists with selfservice capabilities can dramatically reduce the time spent on data preparation by both end-users and IT professionals. That frees resources to focus time and energy on analytics that impact business performance. Self-service data preparation is fast becoming a requirement for data lake analytics at speed and scale. Today s best data preparation tools provide an Excellike, browser-based interface that combines ease of use with robust functionality. Data analysts can explore, blend, enrich and standardize data. They can readily search and discover data, understand its provenance, and find related data. They can also identify data attributes and lineage, and correct errors, resolve duplicates, and validate names and addresses. Data may then be moved into a Hadoop-based data lake, or made accessible to business analytics tools like Qlik or Tableau. The result is more time for analytics, and information that is more complete, accurate and reliable. Collaboration Conventional data preparation often consists of ad hoc, tactical efforts by data analysts, data scientists and IT. These efforts are rarely well coordinated across the larger enterprise, and may rely on the email exchange of spreadsheets. The results can include duplication of data preparation efforts, and teams starting from scratch without the benefit of best practices and lessons learned by other practitioners. A collaborative environment for users to prepare, publish, iterate and share data should be a key element in self-service data preparation. All stakeholders in data preparation benefit with a community-oriented platform geared for knowledge and resource sharing. That can include common project workspaces that enable viewing and sharing of data assets, sources and data relationships across multiple domains and dimensions. Governance Data preparation governance is an essential focus area that rewards stakeholders with greater visibility, control and trust in data. Governance depends on establishing roles, rules and operationalized processes to streamline and continuously optimize data preparation, from start to finish. Governance in the context of data preparation should work in tandem with a broader governance framework in place for a data lake. Done right, data preparation governance surfaces and enriches the metadata of data assets for greater transparency. Data lineage will illustrate source, time of origin, revisions by whom and more. Governance will also reflect data usage and user activity, and will highlight gaps that should be addressed. In a governance framework, an information catalog can drive user adoption by enabling IT and business users to understand data lake contents and dive into data exploration. These insights are important to building confidence that data is accurate and ready for analytics. Transform Data Preparation with Trusted Guidance and Technology Trusted strategy advisors and implementation partners help organizations transform resourceintensive data preparation into a straightforward self -service discipline that provides a foundation for effective data lake analytics. With more than 2,000 client engagements over 15 years, Trianz partners with Informatica to provide proven best practices and market-leading technology to accelerate and enrich data preparation. Trianz helps organizations develop data strategy (capability assessment, roadmap planning, big data 4

and cloud readiness, operating model design) and implement data management capabilities that are foundational for self-service data preparation. The joint Trianz-Informatica solution combines Trianz s expertise in enterprise information management with Informatica Intelligent Data Lake, a self-service data prep tool, which supplies industry-leading capabilities to ingest, find, prepare and protect data for analysis inside an intelligent data lake. With Trianz and Informatica, your organization can eliminate the time and headaches of data preparation and speed your evolution into a datadriven business. About Informatica About Trianz Informatica is 100 percent focused on data because the world runs on data. Organizations need business solutions around data for the cloud, big data, real-time and streaming. Informatica is the world s No. 1 provider of data management solutions, in the cloud, on-premise or in a hybrid environment. More than 7,000 organizations around the world turn to Informatica for data solutions that power their businesses. For more information, visit www.informatica.com. Trianz enables digital transformations through effective strategies and excellence in execution. Collaborating with business and technology leaders, we help formulate and execute operational strategies to achieve intended business outcomes by bringing the best of consulting, technology experiences and execution models. Powered by knowledge, research, and perspectives, we serve Fortune 1000 and emerging organizations across industries and geographies to transform their business ecosystems and achieve superior performance by leveraging Cloud, Digital, Analytics and Security paradigms. As a professional services firm, our values and culture are focused on delivering measurable business impact, predictability in execution, and a unique partnership experience. Silicon Valley Washington DC Metro Jersey City Dubai Bengaluru Mumbai Delhi-NCR Chennai Hyderabad www.trianz.com info@trianz.com +1-408-387-5800 2017 Informatica LLC. All rights reserved. Informatica is a registered trademark of Informatica in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks. @ Copyright 2017, Trianz. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Trianz. The information contained herein is subject to change without notice. All other trademarks mentioned herein are the property of their respective owners. 5