: Boosting Business Returns with Faster and Smarter Data Lakes

Similar documents
Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

DLT AnalyticsStack. Powering big data, analytics and data science strategies for government agencies

Cask Data Application Platform (CDAP) Extensions

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud

Hortonworks Connected Data Platforms

Amsterdam. (technical) Updates & demonstration. Robert Voermans Governance architect

Building a Single Source of Truth across the Enterprise An Integrated Solution

SOLUTION SHEET Hortonworks DataFlow (HDF ) End-to-end data flow management and streaming analytics platform

Datametica DAMA. The Modern Data Platform Enterprise Data Hub Implementations. What is happening with Hadoop Why is workload moving to Cloud

Architecting an Open Data Lake for the Enterprise

SOLUTION SHEET End to End Data Flow Management and Streaming Analytics Platform

4/26. Analytics Strategy

PERSPECTIVE. Monetize Data

DATASHEET. Tarams Business Intelligence. Services Data sheet

Databricks Cloud. A Primer

Cognizant BigFrame Fast, Secure Legacy Migration

Meta-Managed Data Exploration Framework and Architecture

Turn Your Business Vision into Reality with Microsoft Dynamics SL

INSIDE THIS ISSUE. Whitepaper

PORTFOLIO AND TECHNOLOGY DIRECTION ARMISTEAD SAPP & RANDY GUARD

Cognitive Data Governance

CHANGE IMAGINED. CHANGE DELIVERED

Trusted by more than 150 CSPs worldwide.

Who is Databricks? Today, hundreds of organizations around the world use Databricks to build and power their production Spark applications.

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1

Analytics for All Your Data: Cloud Essentials. Pervasive Insight in the World of Cloud

BUSINESSOBJECTS EDGE PROFESSIONAL

BUSINESS OBJECTS CRYSTAL DECISIONS PROFESSIONAL

Information for Competitive Advantage

Powered by FICO Blaze Advisor decision rules management system

At the Heart of Connected Manufacturing

Analytics in Action transforming the way we use and consume information

End-to-end Business Management Solution for Small to Mid-sized Businesses

Automated Service Intelligence (ASI)

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia

Aptitude Accounting Hub

VIA Insights: Telcoms CONNECT to Digital Operations

ADVANTAGE YOU. Drive TCO* reduction through Infosys TIBCO solutions

A complete service guide for MICROSOFT DATA ANALYTICS ENABLEMENT

Context. The NEW data services from UST Global UST GLOBAL - A UNIQUE PARTNER. UST Global Data Services March 2018!1

1% + 99% = AI Popularization

Transforming Big Data to Business Benefits

Advancing Information Management and Analysis with Entity Resolution. Whitepaper ADVANCING INFORMATION MANAGEMENT AND ANALYSIS WITH ENTITY RESOLUTION

Step inside your new look business with SAP Business One. SAP Solution Brief SAP Solutions for Small Midsize Businesses

COGNITIVE QA: LEVERAGE AI AND ANALYTICS FOR GREATER SPEED AND QUALITY. us.sogeti.com

NICE Customer Engagement Analytics - Architecture Whitepaper

Data Management and Analytics for. SalesForce.com

How to Design a Successful Data Lake

Trifacta Data Wrangling for Hadoop: Accelerating Business Adoption While Ensuring Security & Governance

Bringing Big Data to Life: Overcoming The Challenges of Legacy Data in Hadoop

Aprimo Marketing Productivity

Velocity. Accelerating Analytic Innovation Teradata 2018 Teradata

Luxoft and the Internet of Things

Pega Upstream Oil & Gas Capabilities Overview

OMNICHANNEL COMMERCE SOFTWARE FOR RETAILERS AND BRANDED MANUFACTURERS

Easy-to-Build Workflows & Forms For Dummies WHITEPAPER

Embracing the Hybrid Cloud using Power BI in CSP. Name Role Group

Flexso SAP Analytics Vision

When the Status Quo Means Getting Left Behind. Accelerating Analytics Platform Adoption through Evolving Technology

Striking the Balance Between Risk and Reward

THE DATA WAREHOUSE EVOLVED: A FOUNDATION FOR ANALYTICAL EXCELLENCE

Data Integration for the Real-Time Enterprise

Analytics for All Data

Data Ingestion in. Adobe Experience Platform

A NEW WORLD OF DATA DEMANDS A NEW APPROACH WELCOME TO THE NEW ERA OF PERVASIVE DATA INTELLIGENCE

Fortune 10 Company Uses DevOps to Drive Efficiency. Transforming a Generations-old Approach with Chef Automate and Habitat

We re not just good on paper.

Infosys Real Time Streams

Establishing Self-Driving Infrastructure Operations

Enabling Self-Service Analytics Across The UDA With Teradata AppCenter

Common Customer Use Cases in FSI

Solution Brief. An Agile Approach to Feeding Cloud Data Warehouses

Take a Dive into the Data Lake

Smarter Reporting Leads to Better Decisions:

DataAdapt Active Insight

Advanced Analytics in Azure

THE MAGIC OF DATA INTEGRATION IN THE ENTERPRISE WITH TIPS AND TRICKS

Adobe and Hadoop Integration

To win over grocery shoppers, rethink your technology and embrace a unified commerce approach

Adobe and Hadoop Integration

Governing Big Data and Hadoop

5 Ways. to Cut. Market Data. Spend and IMPROVE WORKFLOW WHITE PAPER

Copyright - Diyotta, Inc. - All Rights Reserved. Page 2

The Age of Agile Solutions

Smarter Content, Smarter People: Why Content Management Matters IBM Corporation

Analytics empowering clients to see farther & go faster

CAPITAL MARKETS TRANSFORMATION. Pathways to Operations Control Value

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11

Pentaho 8.0 and Beyond. Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara

EDW MODERNIZATION & CONSUMPTION

From Data Deluge to Intelligent Data

Datameer for Data Preparation: Empowering Your Business Analysts

Executive Brief. 3 Keys to Self-Service Data Preparation

WHITE PAPER. Standardization in HP ALM Environments. Tuomas Leppilampi & Shir Goldberg.

Microsoft Dynamics ERP. Success for your business. Success for you.

DIGITAL CASE STUDIES

EMBED ANALYTICS EVERYWHERE Tomáš Jurczyk

SAP BusinessObjects Business Intelligence

Rapid Delivery Predictable Outcomes Your SAP Data Partner

Going Big Data? You Need A Cloud Strategy

Transcription:

: Boosting Business Returns with Faster and Smarter Data Lakes Empower data quality, security, governance and transformation with proven template-driven approaches By Matt Hutton Director R&D, Think Big, A Teradata Company

Executive Summary Over the past several years, organizations building Hadoop-enabled data lakes have often struggled with the challenges that come with custom engineering open source solutions. Most organizations in this position are willing to absorb the pain of early adoption for the promise of big data insights. However, when data is lodged in silos, locked-down due to security and governance access or bogged down by software release cycles, big data programs cannot deliver business value. To realize the value in organizational data, it is imperative that organizations align forces, choosing the right data lake approach to support business strategy and growth. That s why in February 2017 Think Big introduced Kylo, a next generation open source data lake management software platform. Kylo has been built on years of global expertise involving over 150 successful engagements in leading financial services, retail, telecoms, manufacturing and insurance companies to name a few. Kylo provides automated data pipelines in weeks, rather than months incorporating data quality, security, governance, operational monitoring and alerting, data preparation and profiling as well as user data access. At Discover Financial Services, we are focused on leveraging leading-edge technology that helps us quickly bring products to market while providing exceptional customer service. Kylo has a unique framework that has the potential to accelerate development and value on new data sources that leverage Apache NiFi, said Ka Tang, Director, Enterprise Data Architecture, Discover. He adds: Kylo provides an opportunity to leverage open source innovations while allowing the opportunity to give back to the open source community. Kylo is a dedicated open source data lake software with a difference; it can deliver a scalable data lake at a fraction of the multi-million dollar cost of custom engineered data lake projects while making high-quality data quickly and easily available to business as well as non-technical users. Kylo : 7 Key Takeaways 1. Cost Reduction - Kylo can help your organization build custom engineered data lakes from the onset at a fraction of the typical multi-million-dollar cost. 2. Business Value - Improves the relevance and application of use cases selected by your business users. 3. Speed to Market - Enable direct user access through a UI and template framework, plus the ability to see and understand any data pipeline lifecycle by using 100% open source instead of proprietary technology. 4. Increased Efficiency - Empower self-service data accessibility, freeing your data lake engineering teams and data scientists from data source administration and allowing them to focus on building complex transformations and business value. 5. Improved Quality and Security - Improve data quality and security, enabling your teams to meet defined SLAs. 6. Scalability - Discover how Kylo works with governed feeds to help your business move faster, and scale up using big data more efficiently than ever before. 7. Extensibility - Enable data access across your organization. Through the open source community and the ability to add other Apache projects Kylo develops along the evolving market. Being able to upgrade the templates easily Kylo helps to cut the development time considerably. In this whitepaper, we discuss not only the key benefits of Kylo but also the potential it offers organizations looking to optimize business outcomes while reducing build costs significantly. We present real-life success stories to illustrate how Kylo has helped organizations accelerate speed to market and business value while reducing cost and risk. 2 : Boosting Business Returns with Faster and Smarter Data Lakes

The Power of Kylo : Overview and Benefits Kylo is an extensible data lake management software platform framework designed to accelerate big data programs, enabling users across all departments to access data easily. Kylo is a 100% open source solution, freeing your organization from costly vendor lock-ins and allowing for continued innovation and functionality extensions as provided by an open source community model. Traditional Approach With Kylo Data Ingest Weeks Minutes Validation Rules Weeks Minutes Data Profiling Weeks Automatic Data Discovery Months Seconds Figure 1. Traditional vs. Kylo Approach The Five-Tiered Data Swamp Challenge: How Kylo Can Help Having worked with more than 150 projects over the past several years, there are five common scenarios that Think Big come across time and time again. In our experience, data lakes can be: 1. Expensive and Labor Intensive - Building a custom engineered data lake and closing gaps using lowlevel platforms within the Hadoop ecosystem is a multi-year effort. Kylo builds on Hadoop, but fills the gaps to enable a fully functional data lake often in just a multi-week engagement. 2. Lacking Features to Excite and Enable Business Users - In dealing with data lakes bespoke to enterprise needs, IT departments often spend extensive budget building a huge team who spend months or years building a data lake and ingesting data without really consulting the businesses about use cases. Kylo helps to move away from that because enterprises can quickly get to building the features that excite the business and solve real problems. 3. Lacking Governance and Security - Companies often don t have the experience, capability or skills to fully enable the governance and security they need to safely and productively maintain a data lake. Kylo helps to make security and governance a key part of the creation process, whilst leveraging integration with open source options such as Ranger and Sentry. 4. Devolving Data over Time without Data Quality Control - Data in data lakes that have been custom engineered tends to devolve over time. Automated data refresh and quality assurance are key. Using Kylo, SLAs can also be tied to data confidence, with alerts connected to any downgrade in data quality. 5. Lacking Self-Service Capabilities and Suffering Slow Release Cycles - Unfortunately, building a full self-service capability for non-technical users of data lakes is extremely costly and difficult because of the amount of engineering around the user interface that has to be built in order to make it easy use. Kylo s self-service UI bypasses this problem, making it easy for non-technical users to get to the data quickly. No coding required. Think Big has developed Kylo to help make data departments quickly available to your business, ensuring users can easily query, wrangle and analyse data using its intuitive graphic user interface and dashboard. Kylo s operations management monitors the health of data feeds measuring quality, performance, and even service levels, ultimately delivering data confidence to users. It therefore frees your data lake engineering team to work on business-critical, value-add focused projects. As a result, organizations can build, manage and govern disparate datasets faster and more efficiently, saving millions while accelerating speed to market. 3 : Boosting Business Returns with Faster and Smarter Data Lakes

Kylo : Driving Business Value and Growth through Innovation Before + Slow Deployment - Poorly managed storage and processing of data slow the delivery cycles and time to market. + Lacking Compliance and Security - Convoluted processes and unsafe environments result in insufficient data quality, security and governance. + Ineffective Solution - A fragmented, expensive, non-scalable, system that is hard to manage. + Limited Functionality - Based on available, commercial solution roadmaps. After + Speed to Market - Flexible, smarter approach accelerating your big data program and helping you to stay ahead of the competition. + Improved Quality, Security and Governance - Kylo empowers the best-suited tools and boosts quality, security and governance to meet SLAs. + Cost Reduction - With Kylo you can build your data lake at a fraction of the typical cost. + Extensible Functionality - Flexible platform that can be extended through integration with existing processors and templates. Three Steps to Generate Business Value in Weeks The Kylo Journey The Kylo journey is a three step process that addresses the key stages of the data lake engineering opportunity. INPUTS/SOURCES INGEST PREPARE DISCOVER TOOLS USERS Sensors s KYLO PLATFORM Mobile Machine Logs Tabular ar rd Data Email Social STREAMS BATCH WRANGLE CLEANSING GOVERNANCE SEARCH ACCESS ANALYZE Marketing Applications Business Intelligence Data Mining Data Science Marketing Executives Operational Systems Customer Partners Business Analysis Telemetry em Reporting Data Scientists Hadoop Spark Security, Metadata, Lineage, Operations VALUE Figure 2. Kylo - The 3-Step Journey 4 : Boosting Business Returns with Faster and Smarter Data Lakes

Ingest - There are many tools that ingest batch data, but few that work to ingest streaming or real-time data. Kylo supports a mixture of both. In fact, you can have data sources that come from a streamed source into Kylo before going to a batched source, and then back to streaming. 1. Prepare - Kylo helps companies wrangle with data sources, pulling apart and understanding their data better. Kylo s Ingest and Prepare stages feature UI-guided feed creation as shown below. Users can take full advantage of Kylo s unique UI, as well as its data protection, cleanse, validation and profiling functionality to build data sources. Kylo makes building data sources for business users easier than ever before. Once data is ingested, Kylo features a visual SQL builder and a spreadsheet-like interface, exposing 180+ transformation features to help data analysts wrangle data before they publish and schedule their own feeds. From a business perspective, this means that big data programs will run faster and more smoothly, and that business end users will be able to find and model data to suit their needs easily and quickly. 2. Discover - once your data has been ingested and cleansed, analysts and data scientists can begin to search and find what data is available to them. Kylo makes this data discovery simple, allowing users to build queries to access the data to build data products that support analysis. Using Kylo, data discovery becomes quick and simple with Google-like search functionality against both data and metadata. It allows users to quickly scan the schema catalogue for relevant resources. Kylo metadata entities can be enriched with useful and extensible business metadata. Kylo : Real-Life Success Stories Kylo has helped customers across all verticals to turn sinking data lake projects that have eaten development hours and budgets into tailored solutions that generate tangible results from day one. To give you a flavor of what can be achieved, this section features three success stories. Financial Services: Kylo Improves Security and Fraud Detection while Driving Competitive Advantage The VP of Analytics and Data Warehouse Architect of a large US credit card company had identified various big data challenges, including long lead times in introducing new sources to their Hadoop data lake, as well as pentup demand from click stream data, call center logs and other third party data sources. The team selected Kylo to solve these issues and identified a set of strategic goals for their open source Hadoop project. The top Kylo benefits reported by the customer are: Figure 3. Ingest & Prepare + New Data Access - As a by-product of introducing new data sets, the organization can now analyse, store and load new data sets that were previously prohibitive (i.e. more challenging data formats like call the audio call centre recordings plus data from social channels). + Embracing Innovation - The customer is more agile and has more time to innovate. The team are now free to focus on higher value work. + Reduced Costs and Quicker Time to Market - The customer team now finds operations management much easier with Kylo. Think Big built an operations console which may have taken the customer up to 6 months to build, saving a substantial amount of time and cost. Figure 4. Self-Service & Wrangle 5 : Boosting Business Returns with Faster and Smarter Data Lakes

FMCG: Kylo Enabling Stock Optimization and Buying Pattern Analysis For over two years, our experts have been working with a multinational beverage company to help it to capitalize on its Hadoop investment and to democratize data across the organization while applying advanced analytics. The top Kylo benefits reported by the customer are: + Strategic Opportunities - Kylo s emphasis on governance has allowed the customer to focus on the strategic nature of its big data program, including shelf optimization, buying patterns and loyalty card shopper data and even detailed insight into the potential impacts of the U.S. sugar tax. + Faster Delivery - The team is able to build pipelines faster than before. Kylo has lowered the bar of the skills required to build a pipeline for the database. Users no longer require Java backgrounds and can use Kylo s user-friendly interface to build the pipelines in days rather than weeks. + Improved Governance - Kylo has allowed the team to standardize data quality checks, while also offering the flexibility for custom coding. Standardization has enabled the business to much more easily govern and support its data, and the cost of supporting Kylo has proven much lower because of the technologies that are used. Retail: Kylo Enables More Effective and Targeted Customer Marketing For the past year, our team of data experts have been working with one of the largest retailers in the world to help it re-direct the company s entire data strategy, and to use Kylo to create a single platform as one source of truth for all transactions happening online and in stores across the business. The top Kylo benefits reported by the customer are: + Data Consolidation - The customer has been able to consolidate data into one place where different parts of the business can query it, relying on the data they find. + Improved Data Quality - Because of the reduced administrative burden in ingesting and querying data, the customer has been able to focus on improving the quality of its data. Following the initial Kylo engagement, Think Big are working with the client to continue to re-define this large business as a data organization by doing further consultancy and training around the analytics, processing and publishing of data. Kylo : Building Your Team to Deliver Scalable Solutions and Tangible Outcomes Effective analytic operations is about having crossfunctional teams that collaborate to automate the lifecycle from analytic data set generation, model scoring to testing and deployment. How We Work in Collaboration with Clients Working with the Think Big team, Kylo engagements are structured around a multi-week schedule focused on creating a production-capable data lake. During this time we will set up, install and configure Kylo in multiple environments, ingesting, cleansing, protecting and validating your data while negotiating a set of key data use cases. This covers your metadata capture, data security and the establishment of data confidence metrics. Think Big also provides a custom engineering service of Kylo data lakes that focuses on adding new bespoke templates, as well as system and security integration. In addition we offer a further option to tailor the platform with add-ons designed around your needs to meet your unique data analytics lifecycle. The Team: Who Needs to Get Involved in Your Organization To maximize the returns from your organizational data, Think Big can help you build and train your team of experts, enabling them to focus on their areas of expertise within the analytics journey: + Data Analysts - Helping to streamline configuration, schema definition, scripts and transformations. + Data Stewards - Supporting data confidence, security, audit and data policies. + Data Scientists - Introducing user friendly data wrangling, modelling, visualization and lineage capacity. + Operations - Helping to lead monitoring, controlling, scheduling and SLAs/alerts. + Improved Recommendation Engine - Working with Think Big, the customer now has the ability to use clickstream data from all their websites as well as point of sale transactions from the stores to enable multi-channel recommendation marketing online. 6 : Boosting Business Returns with Faster and Smarter Data Lakes

DEFINE GOVERN DISCOVER MANAGE Configuration Data Confidence Wrangle Monitor Schema Definition Security Model Control Scripts Audit Visualise Scheduler Transformations Data Policies Lineage SLA/Alerts DATA ANALYST DATA STEWARD DATA SCIENCE OPERATIONS Figure 5. The Roles of Kylo Team Training and Support With Kylo, Think Big is building advanced data lake capabilities and donating them back to the open source community. Companies can choose to use Kylo free-ofcharge under the Apache licence, or they can purchase support agreements or a managed service to enhance the level of data lake capability and support they require. Full collateral support starts with building a roadmap and architecture and progressing through to building and engineering your data lake. Kylo operationalizes data science proof of values, as well as engineering use cases. We can follow this with mentoring, training, support and managed services if required. Building Future and Growth with Full Lifecycle Velocity Services Kylo Data Lake Foundation works as a key deliverable within your program to accelerate your time to value and to provide an easy-to-use, costeffective and flexible approach. Kylo empowers modern data analytics by enabling data scientists to quickly catalogue, discover, and qualify data for growth. Following your Kylo data lake engagement, our full lifecycle Velocity approach takes your company through the process of creating a big data strategy to achieve its technical and enterprise goals. + Strategy and Achitecture - Prioritized implementation plan to maximize short and long-term returns. + Data Lake Solutions - Best practices and pre-built components to accelerate time to value. + Analytic Solutions - Building scalable big data infrastructures to meet your future needs. + Data Science - Boosting growth through innovation and modern analytics approaches. + Managed Services - 24/7 expert support empowering continuous improvement. + Training - A comprehensive selection of topical courses delivered by our certified experts including 2 and 4 day training on Kylo and Apache Nifi. Together, Kylo and Velocity Services are improving product and service speed to market while generating tangible returns. 7 : Boosting Business Returns with Faster and Smarter Data Lakes

Conclusion The vision for data lakes has always been user-centric and straightforward, focusing completely on making data swiftly available to business users. However, the reality is that the data contained within most data lakes is not accessible to the average business user. Data lakes are failing to support the time-to-market requirements big data analytics driven innovation requires, and it s safe to say that in many companies, data lakes are widely perceived to be expensive and ineffective. Using Kylo, Think Big is building data lakes based on several years of best practice and global expertise. What is embodied in Kylo is not just an arbitrary application - it is an open source management software platform strategically designed to solve the many challenges of building and maintaining data lakes we ve seen companies face for years. Think Big is working with firms worldwide to build custom engineered data lakes at a fraction of the typical multi-million dollar cost. We create self-service data accessibility and free engineers and data scientists to focus on building complex big data transformations. Kylo helps companies who are starting out on their data lake journey to quickly get productive with a modern approach, as well as those looking to reboot failing data lake programs. If you are considering a new data lake project, or, if you would like to discuss an existing data lake that could benefit from optimization, get in touch today. Just email Kylo@. Teradata Velocity Services approach combines speed with the experience to proceed in the right direction. These services include: Strategy & Architecture, Data Lake, Analytic Solutions, Data Science, Managed Services, and Training to accelerate time to value and increase investment returns. Teradata has also developed innovative technology frameworks through our production work with enterprise customers to speed deployment of big data solutions. With Teradata Velocity, we deliver proven business outcomes in an expedient manner with minimal risk and maximum flexibility to support your ongoing business strategy. Think Big Analytics ThinkBigA ThinkBigAnalytics Think Big Analytics Think Big, A Teradata Company 10705 South Jordan Gateway, Suite 100, South Jordan, Utah 84095