Solution Brief. An Agile Approach to Feeding Cloud Data Warehouses

Similar documents
Architecting an Open Data Lake for the Enterprise

EBOOK: Cloudwick Powering the Digital Enterprise

Cask Data Application Platform (CDAP) Extensions

Datameer for Data Preparation: Empowering Your Business Analysts

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper

Who is Databricks? Today, hundreds of organizations around the world use Databricks to build and power their production Spark applications.

Building a Single Source of Truth across the Enterprise An Integrated Solution

Cognizant BigFrame Fast, Secure Legacy Migration

Digitalizing the customer journey

Embark on Your Data Management Journey with Confidence

Integrated Social and Enterprise Data = Enhanced Analytics

GUIDEBOOK MICROSOFT DYNAMICS CRM FOR GOVERNMENT

Delivering Data Warehousing as a Cloud Service

Building a Data Lake on AWS

Safe Harbor Statement

SOLUTION SHEET End to End Data Flow Management and Streaming Analytics Platform

Microsoft Azure Essentials

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud

Fueling Innovation in Energy Exploration and Production with High Performance Computing (HPC) on AWS

Managing explosion of data. Cloudera, Inc. All rights reserved.

Building a Data Lake on AWS EBOOK: BUILDING A DATA LAKE ON AWS 1

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1

SOLUTION SHEET Hortonworks DataFlow (HDF ) End-to-end data flow management and streaming analytics platform

SAP BusinessObjects Business Intelligence

How to Design a Successful Data Lake

How to Build Your Data Ecosystem with Tableau on AWS

Government Business Intelligence

#mstrworld. A Deep Dive Into Self-Service Data Discovery In MicroStrategy. Vijay Anand Gianthomas Tewksbury Volpe. #mstrworld

Fueling innovation in energy exploration and production with High Performance Computing (HPC) on AWS

Goodbye Starts & Stops... Hello. Goodbye Data Batches... Goodbye Complicated Workflow... Introducing

Building the Foundation for Digital Insurance. An IDC InfoBrief, sponsored by CSC and EMC September 2016

Pega Upstream Oil & Gas Capabilities Overview

The business owner s guide for replacing accounting software

Why an Open Architecture Is Vital to Security Operations

The ITSM Buyer s Guide 5 Must Have-Criteria in a Next Generation ITSM Solution. Presented By:

ENABLING DATA DRIVEN DECISIONS WITH REALTIME INTELLIGENCE

Infor M3 Cloud. Establish a foundation for digital transformation

Amsterdam. (technical) Updates & demonstration. Robert Voermans Governance architect

ApiOmat. Case Study. Challenge

Quick Base s Third Annual Report State of Business Apps 2017: The Future of Problem Solving Fall 2017

MicroStrategy Positioned in the Leaders Quadrant of 2013 Magic Quadrant for Business Intelligence and Analytics Platforms Report

In search of the Holy Grail?

Achieving a Continuous State of Compliance and Readiness on AWS Unlock GRC Innovation on Public Cloud

Ways to Transform. Big Data Analytics into Big Value

Progressive Organization PERSPECTIVE

Are You Ready For a New Era in B2B Collaboration?

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand

Data Integration for the Real-Time Enterprise

Trifacta Data Wrangling for Hadoop: Accelerating Business Adoption While Ensuring Security & Governance

Intelligent Process Automation. The key to automating, orchestrating, and optimizing the modern workplace

The Future Moves Fast: Are You Ready to Respond?

TechArch Day Digital Decoupling. Oscar Renalias. Accenture

When the Status Quo Means Getting Left Behind. Accelerating Analytics Platform Adoption through Evolving Technology

Meta-Managed Data Exploration Framework and Architecture

Enabling Self-Service BI Success: TimeXtender s Discovery Hub Bridges the Gap Between Business and IT

MIGRATING AND MANAGING MICROSOFT WORKLOADS ON AWS WITH DATAPIPE DATAPIPE.COM

TECHNICAL WHITE PAPER. Rubrik and Microsoft Azure Technology Overview and How It Works

Business Intelligence: Build it or Buy it?

Digital Asset Management. Build Better Applications.

PERSPECTIVE. Monetize Data

Migration To the Cloud Using AWS

An Introduction to An Introduction to. BIRST Birst

What is Search-Powered Analytics?

: Boosting Business Returns with Faster and Smarter Data Lakes

Oracle Autonomous Data Warehouse Cloud

Make Business Intelligence Work on Big Data

NEXT-GENERATION DATA MANAGEMENT FOR PUBLIC SAFETY ORGANIZATIONS. Next-Generation Data Management for Public Safety Organizations

How do I simplify, accelerate and scale application testing in my Microsoft Azure development environments?

Core modernization driving digital transformation

Mainframe Development Study: The Benefits of Agile Mainframe Development Tools

Your Business. The Cloud. Business Cloud.

Service Management Automation: Solutionas-a-Service. Brochure. Professional Services

IBM Sterling B2B Integrator

Cognitive enterprise archive and retrieval

Application Migration Patterns for the Service Oriented Cloud

I D C M A R K E T S P O T L I G H T. S i l o s a n d Promote Business Ag i l i t y

7 Steps to Virtualized Data Preparation

INSIDE THIS ISSUE. Whitepaper

NetSuite Software Case Studies. Copyright 2017, Oracle and/or its affiliates. All rights reserved.

TOP 10 REASONS TO MOVE YOUR CONTACT CENTER TO

On-Premises, Consumption- Based Private Cloud Creates Opportunity for Enterprise Out- Tasking Buyers

HOW AGILE DO YOU WANT IT?

Analytics for All Your Data: Cloud Essentials. Pervasive Insight in the World of Cloud

JOURNEY TO AS A SERVICE

Grow. Merchandizing. Digital Experience. Benefits

efiscal Networks Case Study

IBM Planning Analytics Express

CORVINA CORE VALUE INSURANCE ADMINISTRATION. Start Your Vision

IBM Digital Analytics Accelerator

Oracle Autonomous Data Warehouse Cloud

THE MAGIC OF DATA INTEGRATION IN THE ENTERPRISE WITH TIPS AND TRICKS

Efficiently Develop Powerful Apps for An Intelligent Enterprise

Table of Contents. Are You Ready for Digital Transformation? page 04. Take Advantage of This Big Data Opportunity with Cisco and Hortonworks page 06

Smarter Reporting Leads to Better Decisions:

Data warehouse and business intelligence developer

A Breakthrough Approach to Sales Enablement

Faster TIME TO MARKET for competitive campaigns. FASTER changes for rich, optimized content. Increase in CONVERSION with targeted web offers

Digital Transformation Built on Cloud ERP

The Key to Successful Business Transformation

Transcription:

Solution Brief An Agile Approach to Feeding Cloud Data Warehouses

The benefits of cloud data warehouses go far beyond cost savings for organizations. Thanks to their ease-of-use, speed and nearlimitless scalability, cloud data warehouses give business teams the ability to deliver more analytic value faster than ever. Traditional, on-premise data warehouses are expensive to procure and require complex IT setup. This limits an organization s ability capitalize on faster insights and scale ondemand. Cloud data warehouses provide a number of advantages over legacy data warehouses: Easy, rapid deployment Nearly limitless scalability Cost-effective elastic pricing The ability to expand quickly Cloud data warehouses increase the speed to insight for business teams and help organizations remain agile as their data and analytics programs scale. However, many organizations still rely on slow, traditional methods for feeding their cloud data warehouse. The mentality of cloud analytics is different than on-premise. Gone are fixed projects with highly static requirements. The new approach is iterative and exploratory. Business analysts spend more time in self-service mode, where they explore data looking for answers. To facilitate this new iterative and exploratory approach, organizations need a modern approach to delivering data to their cloud data warehouses. Data delivery and the cloud data warehouse need to complement each other to deliver the desired agility. PAGE 2

Data Movement Challenges Data is only useful when it is available for analysis. The faster analysts can get their hands on the data, the more valuable their insights will be for organizations. This is why legacy data movement methods are so detrimental. The agility of cloud data warehouses is undermined by legacy movement process like batch-loading ETL. ETL cannot handle the volume, variety, and velocity of data in today s big data world. Creating data pipelines with ETL-style tools can take weeks, if not months. And the technical nature of the tools renders them usable only for technical staff, making the business teams having to wait in the IT queue. There are four key problems with ETL tools and processes: Programmatic tools ETL tools are complex and programmatic which requires an IT-led project that takes weeks or even months Complex modelling the fixed schema approach of ETL tools makes modelling slow and complicated, causing projects of weeks or months Rigid modelling analysts must work within requirements of rigid schemas, limiting their ability to truly explore the data and boxing in the potential insights Locked-in Logic logic is tied to transformations between specific end-points, limiting the ability to re-use that logic. When these slow, complex ETL process get involved, the agility of cloud data warehouses is severely diminished. Business analysts wait weeks or months to get new datasets, limiting their ability deliver the analytic agility the business teams desire. PAGE 3

An Faster Modern Approach Businesses today need a process to feed their cloud data warehouse that matches the agility, flexibility, and scalability of it. A new generation of data preparation tools on the market today make this level of performance possible. Businesses can finally capture the full advantages of cloud data warehouses. A New Method Two key items have changed in the analytics world in the last couple years the cloud and agile methodologies. We explained the benefits of the cloud earlier and the speed benefits it brings. But agile development methodologies have also worked their way into the analytics world. If the old world ETL approach had highly fixed requirements and length projects, the new world consists of speed, agility and exploration: Speed very quick projects that rapid initial time to value Agility the ability to meet the need of new business requirements faster Exploration digging deeply into the data to find new, unknown insights Figure 1 shows a model that a describes how a new generation of data preparation offerings support this new iterative and exploratory world. Analysts model the data, explore it, model it more, etc. in an iterative way until they FIND what they are looking for. This facilitates faster analytic cycles to dramatically reduce time to insight. Using this approach, Datameer customers have seen up to 98% reduction in their analytic cycles from weeks and months to hours. Figure 1: A New Iterative Approach to Feeding the Cloud Data Warehouse PAGE 4

Feature Requirements To create an agile feed to a cloud data warehouse, there are several key features that data preparation tools should possess: 1. Self-Service 2. Rapid modeling 3. Easy operationalization 4. Deep governance and security 5. Integration with the cloud platform Therefore, new tools that will be successful in creating an agile feed to a cloud data warehouse will have to fulfill these requirements. Let s look at each feature more closely. 1. Self-Service Traditionally, business teams were required to submit data pipelining requests to IT, who would then collect, prepare, model, and finally feed the data into the data warehouse for analysis. The process could take weeks or even months depending on the size and complexity of the dataset. This slow data analytics cycle is damaging to an organization for two reasons: Outdated data. Markets and business environments can change drastically in the weeks or months it takes to feed data into the data warehouse using ETL. Business teams are unable to make real-time decisions and could very well be using outdated data entirely. Limited Exploration. The traditional data analytics process requires business teams to make specific requests for IT to build data pipelines. This forces analysts to work within the confines of what they already know, not giving them room to explore questions they never even thought to ask. To make full use of cloud data warehouses, business teams need self-service data preparation tools that quickly deliver new data models for analysis. The self-service approach gives business teams the power to build their own data models with no-code styling tools and intuitive interfaces. Business analysts can transform, blend, and enrich complex datasets to feed their cloud data warehouses and generate insights more quickly. With self-service tooling, organizations can shorten their analytics cycles from weeks or months to less than a day. Organizations can make faster decisions instead of reactionary ones based on old data. Analysts also have the freedom to explore the entire dataset. They no longer have to know what they re looking for ahead of time; they can explore far and wide for unique and valuable insights. PAGE 5

2. Rapid, schema-less modeling One massive bottleneck in the ETL process was the need to transform data using fixed schemas. The schema-on-write approach is a key reason why data requests take weeks or months to fulfill. This task has gotten infinitely more difficult thanks to the volume, velocity, and variety of data collected today. Preparing data this way takes exhaustive resources and limits the functionality of the data. New data preparation tools don t require data to fit a schema before being loaded into the data warehouse. This type of process is called schema-less, or schema-on-read, which means a virtual schema s are automatically derived as the analyst defines their transformations, and a final schema is only applied when the data is delivered to its target destination in this case, the cloud data warehouse. Schema-less modeling tools make the end to end modeling process faster, but it also makes the data more useful. With the wide variety of data sources and formats, it is nearly impossible to apply one schema that optimizes each individual dataset. With schema-on-read, data can stay in its original format until its ready to be used. Then it can be transformed to fit any number of schemas depending on what the analysts want to do with it. 3. Easy Operationalization ETL-based data feeds have traditionally been difficult to operationalize, forcing organizations to rely on third party programmatic tools for automating and scheduling data movement. Adding multiple tools makes feeding your cloud data warehouse even more complex and time-consuming. Cloud data warehouses require data preparation tools that make operationalization of data pipelines fast, easy and self-service. These tools have built-in automation and scheduling features so business teams can set it and forget it. No custom code is needed; analysts can point to a particular data pipeline and run analytics without the help of IT. This facilitates the iterative process analysts desire. And operationalized jobs need the scale of big data. While traditional ETL tools in the cloud will run on standard EC2 instances, a modern data preparation platform will leverage the power of AWS EMR to deliver high performant and scalable execution of the data pipelines that feed the cloud data warehouse. PAGE 6

4. Deep Governance and Security Governance and security are of critical importance for organizations today. The regulation environment around the world is becoming increasingly difficult to navigate as requirements change regularly and with little warning. Businesses need a data source of truth to ensure transparency and high levels of security. This is a must-have feature for modern data tools that feed your cloud data warehouse. Modern data preparation tools monitor all data movement, transformation, and usage across the entire organization. Using metadata descriptions attached to each dataset, governance leaders can track the entire data lifecycle to ensure compliance. If regulatory bodies ask to see records of data usage, this information is quickly available and always up to date. For security, essential features include encryption, masking, and fine-grained, role-based access control. Business analysts need the freedom explore datasets, but organizations must keep track of their actions. 5. Integration with the Cloud Platform To deliver on many of the aforementioned features, a modern data preparation and exploration platform needs to tightly integrate with and leverage critical underlying services of the cloud platform. This makes the data preparation platform cloudnative, helps deliver a seamless experience, and provides a foundation for the scale, performance and security necessary. For a platform like Datameer, being cloud-native on Amazon Web Services first means separating the compute and storage layers, which allows the two to be elastically scaled independently. It then involves leveraging compute, storage and security services, including: Use of S3 Cloud Object Storage Datameer for AWS stores and manages data on the S3 object storage. This provides persistent storage services and enables Datameer to separate compute and storage services it uses, enabling these to be scaled independently. Elastic compute with EMR clusters Datameer for AWS uses EMR for compute capacity. EMR clusters can be scaled independently by customers allowing Datameer to take advantage of burst computing capacity. Integration with AWS IAM In Datameer for AWS, security is integrated the AWS Identity and Access Management system. PAGE 7

Integration with AWS KMS for encryption key management, Datameer for AWS integrates with the AWS Key Management System. Integration with AWS Management Console Datameer for AWS instances and the underlying EMR and S3 services Datameer uses, can all be managed within the AWS Management Console. Beyond this, Datameer also provides deep integration with the key cloud data warehouse destinations, including Redshift and Snowflake. In these integrations, Datameer provides bi-directional data connections, and has optimizations to ensure the utmost performance when loading data. Datameer in Action Modern data preparation and analytics platforms like Datameer meet the above requirements for feeding cloud data warehouses today. Let s look at how one organization put these tools to work in their business. Agile CDW Data Feeds in Retail The Problem Facing a new stream of digital age buyers and new, specialty e-retail competition, this retailer needed to undergo digital transformation to grow their online business while maintaining balance and leadership with their brick-and-mortar and call center operations. This presented key challenges including: Faster time to insight to fuel data-driven business decisions Delivering wider self-service access to trusted data assets Using cloud economics to reduce analytic infrastructure time & cost In their previous environment, projects to deliver new datasets to business teams were slow and costly taking up to 6 months and costing up to $350,000. In some cases, business teams were not even allowed access to certain silo ed data sources. The retailer wanted to give self-service access to their business analysts so they could integrate and shape the data to their needs and load it into Redshift. This would alleviate the data engineering team, and let them focus on higher value projects to deliver more broad, yet governed access to the data. The company also needed to provide daily updated data feeds to certain business execs and teams. PAGE 8

The Solution The company decided to go with Datameer for their data pipelining needs. Using Datameer s enterprise-grade pipelining platform, the data engineering team was able to set up self-service access to over 350 unique, curated datasets. If new complex data workflows need to be added, these projects are now delivered by the data engineering team in a matter of days, not months. From there, business analysts have self-service access to the curated data, and create their own customized workflows specific to their analytic needs. With built-in data exploration at-scale, and spreadsheet-style tools, analysts could quickly and easily explore the data to find the parts of assets that meet their needs, create their own custom preparation workflows and deliver new insights to organization. Today, the company runs 3 million jobs a year that pump data into their warehouse, providing the daily data feeds that drive faster analytics. Analytic cycles have been reduced by up to 98% and business analysts have the self-service access to the data they need. Through these new insights, the company has improved the efficiency of their e-commerce operations, expanded. Getting Started Where is the slow point for your agile data organization? Even if you have adopted cloud data warehouses, it still isn t enough if you re still using old-school ETL processes to move your data. Organizations need agility, flexibility, and scalability across their entire big data stack, and that includes their data pipelines. Datameer can help you supercharge your business analytics teams by putting the power of data preparation in their hands. Self-service tools with rapid modeling lets business teams build and operationalize their own data pipelines without the need for IT. Datameer also monitors and secures data throughout its lifecycle, giving data engineering teams the tools that keep data secure and governed to match even the most stringent data privacy regulations. The modern Datameer approach using data preparation to feed your cloud data warehouse makes your data move faster, more powerful, and more secure than ever before. To learn more, visit www.datameer.com. PAGE 9