Meta-Managed Data Exploration Framework and Architecture

Size: px
Start display at page:

Download "Meta-Managed Data Exploration Framework and Architecture"

Transcription

1 Meta-Managed Data Exploration Framework and Architecture

2 CONTENTS Executive Summary Meta-Managed Data Exploration Framework Meta-Managed Data Exploration Architecture Data Exploration Process: Modules And Flow Our Experience Conclusion References

3 Executive Summary The geometric increase in the volume (amount), velocity (speed) and variety (range) of data today has facilitated the evolution of new information architectures and tools which not only collect and store data, but use it (data) to create business value and provide innovative business insights. In fact, they have made harnessing the data to generate near real-time insights entirely possible. Data exploration was, traditionally, a field with a limited number of expert users. But it has come a long way since then, and is now a part and parcel of our daily work. So much so that through the democratization of information, we are now placing data in the hands of many while simultaneously keeping it a separate process. Having said so, the need to create insights on the fly instead of having to go through standard IT processes is on the rise. A meta-managed architecture has the necessary potential to enable this capability since it provides a blueprint for the lineage, storage, data structures and the functionality inherent to an enterprise-scale application. Metadata management can extend both application scope and functionality with minimal to no additional software development. Whenever actual software development is required to enhance the platform and resolve some unknown complexity, it is made possible by extending additional metadata sets without facing too many challenges. Unaware of these advances in data-related technologies, analysts spend an inordinate amount of time in data aggregation and interpretation. This process of data exploration is often the most tedious and time consuming aspect of the analysis. This paper explores how a meta-managed information architecture framework has the potential to track different data sources which are ingested into an information hub through well-defined data lineage and additional capabilities. These capabilities include interactive data curation, transformation, publication and collaboration. They also help data analysts and agents in focusing their time and energy on making informed decisions using reliable data sets instead of non-productive activities. Capabilities for exchanging data to downstream / upstream systems in near real time Data APIs for integration with internal / external systems. Data exploration features Exploratory queries - Search capabilities enabled Analytical query processing 5 Query Balanced 6 Information Xchange Next Generation Information Architecture 1 Meta Managed Governed 2 Ability to track data sources ingested into the information hub Track data lineage and provenance of storage and processing activities Centralized and coordinated management of projects / activities, managing change and modern data management technologies used for cleansing, standardizing and integrating the data leveraging a scalable computing platform Stream processor integrated with Analytics store to respond to queries for both real-time and historical data High volume data ingestion capabilities across multiple data formats Batch and Stream Oriented 4 Operability 3 Proper monitoring is critical to run a large-scale distributed cluster Near real-time operational dashboard to depict health of operations, like delivery effectiveness, sorting effectiveness, etc Figure 1: Six core strategic dimensions to build a strong future ready information architecture 03

4 Meta-Managed Data Exploration Framework Figure 2 Data Exploration and Visual Analysis Framework Agile data wrangling capability enables deep analysis of data sets and the derivation of breadth through insights. The data exploration workflow starts with one or more initial raw analytical data sets. Figure 2 describes a typical data wrangling framework. The salient characteristics of a data wrangling framework are: An overlap between the data exploration and analysis phases An integrated toolset with data exploration and visual analysis tools Involves multiple iterations of involved steps (like a typical data exploration process) Immediate revelation of data due to data exploration taking place before loading the data onto visualization and analysis tools Enables exploration at every stage of the analysis process as users discover interesting insights from raw or new data (as and when it becomes available) As a result, the data which evolves from the raw to the usable stage through the data wrangling analytical process leads to new and better business insights. Industries with active adoption Some departments in which the adoption of a meta-managed data exploration platform can deliver deeper business insights are: Life Sciences research is an area in which the potential of data analysis can be demonstrated conspicuously. The problem is that physicians, biologists and life scientists lack the required data analysis skills. Using a meta-managed data exploration platform can help overcome these and associated challenges. Some key benefits include accelerating clinical trials, facilitating drug innovation, identifying adverse drug reactions, as well as optimizing supply chain management and marketing. Financial services and Insurance institutions can leverage data exploration platforms to detect fraud. These platforms can also help identify and manage risks, assess portfolio valuations, build global customer 360 degree view while ensuring regulatory compliance, and improve investment management capabilities. CPG and Retail institutions can derive valuable insights from diverse data sets sourced from different suppliers, production facilities, logistics, partners and retailers. These institutions can use these platforms to improve their bottom lines by quickly responding to sales trends, thereby reducing costs. 04

5 Meta-Managed Data Exploration Architecture Figure 3 Meta-Managed Data Exploration Platform Architecture A brief description of the components as described in Figure 3 is provided in the table below: Components Data sources and Ingestion Data Transformation Meta Data Management, Data Security and Data Governance Data Exploration Technology and Tools Description Consists of enterprise internal and external data sources feeding input data sets to data exploration platform Comprises of various interactive data preparation service modules used by end users / data analysts to align, harmonize, integrate and enrich input data sets as part of the data exploration process. This layer also provides interfaces like Web Services or APIs, which are used to interact with Big Data distribution and traditional data repository components Encompasses capabilities applicable to whole data exploration platforms. These include components which are responsible for managing metadata (business and technical); also applies security control mechanisms such as data anonymization, role-based access etc, and overall platform data governance such as analyst onboarding, provisioning access, project portfolio management and so on Includes visual analytics tools which can connect, stream and visualize data hosted on both Big Data and traditional structured data repositories. Data virtualization layer, which is not shown in the diagram is leveraged to access data across different data repository tiers Embraces Big Data distribution components such as Interactive SQL, NoSQL data repositories, data indexing and search, and in-memory computation engines. Technical components provide technical enablement for the functionalities outlined in the Data Transformation layer 05

6 Data Exploration Process: Modules and Flow Figure 2 describes a typical data wrangling framework and illustrates how a combination of exploration and visual analytics tools can jointly help data analysts to derive valuable business insights. Figure 4 complements the framework and explains how a typical data exploration platform built using modern age tools and technologies works. Ingest Prepare Transform Publish Figure 4 Typical data wrangling platform workflow using modern age tools and technologies Each of these modules along with their associated components is listed as follows: Module Definition Components / Features Ingest Prepare Transform Ingest, manage and register data set in data exploration tool Preview, interact and define transformation workflow Execute transformation on entire data set at scale Data set entitlement engine (authentication and authorization) Exploration project / Workspace management Data ingestion: Connectors / APIs for source systems such as enterprise applications, S3, HDFS, NoSQL, etc. Data indexing, profiling and search Metadata catalog Data storage: NoSQL (Accumulo, Hbase, etc.), HDFS Data and metadata life cycle management: On premise, Cloud Data sampling: Sample data set Data security: Role-based data access Data lineage: Processing steps and metadata (application and operational) Data preparation and processing services UX framework components: Data preparation (structure, split, enrich, filter, etc.) Interactive access and data cache User access management Machine learning recommendations Data processing services: Entire data set Data lineage: Data processing steps and metadata (application and operational) Machine learning recommendations Off-Heap data cache considerations Batch data processing using MapReduce (Pig, Hive, etc.), Spark, and so on Publish Select transformation output format and target location Output data generation and persistence: HDFS, Avro, ORC, Parquet, etc. Connectors for downstream applications: BI, data services, mobile, enterprise applications Visual analytics tool connectors: Hadoop connectors, Star Schema, etc. Analytic SQL Impala, Hive, etc Scheduling and delivery: s, SFTP, etc 06

7 A success story Client brief: A large multinational company in the US Background: ValueLabs designed a data exploration platform to support the client s data analysts with a workflow-based, interactive data preparation, transformation and visualization platform. Data sets were ingested from a variety of sources (legacy systems that had been running for years) and across geographies involving high-volume data and complex, algorithm-driven data preparation recommendations. Some key features that were implemented include: Compute various KPIs (to be reported to customers) based on the ingested survey data Load data into target data model, providing a mechanism of sending Alerts to configured stakeholders (post the final loading of data) Create reports on top of the data store to report KPIs across store, product and time dimensions Challenges: Handling data ingestion from client s global data factory Dissimilar client data - Data in different formats / schema for different geographies and products Varied metadata across different products and geographies Solution: The main challenge was to handle dynamic schema and dynamic KPI calculation, and so we used the Avro Framework to handle dynamic schema, stored rules in HBase, and computed on Spark to achieve dynamic calculations. Key considerations: Configurable source file schema: Easy-to-handle changes in the existing source file schema or introduction of a new source file leveraging the flexible schema features of Avro Spark components: In-memory processing to speed up KPI calculation Avro Schema file: Helps in identifying the changes in source file, and in appropriate error handling Individual suite of applications or instances: For better control and customization Lineage / Job status: Lineage / job status information stored in a log file or HBase table to track the progress and status of components Key technologies considered: Configurable source file schema (using Avro) Spark components (for in-memory processing) Benefits: A platform that can be used to load as-is data from different data factories Linear scalability and performance improvement in data processing through dynamic schema handling and KPI calculation Quicker turnaround time for report generation while maintaining consistency in what gets reported Our solution included developing a metadata-driven data exploration system with the ability to compute KPIs with minimal changes in the code base, based on ever changing business needs. 07

8 Conclusion References In this paper, we have covered the practical problems and challenges that come up regularly when a data analyst tries to work with a real-world data set. Visualization can aid in the detection of potential problems in the raw data as a counterpart to fully algorithmic approaches. There is a possibility of integrated approaches that allow an individual to visually steer statistical algorithms. Visualization is also useful in the communication of data errors and uncertainties. When designing new visual interfaces, there needs to be a consideration for input data which may not be pristine. Also, the chosen visual interface should indicate any missing values and data uncertainties. Finally, when it comes to correcting data errors, visual approaches could integrate with automated approaches to allow an interactive editing cycle. Sean Kandel, Jeffrey Heer, Catherine Plaisant, Jessie Kennedy, Frank van Ham, Nathalie Henry Riche, Chris Weaver, Bongshin Lee5, Dominique Brodbeck and Paolo Buono, Research directions in data wrangling: Visualizations and transformations for usable and credible data, 2011 Trifacta, The Opportunity for Data Wrangling in Financial Services and Insurance Trifacta, The Opportunity for Data Wrangling in Life Sciences and Biopharmaceuticals Trifacta, Optimize Retail & CPG processes with Agile Data Wrangling on Hadoop Ideally, the output of an exploration session should be more than a clean data set; it should also encompass the raw data coupled with a well-defined set of data operations and potentially some metadata indicating why these operations were performed. These operations should be auditable and editable by a user. Secondary benefits of a high-level data transformation language include easier reuse of previous formatting efforts and an increased potential for social, distributed collaboration around data exploration. By Shuvadeep Dutta (Director - Architecture and Consulting) Manidipa Mitra (Director - Big Data) Debasish Chatterjee (VP and Global Head - Big Data) Plot No.41, Hitech City, Phase II, Cyberabad, Phone , Fax

EXAMPLE SOLUTIONS Hadoop in Azure HBase as a columnar NoSQL transactional database running on Azure Blobs Storm as a streaming service for near real time processing Hadoop 2.4 support for 100x query gains

More information

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight. Course Content Course Description: The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight. At Course Completion: After competing this course,

More information

Trifacta Data Wrangling for Hadoop: Accelerating Business Adoption While Ensuring Security & Governance

Trifacta Data Wrangling for Hadoop: Accelerating Business Adoption While Ensuring Security & Governance 575 Market St, 11th Floor San Francisco, CA 94105 www.trifacta.com 844.332.2821 1 WHITEPAPER Trifacta Data Wrangling for Hadoop: Accelerating Business Adoption While Ensuring Security & Governance 2 Introduction

More information

20775A: Performing Data Engineering on Microsoft HD Insight

20775A: Performing Data Engineering on Microsoft HD Insight 20775A: Performing Data Engineering on Microsoft HD Insight Duration: 5 days; Instructor-led Implement Spark Streaming Using the DStream API. Develop Big Data Real-Time Processing Solutions with Apache

More information

20775 Performing Data Engineering on Microsoft HD Insight

20775 Performing Data Engineering on Microsoft HD Insight Duración del curso: 5 Días Acerca de este curso The main purpose of the course is to give students the ability plan and implement big data workflows on HD. Perfil de público The primary audience for this

More information

Transforming Analytics with Cloudera Data Science WorkBench

Transforming Analytics with Cloudera Data Science WorkBench Transforming Analytics with Cloudera Data Science WorkBench Process data, develop and serve predictive models. 1 Age of Machine Learning Data volume NO Machine Learning Machine Learning 1950s 1960s 1970s

More information

SOLUTION SHEET Hortonworks DataFlow (HDF ) End-to-end data flow management and streaming analytics platform

SOLUTION SHEET Hortonworks DataFlow (HDF ) End-to-end data flow management and streaming analytics platform SOLUTION SHEET Hortonworks DataFlow (HDF ) End-to-end data flow management and streaming analytics platform CREATE STREAMING ANALYTICS APPLICATIONS IN MINUTES WITHOUT WRITING CODE The increasing growth

More information

20775A: Performing Data Engineering on Microsoft HD Insight

20775A: Performing Data Engineering on Microsoft HD Insight 20775A: Performing Data Engineering on Microsoft HD Insight Course Details Course Code: Duration: Notes: 20775A 5 days This course syllabus should be used to determine whether the course is appropriate

More information

20775: Performing Data Engineering on Microsoft HD Insight

20775: Performing Data Engineering on Microsoft HD Insight Let s Reach For Excellence! TAN DUC INFORMATION TECHNOLOGY SCHOOL JSC Address: 103 Pasteur, Dist.1, HCMC Tel: 08 38245819; 38239761 Email: traincert@tdt-tanduc.com Website: www.tdt-tanduc.com; www.tanducits.com

More information

SOLUTION SHEET End to End Data Flow Management and Streaming Analytics Platform

SOLUTION SHEET End to End Data Flow Management and Streaming Analytics Platform SOLUTION SHEET End to End Data Flow Management and Streaming Analytics Platform CREATE STREAMING ANALYTICS APPLICATIONS IN MINUTES WITHOUT WRITING CODE The increasing growth of data, especially data-in-motion,

More information

Amsterdam. (technical) Updates & demonstration. Robert Voermans Governance architect

Amsterdam. (technical) Updates & demonstration. Robert Voermans Governance architect (technical) Updates & demonstration Robert Voermans Governance architect Amsterdam Please note IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

Adobe and Hadoop Integration

Adobe and Hadoop Integration Predictive Behavioral Analytics Adobe and Hadoop Integration DECEMBER 2016 SYNTASA Copyright 1.0 Introduction For many years large enterprises have relied on the Adobe Marketing Cloud for capturing and

More information

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

Cloud Integration and the Big Data Journey - Common Use-Case Patterns Cloud Integration and the Big Data Journey - Common Use-Case Patterns A White Paper August, 2014 Corporate Technologies Business Intelligence Group OVERVIEW The advent of cloud and hybrid architectures

More information

Cask Data Application Platform (CDAP) Extensions

Cask Data Application Platform (CDAP) Extensions Cask Data Application Platform (CDAP) Extensions CDAP Extensions provide additional capabilities and user interfaces to CDAP. They are use-case specific applications designed to solve common and critical

More information

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation Roger Ding Cloudera February 3rd, 2018 1 Agenda Hadoop History Introduction to Apache Hadoop

More information

: Boosting Business Returns with Faster and Smarter Data Lakes

: Boosting Business Returns with Faster and Smarter Data Lakes : Boosting Business Returns with Faster and Smarter Data Lakes Empower data quality, security, governance and transformation with proven template-driven approaches By Matt Hutton Director R&D, Think Big,

More information

Spark and Hadoop Perfect Together

Spark and Hadoop Perfect Together Spark and Hadoop Perfect Together Arun Murthy Hortonworks Co-Founder @acmurthy Data Operating System Enable all data and applications TO BE accessible and shared BY any end-users Data Operating System

More information

DLT AnalyticsStack. Powering big data, analytics and data science strategies for government agencies

DLT AnalyticsStack. Powering big data, analytics and data science strategies for government agencies DLT Stack Powering big data, analytics and data science strategies for government agencies Now, government agencies can have a scalable reference model for success with Big Data, Advanced and Data Science

More information

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud Datametica The Modern Data Platform Enterprise Data Hub Implementations Why is workload moving to Cloud 1 What we used do Enterprise Data Hub & Analytics What is Changing Why it is Changing Enterprise

More information

Big Data Cloud. Simple, Secure, Integrated and Performant Big Data Platform for the Cloud

Big Data Cloud. Simple, Secure, Integrated and Performant Big Data Platform for the Cloud Big Data Cloud Simple, Secure, Integrated and Performant Big Data Platform for the Cloud Big Data Platform engineered for the data-driven enterprise Oracle s Big Data Cloud delivers a Big Data Platform

More information

Common Customer Use Cases in FSI

Common Customer Use Cases in FSI Common Customer Use Cases in FSI 1 Marketing Optimization 2014 2014 MapR MapR Technologies Technologies 2 Fortune 100 Financial Services Company 104M CARD MEMBERS 3 Financial Services: Recommendation Engine

More information

Hortonworks Connected Data Platforms

Hortonworks Connected Data Platforms Hortonworks Connected Data Platforms MASTER THE VALUE OF DATA EVERY BUSINESS IS A DATA BUSINESS EMBRACE AN OPEN APPROACH 2 Hortonworks Inc. 2011 2016. All Rights Reserved Data Drives the Connected Car

More information

How to Design a Successful Data Lake

How to Design a Successful Data Lake KNOWLEDGENT WHITE PAPER How to Design a Successful Data Lake Information through Innovation Executive Summary Business users are continuously envisioning new and innovative ways to use data for operational

More information

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake White Paper Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake Motivation for Modernization It is now a well-documented realization among Fortune 500 companies

More information

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK Are you drowning in Big Data? Do you lack access to your data? Are you having a hard time managing Big Data processing requirements?

More information

Pentaho 8.0 and Beyond. Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara

Pentaho 8.0 and Beyond. Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara Pentaho 8.0 and Beyond Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara Safe Harbor Statement The forward-looking statements contained in this document represent an outline of our

More information

Confidential

Confidential June 2017 1. Is your EDW becoming too expensive to maintain because of hardware upgrades and increasing data volumes? 2. Is your EDW becoming a monolith, which is too slow to adapt to business s analytical

More information

Solution Brief. An Agile Approach to Feeding Cloud Data Warehouses

Solution Brief. An Agile Approach to Feeding Cloud Data Warehouses Solution Brief An Agile Approach to Feeding Cloud Data Warehouses The benefits of cloud data warehouses go far beyond cost savings for organizations. Thanks to their ease-of-use, speed and nearlimitless

More information

Business is being transformed by three trends

Business is being transformed by three trends Business is being transformed by three trends Big Cloud Intelligence Stay ahead of the curve with Cortana Intelligence Suite Business apps People Custom apps Apps Sensors and devices Cortana Intelligence

More information

Adobe and Hadoop Integration

Adobe and Hadoop Integration Predictive Behavioral Analytics Adobe and Hadoop Integration JANUARY 2016 SYNTASA Copyright 1.0 Introduction For many years large enterprises have relied on the Adobe Marketing Cloud for capturing and

More information

Azure Data Analytics & Machine Learning Seminar. Daire Cunningham: BI Practice Area Manager

Azure Data Analytics & Machine Learning Seminar. Daire Cunningham: BI Practice Area Manager Azure Data Analytics & Machine Learning Seminar Daire Cunningham: BI Practice Area Manager AGENDA 09:00 AM 09:30 AM Registration & Refreshments 09.30AM 10:00 AM 10:00 AM 10:30 AM Welcome & Keynote, Ger

More information

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica Accelerating Your Big Data Analytics Jeff Healey, Director Product Marketing, HPE Vertica Recent Waves of Disruption IT Infrastructu re for Analytics Data Warehouse Modernization Big Data/ Hadoop Cloud

More information

Databricks Cloud. A Primer

Databricks Cloud. A Primer Databricks Cloud A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to

More information

PORTFOLIO AND TECHNOLOGY DIRECTION ARMISTEAD SAPP & RANDY GUARD

PORTFOLIO AND TECHNOLOGY DIRECTION ARMISTEAD SAPP & RANDY GUARD PORTFOLIO AND TECHNOLOGY DIRECTION ARMISTEAD SAPP & RANDY GUARD FOCUS MARKETS SAS Addressable Market Size $US Billions $14.7 2015 2019 $10.6 $9.6 $7.0 $7.9 $5.0 $2.6 $3.7 $5.7 $4.4 $3.0 $4.2 BUSINESS INTELLIGENCE

More information

DataAdapt Active Insight

DataAdapt Active Insight Solution Highlights Accelerated time to value Enterprise-ready Apache Hadoop based platform for data processing, warehousing and analytics Advanced analytics for structured, semistructured and unstructured

More information

Enterprise Analytics Accelerating Your Path to Value with an Open Analytics Platform

Enterprise Analytics Accelerating Your Path to Value with an Open Analytics Platform Enterprise Analytics Accelerating Your Path to Value with an Open Analytics Platform Federico Pozzi @fedealbpozzi Mathias Coopmans @macoopma Characteristics of a badly managed platform No clear data

More information

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand Paper 2698-2018 Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand ABSTRACT Digital analytics is no longer just about tracking the number

More information

Microsoft Azure Essentials

Microsoft Azure Essentials Microsoft Azure Essentials Azure Essentials Track Summary Data Analytics Explore the Data Analytics services in Azure to help you analyze both structured and unstructured data. Azure can help with large,

More information

Microsoft reinvents sales processing and financial reporting with Azure

Microsoft reinvents sales processing and financial reporting with Azure Microsoft IT Showcase Microsoft reinvents sales processing and financial reporting with Azure Core Services Engineering (CSE, formerly Microsoft IT) is moving MS Sales, the Microsoft revenue reporting

More information

Pentaho 8.0 Overview. Pedro Alves

Pentaho 8.0 Overview. Pedro Alves Pentaho 8.0 Overview Pedro Alves Safe Harbor Statement The forward-looking statements contained in this document represent an outline of our current intended product direction. It is provided for information

More information

AZURE HDINSIGHT. Azure Machine Learning Track Marek Chmel

AZURE HDINSIGHT. Azure Machine Learning Track Marek Chmel AZURE HDINSIGHT Azure Machine Learning Track Marek Chmel SESSION AGENDA Understanding different scenarios of Hadoop Building an end to end pipeline using HDInsight Using in-memory techniques to analyze

More information

WELCOME TO. Cloud Data Services: The Art of the Possible

WELCOME TO. Cloud Data Services: The Art of the Possible WELCOME TO Cloud Data Services: The Art of the Possible Goals for Today Share the cloud-based data management and analytics technologies that are enabling rapid development of new mobile applications Discuss

More information

GET MORE VALUE OUT OF BIG DATA

GET MORE VALUE OUT OF BIG DATA GET MORE VALUE OUT OF BIG DATA Enterprise data is increasing at an alarming rate. An International Data Corporation (IDC) study estimates that data is growing at 50 percent a year and will grow by 50 times

More information

Real-time IoT Big Data-in-Motion Analytics Case Study: Managing Millions of Devices at Country-Scale

Real-time IoT Big Data-in-Motion Analytics Case Study: Managing Millions of Devices at Country-Scale Real-time IoT Big Data-in-Motion Analytics Case Study: Managing Millions of Devices at Country-Scale Real-time IoT Big Data-in-Motion Analytics Case Study: Managing Millions of Devices at Country-Scale

More information

Got Data Silos? Automate Data Ingestion Into Isilon In Support Of Analytics

Got Data Silos? Automate Data Ingestion Into Isilon In Support Of Analytics Got Data Silos? Automate Data Ingestion Into Isilon In Support Of Analytics Key takeaways Analytic Insights Module for self-service analytics Automate data ingestion into Isilon Data Lake Three methods

More information

Cloudera, Inc. All rights reserved.

Cloudera, Inc. All rights reserved. 1 Data Analytics 2018 CDSW Teamplay und Governance in der Data Science Entwicklung Thomas Friebel Partner Sales Engineer tfriebel@cloudera.com 2 We believe data can make what is impossible today, possible

More information

Access, Transform, and Connect Data with SAP Data Services Software

Access, Transform, and Connect Data with SAP Data Services Software SAP Brief SAP s for Enterprise Information Management SAP Data Services Access, Transform, and Connect Data with SAP Data Services Software SAP Brief Establish an enterprise data integration and data quality

More information

EBOOK: Cloudwick Powering the Digital Enterprise

EBOOK: Cloudwick Powering the Digital Enterprise EBOOK: Cloudwick Powering the Digital Enterprise Contents What is a Data Lake?... Benefits of a Data Lake on AWS... Building a Data Lake on AWS... Cloudwick Case Study... About Cloudwick... Getting Started...

More information

Datameer for Data Preparation: Empowering Your Business Analysts

Datameer for Data Preparation: Empowering Your Business Analysts Datameer for Data Preparation: Empowering Your Business Analysts As businesses strive to be data-driven organizations, self-service data preparation becomes a critical cog in the analytic process. Self-service

More information

PLATFORM CAPABILITIES OF THE DIGITAL BUSINESS PLATFORM

PLATFORM CAPABILITIES OF THE DIGITAL BUSINESS PLATFORM PLATFORM CAPABILITIES OF THE DIGITAL BUSINESS PLATFORM Jay Gauthier VP Platform Integration DIGITAL TRANSFORMATION #WITHOUTCOMPROMISE 2017 Software AG. All rights reserved. DIGITAL BUSINESS PLATFORM DIGITAL

More information

Hortonworks Data Platform

Hortonworks Data Platform Hortonworks Data Platform An open-architecture platform to manage data in motion and at rest Highlights Addresses a range of data-at-rest use cases Powers real-time customer applications Delivers robust

More information

5th Annual. Cloudera, Inc. All rights reserved.

5th Annual. Cloudera, Inc. All rights reserved. 5th Annual 1 The Essentials of Apache Hadoop The What, Why and How to Meet Agency Objectives Sarah Sproehnle, Vice President, Customer Success 2 Introduction 3 What is Apache Hadoop? Hadoop is a software

More information

SAP Cloud Platform Big Data Services EXTERNAL. SAP Cloud Platform Big Data Services From Data to Insight

SAP Cloud Platform Big Data Services EXTERNAL. SAP Cloud Platform Big Data Services From Data to Insight EXTERNAL FULL-SERVICE BIG DATA IN THE CLOUD, a fully managed Apache Hadoop and Apache Spark cloud offering, form the cornerstone of many successful Big Data implementations. Enterprises harness the performance

More information

Integrated Social and Enterprise Data = Enhanced Analytics

Integrated Social and Enterprise Data = Enhanced Analytics ORACLE WHITE PAPER, DECEMBER 2013 THE VALUE OF SOCIAL DATA Integrated Social and Enterprise Data = Enhanced Analytics #SocData CONTENTS Executive Summary 3 The Value of Enterprise-Specific Social Data

More information

TechArch Day Digital Decoupling. Oscar Renalias. Accenture

TechArch Day Digital Decoupling. Oscar Renalias. Accenture TechArch Day 2018 Digital Decoupling Oscar Renalias Accenture !"##$ oscar.renalias@acenture.com @oscarrenalias https://www.linkedin.com/in/oscarrenalias/ https://github.com/accenture THE ERA OF THE BIG

More information

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1 Spotlight Sessions Nik Rouda Director of Product Marketing Cloudera @nrouda Cloudera, Inc. All rights reserved. 1 Spotlight: Protecting Your Data Nik Rouda Product Marketing Cloudera, Inc. All rights reserved.

More information

Cloudera Data Science and Machine Learning. Robin Harrison, Account Executive David Kemp, Systems Engineer. Cloudera, Inc. All rights reserved.

Cloudera Data Science and Machine Learning. Robin Harrison, Account Executive David Kemp, Systems Engineer. Cloudera, Inc. All rights reserved. Cloudera Data Science and Machine Learning Robin Harrison, Account Executive David Kemp, Systems Engineer 1 This is the age of machine learning. Data volume NO Machine Learning Machine Learning 1950s 1960s

More information

Analytics empowering clients to see farther & go faster

Analytics empowering clients to see farther & go faster Analytics empowering clients to see farther & go faster Vendor Agnostic Data & analytics focus with leading technology expertise Business Value Improve business processes via analytic solutions Partner

More information

Analytics in Action transforming the way we use and consume information

Analytics in Action transforming the way we use and consume information Analytics in Action transforming the way we use and consume information Big Data Ecosystem The Data Traditional Data BIG DATA Repositories MPP Appliances Internet Hadoop Data Streaming Big Data Ecosystem

More information

TechValidate Survey Report. Converged Data Platform Key to Competitive Advantage

TechValidate Survey Report. Converged Data Platform Key to Competitive Advantage TechValidate Survey Report Converged Data Platform Key to Competitive Advantage TechValidate Survey Report Converged Data Platform Key to Competitive Advantage Executive Summary What Industry Analysts

More information

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved. Apache Spark 2.0 GA The General Engine for Modern Analytic Use Cases 1 Apache Spark Drives Business Innovation Apache Spark is driving new business value that is being harnessed by technology forward organizations.

More information

In search of the Holy Grail?

In search of the Holy Grail? In search of the Holy Grail? Our Clients Journey to the Data Lake André De Locht Sr Business Consultant Data Lake, Information Integration and Governance $ andre.de.locht@be.ibm.com ( +32 476 870 354 Data

More information

Sr. Sergio Rodríguez de Guzmán CTO PUE

Sr. Sergio Rodríguez de Guzmán CTO PUE PRODUCT LATEST NEWS Sr. Sergio Rodríguez de Guzmán CTO PUE www.pue.es Hadoop & Why Cloudera Sergio Rodríguez Systems Engineer sergio@pue.es 3 Industry-Leading Consulting and Training PUE is the first Spanish

More information

EXPERIENCE EVERYTHING

EXPERIENCE EVERYTHING EXPERIENCE EVERYTHING RAPID. OPEN. SECURE. Jigar Bhansali VP Solution & Architecture, Asia & China INNOVATION TOUR 2018 April 26 Singapore 2018 Software AG. All rights reserved. For internal use only HYBRID

More information

Cask Data Application Platform (CDAP)

Cask Data Application Platform (CDAP) Cask Data Application Platform (CDAP) CDAP is an open source, Apache 2.0 licensed, distributed, application framework for delivering Hadoop solutions. It integrates and abstracts the underlying Hadoop

More information

Architecting an Open Data Lake for the Enterprise

Architecting an Open Data Lake for the Enterprise Architecting an Open Data Lake for the Enterprise 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Today s Presenters Daniel Geske, Solutions Architect, Amazon Web Services Armin

More information

Architecture Optimization for the new Data Warehouse. Cloudera, Inc. All rights reserved.

Architecture Optimization for the new Data Warehouse. Cloudera, Inc. All rights reserved. Architecture Optimization for the new Data Warehouse Guido Oswald - @GuidoOswald 1 Use Cases This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently

More information

Make Business Intelligence Work on Big Data

Make Business Intelligence Work on Big Data Make Business Intelligence Work on Big Data Speed. Scale. Simplicity. Put the Power of Big Data in the Hands of Business Users Connect your BI tools directly to your big data without compromising scale,

More information

Oracle Big Data Discovery Cloud Service

Oracle Big Data Discovery Cloud Service Oracle Big Data Discovery Cloud Service The Visual Face of Big Data in Oracle Cloud Oracle Big Data Discovery Cloud Service provides a set of end-to-end visual analytic capabilities that leverages the

More information

Building a Single Source of Truth across the Enterprise An Integrated Solution

Building a Single Source of Truth across the Enterprise An Integrated Solution SOLUTION BRIEF Building a Single Source of Truth across the Enterprise An Integrated Solution From EDW modernization to self-service BI on big data This solution brief showcases an integrated approach

More information

Big Data Introduction

Big Data Introduction Big Data Introduction Who we are Experts At Your Service Over 50 specialists in IT infrastructure Certified, experienced, passionate Based In Switzerland 100% self-financed Swiss company Over CHF8 mio.

More information

Who is Databricks? Today, hundreds of organizations around the world use Databricks to build and power their production Spark applications.

Who is Databricks? Today, hundreds of organizations around the world use Databricks to build and power their production Spark applications. Databricks Primer Who is Databricks? Databricks was founded by the team who created Apache Spark, the most active open source project in the big data ecosystem today, and is the largest contributor to

More information

Azure ML Data Camp. Ivan Kosyakov MTC Architect, Ph.D. Microsoft Technology Centers Microsoft Technology Centers. Experience the Microsoft Cloud

Azure ML Data Camp. Ivan Kosyakov MTC Architect, Ph.D. Microsoft Technology Centers Microsoft Technology Centers. Experience the Microsoft Cloud Microsoft Technology Centers Microsoft Technology Centers Experience the Microsoft Cloud Experience the Microsoft Cloud ML Data Camp Ivan Kosyakov MTC Architect, Ph.D. Top Manager IT Analyst Big Data Strategic

More information

Advancing Information Management and Analysis with Entity Resolution. Whitepaper ADVANCING INFORMATION MANAGEMENT AND ANALYSIS WITH ENTITY RESOLUTION

Advancing Information Management and Analysis with Entity Resolution. Whitepaper ADVANCING INFORMATION MANAGEMENT AND ANALYSIS WITH ENTITY RESOLUTION Advancing Information Management and Analysis with Entity Resolution Whitepaper February 2016 novetta.com 2016, Novetta ADVANCING INFORMATION MANAGEMENT AND ANALYSIS WITH ENTITY RESOLUTION Advancing Information

More information

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper Sponsored by Successful Data Warehouse Approaches to Meet Today s Analytics Demands EXECUTIVE BRIEF In this Paper Organizations are adopting increasingly sophisticated analytics methods Analytics usage

More information

Hadoop Course Content

Hadoop Course Content Hadoop Course Content Hadoop Course Content Hadoop Overview, Architecture Considerations, Infrastructure, Platforms and Automation Use case walkthrough ETL Log Analytics Real Time Analytics Hbase for Developers

More information

NICE Customer Engagement Analytics - Architecture Whitepaper

NICE Customer Engagement Analytics - Architecture Whitepaper NICE Customer Engagement Analytics - Architecture Whitepaper Table of Contents Introduction...3 Data Principles...4 Customer Identities and Event Timelines...................... 4 Data Discovery...5 Data

More information

H2O Powers Intelligent Product Recommendation Engine at Transamerica. Case Study

H2O Powers Intelligent Product Recommendation Engine at Transamerica. Case Study H2O Powers Intelligent Product Recommendation Engine at Transamerica Case Study Summary For a financial services firm like Transamerica, sales and marketing efforts can be complex and challenging, with

More information

Apache Hadoop in the Datacenter and Cloud

Apache Hadoop in the Datacenter and Cloud Apache Hadoop in the Datacenter and Cloud The Shift to the Connected Data Architecture Digital Transformation fueled by Big Data Analytics and IoT ACTIONABLE INTELLIGENCE Cloud and Data Center IDMS Relational

More information

Managing explosion of data. Cloudera, Inc. All rights reserved.

Managing explosion of data. Cloudera, Inc. All rights reserved. Managing explosion of data 1 Customer experience expectations are converging on the brand, not channel Consistent across all channels and lines of business Contextualized to present location and circumstances

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

Charter Global. Digital Solutions and Consulting Services. Digital Solutions. QA Testing

Charter Global. Digital Solutions and Consulting Services. Digital Solutions. QA Testing Charter Global Digital Solutions and Consulting Services IT Strategy and Assessment Digital Solutions Big Data Mobility Application Development QA Testing Infrastructure Management Services Professional

More information

Decisyon App Composer (DAC) Technology Overview

Decisyon App Composer (DAC) Technology Overview Decisyon App Composer (DAC) Technology Overview Decisyon App Composer is an agnostic Industrial IoT (IIOT) Visual Rapid Development Platform with rich native microservices. Along with services from different

More information

Getting Started: Modeling the Structure and Operations of Big Data

Getting Started: Modeling the Structure and Operations of Big Data Getting Started: Modeling the Structure and Operations of Big Data Session BG2, February 11, 2019 Deepesh Chandra, Associate Partner & Senior Expert Pierre-Arnaud Klaskala, Associate Partner, Director

More information

Data Integration for the Real-Time Enterprise

Data Integration for the Real-Time Enterprise Solutions Brief Data Integration for the Real-Time Enterprise Business Agility in a Constantly Changing World Executive Summary For companies to navigate turbulent business conditions and add value to

More information

Bringing the Power of SAS to Hadoop Title

Bringing the Power of SAS to Hadoop Title WHITE PAPER Bringing the Power of SAS to Hadoop Title Combine SAS World-Class Analytics With Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities ii Contents Introduction... 1 What

More information

Advancing your Big Data Strategy

Advancing your Big Data Strategy Welcome # T C 1 8 Advancing your Big Data Strategy Robbin Cottiss Strategic Customer Consultant Tableau Vindy Krishnan Senior Product Manager Tableau You Know Me And Me DATA TABLEAU AND Audience Poll How

More information

Why an Open Architecture Is Vital to Security Operations

Why an Open Architecture Is Vital to Security Operations White Paper Analytics and Big Data Why an Open Architecture Is Vital to Security Operations Table of Contents page Open Architecture Data Platforms Deliver...1 Micro Focus ADP Open Architecture Approach...3

More information

Governing Big Data and Hadoop

Governing Big Data and Hadoop Governing Big Data and Hadoop Philip Russom Senior Research Director for Data Management, TDWI October 11, 2016 Sponsor 2 Speakers Philip Russom Senior Research Director for Data Management, TDWI Jean-Michel

More information

SAP Predictive Analytics Suite

SAP Predictive Analytics Suite SAP Predictive Analytics Suite Tania Pérez Asensio Where is the Evolution of Business Analytics Heading? Organizations Are Maturing Their Approaches to Solving Business Problems Reactive Wait until a problem

More information

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

E-guide Hadoop Big Data Platforms Buyer s Guide part 1 Hadoop Big Data Platforms Buyer s Guide part 1 Your expert guide to Hadoop big data platforms for managing big data David Loshin, Knowledge Integrity Inc. Companies of all sizes can use Hadoop, as vendors

More information

The Internet of Things Wind Turbine Predictive Analytics. Fluitec Wind s Tribo-Analytics System Predicting Time-to-Failure

The Internet of Things Wind Turbine Predictive Analytics. Fluitec Wind s Tribo-Analytics System Predicting Time-to-Failure The Internet of Things Wind Turbine Predictive Analytics Fluitec Wind s Tribo-Analytics System Predicting Time-to-Failure Big Data and Tribo-Analytics Today we will see how Fluitec solved real-world challenges

More information

The Information Integration Platform

The Information Integration Platform The Information Integration Platform IIS Product and Technology Vision & Roadmap Bob Zurek Director, Advanced Technologies and Product Strategy Information Integration Solutions IBM Software Group IBM

More information

By 2020, more than half of major new business processes and systems will incorporate some element of the IoT.

By 2020, more than half of major new business processes and systems will incorporate some element of the IoT. Trends in Analytics By 2020, more than half of major new business processes and systems will incorporate some element of the IoT. Gartner Unexpected Implications Arising From the Internet of Things report

More information

Real-time Streaming Insight & Time Series Data Analytic For Smart Retail

Real-time Streaming Insight & Time Series Data Analytic For Smart Retail Real-time Streaming Insight & Time Series Data Analytic For Smart Retail Sudip Majumder Senior Director Development Industry IoT & Big Data 10/5/2016 Economic Characteristics of Data Data is the New Oil..then

More information

REDEFINE BIG DATA. Zvi Brunner CTO. Copyright 2015 EMC Corporation. All rights reserved.

REDEFINE BIG DATA. Zvi Brunner CTO. Copyright 2015 EMC Corporation. All rights reserved. 1 REDEFINE BIG DATA Zvi Brunner CTO 2 2020: A NEW DIGITAL WORLD 30B DEVICES 7B PEOPLE Millions OF NEW BUSINESSES Source: Gartner Group, 2014 DIGITIZATION IS ALREADY BEGINNING PRECISION FARMING DRESS THAT

More information

SAP BusinessObjects Business Intelligence

SAP BusinessObjects Business Intelligence SAP BusinessObjects Business Intelligence Increase Business Agility with the Right Information, When & Where it is Needed Disruptive innovation has resulted in a revolutionary shift in the way enterprises

More information

Bringing Big Data to Life: Overcoming The Challenges of Legacy Data in Hadoop

Bringing Big Data to Life: Overcoming The Challenges of Legacy Data in Hadoop 0101 001001010110100 010101000101010110100 1000101010001000101011010 00101010001010110100100010101 0001001010010101001000101010001 010101101001000101010001001010010 010101101 000101010001010 1011010 0100010101000

More information

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop Simplifying the Process of Uploading and Extracting Data from Apache Hadoop Rohit Bakhshi, Solution Architect, Hortonworks Jim Walker, Director Product Marketing, Talend Page 1 About Us Rohit Bakhshi Solution

More information

Building data-driven applications with SAP Data Hub and Amazon Web Services

Building data-driven applications with SAP Data Hub and Amazon Web Services Building data-driven applications with SAP Data Hub and Amazon Web Services Dr. Lars Dannecker, Steffen Geissinger September 18 th, 2018 Cross-department disconnect Cross-department disconnect Cross-department

More information