Got Data Silos? Automate Data Ingestion Into Isilon In Support Of Analytics

Similar documents
Your Top 5 Reasons Why You Should Choose SAP Data Hub INTERNAL

Architecture Overview for Data Analytics Deployments

Microsoft Azure Essentials

Azure Data Analytics & Machine Learning Seminar. Daire Cunningham: BI Practice Area Manager

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

20775A: Performing Data Engineering on Microsoft HD Insight

REDEFINE BIG DATA. Zvi Brunner CTO. Copyright 2015 EMC Corporation. All rights reserved.

Pentaho 8.0 and Beyond. Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara

In search of the Holy Grail?

5th Annual. Cloudera, Inc. All rights reserved.


From Data Deluge to Intelligent Data

20775 Performing Data Engineering on Microsoft HD Insight

Cloudera, Inc. All rights reserved.

20775A: Performing Data Engineering on Microsoft HD Insight

20775: Performing Data Engineering on Microsoft HD Insight

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop

Paul Chang Senior Consultant, Data Scientist, IBM Cloud tw.ibm.com

Cloudera Data Science and Machine Learning. Robin Harrison, Account Executive David Kemp, Systems Engineer. Cloudera, Inc. All rights reserved.

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Building data-driven applications with SAP Data Hub and Amazon Web Services

Hortonworks Connected Data Platforms

Business is being transformed by three trends

BIG DATA AND HADOOP DEVELOPER

Meta-Managed Data Exploration Framework and Architecture

Alexander Klein. ETL meets Azure

Trifacta Data Wrangling for Hadoop: Accelerating Business Adoption While Ensuring Security & Governance

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1

Building a Single Source of Truth across the Enterprise An Integrated Solution

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand

Modern Analytics Architecture

Embark on Your Data Management Journey with Confidence

AZURE HDINSIGHT. Azure Machine Learning Track Marek Chmel

Transforming Analytics with Cloudera Data Science WorkBench

Managing explosion of data. Cloudera, Inc. All rights reserved.

Cask Data Application Platform (CDAP) Extensions

ETL challenges on IOT projects. Pedro Martins Head of Implementation

Who is Databricks? Today, hundreds of organizations around the world use Databricks to build and power their production Spark applications.

Architecting an Open Data Lake for the Enterprise

The Importance of good data management and Power BI

Outline of Hadoop. Background, Core Services, and Components. David Schwab Synchronic Analytics Nov.

Machine Learning For Enterprise: Beyond Open Source. April Jean-François Puget

Taking Advantage of Cloud Elasticity and Flexibility

PORTFOLIO AND TECHNOLOGY DIRECTION ARMISTEAD SAPP & RANDY GUARD

Copyright 2014, Oracle and/or its affiliates. All rights reserved. 2

Make Business Intelligence Work on Big Data

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

Analytics for All Data

How In-Memory Computing can Maximize the Performance of Modern Payments

Bringing the Power of SAS to Hadoop Title

THE CIO GUIDE TO BIG DATA ARCHIVING. How to pick the right product?

EMC IT Big Data Analytics Journey. Mahmoud Ghanem Sr. Systems Engineer

Adobe Deploys Hadoop as a Service on VMware vsphere

CREATING A FOUNDATION FOR BUSINESS VALUE

Cask Data Application Platform (CDAP)

This tutorial helps you to learn all the fundamentals of Talend tool for data integration and big data with examples.

Modernizing Your Data Warehouse with Azure

Optimal Infrastructure for Big Data

Spark and Hadoop Perfect Together

Redefine Big Data: EMC Data Lake in Action. Andrea Prosperi Systems Engineer

Data Ingestion in. Adobe Experience Platform

Analytics for All Your Data: Cloud Essentials. Pervasive Insight in the World of Cloud

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB

Secure information access is critical & more complex than ever

Datameer for Data Preparation: Empowering Your Business Analysts

Microsoft Big Data. Solution Brief

Jason Virtue Business Intelligence Technical Professional

Advanced Analytics in Azure

IBM WebSphere Information Integrator Content Edition Version 8.2

Oracle Infinity TM. Key Components

Big and Fast Data: The Path To New Business Value

Actionable Insights with PI Integrators

Pentaho 8.0 Overview. Pedro Alves

Mastering Your Data Power Your Connected Business With Your Master Data. Scott Walz, Sales Engineer June 27, 2018

Two offerings which interoperate really well

Analytics in the Digital Economy data, experience, ideas & people. Juergen Hagedorn, Viktor Kehayov Product Management, SAP Analytics March 2017

Hadoop and Analytics at CERN IT CERN IT-DB

Cisco Connected Asset Manager for IoT Intelligence

BIG DATA TRANSFORMS BUSINESS. Copyright 2013 EMC Corporation. All rights reserved.

MicroStrategy 10. Adam Leno Technical Architect NDM Technologies

Organizations do not need a Big Data Strategy; they need a Business Strategy that incorporates Big Data

Pentaho Technical Overview. Max Felber Solution Engineer September 22, 2016

Education Course Catalog Accelerate your success with the latest training in enterprise analytics, mobility, and identity intelligence.

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

Adobe and Hadoop Integration

Processing Big Data with Pentaho. Rakesh Saha Pentaho Senior Product Manager, Hitachi Vantara

TechValidate Survey Report. Converged Data Platform Key to Competitive Advantage

Boomi Basics: Going Beyond Integration with APIs, Data Management and Workflow Automation

POWER NEW POSSIBILITIES

EMC Big Data: Become Data-Driven

Enabling Self-Service Analytics Across The UDA With Teradata AppCenter

Data Analytics. Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC

TECHNOLOGY PLATFORM STRATEGY

ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS)

The Internet of Everything and the Research on Big Data. Angelo E. M. Ciarlini Research Head, Brazil R&D Center

Copyright 2012 EMC Corporation. All rights reserved.

Making Data Science Simple

By 2020, more than half of major new business processes and systems will incorporate some element of the IoT.

Store. Analyze. Preserve. Big Data Assets

Transcription:

Got Data Silos? Automate Data Ingestion Into Isilon In Support Of Analytics

Key takeaways Analytic Insights Module for self-service analytics Automate data ingestion into Isilon Data Lake Three methods to ingest data from external data sources Fine-grained data control and governance 2

Agenda Introduction to Analytics Insights Module Personal On-demand Workspace Demo Automate Isilon HDFS Provisioning Data Ingestion Methods Fine-grained Data Access Policies Q&A 3

The new digital customer Rising and continuously changing expectations around experiences Always available Real-time updates Intelligent interactions Intelligent applications are the new face of business 4

Challenges to realizing business value from your data 80% 60% Time spent discovering and preparing data 1 Data Analytics projects failing to move past exploratory stage 3 25% 41% 90% 71% Of unstructured data is used 2 Of structured data is used 2 Silo d data analytic efforts 5 Employees have access to data they shouldn t 4 5 1 Boost Your Business Insights By Converging Big Data And BI, March 25, 2015 4 Corporate Data: A Protected Asset or a Ticking Time Bomb?, Ponemon Institute, Dec 2014 2 Business Technographics, Global Data and Analytics Survey, Forrester Research, 2014 5 Information Innovation Key Overview, April 22, 2014 3 Predicts 2015: Big Data Challenges Move From Technology to the Organization, Gartner, 28 November 2014

Analytic Insights Module Increases your speed through the virtuous cycle Analyze via self-service to create new insights Act on insights for monetizing new opportunities 6 Gather the right data with deep awareness

ANALYTIC INSIGHTS MODULE Personal On-demand Workspace Access data, create analytics, and collaborate ANALYZE GET TO WORK IN MINUTES 7

Analytic Insights Module engineered for speed Platform manager Coming Soon! 8

Logical View of Analytic Insights Module Global UI app Pivotal CF Attivio Bedrock Analytic Insights Module Controller Web client WORKSPACE DAC applications User tools & apps Rabbit MQ DAC Services Data scientist SSH Data containers [Hadoop databases etc.] DAC client services ISILON Published data/apps 9

Automate Isilon HDFS provisioning for Hadoop ACCESS ZONE: creates an access zone within Isilon IP POOL: creates an IP pool for the new zone HADOOP USERS: creates Hadoop users within the access zone and assigns GID and UID USER MAPPING: maps hdfs user to root DIRECTORY CONFIGURATION: creates required directory structure 10

Isilon NameNode registration The Controller deploys Cloudera Manager or Ambari and registers Isilon as a Namenode 11

Data ingest methods Analytic Insights Module supports three methods to ingest data from external data sources 1 Workspace 1 Workspace 2... Workspace n 2 3 Data ingest to ODC Ability for data engineers bring commonly used sets to ODC, thus avoiding multiple access requests from sources External data DSD Bedrock ODC Analytic Insights Module 12

Data ingestion into a workspace data path External data sources control path DATA USER Analytic Insights Module 1 Access data source definition & build marts Access Attivio DSD from workspace UI using user credential via LDAP integration EDW Enterprise Apps Social media Ingest workflow Bedrock 3 2 Provide metadata 4 Create & execute workflow DSD Register metadata DAC Browse discovered data sources, review sample data, semantic search of data sources Choose data sets and prepare a custom data model Auto-provision custom data model to a data container in workspace Cloud SaaS Data containers No operating knowledge of Bedrock required Devices & sensors Workbench VM Data store Ingest process runs on-demand 13

Data source connection and ingestion Direct upload Single.csv,.xml, and.zip files CMS CONNECTION Data Profiling Connectors Small/ unstructured data sources All connection types perform Metadata Collection and Content Ingestion 14 DSD Spiders Large, complex, structured sources (e.g., data warehouse) Ingest only a sample of the content from each detected table

Unify Automatically generates data models Correlates all structured data and unstructured content Enables Dynamic Modeling to create a data mart with multiple sources 15

Provision dataset to workspace data container SalesData SalesData Attivio DSD triggers Bedrock workflow to provision any dataset to user s workspace User can select destination workspace data container to provision into and Bedrock will execute the ingestion process 16

Trigger Bedrock ingestion workflow from DSD Ingestion Driver is available out-of-the-box in Bedrock Workflow is triggered by Attivio provisioner plug-in Identifies the target container Branches to Hive/MongoDB/MySQL ingestion depending on request 17

Dataset is automatically available in workspace Container URL provides a way to access the data from within the workbench data container SalesData appears in user workspace automatically User is able to view the URL to access the data using their own credential which is integrated with LDAP 18

Dirt road ingestion into workspace BYOD External data sources EDW Enterprise Apps data path control path DATA USER Workspace in Analytic Insights Module 1 2 Register data source(s) in DAC Ingest data from external sources DAC Use ingestion applications from Hadoop clusters to source external data Build ingest workflow pipelines to gather and transform data before loading to workspace containers Social media Support BYOD use cases wrangling in real-time or batch data feeds Cloud SaaS Hadoop Data store Develop machine learning algorithms using Spark platform on Hadoop Work bench VM Devices & sensors Data store 19

Data ingestion into the Data Catalog External data sources EDW Enterprise Apps Control Path Data Path Analytic Insights Module Design & run ingest workflow 1 Bedrock DATA ENGINEER DAC Create batch or real-time process in Zaloni and execute to ingest data from external data sources Ability to support complex data transformations in Zaloni workflows Social media 2 Execute workflow 3 Register metadata Stage may contain look ups or master data for enrichment Ingest workflow Ability to add ingestion rules and/or definitions to data elements Cloud SaaS Stage Ingest workflow ODC Monitor and Administer Zaloni workflow runs Devices & sensors Data store 20

Define file ingest in Bedrock Test[0-9]+.dat File pattern associated with the source 21

Workflow Designer Orchestrate set of actions Supports simple/complex flows Hive, Spark, Spark-SQL, Shell, Java, Mapreduce generic actions Built-in actions for CDC, Watermarking, Tokenization, Avro conversion, Parquet conversion 22

Transformation Transformation library Build transformations using drag-and-drop interface Supports Spark for efficient transformation Integrates with workflow module Metadata entity data flows 23

Data governor policy engine All data requests are intercepted and forwarded to policy engine. Policy engine evaluates request against policies and returns modified request that gives user policy-compliant results. Full audit trail at the user and data level is built automatically. Business users, data scientists, developers Active directory Applications 2 USER REQUEST COMPLIANT RESULTS 5 Security admins 1 BlueTalon policy console BlueTalon policy engine 3 USER REQUEST 4 MODIFIED, COMPLIANT REQUEST ODC 6 BlueTalon audit engine Security admins 24 BlueTalon Enforcement Points

Fine-grained data access policies 123-45-6789 123-45-XXXX Create rule to mask customer s Social Security Number based on user roles 25

Analytic Insights Module Increases your speed through the virtuous cycle Analyze via self-service to create new insights Act on insights for monetizing new opportunities 26 Gather the right data with deep awareness

Questions? 27

Realize your next steps Attend Breakout Sessions Secure IT's Seat At The Table: Deliver The Business Self-Service Data Analytics Wed. (5/10), 1:30 PM - 2:30 PM, San Polo 3405 IoT Analytics: A Modern Manufacturing Surveillance Use Case Wed. (5/10), 12:00 PM - 1:00 PM, Delfino 4003 See the Blueprint solutions in action at the Expo Kiosks and Customer Presentations in the Converged Platforms and Solutions booth #872 Engage with our Dell EMC Big Data and Cloud subject matter experts to learn more Visit dellemc.com/aim 28