Got Data Silos? Automate Data Ingestion Into Isilon In Support Of Analytics
Key takeaways Analytic Insights Module for self-service analytics Automate data ingestion into Isilon Data Lake Three methods to ingest data from external data sources Fine-grained data control and governance 2
Agenda Introduction to Analytics Insights Module Personal On-demand Workspace Demo Automate Isilon HDFS Provisioning Data Ingestion Methods Fine-grained Data Access Policies Q&A 3
The new digital customer Rising and continuously changing expectations around experiences Always available Real-time updates Intelligent interactions Intelligent applications are the new face of business 4
Challenges to realizing business value from your data 80% 60% Time spent discovering and preparing data 1 Data Analytics projects failing to move past exploratory stage 3 25% 41% 90% 71% Of unstructured data is used 2 Of structured data is used 2 Silo d data analytic efforts 5 Employees have access to data they shouldn t 4 5 1 Boost Your Business Insights By Converging Big Data And BI, March 25, 2015 4 Corporate Data: A Protected Asset or a Ticking Time Bomb?, Ponemon Institute, Dec 2014 2 Business Technographics, Global Data and Analytics Survey, Forrester Research, 2014 5 Information Innovation Key Overview, April 22, 2014 3 Predicts 2015: Big Data Challenges Move From Technology to the Organization, Gartner, 28 November 2014
Analytic Insights Module Increases your speed through the virtuous cycle Analyze via self-service to create new insights Act on insights for monetizing new opportunities 6 Gather the right data with deep awareness
ANALYTIC INSIGHTS MODULE Personal On-demand Workspace Access data, create analytics, and collaborate ANALYZE GET TO WORK IN MINUTES 7
Analytic Insights Module engineered for speed Platform manager Coming Soon! 8
Logical View of Analytic Insights Module Global UI app Pivotal CF Attivio Bedrock Analytic Insights Module Controller Web client WORKSPACE DAC applications User tools & apps Rabbit MQ DAC Services Data scientist SSH Data containers [Hadoop databases etc.] DAC client services ISILON Published data/apps 9
Automate Isilon HDFS provisioning for Hadoop ACCESS ZONE: creates an access zone within Isilon IP POOL: creates an IP pool for the new zone HADOOP USERS: creates Hadoop users within the access zone and assigns GID and UID USER MAPPING: maps hdfs user to root DIRECTORY CONFIGURATION: creates required directory structure 10
Isilon NameNode registration The Controller deploys Cloudera Manager or Ambari and registers Isilon as a Namenode 11
Data ingest methods Analytic Insights Module supports three methods to ingest data from external data sources 1 Workspace 1 Workspace 2... Workspace n 2 3 Data ingest to ODC Ability for data engineers bring commonly used sets to ODC, thus avoiding multiple access requests from sources External data DSD Bedrock ODC Analytic Insights Module 12
Data ingestion into a workspace data path External data sources control path DATA USER Analytic Insights Module 1 Access data source definition & build marts Access Attivio DSD from workspace UI using user credential via LDAP integration EDW Enterprise Apps Social media Ingest workflow Bedrock 3 2 Provide metadata 4 Create & execute workflow DSD Register metadata DAC Browse discovered data sources, review sample data, semantic search of data sources Choose data sets and prepare a custom data model Auto-provision custom data model to a data container in workspace Cloud SaaS Data containers No operating knowledge of Bedrock required Devices & sensors Workbench VM Data store Ingest process runs on-demand 13
Data source connection and ingestion Direct upload Single.csv,.xml, and.zip files CMS CONNECTION Data Profiling Connectors Small/ unstructured data sources All connection types perform Metadata Collection and Content Ingestion 14 DSD Spiders Large, complex, structured sources (e.g., data warehouse) Ingest only a sample of the content from each detected table
Unify Automatically generates data models Correlates all structured data and unstructured content Enables Dynamic Modeling to create a data mart with multiple sources 15
Provision dataset to workspace data container SalesData SalesData Attivio DSD triggers Bedrock workflow to provision any dataset to user s workspace User can select destination workspace data container to provision into and Bedrock will execute the ingestion process 16
Trigger Bedrock ingestion workflow from DSD Ingestion Driver is available out-of-the-box in Bedrock Workflow is triggered by Attivio provisioner plug-in Identifies the target container Branches to Hive/MongoDB/MySQL ingestion depending on request 17
Dataset is automatically available in workspace Container URL provides a way to access the data from within the workbench data container SalesData appears in user workspace automatically User is able to view the URL to access the data using their own credential which is integrated with LDAP 18
Dirt road ingestion into workspace BYOD External data sources EDW Enterprise Apps data path control path DATA USER Workspace in Analytic Insights Module 1 2 Register data source(s) in DAC Ingest data from external sources DAC Use ingestion applications from Hadoop clusters to source external data Build ingest workflow pipelines to gather and transform data before loading to workspace containers Social media Support BYOD use cases wrangling in real-time or batch data feeds Cloud SaaS Hadoop Data store Develop machine learning algorithms using Spark platform on Hadoop Work bench VM Devices & sensors Data store 19
Data ingestion into the Data Catalog External data sources EDW Enterprise Apps Control Path Data Path Analytic Insights Module Design & run ingest workflow 1 Bedrock DATA ENGINEER DAC Create batch or real-time process in Zaloni and execute to ingest data from external data sources Ability to support complex data transformations in Zaloni workflows Social media 2 Execute workflow 3 Register metadata Stage may contain look ups or master data for enrichment Ingest workflow Ability to add ingestion rules and/or definitions to data elements Cloud SaaS Stage Ingest workflow ODC Monitor and Administer Zaloni workflow runs Devices & sensors Data store 20
Define file ingest in Bedrock Test[0-9]+.dat File pattern associated with the source 21
Workflow Designer Orchestrate set of actions Supports simple/complex flows Hive, Spark, Spark-SQL, Shell, Java, Mapreduce generic actions Built-in actions for CDC, Watermarking, Tokenization, Avro conversion, Parquet conversion 22
Transformation Transformation library Build transformations using drag-and-drop interface Supports Spark for efficient transformation Integrates with workflow module Metadata entity data flows 23
Data governor policy engine All data requests are intercepted and forwarded to policy engine. Policy engine evaluates request against policies and returns modified request that gives user policy-compliant results. Full audit trail at the user and data level is built automatically. Business users, data scientists, developers Active directory Applications 2 USER REQUEST COMPLIANT RESULTS 5 Security admins 1 BlueTalon policy console BlueTalon policy engine 3 USER REQUEST 4 MODIFIED, COMPLIANT REQUEST ODC 6 BlueTalon audit engine Security admins 24 BlueTalon Enforcement Points
Fine-grained data access policies 123-45-6789 123-45-XXXX Create rule to mask customer s Social Security Number based on user roles 25
Analytic Insights Module Increases your speed through the virtuous cycle Analyze via self-service to create new insights Act on insights for monetizing new opportunities 26 Gather the right data with deep awareness
Questions? 27
Realize your next steps Attend Breakout Sessions Secure IT's Seat At The Table: Deliver The Business Self-Service Data Analytics Wed. (5/10), 1:30 PM - 2:30 PM, San Polo 3405 IoT Analytics: A Modern Manufacturing Surveillance Use Case Wed. (5/10), 12:00 PM - 1:00 PM, Delfino 4003 See the Blueprint solutions in action at the Expo Kiosks and Customer Presentations in the Converged Platforms and Solutions booth #872 Engage with our Dell EMC Big Data and Cloud subject matter experts to learn more Visit dellemc.com/aim 28