MapR Pentaho Business Solutions The Benefits of a Converged Platform to Big Data Integration Tom Scurlock Director, WW Alliances and Partners, MapR
Key Takeaways 1. We focus on business values and business outcomes, Pentaho and MapR technologies follow the solutions. 2. Business transformation is complex. Companies need to evaluate how data can be used to generate revenue streams using a converged data platform. 3. Companies that invest in one converged data platform for large data volumes, variety and velocity will increase revenues, business agility, productivity in today s digital economy.
Agenda Why the data used in business transformation is complex? Many points of solutions on multiple clusters MapR Converged data platform Infrastructure: Optimized resources consumption MapR builds a breakthrough technology Data reference architecture Next generation of enterprise applications: Enable transformation through converged applications To-Be enterprise applications architecture Pentaho And MapRsolutions
Why the Data Used In Business Transformation is Complex? Data resides in multiple places Data is messy and incomplete Data is inconsistent Changing regulatory requirements Data is structured, Semi- Structured and Unstructured
Next Generation of Enterprise Converged Applications ANALYTICAL APPLICATIONS Business insight Next-Gen Applications OPERATIONAL APPLICATIONS Business performance Complete Access to Real-time and Historical Data in One Platform
Business Transformation Challenges Market Forces IT Budget C-Levels Industry Initiatives Cloud Mobile Big Data Fraud Detection in Real Time 360 Customers Visibility Smart Logistics E-Commerce IoT Predictive Maintenance Omni-Channel Optimization ESB / Data Integration Platform Data Consumption HDFS API Hadoop & Spark Cluster REST-API Classic Data Warehouse HBASE API NoSQL JSON API Document DB JMS Message Middleware Application Server ODBC/JDBC ODBC IBM Mainframe BUSINESS CONCERN: DATA BECOMES EXPENSIVE I/O SPEED SCALE/PERF. SECURITY H.A. COMPLEXITY SINGLE TENANT CLUSTER SPRAWL Expensive to Stitch Fragile Limitations for Speed, Scale, Reliability Data Synchronization Application Integration B2B Integration File Integration Process Integration Data Integration
Many Point Solutions on Multiple Clusters Converged Data Platform on A Single Cluster MAPR CONVERGED DATA PLATFORM Hadoop cluster Stream processing Messaging platform Existing Enterprise Applications Batch & Interactive Analytics Intelligent Applications Search server CLOUD-SCALE DATA STORE ANALYTICS & ML ENGINES OPERATIONAL DATABASE CONVERGED DATA PLATFORM EVENT DATA STREAMS Classic data warehouse Document Database NoSQL database High Availability Real Time Unified Security Multi-tenancy Disaster Recovery Global Namespace ON-PREMISE, IN THE CLOUD, HYBRID Data Center IoT/Edge 2017 MapR Technologies 10 Connected, not converged Separate solutions per data type Operational concerns Expensive to operate Data replication, data movement Limited in scale Engineered as single platform Files, Tables, Documents, and Streams Enterprise-Grade Capabilities Runs at a lower cost Converged application development platform Scale-out to any workload
MapR Enterprise Converged Data Platform Supply Chain Visibility Omni-Channel Customer Engagement Fraud Detection In Real Time Predictive Maintenance Smart Logistics Legacy Apps. (ERP, CRM, BI) Container Apps. ML, Streaming Analytics Operational Intelligence Complex Event Processing in Real-Time Business Rules Engine, Process Automation, Identify Alerts, Patterns. REST API HDFS API POSIX, NFS HBase API JSON API Kafka API JDBC/ODBC Event Enabled Intelligent Applications CLOUD-SCALE DATA STORE ANALYTICS & ML ENGINES OPERATIONAL DATABASE CONVERGED DATA PLATFORM EVENT DATA STREAMS (PUB/SUB) High Availability Real Time Unified Security Multi-tenancy Disaster Recovery Global Namespace On-Premise, In the Cloud, Hybrid Data Producers
Optimized Resource Consumption Hadoop: Every layer contends for more CPU and memory Kafka (separate cluster) Java Virtual Machine HDFS (append-only) Java Virtual Machine HBase (excessive writes) Java Virtual Machine Linux File System (general purpose, slower than MapR-FS, leaves HA up to other engines) X X Replace with less I/O and RAM consumption Eliminate layer Replace with full read-write Eliminate layer Replace with speed, connectivity, HA/DR MapR: Efficient architecture frees up resources: shared HA, DR, and I/O systems MapR-FS + MapR-DB + MapR Streams Fast, Efficient, Direct I/O Storage Hardware Storage Hardware
MapR Built with Breakthrough Technology Innovative architecture delivers uncompromising scale, speed and availability Optimized for Speed Optimized for Scale Enables high scale processing by organizing underlying data into large distributed containers to scale to trillions of files. Supports parallel processing of large scale analytics and machine learning across data. Optimized for Availability Provides advanced capabilities including selfhealing and disaster recovery to support continuous data access.
MapR Data Reference Architecture
Next Generation of Enterprise Converged Applications ANALYTICAL APPLICATIONS Business insight Next-Gen Applications OPERATIONAL APPLICATIONS Business performance Complete Access to Real-time and Historical Data in One Platform
To-Be Enterprise Application Architecture: Globally Distributed Data Centers Worldwide Central Data Processing & Aggregation Data Ingesting Stream Topic Replicating Other Data Sources Streaming Stream Topic Stream Topic Ad-hoc analysis Real-time analysis Reporting
Journey with MapR Converged Data Platform Adoption + Business Value Stage 1 Infrastructure Agility: Offers ETL Offloading And data optimization to reduce TCO Stage 2 Applications Agility: Delivers event-enabled apps., monitoring business decisions in real-time. Stage 3 Business Agility: Accelerates large scale deployment. On-Prem/Cloud: Data Replication, HA between data centers in real-time. Micro-Services: Docker, Containers, Images, Kubernates, Docker Swam. Next-Gen Apps: Operational Intelligence, Streaming Analytics, Machine Learning, IoT Predictive Maintenance. Business Intelligence: Re-use of existing BI apps. (Tableau, MicroStrategies, Cognos, Crystal, Custom Apps., etc). Data Fabric: Fast TCO reduction in data appliances and servers, Data Off-loading & Optimization. Benefits: Unified Files, Tables, Events Streams under one admin. On-Premise, Any Clouds Next Gen Apps, Micro-Services Data Normalization for structure, semistructure & non-structure data from multiple API sources Multi-tenancy Supports Hadoop Distributed File System & Eco-System. Share resources Data Locality Supports No-SQL, HBASE DB Massive Parallel Processing Fully Read/Write in real-time Extreme Data Scalability, HA, Security, High Performance & Relibility. Geo-Replication MapR - Converged Data Platform
Pentaho and MapR Data integration, orchestration and analytics DNA Variant Cloud Annotation EHR/EMR Modality Data PACS/Imaging Clinical Data Blend & Ingest Validate Cleanse Standardize Process & Refine MapReduce Spark Machine Learning Deliver Analyze Optimize Orchestrate Analytical Database Virtualized Data Reports Visualization Discovery Predictive Patient Satisfaction Administrative Converged Data Platform Redshift Data as a Service Embedded Applications Financial Data