Pentaho 8.0 and Beyond Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara
Safe Harbor Statement The forward-looking statements contained in this document represent an outline of our current intended product direction. It is provided for information purposes only and is not a commitment to deliver any new or enhanced product or functionality, or that we will pursue the product direction described. Facts and circumstances may occur which may impact current plans, resulting in changes to the information in this presentation. This information is current only as of the date it is made and should not be relied upon in making purchasing decisions. The development, release (if at all), and timing of any features or functionality described for the Pentaho products remains at the sole discretion of Pentaho.
Pentaho 8.0 and Beyond 1 Product Vision 2 Pentaho 8.0 3 Product Roadmap
Product Vision
The Power of Three HITACHI DATA SYSTEMS > Content platform > Storage solutions PENTAHO > Data Integration > Business Analytics HITACHI INSIGHT GROUP > Lumada IoT
Pentaho Business Analytics Platform Data Engineer Data Analyst / Data Scientist Business Analyst Consumer Production Reporting Interactive Query and Analysis Custom and Self-Service Dashboards Pentaho Data Integration Data Preparation Integrated Machine Learning learning OPEN AND EMBEDDABLE Operational Data Big Data Data Stream Public/Private Clouds
Future Vision: A Single Consistent Experience Data Engineering Data Prep Analytics Ingestion Processing Blending Data Delivery Data Discovery / Analysis Analysis & Dashboards Administration Security Lifecycle Management Data Provenance Dynamic Data Pipeline Monitoring Automation
Pentaho 8.0
Introducing Pentaho 8.0 Challenge #1 Data volumes and velocity are growing exponentially Pentaho 8.0 Broadens connectivity to streaming data sources Connect to Kafka streams Stream processing with Spark Big data security with Knox Challenge #2 Processing and storage resources are constrained Pentaho 8.0 Optimizes processing resources Enhanced Adaptive Execution (AEL) Native Avro and Parquet handling Worker nodes for Scale-out Challenge #3 Shortage of Big Data talent and lack of productivity Pentaho 8.0 Boosts team productivity across the pipeline Data explorer filters Improved repository UX Extended operations mart
Streaming for Time Sensitive Insight Enable use cases that require real-time processing, monitoring and aggregation Real-time device monitoring Log-file aggregation Notifications And more NEW in Pentaho 8.0 ü Kafka Producer Step ü Kafka Consumer Step ü Get records from stream Step ü Spark streaming via AEL
Pentaho 7.1 Adaptive Execution for Spark PDI Pentaho Kettle ü No Coding ü Build Once ü Execute on Any* Engine *Currently Available Engines
Enhanced Adaptive Execution Simplified setup Eliminated Zookeeper component HADOOP CLUSTER Reduced number of setup steps Hardened deployment Fail-over at the edge PDI Client AEL-Spark Daemon on Edge Nodes Spark/Hadoop Processing Nodes Spark Executors Kerberos impersonation for client More flexible Hadoop/Spark Compatible Storage Cluster Support multiple run configurations Customize cluster settings per job type AEL-Spark Engine (Spark Driver) HDFS Azure Storage Amazon S3 Etc
Worker Nodes for Scaling Out Scale work items across multiple nodes (containers) Easily add and remove resources as required Monitor and balance changing workloads Deploy on premise, cloud and hybrid Distribute and Scale Worker Node (a) Worker Node (b) Worker Node (c ) NEW in Pentaho 8.0 ü Container framework ü Orchestration framework ü Node monitoring ü Enhanced HA implementation
Worker Nodes Architecture WORKER NODES Orchestration Framework Orchestration (Scheduler, monitoring, security, etc.) Powered by Pentaho Clients Master (Working) Master (Standby) Controller (HA) Master (Standby) Pentaho Server Container Framework Pentaho Repository WN 1 e.g. KJB WN 2 e.g. KTR WN n Executor
Pentaho 7.0 Data Explorer Access visualizations during data prep for inspection and prototyping
Data Explorer Filters Enhanced data inspection in PDI Identify data to be cleaned or removed Deliver data to the business more quickly ENHANCED in Pentaho 8.0 ü Numeric filters ü String filters ü Include/Exclude data points
Pentaho 8.0 Complete Data Integration Filters in Data Explorer for enhanced data inspection during prep New PDI Repository Dialogs for better usability Run Configurations for Jobs for seamless user experience Big Data Stream Data Processing to simplify near real time integration with Kafka Enhanced AEL for reliability, performance, and security Big Data File Formats to support crucial Hadoop use cases Big Data Security with HDP Knox Gateway VFS Improvements for named Hadoop clusters Enterprise Platform Worker Nodes Scale-Out to drive superior agility and TCO for enterprises Ruby Theme new platform branding Additional Items Ops Mart for Oracle, MySQL, SQL Server Big Data Sandbox VM updates Platform password security improvements PDI Mavenization for infra alignment Documentation improvements on help.pentaho.com
Product Roadmap
Roadmap Initiatives Visual Data Experience Data Exploration Visual Data Prep Embedded Analytics Data Catalog Big Data Processing Adaptive Execution Spark Execution Stream Processing Machine Learning Enterprise Platform Scale-out Deployment Metadata Management Operations Management Cloud Deployment EMERGING TRENDS AND TECHNOLOGY Advanced Analytics Real-time PENTAHO FOUNDATIONAL INVESTMENT AREAS
Strengthening the Bridge Between Data and Insight ü Visual data inspection ü Intuitive data prep ü Advanced visualization DATA EXPLORER Source 1 Source 2 Source 3 Source 4 Source 5 CATALOG ü Governed access ü Searchable metadata ü Collaboration
Inline Data Prep Vision Intuitive, excel-like transformation design Inline Model Inline Transformation Integrated Profiling Field Statistics Field Type: Integer Records: 10,000 Cardinality: 273 Min <count>: 1 Max <count>: 23 Bin Size (%): Quintile Merge Fields
Pentaho Machine Learning Orchestration Roadmap projects that serve emerging needs of data scientists. Catalog Data Explorer Notebook Integrations Adaptive Execution Native Algorithms
Pentaho Roadmap Features and dates are subject to change. Nov 2017 1H18 (8.1) Future VISUAL DATA EXPERIENCE Data Explorer Filters Catalog I Visual Profiling Catalog Search Data Prep from DET Layout Manager New User Console Data Science Viz Real-time Viz (BIG) DATA PROCESSING Kafka Interface Spark Streaming Parquet and Avro Enhanced AEL Streaming II Enhanced JSON/XML/ORC AEL - extend distros Advanced Profiling Rules Validator Native ML algorithms AEL Flink Thin Kettle (Composer) Web Designer Data Operations Mgr. AEL Next ENTERPRISE PLATFORM Scale-out Framework Foundry Integration Unified Monitoring Harden Metadata Bridges Vantara Integrations Enhanced Upgrade Enhanced Security New Content Lifecycle Vantara Integrations Metadata Manager Business Glossary Multi-tenancy Vantara Integrations ECOSYSTEM AEL HDP, MapR Google Cloud Platform Cassandra/NoSQL Update Multi-cloud Orchestration Cloud App Connectors Mainframe Enhanced SAP and SFDC
Hitachi Vantara Portfolio Application Framework Studio Dashboards Visualization Notifications App Development Edge Processing Asset Management Data Data Integration Analytics Asset registry Data catalog Metadata management Modeling and lineage Governance Data connectors Transformation engines Profiling and quality Data blending Data preparation Business analytics Content analytics Artificial intelligence Batch and stream Search Foundry Software Service Platform Workflow Scheduling Security Clustering Repository Monitoring Flash Storage Storage Converged Infrastructure Automated Management Data Protection
IoT Solutions from Edge to Outcomes SMART DATA CENTER SMART BUSINESS SMART INDUSTRY SMART CITY Edge Fog Layer Core Core Insights Outcomes Sensors Sensors Things Things People People Telemetry Edge Edge Filtering Asset Registry Asset Registry Stream Queues Ingest Process Visualize Model Predict Notify IoT Data Pipeline Lumada IoT Data Pipeline IoT Analytic Processor
Unlock the Business Value in YOUR Data YOUR STRATEGY Need for Better Insights To Achieve Better Outcomes YOUR INSHGTS Big Data Analytics Content Exploration Pentaho Hitachi Content Intelligence Hitachi Content Platform YOUR DATA TX TX Transactional Data Email and Documents Video, Image and Audio Social Media IT, Sensor and Machine Logs
The Power of Three HITACHI DATA SYSTEMS > Content platform > Storage solutions PENTAHO > Data Integration > Business Analytics HITACHI INSIGHT GROUP > Lumada IoT
Summary
Summary What we covered today: Product Vision Pentaho 8.0 Release Product Roadmap
Next Steps Want to learn more about Pentaho 8.0 and product roadmap? Other recommended breakout sessions: Processing Big Data with Pentaho: Rakesh Saha Operating Pentaho at Scale: Jens Bleul Solution Expo Pentaho 8.0 and Beyond Lumada IoT Platform Hitachi Content Platform Spark Processing And more.
Pentaho 8.1 Preview Some Candidate Projects Enhanced Streaming Enhanced Profiling Google Cloud Platform Unified Monitoring and Logging Enhanced Metadata Handling Pentaho 8.1 Expected Availability Q2 2017