Pentaho 8.0 Overview. Pedro Alves

Similar documents
Cask Data Application Platform (CDAP)

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

Oracle Big Data Cloud Service

ETL challenges on IOT projects. Pedro Martins Head of Implementation

Microsoft Azure Essentials

Microsoft Big Data. Solution Brief

Hybrid Data Management

ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS)

Data Analytics. Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC

How to Build Your Data Ecosystem with Tableau on AWS

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

Stuck with Power BI? Get Pyramid Starting at $0/month. Start Moving with the Analytics OS

Deploying Microservices and Containers with Azure Container Service and DC/OS

Bringing the Power of SAS to Hadoop Title

Microsoft FastTrack For Azure Service Level Description

MapR Pentaho Business Solutions

Oracle Big Data Discovery Cloud Service

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop

THE MAGIC OF DATA INTEGRATION IN THE ENTERPRISE WITH TIPS AND TRICKS

Fast Start Business Analytics with Power BI

Oracle Enterprise Data Quality Product Roadmap and Statement of Direction. October 2016

Copyright 2014, Oracle and/or its affiliates. All rights reserved. 2

Oracle Autonomous Data Warehouse Cloud

Innovate with Oracle Public Cloud Platform & Infrastructure Services

Oracle Big Data Discovery The Visual Face of Big Data

Data Center Operating System (DCOS) IBM Platform Solutions

DLT AnalyticsStack. Powering big data, analytics and data science strategies for government agencies

Improving Business Value, Through better solutions.

Oracle Autonomous Data Warehouse Cloud

Analytics empowering clients to see farther & go faster

Hadoop and Analytics at CERN IT CERN IT-DB

Application Integrator Automate Any Application

Oracle Cloud Blueprint and Roadmap Service. 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Conclusion.

Oracle Enterprise Manager 13c Cloud Control

Integrating MATLAB Analytics into Enterprise Applications

Is SharePoint 2016 right for your organization?

Real-Time Streaming: IMS to Apache Kafka and Hadoop

Got Data Silos? Automate Data Ingestion Into Isilon In Support Of Analytics

Microsoft BI Product Suite

HP SummerSchool TechTalks Kenneth Donau Presale Technical Consulting, HP SW

Understanding the Business Value of Docker Enterprise Edition

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia

Hadoop in the Cloud. Ryan Lippert, Cloudera Product Cloudera, Inc. All rights reserved.

Solution Components Sugar 6.5 Release

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB

Expert Reference Series of White Papers. Microsoft Service Manager Simplified

POWER BI OVERVIEW & FEATURES JANUARY 2017, SINGAPORE. Khilitchandra Prajapati

Oracle Paas. Rino Weggers, Customer Success Manager Frank Brink, Customer Success Manager November 17, 2015

Microsoft Dynamics 365 and Columbus

Bringing Big Data to Life: Overcoming The Challenges of Legacy Data in Hadoop

Common Customer Use Cases in FSI

ETL on Hadoop What is Required

Mit Werkzeugen den DevOps Konflikt auflösen

20332B: Advanced Solutions of Microsoft SharePoint Server 2013

Hortonworks Powering the Future of Data

E-guide Hadoop Big Data Platforms Buyer s Guide part 3

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

Middleware Modernization: lay the foundation to your digital success

Copyright - Diyotta, Inc. - All Rights Reserved. Page 2

Sr. Sergio Rodríguez de Guzmán CTO PUE

The Alpine Data Platform

Jason Virtue Business Intelligence Technical Professional

Faizer Feroz Director Enterprise Applications Herbalife. Scott Haaland Product Strategy Director Service Integration Product Management

IBM SPSS & Apache Spark

Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect

ARCHITECTURES ADVANCED ANALYTICS & IOT. Presented by: Orion Gebremedhin. Marc Lobree. Director of Technology, Data & Analytics

Cask Data Application Platform (CDAP) The Integrated Platform for Developers and Organizations to Build, Deploy, and Manage Data Applications

Oracle s Service-Oriented Architecture Strategy

WHITE PAPER SPLUNK SOFTWARE AS A SIEM

F5 Visualization and Analytics. Nishant Shah Sr. Product Manager

Service Catalog ATTOSOL TECHNOLOGIES.

The IBM Reference Architecture for Healthcare and Life Sciences

Product presentation. Fujitsu HPC Gateway SC 16. November Copyright 2016 FUJITSU

Turn Data into Business Value

Leveraging Oracle Big Data Discovery to Master CERN s Data. Manuel Martín Márquez Oracle Business Analytics Innovation 12 October- Stockholm, Sweden

TIBCO Live Datamart providing an operational command and control center in a virtual train application.

Creating an integrated plug-and-play supply chain with serverless computing

Setup HSP Ambari Cluster

POWER BI BOOTCAMP. COURSE INCLUDES: 4-days of instructor led discussion, Hands-on Office labs and ebook.

Big Data Analytics for Retail with Apache Hadoop. A Hortonworks and Microsoft White Paper

Cloud Based Analytics for SAP

Analyze Big Data Faster and Store it Cheaper. Dominick Huang CenterPoint Energy Russell Hull - SAP

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Architecture Overview for Data Analytics Deployments

IIOT Data Access with the PI System

Deloitte School of Analytics. Demystifying Data Science: Leveraging this phenomenon to drive your organisation forward

: 20776A: Performing Big Data Engineering on Microsoft Cloud Services

Gyors piacra jutás felhő platformon - hagyományos IT fejlesztés nélkül? Petrohán Zsolt

What s new in Teamcenter Service Pack

Building Your Big Data Team

Setting the Foundation for Improved Business Agility

High Speed Video Processing for Big Data Applications

Welcome to. enterprise-class big data and financial a. Putting big data and advanced analytics to work in financial services.

Take a Tour of Native Hybrid Cloud & Neutrino. Modern, cloud native platforms

Exelon Utilities Data Analytics Journey

: Boosting Business Returns with Faster and Smarter Data Lakes

Azure PaaS and SaaS Microsoft s two approaches to building IoT solutions

Transcription:

Pentaho 8.0 Overview Pedro Alves

Safe Harbor Statement The forward-looking statements contained in this document represent an outline of our current intended product direction. It is provided for information purposes only and is not a commitment to deliver any new or enhanced product or functionality, or that we will pursue the product direction described. Facts and circumstances may occur which may impact current plans, resulting in changes to the information in this presentation. This information is current only as of the date it is made and should not be relied upon in making purchasing decisions. The development, release (if at all), and timing of any features or functionality described for the Pentaho products remains at the sole discretion of Pentaho.

Pentaho Business Analytics Platform Data Engineer Data Analyst / Data Scientist Business Analyst Consumer Production Reporting Interactive Query and Analysis Custom and Self-Service Dashboards Pentaho Data Integration Data Preparation Integrated Machine Learning learning OPEN AND EMBEDDABLE Operational Data Big Data Data Stream Public/Private Clouds

Future Vision: A Single Consistent Experience Data Engineering Data Prep Analytics Ingestion Processing Blending Data Delivery Data Discovery Analysis & Dashboards Administration Security Lifecycle Management Data Provenance Dynamic Data Pipeline Monitoring Automation

Introducing Pentaho 8.0 Challenge #1 Data volumes and velocity are growing exponentially Pentaho 8.0 Broadens connectivity to streaming data sources Connect to Kafka streams Stream processing with Spark Big data security with Knox Challenge #2 Processing and storage resources are constrained Pentaho 8.0 Optimizes processing resources Enhanced Adaptive Execution (AEL) Native Avro and Parquet handling Worker nodes for Scale-out Challenge #3 Shortage of Big Data talent and lack of productivity Pentaho 8.0 Boosts team productivity across the pipeline Data explorer filters Improved repository UX Extended operations mart

Pentaho 8

Platform and Scalability Worker Nodes New theme

Worker Nodes for Scaling Out Scale work items across multiple nodes (containers) Easily add and remove resources as required Distribute and Scale Worker Node (a) Worker Node (b) Worker Node (c ) Monitor and balance changing workloads Deploy on premise, cloud and hybrid NEW in Pentaho 8.0 ü Container framework ü Orchestration framework ü Node monitoring ü Enhanced HA implementation

Worker Nodes Architecture WORKER NODES Orchestration Framework Orchestration (Scheduler, monitoring, security, etc.) Powered by Pentaho Clients Master (Working) Master (Standby) Controller (HA) Master (Standby) Pentaho Server Container Framework Pentaho Repository WN 1 e.g. KJB WN 2 e.g. KTR WN n Executor

New Theme - Ruby

Data Integration Streaming support! Run configurations for Jobs Filters in Data Explorer New Open / Save experience

Streaming for Time Sensitive Insight Enable use cases that require realtime processing, monitoring and aggregation Real-time device monitoring Log-file aggregation Notifications And more NEW in Pentaho 8.0 ü Kafka Producer Step ü Kafka Consumer Step ü Get records from stream Step ü Spark streaming via AEL

Run configurations for Jobs Run Configuration Execute on server Leverage worker nodes

Pentaho 7.0 Data Explorer Access visualizations during data prep for inspection and prototyping

Data Explorer Filters Enhanced data inspection in PDI Identify data to be cleaned or removed Deliver data to the business more quickly New in Pentaho 8.0 ü Numeric filters ü String filters ü Include/Exclude data points

New PDI Repository Dialogs Enhanced user experience for content repository Consolidated open and save dialogs Enhanced search Recently used files option Sticky directories Business Benefits Increased productivity

Big Data Improvements on AEL Big Data File Formats - Avro and Parquet Big Data Security - Support for Knox VFS improvements for Hadoop Clusters

Pentaho 7.1 Adaptive Execution for Spark PDI ü No Coding Pentaho Kettle ü Build Once ü Execute on Any* Engine *Currently Available Engines

Enhanced Adaptive Execution Simplified setup HADOOP CLUSTER Eliminated Zookeeper component Reduced number of setup steps Hardened deployment PDI Client AEL-Spark Daemon on Edge Nodes Spark/Hadoop Processing Nodes Spark Executors Fail-over at the edge Kerberos impersonation for client More flexible AEL-Spark Engine (Spark Driver) HDFS Hadoop/Spark Compatible Storage Cluster Azure Storage Amazon S3 Etc Support multiple run configurations Customize cluster settings per job type

Native Avro and Parquet Handling Visual handling of data files with common big data file formats Reading and writing files with specific steps Natively executes in Spark via AEL NEW in Pentaho 8.0 ü Avro Input and Output Steps ü Parquet Input and Output Steps

Knox Gateway for More Secure Clusters Pentaho can interact with data services in Knox gateway protected Hadoop clusters* Benefits Highly secured Knox-protected Hadoop clusters can now be integrated in ETL pipeline *Knox is only available for Hortonworks HDP distro.

VFS improvements for Hadoop Clusters In order to simplify the overall lifecycle of jobs and transformations we made the hadoop clusters available through VFS, on the format hc://hadoop_cluster/.

Others Ops Mart for Oracle, MySQL, SQL Server Platform password security improvements PDI mavenization Documentation changes on help.pentaho.com Feature Removals: Analyzer on MongoDB Mobile Plug-in (Deprecated in 7.1)

Product Documentation Updates MindTouch Documentation Improvements (continued) Streamlining of site content pages: Simpler site structure and click paths to accelerate user content access Eliminates pages with little content value Fun fact: 7.1 contained 1200+ pages; 8.0 contains 900+ pages with no elimination of valuable content Ongoing migration of big data steps from the wiki pages to the MindTouch Doc site Business Benefits Improved access to on-line documentation

Pentaho 8.0 - Detailed Data Integration Drag and drop numeric and non-numeric filters in Data Explorer Chart actions in Data Explorer to include or exclude data Cleaner and easier design/ux for PDI Repository dialogs Consolidated PDI Repository dialogs, replacing multiple pop-ups PDI Repository Dialogs remember last opened directory Jobs can leverage Run Configurations for seamless user experience Run Configuration option to run jobs on Pentaho Server Streaming Kafka Producer step for outputting streaming data Kafka Consumer step for streaming data input Get Records from Stream step for stream processing Ability to run AEL on Spark Streaming for transformations that use Kafka streaming ingest Enterprise Platform Worker Nodes Scale-Out solution to drive superior agility Ops Mart for Oracle, MySQL, SQL Server Ruby Theme new platform branding for browser and client tools Platform password security improvements PDI Mavenization for infra alignment Big Data Simplified Spark AEL setup Spark AEL fail-over and load balancing Support for Spark AEL on Hortonworks Enhanced security for Spark AEL including Kerberos impersonation and secure connectivity Customization of Spark AEL processing (i.e. memory and other settings) Parquet input and output steps Avro input and output steps Big Data Security with HDP Knox Gateway VFS Improvements for named Hadoop clusters Additional Items Big Data Sandbox VM updates Cumulative service pack framework Documentation improvements on help.pentaho.com

Q&A