Data Science, realizing the Hype Cycle. Luigi Di Rito, Director Data Science Team, SAP Center of Excellence
Data Science, Machine Learning and Artificial Intelligence Deep Learning AREAS OF AI Rule-based Reasoning Machine Learning Artificial Intelligence Intelligence exhibited by machines Broadly defined to include any simulation of human intelligence Expanding and branching areas of research, development and investment Includes robotics, rule-based reasoning, natural language processing (NLP), knowledge representation techniques (knowledge graphs), Data Science aka Predictive/ Advanced Analytics Algorithmic and computational techniques and tools for handling large data sets Increasingly focused on preparing and modeling data for ML and DL tasks Natural Language Processing Translation Machine Vision Speech to Text Speech Text to Speech Robotics Autonomics Vehicles Artificial Intelligence Machine Learning A subfield of AI which aims to teach computers the ability to do tasks with data, without explicit programming Uses numerical and statistical approaches, including artificial neural network techniques to encode learning Models are built using training computation runs, can also train through usage Deep Learning A subfield of ML that uses specialized computational techniques, typically multi-layer (2+) artificial neural networks Layering allows cascaded learning and abstraction levels (e.g. line recognition -> shape -> object -> scene) Computationally intensive enabled by clouds, GPUs, and increasingly more specialized HW such as FPGA and new custom hardware ` 2
Is Machine Learning something new? What is machine learning? Computers learn from data without being explicitly programmed. Machine learning is not a new concept but is now becoming mainstream Why now? Big Data (for example, cloud applications, the Internet of Things, Data Lakes) Massive improvements in hardware (graphics processing unit [GPU] and multicore) Deep learning and other new algorithms performing complex operations 3
Typical Use Cases for Machine Learning Predictive Maintenance Predictive Quality Fraud Detection Root-cause analysis Risk Management Price / Volume Prediction Churn Prediction Demand / Revenue forecasting Price Optimization Elasticity understanding Customer Segmentation 4
Data Science is an iterative process Business Understanding Data Understanding Monitoring Data Preparation Deployment Data Modeling Evaluation CRISP-DM Cross Industry Standard Process for Data Mining 5
Is it all about Creating Models? Only 20% of the time is spent on modeling / algorithms Rexer Analytics 6
Real scenarios require a Data Science Platform Data Access Advanced Analytics Integrate and transform data from flat files, relational data bases or Hadoop for consumption in statistical models Utilize SAP PA automated modelling, HANA Predictive Analytics Library, R, 3 rd party and open source ML, text analytics or geo spatial engines to generate new business insight 1 3 Value Access Storage Preparation Visualization Analysis Process Operate 2 Data Preparation 4 Business Process Integration Automation of data acquisition and transformation to create input data sets for data science use cases Integrate results with existing business processes or develop new business applications using SCP, BI, HANA SQL and HTML5 (SAP UI5) 7
The Data Scientists Profiles The Data Scientist The Data Architect The Data / Analytics Manager The Data / Business Analyst The Data Engineer/ Administrator 8
Data Driven Application Development Process Steps and Owners Business Understanding Data Understanding Business Expert Business Analyst Analysis (Exploration) Data Preparation Data Scientist Modelling Data Non-SAP Data SAP Data Evaluation IT Deploy Operate Data Access Data Preparation Analysis (Production) Visualization Process Integration 9
Machine Learning @SAP. How do we explain to customers: MODE 1 Embedded Intelligence (S/4, SAC, Hybris) New Intelligent Applications INTELLIGENT APPS # OF USERS SAP Predictive Analytics 3 rd Party Analytical Tools BUSINESS ANALYST, MODE 2 PAL, APL, Graph, Text, Streaming, PAI Machine Learning Cloud Services Open Source & 3rd Party Machine Learning (R, Python, TensorFlow, Spark ML, Hadoop) CITIZEN DATA SCIENTIST DEVELOPER, DATA SCIENTIST ML SKILLS 10
December, 2017 Automating predictive quality pipelines based on image features SAP Data Hub use case INTERNAL
Efficiently Identifying Faults Earlier ERP Data Sensors Pressure Temperature Raw material Manufacturing Molding press IR image features IR cameras Business Outcome Reduce waste and rework due to higher accuracy of quality checks Early issue detection due to solution performance 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC Public 12
Orchestration & Pipelines - High Level Architecture Sensor Pipeline SAP Data Hub Streaming Data SAP HANA Platform Image Processing Pipeline Feature Extraction XSA Application Server HTML5 User Interface Data Modeling Predictive ERP Data Business & logistics data 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC Public 13
Backend : Data Hub task workflow 1. Stream Data 3. Classify 2. Extract Features The workflow simulates the arrival of an image and new sensor data stream Process continues with two operators streaming the data to HANA and performing the Python image processing to extract the features Finally, a stored procedure in HANA using PAL is used to apply a classification model to evaluate the quality 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC Public 14
Backend : Data Hub Pipelines 1. Stream Data 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC Public 15
Backend : Data Hub Pipelines 2. Extract Features 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC Public 16
Backend : Data Hub Pipelines 3. Classify 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC Public 17
Frontend : Monitoring UI Track the products on the production line with the quality check results IR Image of the production line for optical validation Main contributing variables with their values can be seen here. If they are over the limit, it is indicated by red font Operator can notify engineers if some irregularities occur 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC Public 18
Thank you. Luigi Di Rito Director Data Science Team SAP EMEA Center of Excellence Thank you!
2018 SAP SE or an SAP affiliate company. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company. The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors. National product specifications may vary. These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation, and SAP SE s or its affiliated companies strategy and possible future developments, products, and/or platforms, directions, and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they should not be relied upon in making purchasing decisions. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies. See www.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.