How to build and deploy machine learning projects

Similar documents
GOVERNMENT ANALYTICS LEADERSHIP FORUM SAS Canada & The Institute of Public Administration of Canada. April 26 The Shaw Centre

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT

DevOps and Machine Learning. Jasjeet Thind VP, Data Science & Engineering, Zillow

Evaluation of Machine Learning Algorithms for Satellite Operations Support

Is Machine Learning the future of the Business Intelligence?

Machine Learning Techniques For Particle Identification

MACHINE LEARNING IN THE OIL AND GAS INDUSTRY

Automated Embedded AI Asset Intelligence. Jean-Michel Cambot Founder & Chief Evangelist

COGNITIVE QA: LEVERAGE AI AND ANALYTICS FOR GREATER SPEED AND QUALITY. us.sogeti.com

Professor Dr. Gholamreza Nakhaeizadeh. Professor Dr. Gholamreza Nakhaeizadeh

This document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and

Machine Learning 2.0 for the Uninitiated A PRACTICAL GUIDE TO IMPLEMENTING ML 2.0 SYSTEMS FOR THE ENTERPRISE.

How to create an Azure subscription

VICE PRESIDENT, ARCHITECTURE GENERAL MANAGER, AI PRODUCTS GROUP - INTEL

Oracle Spreadsheet Add-In for Predictive Analytics for Life Sciences Problems

Predicting Customer Purchase to Improve Bank Marketing Effectiveness

Analytics with Intelligent Edge Sergey Serebryakov Research Engineer IEEE COMMUNICATIONS SOCIETY

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models.

DATA SCIENCE: HYPE AND REALITY PATRICK HALL

Analytics for Banks. September 19, 2017

TECHNOLOGY BRIEF Intapp and Applied AI. THE INTAPP PROFESSIONAL SERVICES PLATFORM Unified Client Life Cycle

PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING

Forecasting Seasonal Footwear Demand Using Machine Learning. By Majd Kharfan & Vicky Chan, SCM 2018 Advisor: Tugba Efendigil

Medical device company ensures product quality while saving hundreds of thousands of dollars

Week 1 Unit 1: Intelligent Applications Powered by Machine Learning

Disruptive Technologies Data Science & AI-Machine Learning

AI in Marketing Supercharging your marketing pipeline with SaaSberry BI s AI-driven solutions.

SAS Machine Learning and other Analytics: Trends and Roadmap. Sascha Schubert Sberbank 8 Sep 2017

WiFi MSFTGUEST msevent439sh

Data Analytics for Semiconductor Manufacturing The MathWorks, Inc. 1

Artificial Intelligence in Automotive Production

What s new in Machine Learning across the Splunk Portfolio

Approaching an Analytical Project. Tuba Islam, Analytics CoE, SAS UK

Data Analysis in the Internet of Things: IoT capabilities with MATLAB/Simulink

SmartCare. SPSS Workshop. Rick Durham - North American Advanced Analytics Channel Team IBM Corporation. Date: 5/28/2014

TECHNICAL PAPER PRESENTATION

SAP Predictive Analytics Suite

MATLAB and the Internet of Things (IoT): Collecting and Analysing IoT Data Gabriele Bunkheila Technical Marketing, MathWorks

Big Data. Methodological issues in using Big Data for Official Statistics

Introducing Analytics with SAS Enterprise Miner. Matthew Stainer Business Analytics Consultant SAS Analytics & Innovation practice

Paul Chang Senior Consultant, Data Scientist, IBM Cloud tw.ibm.com

INTELLIGENT SYSTEM FOR PREDICTING BEHAVIOR OF ELECTRICAL ENERGY CONSUMPTION

TIBCO Data & Analytics Overview

ThingSpeak - IoT Platform with MATLAB Analytics

Get Started Today with Smart Connected Operations

Introduction to Data Science for PI Data for PI Professionals

Intel s Machine Learning Strategy. Gary Paek, HPC Marketing Manager, Intel Americas HPC User Forum, Tucson, AZ April 12, 2016

Machine Learning For Enterprise: Beyond Open Source. April Jean-François Puget

Text Categorization. Hongning Wang

Text Categorization. Hongning Wang

Using OSIsoft Cloud Services to Fuel Cognitive Computing and Machine Learning

Smart Mortgage Lending

Machine Learning - Classification

Implementing Instant-Book and Improving Customer Service Satisfaction. Arturo Heyner Cano Bejar, Nick Danks Kellan Nguyen, Tonny Kuo

STATEMENT OF WORK. Statement of Work for Clinical Reasoning and Prediction System Assessment Page 1 of 7

Artificial Intelligence and Machine Learning in IoT

SFS 2.0 IN ACTION DIGITAL EXCELLENCE

Real Applications of Big Data. Challenges and Opportunities

WIN BIG WITH GOOGLE CLOUD

Artificial Intelligence in Canada Mira V Perry

IBM SPSS & Apache Spark

Using Artificial Intelligence and Machine Learning for Healthcare Machines

Conclusion.

Machine Learning for Mobile Platforms

C3 Products + Services Overview

Developing an analytics everywhere framework for the Internet of Things. Ph.D. Research Proposal by Hung Cao

UltiPro Perception Collect and understand employee feedback with surveys and sentiment analysis

TDWI Analytics Fundamentals. Course Outline. Module One: Concepts of Analytics

Automated Security Reviews in a CI Environment. Richard Fry Information Security Manager

TABLE OF CONTENTS ix

Testing Solutions for Hyper-Connected Apps

Predictive Analytics and Machine Learning: An Overview

Welcome to Our Home. Anni Toner Managing Director The Data-Shack Home of Analytics in AME & APAC


INTRODUCTION TO INFORMATION SYSTEMS & DECISION MAKING

C3 IoT: Products + Services Overview

CASE STUDY Delivering Real Time Financial Transaction Monitoring

Agile Industrial Analytics

MacroBase: Prioritizing Attention in Fast Data. Peter Bailis DAWN Project, Stanford InfoLab

Integrated Predictive Maintenance Platform Reduces Unscheduled Downtime and Improves Asset Utilization

Real Data Analysis at PNC

Azure ML Studio. Overview for Data Engineers & Data Scientists

Machine Learning with Google Cloud Platform

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1

IBM Analytics Unleash the power of data with Apache Spark

WorkloadWisdom Storage performance analytics for comprehensive workload insight

Real-time crop mask production using high-spatial-temporal resolution image times series

ERPs and Enabling Technologies. July 2018

ARCHITECTURES ADVANCED ANALYTICS & IOT. Presented by: Orion Gebremedhin. Marc Lobree. Director of Technology, Data & Analytics

Mass-Scale, Automated Machine Learning and Model Deployment Using SAS Factory Miner and SAS Decision Manager

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy

Understanding Business Value of Life Cycle Management. Sumeet Mehra Application & Management Solution Specialist 6 th November 2008

Leading a Successful DevOps Transition Lessons from the Trenches. Randy Shoup Consulting CTO

Applying Big Data Analytics with AI into SK Telecom s OSS (TANGO)

Building a Data Lake on AWS

SOFTWARE DEVELOPMENT PRODUCTIVITY FACTORS IN PC PLATFORM

Tools and technology usage in PFMS application lifecycle management process

Common Customer Use Cases in FSI

On various testing topics: Integration, large systems, shifting to left, current test ideas, DevOps

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other

Transcription:

How to build and deploy machine learning projects Litan Ilany, Advanced Analytics litan.ilany@intel.com

Agenda Introduction Machine Learning: Exploration vs Solution CRISP-DM Flow considerations Other key considerations Q&A 2

Introduction Litan ilany litan.ilany@intel.com Scientist Leader at Intel s Advanced Analytics team. Owns a M.Sc. degree in Information-Systems Engineering at BGU (focused on Machine-Learning and Reinforcement-Learning) Married + 2, Live in Kiryat Motzkin 3

Advanced analytics team Radical improvement of critical processes Help Building AI Competitive products Design Validation Product Dev Cost, quality, performance Sales Unlimited scale Health Smart Clinical Trials Industrial Ai IoT Analytics Edge-Fog-Cloud platform Breakthrough Technology that Scales 4

Machine Learning Statistics Pattern recognition Generalization / Inductive Inference Types of learning: Supervised vs Unsupervised Learning Passive vs Active & Reinforcement Learning Batch vs Online Learning 5

ML Algorithm vs Solution Given a data matrix does not exist in real life Pareto Principle (80/20 rule) Technical aspects Business needs Extreme cases 6

ML project - Go / No-Go Decision Business Feasibility Problem definition is clear Partner willing to invest / change Enough ROI / impact Feasibility measures what they care about ( signal ) Enough accessible & connected data is accurate Execution Feasibility Technology is accessible Model can be executed in a timely manner and size Model s I/O flow is reasonable 7

CRISP-DM Cross-Industry Standard Process for Mining A structed methodology for DM projects Based on practical, real-world experience Conceived in 1996-7 8

CRISP-DM what is the we are working with? what is the problem we are dealing with? Business how should we use the model we developed? Deployment what are the transformations and extractions to be done on the? Preparation does the model meet the project goals? Evaluation Modeling what is the data model we should use? 9

Crisp-dm: business understanding Deployment Business Preparation Modeling Determine business objective Evaluation Assess situation Determine data mining goals and success criteria Determine project plan 10

Crisp-dm: business understanding - example Deployment Business Preparation Example: Smart CI Each git- is integrated with the main repository after tests series passes Multi git- (can t check one-by-one) Bug in code causes entire integration to fail Evaluation Modeling Failed Git Integration CI tests 11

Crisp-dm: business understanding - example Deployment Business Preparation Example: Smart CI Each git- is integrated with the main repository after tests series passes Multi git- (can t check one-by-one) Bug in code causes entire integration to fail Evaluation Modeling Failed Git Personal CI tests Integration CI tests 12

Crisp-dm: business understanding - example Deployment Business Preparation Goals and success criteria: Reduce Turnaround Time (TAT) At least 20% time reduction Project plan Failed ML solution Evaluation Modeling Git Predict pass Predict fail Integration + CI tests Failed Personal CI tests 13

Crisp-dm: data understanding Deployment Business Preparation Collect initial data Example: Evaluation Modeling Describe data Git-log files (unstructured data): Explore data Commits numerical / binary Verify data quality Files, Folders numerical / binary Lines numerical Git DB (structured data): Users categorical Timestamps, etc. Historical tests results (labels) 14

Crisp-dm: data preparation Deployment Business Preparation Modeling Integrate data from multi sources Format data Example: Generate features from log Evaluation Feature extraction Clean data Construct data Derive attributes transformation Reduce imbalance data Fill in missing values Feature selection Generate and clean user-features Normalize counters Thousands of features remove unnecessary ones balancing (if needed) 15

Crisp-dm: modeling Deployment Business Preparation Select modeling technique Example: Evaluation Modeling Consider computer resources, computation time, number of features, business needs Generate test design Train/Test split, Cross validation Simulation (chronological order) We ll check various ML models with various hyperparameters Simulation, weekly training phase Build model Assess model 16

Crisp-dm: modeling example (smart ci) Model assessment: Which model to choose? How can we measure it? Model A Model B Predicted \ Actual failed Total Predicted \ Actual failed Total Predicted pass 55 18 73 Predicted fail 15 12 27 Total 70 30 100 Predicted pass 35 5 40 Predicted fail 35 25 60 Total 70 30 100 17

Crisp-dm: modeling example (smart ci) Model assessment: Which model to choose? How can we measure it? Measure A B Accuracy (55+12)/100 = 67% (35+25)/100 = 60% Model A Model B Predicted \ Actual failed Total Predicted \ Actual failed Total Predicted pass 55 18 73 Predicted fail 15 12 27 Total 70 30 100 Predicted pass 35 5 40 Predicted fail 35 25 60 Total 70 30 100 18

Crisp-dm: modeling example (smart ci) Model assessment: Measure A B Accuracy (55+12)/100 = 67% (35+25)/100 = 60% Which model to choose? Precision 55/73 = 75% 35/40 = 87% How can we measure it? Model A Model B Predicted \ Actual failed Total Predicted \ Actual failed Total Predicted pass 55 18 73 Predicted fail 15 12 27 Total 70 30 100 Predicted pass 35 5 40 Predicted fail 35 25 60 Total 70 30 100 19

Crisp-dm: modeling example (smart ci) Model assessment: Which model to choose? How can we measure it? Measure A B Accuracy (55+12)/100 = 67% (35+25)/100 = 60% Precision 55/73 = 75% 35/40 = 87% Recall 55/70 = 76% 35/70 = 50% Model A Model B Predicted \ Actual failed Total Predicted \ Actual failed Total Predicted pass 55 18 73 Predicted fail 15 12 27 Total 70 30 100 Predicted pass 35 5 40 Predicted fail 35 25 60 Total 70 30 100 20

Crisp-dm: modeling example (smart ci) Model assessment: Which model to choose? How can we measure it? Predicted \ Actual Model A failed Measure A B Accuracy (55+12)/100 = 67% (35+25)/100 = 60% Precision 55/73 = 75% 35/40 = 87% Recall 55/70 = 76% 35/70 = 50% FPR* 18/30 = 60% 5/30 = 17% *Lower is better Total Predicted \ Actual Model B failed Total Predicted pass 55 18 73 Predicted fail 15 12 27 Total 70 30 100 Predicted pass 35 5 40 Predicted fail 35 25 60 Total 70 30 100 21

Crisp-dm: modeling example (smart ci) Business ease of explanation SVM NN Random Forest GBT KNN Decision Tree Regression Complex Simple 22

Crisp-dm: modeling example (smart ci) Expected value (in simulations) Random Forest SVM NN GBT KNN Decision Tree Regression Business ease of explanation 23

crisp-dm: evaluation Deployment Business Preparation Modeling Evaluate results In terms of business needs Review Process Determine next steps Example: Predicted \ Actual TAT reduction: TP = 50% reduction (X2 faster) FN = 0% reduction failed Predicted pass TP FP Predicted fail FN TN Evaluation FP = -500-5000% reduction (X5-50 slower) 24

Crisp-dm: deployment Deployment Business Preparation Modeling Plan and deploy the model Example: Evaluation Plan monitoring and maintenance process Integrate with existing CI system Weekly automatic process that will train the model Weekly automatic process that will monitor the model s performance and suggest better hyper parameters (if needed) 25

CRISP-DM: data flow Business Flow Architecture Deployment flow validation Schema Architecture Preparation Evaluation flow Implementation Modeling 26

Other key considerations Use Git (or other version control platform) Automate the research process (trial-and-error) Use Docker containers TEST YOUR CODE (don t think of it as black box) ML Technical Debt code and data 27

references CRISP-DM (Wikipedia) 4 things DSs should learn from software engineers Machine Learning: The High Interest Credit Card of Technical Debt 28