Data Analytics with MATLAB Adam Filion Application Engineer MathWorks

Similar documents
Data Analytics for Engineers

Sharing and Deploying MATLAB Programs

Data Analytics for Semiconductor Manufacturing The MathWorks, Inc. 1

Enterprise-Scale MATLAB Applications

2015 The MathWorks, Inc. 1

MATLAB for Data Analytics The MathWorks, Inc. 1

Integrating MATLAB Analytics into Enterprise Applications The MathWorks, Inc. 1

Integrating MATLAB Analytics into Enterprise Applications

Introduction to Data Analytics with MATLAB

Building, Scaling, and Implementing Risk Model and Stress Test Frameworks By Bet Herrera Sucarrat, PhD Application Engineer MathWorks

ThingSpeak - IoT Platform with MATLAB Analytics


Data Analysis in the Internet of Things: IoT capabilities with MATLAB/Simulink

KnowledgeSTUDIO. Advanced Modeling for Better Decisions. Data Preparation, Data Profiling and Exploration

The Rise of Engineering-Driven Analytics. Richard Rovner VP Marketing

Get More From Your Data with Data Analytics. Valerie Leung Application Engineering 1

MATLAB and the Internet of Things (IoT): Collecting and Analysing IoT Data Gabriele Bunkheila Technical Marketing, MathWorks

What s new in MATLAB and Simulink

Embracing Technical Computing Trends with MATLAB Accelerating the Pace of Engineering and Science

Problem-Based Learning: Data Analytics and Machine Learning Techniques for Solving Real- World Challenges

Developing Prognostics Algorithms: Data-Based and Model-Based Approaches

IBM SPSS & Apache Spark

Get More From Your Data with Data Analytics

Scaling up MATLAB Analytics with Kafka and Cloud Services

2015 The MathWorks, Inc. 1

Achieve Better Insight and Prediction with Data Mining

Scaling up MATLAB Analytics with Kafka and Cloud Services

DATA SCIENCE: HYPE AND REALITY PATRICK HALL

Intel s Machine Learning Strategy. Gary Paek, HPC Marketing Manager, Intel Americas HPC User Forum, Tucson, AZ April 12, 2016

Ampliando MATLAB Analytics con Kafka y Servicios en la Nube

Brian Macdonald Big Data & Analytics Specialist - Oracle

IBM SPSS Modeler Personal

SAP Predictive Analytics Suite

Advances in PI System Streaming Analytics

IBM SPSS Modeler Personal

SAS Machine Learning and other Analytics: Trends and Roadmap. Sascha Schubert Sberbank 8 Sep 2017

Streaming Calculation with the PI System and MATLAB

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT

Advances in PI System Streaming Analytics and Other External Calculation Engines

Hadoop Course Content

MATLAB Seminar for South African Reserve Bank

The Rise of Engineering-Driven Analytics

USING R IN SAS ENTERPRISE MINER EDMONTON USER GROUP

2016 INFORMS International The Analytics Tool Kit: A Case Study with JMP Pro

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

FINANCE + DEEP LEARNING SKYMIND / DEEPLEARNING4J 2015

Deep Dive into High Performance Machine Learning Procedures. Tuba Islam, Analytics CoE, SAS UK

Transforming Analytics with Cloudera Data Science WorkBench

What s New. Bernd Wiswedel KNIME KNIME AG. All Rights Reserved.

Agile Industrial Analytics

Big Data and Machine Learning for Predictive Maintenance

BRINGING AI TO ALL DEVELOPERS

National Occupational Standard

Building the In-Demand Skills for Analytics and Data Science Course Outline

Oracle Spreadsheet Add-In for Predictive Analytics for Life Sciences Problems

Machine Learning for Mobile Platforms

Developing and Deploying Analytics for IoT Systems

Business is being transformed by three trends

Business Analytics using R

Machine Learning For Enterprise: Beyond Open Source. April Jean-François Puget

USING SAS HIGH PERFORMANCE STATISTICS FOR PREDICTIVE MODELLING

C3 Products + Services Overview

Introducing Analytics with SAS Enterprise Miner. Matthew Stainer Business Analytics Consultant SAS Analytics & Innovation practice

MicroStrategy 10. Adam Leno Technical Architect NDM Technologies

This document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and

Machine Learning 101

What s New in MATLAB and Simulink

Spark and Hadoop Perfect Together

Azure ML Studio. Overview for Data Engineers & Data Scientists

GSAW 2018 Machine Learning

Data Mining and Knowledge Discovery in Large Databases

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

EA103 Extend Your Analytics Capabilities on SAP HANA Using SAP Predictive Analysis Vishwanath Belur, Product Manager, SAP PI BIT Oct 2013

Microsoft Big Data. Solution Brief

Data Mining Applications with R

MACHINE LEARNING OPPORTUNITIES IN FREIGHT TRANSPORTATION OPERATIONS. NORTHWESTERN NUTC/CCIT October 26, Ted Gifford

Predictive Maintenance and Condition Monitoring

Drive Better Insights with Oracle Analytics Cloud

zdata Solutions BI / Advanced Analytic Platform and Pilot Programs

POST GRADUATE PROGRAM IN DATA SCIENCE & MACHINE LEARNING (PGPDM)

Gene Expression Data Analysis

New Features in Enterprise Miner

SQLStarter Intro to Data Science. Dave

IBM SPSS Modeler Premium

1226A: ique on IBM BlueMix can increase your TM1 forecast accuracy. Michael Cowie Dallas Crawford QueBIT Consulting, LLC

Machine Learning Based Prescriptive Analytics for Data Center Networks Hariharan Krishnaswamy DELL

Product Roadmap. February Sebastien Cognet, Pre-Sales EMEA

InsideBIGDATA Guide to Predictive Analytics

Building a Multi-Tenant Infrastructure for Diverse Application Workloads

SAS Business Knowledge Series

The usage of Big Data mechanisms and Artificial Intelligence Methods in modern Omnichannel marketing and sales

Business Rules Modeling Studio

Preface to the third edition Preface to the first edition Acknowledgments

Actian DataConnect 11

Microsoft Azure Essentials

Developing Analytics and Deploying IoT Systems

The real life of Data Scientists. or why we created the Data Science Studio

IBM SPSS Modeler. Accelerate time to value with visual data science and machine learning. Highlights

What s new in Machine Learning across the Splunk Portfolio

BIG DATA AND HADOOP DEVELOPER

Transcription:

Data Analytics with Adam Filion Application Engineer MathWorks 2015 The MathWorks, Inc. 1

Case Study: Day-Ahead Load Forecasting Goal: Implement a tool for easy and accurate computation of dayahead system load forecast Requirements: Acquire and clean data from multiple sources Accurate predictive model Easily deploy to production environment 2

Challenges with Data Analytics Aggregating data from multiple sources Cleaning data Choosing a model Moving to production 3

NYISO Energy Load Data 4

Techniques to Handle Missing Data List-wise deletion Unbiased estimates Reduces sample size Implementation options Built in to many functions Manual filtering 5

Techniques to Handle Missing Data Substitution replace missing data points with a reasonable approximation Easy to model Too important to exclude 6

Merge Different Sets of Data Join along a common axis Popular Joins: Inner Full Outer Left Outer Right Outer Inner Join Full Outer Join Left Outer Join 7

Full Outer Join Key A B 1 1.1 4 1.4 7 1.7 9 1.9 First Data Set Key X Y Z 1 0.1 0.2 3 0.3 0.4 5 0.5 0.6 7 0.7 0.8 Second Data Set Key B Y Z 1 3 4 5 7 9 1.1 NaN 1.4 NaN 0.1 0.3 NaN 0.5 0.2 0.4 NaN 0.6 1.7 1.9 0.7 NaN 0.8 NaN Joined Data Set 8

Learn More: Big Data with www.mathworks.com/discovery/big-data-matlab.html www.mathworks.com/discovery/matlab-mapreduce-hadoop.html 9

Challenges with Data Analytics Aggregating data from multiple sources Cleaning data Choosing a model Moving to production 10

Machine Learning Characteristics and Examples Characteristics Lots of variables System too complex to know the governing equation (e.g., black-box modeling) Examples Pattern recognition (speech, images) Financial algorithms (credit scoring, algo trading) Energy forecasting (load, price) Biology (tumor detection, drug discovery) 11

Overview Machine Learning Type of Learning Categories of Algorithms Machine Learning Supervised Learning Develop predictive model based on both input and output data Classification Regression Unsupervised Learning Clustering Group and interpret data based only on input data 12

Supervised Learning Regression Neural Networks Decision Trees Ensemble Methods Non-linear Reg. (GLM, Logistic) Linear Regression Classification Support Vector Machines Discriminant Analysis Naive Bayes Nearest Neighbor 13

Unsupervised Learning k-means, Fuzzy C-Means Hierarchical Clustering Neural Networks Gaussian Mixture Hidden Markov Model 14

Learn More: Machine Learning with mathworks.com/machine-learning 16

Challenges with Data Analytics Aggregating data from multiple sources Cleaning data Choosing a model Moving to production 17

Deployment Highlights Database Servers Desktop Applications Hadoop / Big Data Spreadsheets Application Servers Client Applications Web Applications Batch/Cron Jobs Share with others who may not have Royalty-free deployment Encryption to protect your intellectual property 18

Deploying Applications with Compiler Compiler SDK Standalone Application Excel Add-in Hadoop C/C ++ Java.NET Production Server Compiler for sharing programs without integration programming Compiler SDK provides implementation and platform flexibility for software developers Production Server provides the most efficient development path for secure and scalable web and enterprise applications 19

Sharing Standalone Applications Application Author Toolboxes 1 2 Compiler End User Standalone Application Excel Add-in Hadoop 3 Runtime 20

Integrating -based Components Application Author Toolboxes Application author and software developer might be same person 1 Software Developer 2 Compiler SDK C/C ++ Java.NET Production Server 3 4 Runtime 21

Production Server Directly deploy analytic programs into production Centrally manage multiple programs & MCR versions Automatically deploy updates without server restarts Scalable & reliable Service large numbers of concurrent requests Add capacity or redundancy with additional servers Production Server(s) Use with web, database & application servers Lightweight client library isolates processing Access programs using native data types Integrates with Java,.NET, C and Python Web Server(s) HTML XML Java Script 22

Deployed Analytics Production Server Web Application Server Apache Tomcat Production Server Production Server Desktop Train in Web Server/ Webservice Request Broker Predictive Models CTF Weather Data Energy Data 23

Learn More: Application Deployment with www.mathworks.com/solutions/desktop-web-deployment/ 24

Learn More: Application Deployment Also www.mathworks.com/solutions/desktop-web-deployment/ 25

Data Analytics Products Access and Explore Data Preprocess Data Develop Predictive Models Integrate Analytics with Systems Parallel Computing Toolbox, Distributed Computing Server Production Server Database Toolbox Statistics and Machine Learning Toolbox Compiler Data Acquisition Toolbox Curve Fitting Toolbox Neural Network Toolbox Compiler SDK Mapping Toolbox Signal Processing Toolbox Computer Vision System Toolbox Image Acquisition Toolbox Image Processing Toolbox Econometrics Toolbox Used in today s demo OPC Toolbox Additional Data Analytics products 26

Key Takeaways Data preparation can be a big job; leverage built-in tools and spend more time on the analysis Rapidly iterate through different predictive models, and find the one that s best for your application Leverage parallel computing to scale-up your analysis to large datasets Eliminate the need to recode by deploying your algorithms into production 27

2015 The MathWorks, Inc. 28