Data Analytics for Engineers

Size: px
Start display at page:

Download "Data Analytics for Engineers"

Transcription

1 Data Analytics for Engineers Will Wilson Application Engineer MathWorks 2016 The MathWorks, Inc. 1

2 Agenda Definition Common Challenges Case Study Wrap Up 2

3 What is Data Analytics? Data Analytics is the process of leveraging the information in your data so you can take action! Data analytics helps companies and organizations to make better business decisions. Scope Engineering and Business data May include Machine Learning 3

4 Challenges with Data Analytics Aggregating data from multiple sources Cleaning data Choosing a model Moving to production 4

5 Case Study: Day-Ahead Load Forecasting Goal: Implement a tool for easy and accurate computation of day-ahead system load forecast Requirements: Acquire and clean data from multiple sources Accurate predictive model Easily deploy to production environment 5

6 Source Data Energy Load & Weather 6

7 Techniques to Handle Missing Data List-wise deletion Unbiased estimates Reduces sample size Implementation options Built in to many MATLAB functions Manual filtering 7

8 Techniques to Handle Missing Data Substitution Replace missing data points with a reasonable approximation Easy to model Too important to exclude 8

9 Merge Different Sets of Data Join along a common axis Popular Joins: Inner Full Outer Left Outer Right Outer Inner Join Full Outer Join Left Outer Join 9

10 Full Outer Join Key A B First Data Set Key X Y Z Second Data Set Key B Y Z NaN 1.4 NaN NaN NaN NaN 0.8 NaN Joined Data Set 10

11 Challenges with Data Analytics Aggregating data from multiple sources Cleaning data Choosing a model Moving to production 11

12 Machine Learning Characteristics and Examples Characteristics Lots of variables System too complex to know the governing equation (e.g., black-box modeling) Examples Pattern recognition (speech, images) Financial algorithms (credit scoring, algo trading) AAA 93.68% AA 2.44% 5.55% 92.60% 0.59% 4.03% 0.18% 0.73% 0.00% 0.15% 0.00% 0.00% 0.00% 0.00% 0.00% 0.06% A 0.14% 4.18% 91.02% 3.90% 0.60% 0.08% 0.00% 0.08% Energy forecasting (load, price) BBB BB 0.03% 0.03% 0.23% 0.12% 7.49% 0.73% 87.86% 8.27% 3.78% 86.74% 0.39% 3.28% 0.06% 0.18% 0.16% 0.64% Biology (tumor detection, drug discovery) B 0.00% 0.00% 0.11% 0.82% 9.64% 85.37% 2.41% CCC 0.00% 0.00% 0.00% 0.37% 1.84% 6.24% 81.88% D 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 1.64% 9.67% % AAA AA A BBB BB B CCC D 12

13 Overview Machine Learning Type of Learning Categories of Algorithms Machine Learning Supervised Learning Develop predictive model based on both input and output data Classification Regression Unsupervised Learning Clustering Group and interpret data based only on input data 13

14 Challenges with Data Analytics Aggregating data from multiple sources Cleaning data Choosing a model Moving to production 14

15 Integrate analytics with your enterprise systems MATLAB Compiler and MATLAB Coder MATLAB Coder MATLAB Compiler MATLAB Compiler SDK.exe.lib.dll MATLAB Runtime 15

16 Scale up with MATLAB Production Server Most efficient path for creating enterprise applications Deploy MATLAB programs into production Manage multiple MATLAB programs and versions Update programs without server restarts Reliably service large numbers of concurrent requests Integrate with web, database, and application servers MATLAB Production Server(s) Web Server(s) HTML XML Java Script 16

17 Request Broker Deployed Analytics MATLAB Production Server Web Application Server Apache Tomcat MATLAB Production Server MATLAB Production Server MATLAB Desktop Train in MATLAB Web Server/ Webservice Predictive Models CTF Weather Data Energy Data 17

18 Data Analytics Products Access and Explore Data Preprocess Data Develop Predictive Models Integrate Analytics with Systems MATLAB Parallel Computing Toolbox, MATLAB Distributed Computing Server MATLAB Production Server Database Toolbox Statistics and Machine Learning Toolbox MATLAB Compiler Data Acquisition Toolbox Curve Fitting Toolbox Neural Network Toolbox MATLAB Compiler SDK Mapping Toolbox Signal Processing Toolbox Computer Vision System Toolbox Image Acquisition Toolbox Image Processing Toolbox Econometrics Toolbox Used in today s demo OPC Toolbox Additional Data Analytics products 18

19 Key Takeaways Data preparation can be a big job; leverage built-in MATLAB tools and spend more time on the analysis. Rapidly iterate through different predictive models, and find the one that s best for your application. Leverage parallel computing to scale-up your analysis to large datasets. Eliminate the need to recode by deploying your MATLAB algorithms into production. 19

20 2016 The MathWorks, Inc. MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See for a list of additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders The MathWorks, Inc. 20