Integrating MATLAB Analytics into Enterprise Applications The MathWorks, Inc. 1

Size: px
Start display at page:

Download "Integrating MATLAB Analytics into Enterprise Applications The MathWorks, Inc. 1"

Transcription

1 Integrating Analytics into Enterprise Applications 2015 The MathWorks, Inc. 1

2 Agenda Example Problem Access and Preprocess Data Develop a Predictive Model Integrate Analytics with Production Systems Build a Dashboard to Visualize Results 2

3 Example Problem How s my driving? A group of MathWorks employees installed an OBD dongle in their car that monitors the on-board systems Data is streamed to the cloud where it is aggregated and stored I would like to use this data to score the driving habits of participants 3

4 Fleet Analytics Architecture Edge Devices Kafka Connector Production System Production Server Analytics Compiler SDK Analytics Development Algorithm Developers API Gateway AWS Lambda Storage Layer Business Decisions Business Systems End Users 4

5 Data Analytics Workflow Access and Explore Data Preprocess Data Develop Predictive Models Integrate Analytics with Production Systems Files Working with Messy Data Model Creation e.g. Machine Learning Desktop Apps Databases Data Reduction/ Transformation Parameter Optimization Enterprise Scale Systems Sensors Feature Extraction Model Validation Embedded Devices and Hardware 5

6 Access and Preprocess Data Edge Devices Production System Analytics Development Algorithm Developers Storage Layer Business Decisions Business Systems End Users 6

7 The Data: Timestamped messages with JSON encoding Key Value { } "vehicles_id": {"$oid":"55a3fd d5b "} { } "time : {"$date":" t18:01:35.000z"}, "kc : , "kff1225" : , "kff125a" : ,.. { } "vehicles_id": {"$oid":"55a3fe d5c5c000020"} { } "time":{"$date":" t18:01:53.000z"}, "kc : , "kff1225" : , "kff125a" : ,.. { } "vehicles_id": "$oid":"55a d115b000001"} { } "time":{"$date":" t19:04:04.000z"} "kc":2200.0, "kff1225" : , "kff125a" : ,.. 7

8 Data Access and Preprocessing Challenges Challenges Data aggregation Different sources (files, web, etc.) Different types (images, text, audio, etc.) Data clean up Poorly formatted files Irregularly sampled data Redundant data, outliers, missing data etc. Data specific processing Signals: Smoothing, resampling, denoising, Wavelet transforms, etc. Images: Image registration, morphological filtering, deblurring, etc. Dealing with out of memory data (big data) Data preparation accounts for about 80% of the work of data scientists - Forbes 8

9 Access a Sample of Data and Develop a Preprocessing Function 10

10 Develop a Predictive Model Edge Devices MDCS Production System Analytics Development Algorithm Developers Storage Layer Business Decisions Business Systems End Users 11

11 Develop a Predictive Model in Label events Represent signals Train model Validate model Scale up 12

12 Integrate Analytics with Production Systems Edge Devices Kafka Connector Production System Production Server Analytics Compiler SDK Analytics Development Algorithm Developers API Gateway AWS Lambda Storage Layer Business Decisions Business Systems End Users 13

13 A quick Intro to Stream Processing Batch Processing applies computation to a finite sized historical data set that was acquired in the past Historical Data Configure Resources Schedule and Run Job Output Data Files Storage Files Storage Reporting Data Exploration Training Models Stream Processing applies computation to an unbounded data set that is produced continuously Continuous Data Messaging Service Stream Analytics Dashboards Connected Devices f(x) Alerts Reporting Real Time Decision Support Storage 14

14 Stream processing exploits the fact that recent data tends to be more valuable Real time decisions Time critical decisions Queries on historical data, Model training Edge Processing with Coder Stream Processing with Production Server MDCS, Compiler with Hadoop/Spark Value of data to decision making Preventive / Predictive Real- Time Actionable Event Hub Kinesis Reactive Historical Seconds Minutes Hours Days Months Time 15

15 Streaming data is treated as an unbounded table Event Time Input Table Vehicle RPM Torque Fuel Flow 18:01:10 55a3fd :10:30 55a3fe :05:20 55a3fd :10:45 55a3fd :30:10 55a :35:20 55a :20:40 55a3fe :39:30 55a :30:00 55a3fe :30:50 55a3fe State Function State Function State Function State Output Table Time window Vehicle Score 18:00:00 18:10:00 55a3fd 55a3fe 55a419 18:10:00 18:20:00 55a3fd 55a3fe 55a419 18:20:00 18:30:00 55a3fd 55a3fe 55a419 18:30:00 18:40:00 55a3fd 55a3fe 55a

16 Introducing Production Server Data Analytics Business System Databases Cosmos DB Production Server Visualization Cloud Storage Azure Blob Web Streaming Amazon Kinesis Azure Event Hub Request Broker Custom App Public Cloud Platform Private Cloud 17

17 Introducing Production Server Server software Manages packaged programs and worker pool Runtime libraries Single server can use runtimes from different releases RESTful JSON interface Enterprise Application MPS Client Library Applications/ Database Servers RESTful JSON Production Server Request Broker & Program Manager Lightweight client libraries C/C++,.NET, Python, and Java Runtime 18

18 Introducing Apache Kafka Kafka is a high through-put distributed messaging system Producer Producer Producer Originally developed at LinkedIn and open sourced in 2011 Kafka is architected as a massively scalable publish/subscribe message queue Topic Partition Partition Partition Kafka Cluster Topic Partition Partition Partition Well suited for large scale streaming applications Consumer Consumer Consumer 19

19 Connecting Production Server to Apache Kafka Kafka client for Production Server feeds topics to functions deployed on the server Topic-0 Kafka Cluster Topic-1 Configurable batch of messages passed as a Timetable Partition Partition Partition Partition Partition Partition Each consumer process feeds one topic to a specified function Consumer Process feeds Topic-0 Async Java Client Consumer Process feeds Topic-1 Async Java Client Drive everything from a simple config file No programming outside of! Request Broker & Program Manager Production Server 20

20 Develop, Test, and Deploy a Stream Processing Function Edge Devices Kafka Connector Production System Compiler SDK Analytics Development Algorithm Developers Business Decisions Storage Layer Business Systems End Users 21

21 Develop a Stream Processing Function in DEMO 23

22 Test Your Stream Processing Function on Live Data 25

23 Complete Your Application Edge Devices Production System Production Server Analytics Compiler SDK Analytics Development Algorithm Developers Business Decisions Storage Layer Business Systems End Users 26

24 Complete Your Application Reference: 28

25 Go Live! 30

26 Key Takeaways Connects directly to your data so you can quickly design and validate algorithms s high-level language and apps enable fast design iterations Production Server enables easy integration of your algorithms with enterprise production systems This enables you to spend your time understanding the data and designing algorithms 31

27 Resources to learn and get started Data Analytics with Compiler SDK Production Server Database Toolbox 32

28 and Simulink are registered trademarks of The MathWorks, Inc. See for a list of additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders The MathWorks, Inc. 33