Enterprise-Scale MATLAB Applications

Size: px
Start display at page:

Download "Enterprise-Scale MATLAB Applications"

Transcription

1 Enterprise-Scale Applications Sylvain Lacaze Rory Adams 2018 The MathWorks, Inc. 1

2 Enterprise Integration Access and Explore Data Preprocess Data Develop Predictive Models Integrate Analytics with Systems Databases Cosmos DB MDCS AWS Kinesis Azure IoT Hub Streaming Cloud Storage Azure Blob Production Server Request Broker PI System OT Platforms Big Data / OT Dashboards PI System 2

3 at Scale Scale with increasing Access Computational Complexity Data Volume and Velocity 3

4 Key Takeaways 1. Share applications and algorithms with anyone 2. Integrate functions into existing workflows and development platforms. 3. Deploy applications within enterprise systems in a scalable manner 4

5 Write Once Then Share To Different Targets Compiler Compiler SDK Coder GPU Coder Apps Files Custom Toolbox Standalone Application Hadoop C/C ++ Python Production Server Excel Add-in Web Apps Java.NET 5

6 Easily share apps with your team using Web Apps Use apps in browser Easy deployment No required knowledge of web technologies 6

7 Integrate -based Components With Your Own Software Application Author Toolboxes Royalty-free Sharing IP Protection via Encryption 1 Software Developer Compiler SDK 2 C/C ++.NET Production Server 3 4 Runtime Python Java 7

8 Customer examples: Financial customer advisory service Production Server Global financial institution with European HQ Request Broker o Saved 2 million annually for an external system Algorithm Developers Compiler SDK Request Broker o Quicker implementation of adjustments in source code by the quantitative analysts Request Broker o Knowledge + = Build your own systems 8

9 Scaling up: Asset Allocation Demo Production Server(s) HTML XML Java Script Web Server(s) 9

10 Production Server with Visualization Platforms analytics for use with desktop, browser, and mobile visualization dashboards Tableau Access models published on Production Server inside Tableau calculated fields TABLEAU SERVER TABLEAU INTERFACE Production Server Analytics Spotfire Access models published on Production Server inside Spotfire workbooks 10

11 Example: Travelling Salesman Problem with Tableau 11

12 and Production Server The easiest and most productive environment to take your enterprise analytics solution from idea to a scalable production solution Idea Production 12

13 Production Deployment Workflow Development Developer Initial Test Application Verify data handling and initial behavior Debug Algorithm Algorithm Compiler SDK Enterprise Application Developer Web Application Function Call Deployable Archive Production Server Production Production Server. Web Application Function Calls Deployable Archives 13

14 Scale Up with Production Server Scalable and reliable Service large numbers of concurrent requests Add capacity or redundancy with additional servers Production Server(s) Directly deploy programs into production Automatically deploy updates without server restarts Most efficient path for creating enterprise applications Web Server(s) HTML XML Java Script 14

15 Production Server Enterprise Class Framework For Running Packaged Programs Server software Manages packaged programs and worker pool Runtime libraries Single server can use runtimes from different releases RESTful JSON interface and lightweight client library Isolates the processing Enterprise Application MPS Client Library Enterprise Application RESTful JSON Production Server Request Broker & Program Manager Runtime Access using native data types 15

16 at Scale Scale with increasing Access Computational Complexity Data Volumes 16

17 Key Takeaways 1. Leverage parallel computing 2. Seamlessly scale to clusters or the cloud 17

18 Commerzbank Aberdeen Asset Management Commerzbank headquarters in Frankfurt. Compute a variety of derived market data from raw market data Improve asset allocation strategies with machine learning techniques can complete urgent change requests ourselves with, often on the same day Testing time has also been reduced, load data 8 times faster than we could do before. Julian Zenglein Commerzbank can develop prototypes to test machine learning techniques quickly get rapid, reliable results by running the algorithms with large financial data sets on a distributed computing cluster. Emilio Llorente-Cano Aberdeen Asset Management 18

19 Ease of Use Accelerating Applications Parallel-enabled toolboxes Simple programming constructs Advanced programming constructs Greater Control 19

20 Parallel-enabled Toolboxes ( Product Family) Enable acceleration by setting a flag or preference Statistics and Machine Learning Resampling Methods, k-means clustering, GPU-enabled functions Optimization Estimation of gradients Neural Networks Deep Learning, Neural Network training and simulation Signal Processing and Communications GPU-enabled FFT filtering, cross correlation, BER simulations Computer Vision Bag-of-words workflow Image Processing Batch Image Processor, Block Processing, GPU-enabled functions Other Parallel-enabled Toolboxes 20

21 Independent Tasks or Iterations Simple programming constructs: parfor, parfeval Examples: parameter sweeps, Monte Carlo simulations No dependencies or communications between tasks client workers Time Time 21

22 Parallel Computing Multicore Desktops Core 1 Core 2 Worker Worker Core 3 Worker Core 4 Worker multicore 22

23 Parallel Computing Scaling Up Clusters/Cloud Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Parallel Computing Toolbox Distributed Computing Server 23

24 Summary - Scale your applications beyond the desktop Option Parallel Computing Toolbox Distributed Computing Server Distributed Computing Server for Amazon EC2 Distributed Computing Server for Custom Cloud Distributed Computing Server for Hadoop + Spark Description Explicit desktop scaling Scale to clusters Scale to EC2 with some customization Scale to custom cloud Scale to custom cloud Maximum workers No limit No limit 256 No limit No limit Hardware Desktop Any Amazon EC2 Amazon EC2, Microsoft Azure,Others Hadoop + Spark Availability Worldwide Worldwide United States, Canada and other select countries in Europe Worldwide Worldwide Learn More: Parallel Computing on the Cloud 24

25 at Scale Scale with increasing Access Computational Complexity Data Volume and Velocity 25

26 Big Data Batch vs Stream Processing Batch Processing applies computation to a finite sized historical data set that was acquired in the past Historical Data Configure Resources Schedule and Run Job Output Data Files Storage Files Storage Reporting Data Exploration Training Models Stream Processing applies computation to an unbounded data set that is produced continuously Continuous Data Messaging Service Stream Analytics Dashboards Connected Devices f(x) Alerts Reporting Real Time Decision Support Storage 26

27 Large Data Options Data fits in memory of pool Distributed arrays Look like normal variables Data does not fit in memory (Big Data) Tall arrays Looks like normal variables Custom map-reduce functions Can be painful to learn 27

28 Typical Workflow Big Data Data Access and Explore Data Preprocess Data Develop Predictive Models Integrate Analytics with Systems Package code & run as automated batch HDFS Developer Spark or MapReduce job 28

29 tall arrays tall array Data doesn t fit into memory Lots of observations - tall Looks like a normal array Numeric types, tables, datetimes, strings, etc Basic math, stats, indexing, etc. Statistics and Machine Learning Toolbox (clustering, classification, etc.) 29

30 Example: Taxi Data 30

31 Using Tall Arrays Run in parallel on compute clusters Distributed Computing Server Local disk Shared folders Databases HDFS Spark + Hadoop Tall arrays 100 s of functions supported Statistics and Machine Learning Toolbox Run in parallel Parallel Computing Toolbox Run in parallel on Spark clusters Distributed Computing Server Deploy applications as standalone applications on Spark clusters Compiler 31

32 Stream Processing: Fleet Analytics Edge Devices Kafka Connector Production System Production Server Analytics Compiler SDK Analytics Development Algorithm Developers API Gateway AWS Lambda Storage Layer Business Decisions Business Systems End Users 32

33 Connecting Production Server to Kafka Kafka client for Production Server feeds topics to functions deployed on the server Publisher Publisher Publisher Kafka Cluster Configurable batch of messages passed as a Timetable Topic-0 Partition Partition Topic-1 Partition Partition Partition Partition Each consumer process feeds one topic to a specified function Drive everything from a simple config file No programming outside of! Consumer Process feeds Topic-0 Async Java Client Production Server Consumer Process feeds Topic-1 Async Java Client Request Broker & Program Manager 33

34 Develop a Stream Processing Function in 34

35 Streaming Data: Treated as an Unbounded Timetable Event Time Input Table Vehicle RPM Torque Fuel Flow 18:01:10 55a3fd :10:30 55a3fe :05:20 55a3fd :10:45 55a3fd :30:10 55a :35:20 55a :20:40 55a3fe :39:30 55a :30:00 55a3fe :30:50 55a3fe State Function State Function State Function State Output Table Time window Vehicle Score 18:00:00 18:10:00 55a3fd 5 55a3fe 55a419 18:10:00 18:20:00 55a3fd 7 55a3fe 3 55a419 18:20:00 18:30:00 55a3fd 55a3fe 4 55a419 18:30:00 18:40:00 55a3fd 55a3fe 5 55a

36 Typical Workflow Data Access and Explore Data Preprocess Data Develop Predictive Models Integrate Analytics with Systems Package code & run as automated batch HDFS Kafka Azure IoT Hub Other Developer Spark or MapReduce job Application Server: Streaming Ad-hoc analysis via Spotfire, Tableau, Other Production Server Analytics 36

37 Value of data to decision making Preventive / Predictive Time versus Value in decision making Edge Processing with Generated Code, C/C++ Near Real time decisions Time critical decisions Stream Processing Big Data processing on historical data Batch Processing Event Hub Kinesis Real- Time Actionable Reactive Historical Seconds Minutes Hours Days Months Time to React 37

38 Enterprise Integration Access and Explore Data Preprocess Data Develop Predictive Models Integrate Analytics with Systems Databases Cosmos DB MDCS AWS Kinesis Azure IoT Hub Streaming Cloud Storage Azure Blob Production Server Request Broker PI System OT Platforms Big Data / OT Dashboards PI System 38