2015 The MathWorks, Inc. 1

Size: px
Start display at page:

Download "2015 The MathWorks, Inc. 1"

Transcription

1 2015 The MathWorks, Inc. 1

2 엔터프라이즈, 빅데이터및 애널리틱솔루션활용을위한 적용기술소개 성호현부장 2015 The MathWorks, Inc. 2

3 Agenda 1 Access and Explore Data 2 Preprocess Data 3 Develop Predictive Models 4 Integrate with Production 5 Visualize Results Files Working with Messy Data Model Creation e.g. Machine Learning Desktop Apps 3 rd party dashboards Databases Data Reduction/ Transformation Parameter Optimization Enterprise Scale AWS Kinesis Web apps Sensors Feature Extraction Model Validation Embedded Devices and Hardware 3

4 The Need for Large-Scale Streaming Predictive Maintenance Increase Operational Efficiency Reduce Unplanned Downtime More applications require near real-time analytics Jet engine: ~800TB per day Turbine: ~ 2 TB per day Medical Devices Patient Safety Better Treatment Outcomes Connected Cars Safety, Maintenance Advanced Driving Features Car: ~25 GB per hour 4

5 Example Problem How s my driving? A group of MathWorks employees installed an OBD dongle in their car that monitors the on-board systems Data is streamed to the cloud where it is aggregated and stored We would like to use this data to score the driving habits of participants 5

6 Example: Fleet Analytics with 6

7 Fleet Analytics Architecture Edge Devices Kafka Connector Production System Production Server Analytics Compiler SDK Analytics Development Algorithm Developers API Gateway AWS Lambda Storage Layer Business Decisions Business End Users 7

8 1 Access and Explore Data The first step is to clean up the incoming data Edge Devices Kafka Connector Production System Production Server Analytics Compiler SDK Analytics Development Algorithm Developers API Gateway AWS Lambda Storage Layer Business Decisions Business End Users 8

9 1 Access and Explore Data The Data: Timestamped messages with JSON encoding { "vehicles_id": {"$oid":"55a3fd d5b "}, Key "time : {"$date":" t18:01:35.000z"}, Timestamp } "kc : , "kff1225" : , "kff125a" : , Values { } "vehicles_id": {"$oid":"55a3fe d5c5c000020"} "time":{"$date":" t18:01:53.000z"}, "kc : , "kff1225" : , "kff125a" : , { } "vehicles_id": {"$oid":"55a d115b000001"} "time":{"$date":" t19:04:04.000z"} "kc":2200.0, "kff1225" : , "kff125a" : , 9

10 1 Access and Explore Data Access a Sample of Data Raw Data Decode JSON data Create Timetable Timetable 10

11 2 Preprocess Data Develop a Preprocessing Function Timetable Clean up Enrich Restructure 11

12 1 Access and Explore Data Ad Hoc Access to Data from AWS Athena Service AWS S3 12

13 3 Develop Predictive Models Develop a Predictive Model Edge Devices Distributed Computing Server Production Server Kafka Connector Production System Analytics Compiler SDK Analytics Development Algorithm Developers API Gateway AWS Lambda Storage Layer Business Decisions Business End Users 13

14 3 Develop Predictive Models Everything you need to develop a predictive model is found in Represent Signals Label Events Scale Up Train Model Validate Model 14

15 3 Develop Predictive Models Develop a Predictive Model in Demo 15

16 4 Integrate with Production Integrate Analytics with Production Edge Devices Kafka Connector Production System Production Server Analytics Compiler SDK Analytics Development Algorithm Developers API Gateway AWS Lambda Storage Layer Business Decisions Business End Users 16

17 4 Integrate with Production A quick Intro to Stream Processing Batch Processing applies computation to a finite sized historical data set that was acquired in the past Historical Data Configure Resources Schedule and Run Job Output Data Files Storage Files Storage Reporting Data Exploration Training Models Stream Processing applies computation to an unbounded data set that is produced continuously Continuous Data Messaging Service Stream Analytics Dashboards Connected Devices f(x) Alerts Reporting Real Time Decision Support Storage 17

18 Value of data to decision making Preventive / Predictive 4 Integrate with Production Why stream processing? Edge Processing with Coder Near Real time decisions Time critical decisions Stream Processing with Production Server Kinesis Today s example Event Hub focuses here Big Data processing on historical data Distributed Computing Server, Compiler Real- Time Actionable Reactive Historical Seconds Minutes Hours Days Months Time 18

19 4 Integrate with Production Streaming data is treated as an unbounded Timetable Event Time Input Table Vehicle RPM Torque Fuel Flow 18:01:10 55a3fd :10:30 55a3fe :05:20 55a3fd :10:45 55a3fd :30:10 55a :35:20 55a :20:40 55a3fe :39:30 55a :30:00 55a3fe :30:50 55a3fe State Function State Function State Function State Output Table Time window Vehicle Score 18:00:00 18:10:00 55a3fd 5 55a3fe 55a419 18:10:00 18:20:00 55a3fd 7 55a3fe 3 55a419 18:20:00 18:30:00 55a3fd 55a3fe 4 55a419 18:30:00 18:40:00 55a3fd 55a3fe 5 55a

20 4 Integrate with Production Introducing Production Server Data Analytics Business System Databases Cosmos DB Production Server Dashboards Cloud Storage Azure Blob Web Streaming AWS Kinesis Request Broker Custom Apps PI System Azure IoT Hub Platform 20

21 4 Integrate with Production Production Server is an application server that publishes code as APIs Compiler SDK Analytics Development Deploy Package Code / test Access Integrate Data sources Production Server Worker processes Request Broker Enterprise Application < > Mobile / Web Application Scale and secure 3 rd party dashboard 22

22 4 Integrate with Production Connecting Production Server to Kafka Kafka client for Production Server feeds topics to functions deployed on the server Publisher Publisher Publisher Kafka Cluster Configurable batch of messages passed as a Timetable Topic-0 Partition Partition Topic-1 Partition Partition Partition Partition Each consumer process feeds one topic to a specified function Drive everything from a simple config file No programming outside of! Consumer Process feeds Topic-0 Async Java Client Production Server Consumer Process feeds Topic-1 Async Java Client Request Broker & Program Manager 23

23 4 Integrate with Production Develop and Deploy a Stream Processing Function Edge Devices Production System Production Server Analytics Development Kafka Connector Analytics Compiler SDK Algorithm Developers API Gateway AWS Lambda Storage Layer Business Decisions Business End Users 24

24 4 Integrate with Production Develop a Stream Processing Function in Process each window of data as it arrives Current score Previous state Current window of data to be processed 25

25 4 Integrate with Production Develop a Stream Processing Function in Apply your pre-processing algorithm 26

26 4 Integrate with Production Develop a Stream Processing Function in Use the model you created with Classification Learner App 27

27 4 Integrate with Production Develop a Stream Processing Function in Update Mongo database Count of events by type and location Results of driver scoring 28

28 4 Integrate with Production Debug a Stream Processing Function in Edge Devices Production System Analytics Development Kafka Connector Compiler SDK Algorithm Developers Storage Layer Business Decisions Business End Users 29

29 4 Integrate with Production Debug a Stream Processing Function in 30

30 4 Integrate with Production Tie in your Dashboard Application Edge Devices Production System Production Server Analytics Development Kafka Connector Analytics Compiler SDK Algorithm Developers API Gateway AWS Lambda Storage Layer Business Decisions Business End Users 31

31 5 Visualize Results Complete Your Application 32

32 5 Visualize Results Scalable Analytics with Enterprise BI Tools TIBCO Spotfire Tableau 33

33 Key Takeaways connects directly to your data so you can quickly design and validate algorithms The language and apps enable fast design iterations Production Server enables easy integration of your algorithms with enterprise production systems You to spend your time understanding the data and designing algorithms 35

34 Resources to learn and get started Data Analytics with Production Server Compiler SDK Statistics and Machine Learning Toolbox Database Toolbox Mapping Toolbox with TIBCO Spotfire with Tableau with MongoDB 36

35 37