Efficient Troubleshooting Using Machine Learning in Oracle Log Analytics

Size: px
Start display at page:

Download "Efficient Troubleshooting Using Machine Learning in Oracle Log Analytics"

Transcription

1 Efficient Troubleshooting Using Machine Learning in Oracle Log Analytics Nima Haddadkaveh Director, Product Management Oracle Management Cloud October, 2018

2 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle s products may change and remains at the sole discretion of Oracle Corporation.

3 Agenda Oracle Management Cloud (OMC) Log Analytics (LA) Machine Learning in OMC & LA How to Use ML in LA

4 Oracle Management Cloud END USER EXPERIENCE / ACTIVITY APPLICATION MIDDLE TIER DATA TIER VIRTUALIZATION TIER INFRASTRUCTURE TIER Global threat feeds Cloud access Identity Real users Synthetic users App metrics Transactions Server metrics Diagnostics logs Host metrics VM metrics Container metrics Configuration Compliance Tickets & Alerts Security & Network events Infrastructure Monitoring Log Analytics Configuration & Compliance Application Performance Monitoring Security Monitoring & Analytics Orchestration IT Analytics Comprehensive, Intelligent Management Platform Zero-effort Operational Insights Automated Preventative & Corrective Actions

5 Oracle Management Cloud - Log Analytics Data Center Application Storage Database EMCC Repository Monitor, aggregate, analyze, search, explore, correlate All Log Data from your applications and infrastructure (on-premise and cloud) in real-time Private Cloud Application Storage Logs + Oracle Public Cloud Application Oracle Operational Data Storage Other Public Cloud Application Database A Cloud Service that leverages a modern, secure, big-data platform Storage

6 Key use cases in LA IT Operation Operational Intelligence Troubleshooting Root-cause Analysis Business Process Analysis Product Analysis Digital Marketing Customer Experience OMC Log Analytics

7 Agenda Oracle Management Cloud (OMC) Log Analytics (LA) Machine Learning in OMC & LA How to Use ML in LA

8 Humans are Great in Searching for known things Memorizing the occurred issues in the past Learning and expanding their knowledge in troubleshooting Source:

9 But what do you do When You have very limited information about a problem? Your product/service is a part of a complex application with hundreds of other systems and devices? You run into intermittent problems? You don t know where to start the troubleshooting Source:

10 ML/AI values in OMC? Digital Transformation Journey Greater Visibility into Complex IT Environments Reduced MTTD and Faster MTTR Real-time Analysis Smart Alerts and Notifications IT Ops Machine Learning Dev Ops

11 ML values in LA? ML helps users to make sense of the mass of their data by organizing data into cohesive, correlated categories Eliminate noise Find context and unknowns Correlate events and detect anomalies Say goodbye to static thresholds

12 LA Approach to Machine Learning Not following the common ML model in the market Fit Apply Inspect Summary Compare Provide ML based solutions to specific problems Don t have IT Ops users focus on the algorithms and the process of data selection Lowest cost to operate and use with faster time to value Requires no data scientist

13 Agenda Oracle Management Cloud (OMC) Log Analytics (LA) Machine Learning in OMC & LA How to Use ML in LA

14 Machine Learning - Cluster Cluster log events by physical structure and analyze the variable data that LA clustering has extracted Clusters: Cluster events based on similarity in their patterns Potential Issues: These are events with different variants of severity like Error, Fault, Fatal, Warning or set of terms that are semantically similar to these Outliers: Number of outliers within the total cluster, which has only 1 occurrence Trends: Show trend of each clustered group; Correlate clustered events that show similar trends

15

16

17

18

19 Machine Learning - Link Events Deep analytics on log events Linked by common attribute value(s) What Link does: It links events from millions of log records from across log sources which share some common attribute(s) like Transaction ID, ECID, Flow ID or User Name Compute statistics on linked events and analyze for outlier

20 When to use Link? Log events from multiple applications/tiers or hosts which are related and span time, can be linked together Example of events/transactions: A single purchase from an online store can span across an application server, database, and e- commerce engine One message can create multiple events as it travels through various queues An out of memory problem could trigger several database events to be logged Visiting a single website normally generates multiple http requests

21 SOA Workflow: Order Workflow SOA Orchestration 3 Update Inventory Database Server 1 Update inventory database Notify User Messaging Server Send an to user with order details User Submits an Order (ID: 211) 4 Pickup Shipment Cloud Service Web service of logistics provider; schedule pickup 2 Warehouse Prepare Order Cloud Service Send an Asynch message to the warehousing provider 5 Confirm Order Ship Messaging Server Send an to user with delivery tracking

22

23

24

25

26

27

28

29

30 Link Log Clustering

31 Summary Machine Learning makes troubleshooting efficient Machine Learning provides insight that would be difficult/impossible to gain from large volumes of logs Oracle s approach to Machine Learning is easy to operate and use with faster time to value