Unified Monitoring for On-Premises and Cloud with Oracle Management Cloud

Size: px
Start display at page:

Download "Unified Monitoring for On-Premises and Cloud with Oracle Management Cloud"

Transcription

1 Unified Monitoring for On-Premises and Cloud with Oracle Management Cloud Ana McCollum Product Management Oracle Management Cloud Henrique Arias DBA Spirit Airlines Rakesh JS Managing Architect Rubicon Red Erik Benner VP Enterprise Transformation Mythics, Inc. October 22, 2018 Copyright 2018, Oracle and/or its affiliates. All rights reserved.

2 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle s products may change and remains at the sole discretion of Oracle Corporation. Copyright 2018, Oracle and/or its affiliates. All rights reserved.

3 Program Agenda Infrastructure Monitoring best practices Spirit Airlines Rubicon Red Mythics Copyright 2018, Oracle and/or its affiliates. All rights reserved.

4 Oracle Management Cloud END USER EXPERIENCE / ACTIVITY APPLICATION MIDDLE TIER DATA TIER VIRTUALIZATION TIER INFRASTRUCTURE TIER Global threat feeds Cloud access Identity Real users Synthetic users App metrics Transactions Server metrics Diagnostics logs Host metrics VM metrics Container metrics Configuration Compliance Tickets & Alerts Security & Network events Infrastructure Monitoring Log Analytics Configuration & Compliance Application Performance Monitoring Security Monitoring & Analytics Unified SaaS Platform Orchestration IT Analytics Comprehensive, Intelligent Management Platform Zero-effort Operational Insights Automated Preventative & Corrective Actions Copyright 2018, Oracle and/or its affiliates. All rights reserved.

5 Oracle Management Cloud END USER EXPERIENCE / ACTIVITY APPLICATION MIDDLE TIER DATA TIER VIRTUALIZATION TIER INFRASTRUCTURE TIER Global threat feeds Cloud access Identity Real users Synthetic users App metrics Transactions Server metrics Diagnostics logs Host metrics VM metrics Container metrics Configuration Compliance Tickets & Alerts Security & Network events Infrastructure Monitoring Log Analytics Configuration & Compliance Application Performance Monitoring Security Monitoring & Analytics Unified SaaS Platform Orchestration IT Analytics Comprehensive, Intelligent Management Platform Zero-effort Operational Insights Automated Preventative & Corrective Actions Copyright 2018, Oracle and/or its affiliates. All rights reserved.

6 Monitoring Visibility Across On-Premises and Cloud Oracle Infrastructure Monitoring Cloud Service Homepage/Dashboards Alert Rules Send notifications/tickets Baselines/Anomaly detection Gateway Upload custom metrics via REST apis REST apis Custom scripts Administrator = Cloud agent On-Premises Infrastructure Cloud Services (IaaS/PaaS) Copyright 2018, Oracle and/or its affiliates. All rights reserved.

7 Use Groups to Setup Monitoring at Scale Create Dynamic Groups using tags ProdGroup = entities with tag Environment=Production POST serviceapi/entitymodel/uds/gr oups { "groupname": ProdGroup", "groupdisplayname": ProdGroup", "tagcriteria": "Environment=Production Use dynamic groups in alert rules Alert rule applies to all members of the group Specify tags when adding entity Entity auto-joins the group Alert rule automatically applies to it Copyright 2018, Oracle and/or its affiliates. All rights reserved.

8 Leverage Machine Learning Anomaly-based Alert Rules ML engine determines baseline (expected range of values) based on observed data Create anomaly-based alert rules ML engine looks for seasonality patterns Daily seasonality: Expected range of data for each hour of the day Ex: Performance at 9-10 am different from 5-6 pm Weekly seasonality Expected range of data for each hour of each day Ex: Monday 9 am-10 am load expected to be higher than Friday 9 am 10 am Copyright 2018, Oracle and/or its affiliates. All rights reserved.

9 Automate Alert Resolution Leverage Orchestration Service Create script to autoremediate alert Deploy script on agent Create credentials with agent (Agent will use credentials to run script) Create auto-remediation action Create orchestration workflow containing the script Embed workflow in remediation action definition Associate remediation action with alert rule Copyright 2018, Oracle and/or its affiliates. All rights reserved.

10 Remediation Action Details Alert history shows remediation script has been submitted for execution Details of script shown in Orchestration UIs Copyright 2018, Oracle and/or its affiliates. All rights reserved.

11 Extending Monitoring Scope Using Collectd Collectd: daemon process which collects system and application performance metrics periodically and stores values in different ways Can monitor wide variety of infrastructure through READ plugins Copyright 2018, Oracle and/or its affiliates. All rights reserved.

12 Integration with Collectd READ Plugins (Collects metrics) 1. Configure the appropriate collectd read plugin Process plugin Redis plugin other plugins https Cloud Agent 2. Configure collectd to send metrics to cloud agent over https (via https write plugin) 3. Add the collectd (metric collector) entity to OMC omcli add_entity agent omc_generic_metric_collector.json As data is ingested into OMC, new entity types/metrics are automatically created Collectd metrics are automatically mapped to OMC metrics Metrics are collected and shown in OMC Monitoring functionality (homepage, alerting, etc.) available Doc Reference: Using Oracle Infrastructure Monitoring Copyright 2018, Oracle and/or its affiliates. All rights reserved.

13 Spirit Airlines Copyright 2018, Oracle and/or its affiliates. All rights reserved.

14 Unified Monitoring for On-Premises and the Cloud with Oracle Management Cloud

15 Henrique Arias Database Administrator at Spirit Airlines

16 1 Environment 2 Goals 3 Accomplishments 4 Best Practices

17 Environment 1 o Microsoft SQL Server 2016 o Oracle Databases 11g o Pervasive PSQL v12 o Azure SQL Database o Azure SQL Managed Instance o SQL Server Integration Services o Informatica Power Center 10.1

18 Goals Improve the availability and reliability of the environment 2 o o o Implement a 360 view of our applications and database systems Deliver 15 minute incident response and problem resolution Achieve 99% up time by identifying issues before they cause disruptions 360

19 Accomplishments 3 o o o Consolidated view of database systems through Enterprise Summary in Infrastructure Monitoring Improved incident response times through Alert Rules Identified recurrent database issues using Log Explorer in Log Analytics

20 Best Practices 4 o o o o o Configure different service instances for Development/QA/Production Configure the Cloud Agent pointing to the right service instance Organize entities into groups (e.g. Oracle Databases, SQL Server Instances, etc.) Install the Cloud Agent as administrator (Windows server) Grant only the permissions required to the Cloud Agent service account

21

22 Rubicon Red Copyright 2018, Oracle and/or its affiliates. All rights reserved.

23 Implementing Enterprise Monitoring Using Oracle Management Cloud Rubicon Red OMC Implementation Use Cases Prepared by : Rakesh JS Managing Architect, Rubicon Red

24 Business Functions Managed Services Product Consulting 1 Lean & Smart Team Manage Many customers Technical and automations first approach. Automated ticketing and notifications Versatile skills and capabilities 2 MyST: CI / CD Software Looking at the business challenges through two lenses Active feature and enhancement streams Requires multiple cloud instances and resources Critical Test Drive and Demo instances Cloud Customer 3 Integrations, AppDev, Chatbots & DevOps Oracle Platinum Partner Largest ANZ Customers. World wide Software Implementation.

25 Business Problem 01 Cloud Infrastructure & App Management 05 Support Product and Small / Medium Sized Customers 02 Consolidated Infrastructure, Performance & Log analysis 03 An Integrated Solution with current Notification and alerting systems 04 Implementation Cost for Monitoring & Management Suites

26 Business Cases Financial Lending Business Portal Application Infra availability, monitoring and usage Early warning systems Integrated alerting systems <2 Food Services Business Oracle SOACS Instances Multi Region & Account. Organize Entities Multi customer management Suite Proactive Issue Resolution 70% Incident Reassignment +No Meeting SLA Rubicon Red Engineering Oracle & AWS Cloud Cloud Discovery Critical Test Drive Monitoring Optimize cloud resource usage. 90% Proactive Mgmt +Capability Implementa tion Cost

27 Business Benefits Unified View and No Silos Multi Channel and Integration Support Log & Performance Analysis Modularize Entities Entity Licensing Trend and Anomaly Detection Monitoring and Management control center Enabling small customers

28 Key Learnings Strategies licensing model Standardize Entity Naming & Tagging Analyze Trends & Tune Alerts & Thresholds Agent Setups Use Configuration As Code Approach Plan External Integrations AWS user setup for Cloud Discovery < OMC Satisfied Customers

29 Thank You

30 Mythics Copyright 2018, Oracle and/or its affiliates. All rights reserved.

31 Erik Benner VP Enterprise TalesFromTheDatacenter.com Mythics.com/blog Published Author RAC Attack Ninja Linux since 1992 Solaris since 1996 DB 12c BETA user Prelaunch ODA comet First Version of Oracle 7 in 1994 ZFS since Thumper OEM 12c since Product Launch OAUG EM for Apps SIG co-chair OEM12c CAB Member IOUG Solaris SIG Leader

32 Complete Oracle Solutions Architecture Procurement Training Emergency Response Implementations of Oracle Technology 32

33 No worries It s Complex Users having intermittent performance issues APM Detecting Intermittent Specific App server showed intermittent Database performance Database was running fine App teams could find NO PROBLEM Cisco Switch were flapping ports Machine Learning, match that EVERY time the ports flapped, the users had issues.

34 I need to buy what? Client was having performance issues with VMs Poor database performance Weblogic failing randomly Server team and the hardware vendor identified more VMWare servers were needed, about 200k in expense IM showed 33% CPU utilization, and 100% RAM Client added RAM, saved 160K

35 35 Erik TalesFromTheDatacenter.com Mythics.com/blog

36