Splunk IT Service Intelligence

Size: px
Start display at page:

Download "Splunk IT Service Intelligence"

Transcription

1 Splunk IT Service Intelligence Presentation Subhead (on two lines, if you need it) Presenter s Name Title & Specialization Date Location

2 Forward-Looking Statements During the course of this presentation, we may make forward-looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward-looking statements made in this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward-looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release. Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners Splunk Inc. All rights reserved.

3 Challenges Facing Today s IT $ $ $ High cost of IT Operation Inefficient use of resources Lower customer satisfaction Lost revenue

4 Desired Outcomes for IT Operations Reduce tool complexity and costs Become more predictive and preventative Use resources efficiently Optimize the consumer experience

5 How IT Operates Today: IT Stack POV Applications, business/mission services 2017 SPLUNK INC. The way many in IT think of their world Each layer is a silo A dedicated team of experts (with domain tools) focus just on the health of that layer Their view of the health of that layer is based on the aggregated health of each component in the layer If 2 out of 100 DBs are struggling, you re still having a good day Web Server (Apache, TomCat) App Server (WebLogic, JBoss EAP, WebSphere) Database (Oracle, SQL Server, MySQL) Guest OS (Windows/Linux/*Nix) Hypervisor (ESX, HyperV, Citrix) Physical Server (Dell, HP, CISCO blades or servers) SAN/NAS Storage (EMC, NetApp) Network

6 What s Needed: Service/App POV Service/App Claims 2017 SPLUNK INC. Outage! The aggregated health of the layer is irrelevant Dependencies now matter The health of the app depends on the health of each component of each layer that that app depends upon If your app depends on 1 or more of those 2 struggling DB servers, you re about to have a bad day! What about those VMs that are red? Web Server App Server Database Guest OS VM/Hypervisor Physical Server SAN/NAS Storage (1,2,3,4,5,6,7,8,9,10 N) (1,2,3,4,5,6,7,8,9,10 N) (1,2,3,4,5,6,7,8,9,10 100) (1,2,3,4,5,6,7,8,9,10 N) (1,2,3,4,5,6,7,8,9,10 N) (1,2,3,4,5,6,7,8,9,10 N) (1,2,3,4,5,6,7,8,9,10 N) 100% 100% 98% 100% 95% 100% 100% Status Network 100%

7 Rethink and Improve How IT Operates Using Artificial Intelligence for IT Operations Traditional IT Structured data Brittle tools and integrations Obsession with faults and traps Focus on components parts Search oriented Data-Driven IT Structured and unstructured data Robust data integrations Real-time insights from big data Focus on the whole service Machine learning-driven analytics

8 What Is Service Intelligence? Enabling a business-aware IT Measuring and reporting on indicators that matter Unlocking operational efficiencies Collaborating across silos to improve service operations Data-based decision making Solving problems and anticipating pitfalls with sophisticated analytics and powerful insights

9 Connecting the Data Dots for Service Intelligence 2017 SPLUNK INC. Incident detection Incident triage Investigation Service restoration Root-cause analysis Businessdriven IT Data-driven decisions Maintain high service levels and availability, prevent outages and recover quickly when things break down Unlocking operational efficiencies Improve productivity and share understanding of business service criticality, impact and incident Business-aware IT Monitor, visualize and present real-time insights into service health against KPIs to drive operational and business decisions

10 Artificial Intelligence for IT Operations Powered by machine learning and analytics for real-time service insights, simplified operations and root-cause isolation

11 Splunk ITSI: Multiple Use Cases, One Solution 2017 SPLUNK INC. SERVICE INSIGHTS EVENT ANALYTICS Baseline KPI trends based on operational patterns and identify abnormal conditions Service health scores calculated from KPIs Organized view of KPIs and trends for fast triage and analysis Service insights on events to prioritize triage and investigation Sophisticated analytics and incident workflow to automate managing events Machine learning to reduce noise and find alerts on root causes of issues Deep insights into technology domains to speed investigation Initiate incident response and remediation actions

12 Breadth of Machine Learning Capabilities Make IT Effective, Proactive and Predictive Dynamic Thresholding Thresholds adapt in real time Trend and alert on anomalous behavior Prevent service degradation Event Clustering Detect and highlight the events that matter Prioritize events that need action taken Anomaly Detection Alerts triggered automatically by anomalous activity Incident responders can see across all silos to find a quicker MTTR Prediction Predict outages and anomalies before they occur Act on these predictions so your services are not affected Platform for Machine Data

13 Predict and Prevent Time Hurts

14 Time Hurts 2017 SPLUNK INC. Events Existing NOC alerted MTTR $ Impact Impacting Fault Time

15 Effective Clustering: Order from Chaos Effective Respond to alerts associated together using Machine Learning clustering Provide starting point or inference for business-impacting event cause Results Reduce employee churn Increase of time investment for strategic projects Example Leidos decreased event noise 95-98% 3,500-5,000 alerts per day down to actionable events

16 Event Analytics Become More Effective 2017 SPLUNK INC. Events Existing NOC alerted MTTR Effective Splunk Event Analytics MTTR $ Impact Impacting Fault Time

17 Proactive Anomalies in the Now Proactive Respond to alerts with Service Context Engage the right IT partners the 1 st Time for faster resolution Engage in the automation (self healing) of high fidelity/high confidence incident Results Respond to alerts with Service Context Engage the right IT partners the 1 st Time for resolution Engage in the automation (self healing) of high fidelity/high confidence incident Example Molina Healthcare gained visibility and correlation across its stack, which reduced the number of IT incidents by 30-45% and MTTR by 70-90%.

18 Move to a Proactive Posture 2017 SPLUNK INC. Events Existing NOC alerted MTTR Effective Proactive (add logs and metrics) Splunk ML Alert MTTR Automated Resolution MTTR $ Impact Impacting Fault Time

19 Predictive It s Like We Know the Future Predictive Predict your Services Health Score ~ 30min into the FUTURE Leverage Key Performance Indicators (KPIs) and Dependency Modeling Respond to business-impacting events BEFORE they CAN occur Results Reduction in MTTR, problems and changes Provide the business early warning of revenue-impacting events Instill confidence in the business for operations teams Re-invest time given back to team in the organization s strategy Example Your organization!

20 Prevent Incidents From Occurring 2017 SPLUNK INC. Events Existing NOC alerted MTTR Effective Proactive (add logs and metrics) Splunk ML Alert MTTR Automated Resolution MTTR Predictive NO MTTR!! $ Impact Cost of Impact Time Return to Business Time

21 Machine Learning in ITSI KPIs Network logs Any Time Series in Splunk Metrics* ANOMALY DETECTION Adaptive Thresholds Anomaly Detection INTELLIGENCE Clustered Notable Events Automated Actions Server logs Machine Learning Cohesion Detection Machine Learning Assisted Deep Dive Investigation Application logs Custom from MLTK MLTK Customization Other Events & Alarms

22 Splunk Customer Examples Effective Proactive Predictive 95-99% reduction in event noise, taking 3,500-5,000 down to actionable events Reduce the number of IT incidents by 30-40%, decrease MTTR by 70-90% Predict their Service Health Score s impact minutes into the future

23 Splunk ITSI Demo 2017 SPLUNK INC.

24 Personalized Visualizations of Your Services 2017 SPLUNK INC. Visualize contextual inter-relationships across service delivery components Illustrate business and service activity using indicators aligned to strategic goals Drive decisions by monitoring service health against performance indicators Create sophisticated dashboards in minutes

25 Organized View of Performance Indicators Organize and correlate KPIs to speed up investigations and diagnosis Compare performance over time and in real time to understand trends and identify issues Enable broad and deep investigation with contextual drill-downs Investigate anomalous activity in your KPIs to proactively address emerging issues

26 Real-Time View of Service and KPI Health Scores 2017 SPLUNK INC. Get early warning of emerging incidents with a heat map of service health and KPI scores, metrics, sparklines and alerts Drill down into service and entity details for in-depth triage

27 Insights Into the Origin of Service Disruptions 2017 SPLUNK INC. Profile an entity to troubleshoot outages and service degradations Identify contributing services and entities of the worst performing KPIs

28 Correlation Rules Generate Notable Events Run predefined correlation searches against learned indicators to generate notable events based on status and composite scores

29 Sophisticated Event Analytics Reduce event clutter and false positives with multivariate anomaly detection Use machine learning Smart Mode to group related events and generate human-scale alerts Create custom aggregation policies to filter event noise Easily sift through events by filtering, tagging and sorting Enrich and add context to events to prioritize investigation and ensure business-service availability

30 Fast Incident Review and Investigation 1 Risk-based security Triage notable events by criticality, trigger new alert actions and automatically initiate defined incident and remediation responses

31 Machine Learning Made Mainstream Adaptive Thresholds Anomaly Detection Event Correlation Manage and maintain KPI thresholds by dynamically adapting to changing operational patterns Catch issues that thresholds can t baseline normal operations and alert on anomalous conditions Reduce event clutter, false positives and rules maintenance by auto-grouping related events

32 Baseline Operational Patterns and Adapt Thresholds Use machine learning to dynamically adapt KPI thresholds by time Maintain and preserve learned thresholds to monitor KPI and service behavior

33 Detect Normal and Abnormal Behavior Baseline normal operations and alert on anomalous conditions Identify abnormal trends and patterns in KPI data

34 Reduce Event Clutter Elicit patterns and real-time correlations to cluster and group relevant events with easy-to-use and sophisticated machine learning algorithms

35 Integrate With Existing Incident Workflows Automatically initiate defined incident and remediation responses Leverage inbuilt integrations with ServiceNow, BMC Remedy, xmatters, PagerDuty to initiate incident resolution Easily build custom integrations, execute remedial actions and extend functionality with powerful APIs

36 Deep Service-Oriented Insights Into Technology Domains Fast-track data collection without costly add-ons, customizations and manual configurations Gain deep service-oriented insights with built-in dashboards Simplify creation and deployment of third-party and custom modules

37 Reduce the Administrative Hurdle ML-Powered AI Eliminate manual rules management with built-in machine learning to group related events and establish normal vs. abnormal patterns Fast Search Performance Maintenance Windows Backup and Restore Role-Based Access Controls Enable mass changes to thresholds and searches with templates, reducing the number of searches and improving performance Set services and entities into maintenance to suppress alerts and accurately reflect health scores Create highly available Splunk ITSI environments, revert configurations to previous versions and ensure continuous delivery Manage granular permissions and authorize access to various views

38 Splunk IT Service Intelligence 2017 SPLUNK INC. Machine Learning Adaptive threshold automation to minimize false alerts Behavior anomaly alerts to proactively address issues Automatic correlation of data into intelligence, mitigating SME dependency Dynamic Service Model Visualize entire tech stack bare metal through business layer View the entire ecosystem with customized views for execs Apply context to events to prioritize investigation based on impact Search-Based KPIs Accelerators minimize SPL coding Trend aggregation to enable rapid visualization Multi KPI Alerts for proactive irregularity identification Platform for Operational Intelligence Time Series Index Schema on Read Handle any and all data Search and Investigation Enterprise Scalability Operational Intelligence Proactive Monitoring Operational Visibility Real-Time Business Insights

39 What Makes Splunk ITSI Different 2017 SPLUNK INC.

40 Built on a Scalable Platform Desktop to Datacenter Schema on-the-fly Universal Data Platform Agile reporting, analytics and visualizations Operate in a single datacenter or globally across multiple datacenters, on-premises or in the cloud Apply structure to data at search time, enabling customizable pivots on any and ALL data Reliably collect, index and store any type of data, at any volume, from tens of thousands of sources, in real time Flexible, easy-to-use interface to create ad hoc reports and custom dashboards for IT and business users on-the-fly and on demand

41 Unified Insights for Data-Driven Actions Full Fidelity Service Health Mathematical Sophistication From Data to Intelligence Reduced Complexity Move seamlessly from business service reports to investigation to remediation Apply data science and sophisticated algorithms for an analytics-driven IT operations Deliver actionable intelligence to IT and the business with service insights and event analytics Fewer tools, fewer administrators and reduced infrastructure capacity

42 Unified Insights for Data-Driven Actions Service Context Machine Learning Simplified rules management Improved incident workflows Deliver context on events to prioritize alerts and events based on business impact Alert on anomalous conditions based on operational baselines to reduce event clutter Eliminate command-line rules configurations and JavaScript vulnerabilities Use built-in integrations into incident management tools with powerful APIs to enable custom integrations

43 Splunk ITSI for Event Analytics Simplify Your Operations With Artificial Intelligence and Service Context Service Context Artificial Intelligence Scalable Platform 2017 SPLUNK INC Find and fix the most important issues Contextualize and prioritize Reduce time-to-resolution on business-critical services Transform IT operations with machine learning Separate valuable signal in noise Enable IT with intelligence for data-driven decisions Get a full view of your IT environment Respond collaboratively and simplify operations Share customized insights across the enterprise to enable business-centric IT

44 Splunk IT Service Intelligence Data-driven service monitoring and analytics Dynamic Service Models At-a-Glance Problem Analysis Early Warning on Deviations Event Analytics Simplified Incident Workflows Splunk IT Service Intelligence Platform for Operational Intelligence Time-Series Index Schema-on-Read Data Model Common Information Model

45 Case Studies 2017 SPLUNK INC.

46 ONLINE SERVICES CLOUD SOLUTIONS, IT OPERATIONS 2017 SPLUNK INC. Real-Time Car Auctions Delivered With Intelligence With Splunk ITSI, we have proactive infrastructure monitoring to ensure a consistent level of customer service for interested buyers to bid on cars. VP Technology Application Development & Operations, Cox Automotive Reduced time-to-investigate and resolution with real-time insights Reduced incidents across global auctions by 90% Improved end-user experience and service reliability Scaling the implementation with Splunk Cloud

47 HEALTHCARE IT OPERATIONS, BUSINESS ANALYTICS 2017 SPLUNK INC. AdvancedMD: Strengthening Customer Satisfaction Splunk ITSI ensures customer satisfaction by giving us servicecentric health reporting, end-to-end visibility and advanced analytics to detect patterns, anomalies and trends. Director, Platform Operations, AdvancedMD Ability to monitor network resources leads to improved service delivery Greater customer satisfaction via service-centric health reporting, end-to-end visibility and advanced analytics to detect patterns, anomalies and trends More efficient IT operations with full visibility into complex processes

48 TECHNOLOGY IT OPERATIONS 2017 SPLUNK INC. Improved Satellite Operations With Real-Time Infrastructure Visibility Using Splunk ITSI has helped us to understand our IT network in a way we weren t able to previously. This has directly led to improvements in areas such as troubleshooting and security awareness. Daniel Nye, CTO, Surrey Satellite Improved service accessibility, reliability and security Enhanced ability to troubleshoot persistent service problems Gained end-to-end visibility into overall IT performance

49 FINANCIAL SERVICES IT OPERATIONS 2017 SPLUNK INC. Modernizing Enterprise Monitoring at the International World Development Bank Financial Services Enhanced service reliability and incident response Ease and flexibility in creating business level dashboards ad hoc and on-the-fly Integrations with BMC Remedy to simplify incident response and action Tracing business transactions end to end

50 TECHNOLOGY IT OPERATIONS Supporting, Monitoring and Securing Services 24/7 Reduce time-to-resolution Consolidated services view across entire IT infrastructure Identify anomalous activity and ensure governance Adaptive thresholds and alerts improve security posture Proactively improve customer experience Comprehensive analytics to reduce service disruption

51 COMMUNICATIONS IT OPERATIONS 2017 SPLUNK INC. Splunk IT Service Intelligence at Vodafone Splunk IT Service Intelligence gives Vodafone a realtime understanding of how our services are performing overall and at the more granular level. Oliver Hoppe, solutions architect, Vodafone Unified insights: data integrations from other tools Reduced incident tickets Usage baselines to identify anomalies

52 FINANCIAL SERVICES IT OPERATIONS 2017 SPLUNK INC. Splunk IT Service Intelligence at Fiserv Server-based to services-based monitoring Top-down and deep-dive service insights 200+ services and 1,500+ KPIs monitored Alerting on service KPIs instead of server performance Flexible creation and modification of services and KPIs Real-time, holistic and proactive client view

53 HEALTHCARE IT OPERATIONS 2017 SPLUNK INC. Molina Healthcare: Splunk ITSI as Platform for Multiple Use Cases You can derive value from Splunk at any level of the business, from the CEO down to any user the first day starting out. Enterprise Infrastructure Leader, Molina Healthcare Operational visibility and real-time views into enterprise infrastructure and application management Comprehensive insight into business intelligence and performance metrics Tracking call center management MTTR, customer service and troubleshooting

54 Splunk IT Service Intelligence Strategic, Business-Centric View of IT Accelerated Value for IT Data-Centric Approach to Service Mapping

55 How Do You Get Splunk ITSI? Online Sandbox Value Assurance 7 days of access to a free, personal environment in the cloud, with prepopulated data Engage in a proof-of-concept to index your data and experience Splunk ITSI

56 Splunk-Sponsored Guided Workshop What is it? 1-day on-site workshop Tightly linked with value Collaborative approach Build your own Splunk ITSI Glass Table Define methods for: Proactive service monitoring Reduced risk and failures Faster issue resolution Increased business performance

57 Thank You 2017 SPLUNK INC.

58 Backup 2017 SPLUNK INC.

59 Splunk is the Backbone of IT Broad ecosystem of integrations Infrastructure Network Applications Server Cloud Development Project & Issue Tracking Storage Code Repository Automation Applications

60 Troubleshooting Monitoring Remediation 2017 SPLUNK INC. Solution Architecture ARTIFICIAL INTELLIGENCE PATTERN DETECTION CLUSTERING ANOMALY DETECTION PREDICTION SOLUTIONS Automation Tools (THIRD PARTY) Service Mgmt Tools (THIRD PARTY) Event Analytics Service Insights INFRASTRUCTURE MONITORING APPLICATION ANALYTICS Infrastructure Troubleshooting Cloud Monitoring & Optimization Custom App Troubleshooting Release Analytics Container Monitor & Troubleshoot Server Monitor & Troubleshooting Custom Experience Monitoring Build Analytics PLATFORM DATA SOURCES TOOLS & APIs METRICS LOGS Platform for Machine Data Cloud APM Open Source Database CMDB Automation Server Host Container Hypervisor Application Storage Network OS Application Mobile Wire Data

61 What We Hear From Our Customers! My CIO is demanding we look at IT from a business service perspective. I need everyone to be able to see the same thing at the same time. Splunk is great for break/fix, but I need to show we re meeting SLAs. I just want to throw data at Splunk and have it find problems for me. Show me what my data can do for me!

62 Why Another Splunk Solution? A data-centric approach is needed Service context maximizes Splunk value An integrated solution accelerates customer success

63 Augment Conventional Monitoring Deliver Insights Based on Integrated Data, Not Integrated Products Splunk IT Service Intelligence APM NPM Operations and Infrastructure Management Domain Tools

64 Splunk IT Service Intelligence Get data Define services, entities and KPIs Monitor and troubleshoot Analyze and detect Data-Defined, Data-Driven Service Insights

65 Pricing 2017 SPLUNK INC.

66 Splunk ITSI $ $ Splunk Enterprise or Splunk Cloud Splunk ITSI

67 Volume Discounts Built In Daily Peak Indexing Volume (GB) Splunk IT Service Intelligence $/GB 1 $5,000 $ $7,500 $ % 5 $12,500 $ % 10 $18,000 $ % 20 $27,000 $ % 50 $47,500 $950 81% 100 $60,000 $600 88% 200 $90,000 $450 91% 500 $162,500 $ % 1000 $300,000 $300 94% Built-in Volume Discount

68 Splunk Quick Start for Service Intelligence Enterprise License Splunk ITSI License Education Professional Services.conf Passes Value Assurance Edition * Services Edition Platform Edition * Splunk ITSI 6-month license

69 Key Terminology Logical grouping of operations Set of actions performed with specific business goals Component required to deliver a service Metrics used to evaluate success EXAMPLES EXAMPLES EXAMPLES EXAMPLES Online banking, authentication, virtualization Sell products, fulfill orders, process payroll Hosts, users, OS processes Service health, order revenue, latency Services Business Processes Entities Key Performance Indicators

70 Splunk IT Service Intelligence Core Concepts Services 2017 SPLUNK INC. Technical Services Business Services Services Web Requests Responses Customer Transactions Requests Responses Mobile API/Middleware Requests Responses Support Desk Requests Responses DNS Requests Responses

71 Splunk IT Service Intelligence Core Concepts Services 2017 SPLUNK INC. Technical Services Business Services Web Requests Responses Customer Transactions Requests Responses Customer Transactions Support Desk DNS API Services Web Services RDBMSs Hypervisor and Hosts Storage Tier Web Mobile API/Middleware In Splunk ITSI, a service is a logical group of technology components that a user deems need to be monitored together Packet Network

72 What s an Entity? An entity is an optional sub-element of a KPI A KPI can be filtered by entities and viewed on a per-entity basis or as an aggregate KPI web requests might use web servers as entities; user logins could use accounts Splunk ITSI can import entities from CMDBs & other sources

73 Service Health Scores A health score is a score from (0 = critical and 100 = normal) that helps determine the health of a service. It is calculated based on importance and status (e.g., green, orange, red) of all KPIs, once every minute.

74 What s an Event? Self descriptive message that tells a user that something happened. Usually contain some sort of title, severity, and description. Used to determine in the moment health. Often very noisy. Think alarm data coming out of tools like Nagios, Solarwinds, APM, Netcool, etc. Example Event src_host="splunk_sh- 01" omd_site ="SJC" perfdata="serviceperfdata" name="check_dhcp" severity="ok" attempt="1" statetype="hard" executiontime="0.000" latency="0.000" reason="ok: Received 1 DHCPOFFER(s), max lease time = 600 sec." result="ok"