System log analysis using InfoSphere BigInsights and IBM Accelerator for Machine Data Analytics

Similar documents
IBM Cognos 8 BI and IBM WebSphere Information Integration Solution The new standard in enterprise visibility

IBM Tivoli Monitoring

Uncovering the Hidden Truth In Log Data with vcenter Insight

Comparing Infrastructure Management Vendors Time to Monitor

Adaptive work environments

Cisco IT Automates Workloads for Big Data Analytics Environments

This topic focuses on how to prepare a customer for support, and how to use the SAP support processes to solve your customer s problems.

INTELLIGENCE. Transforming Industrial Data into Actionable Information PRODUCT DATASHEET

White paper Interstage Process Analytics Architecture

A technical discussion of performance and availability December IBM Tivoli Monitoring solutions for performance and availability

SEQUEL: The Best Data Solution Anywhere

IBM Tivoli Composite Application Manager for Transactions V6.2. helps monitor the availability and response time of business

IBM Watson IoT Maximo Asset Management Version 7.6 Release

Ibm Cognos Express Planner Applications User Guide >>>CLICK HERE<<<

enteliweb Software: Enterprise Facility and Energy Management

IBM Tivoli Service Desk

PROGNOSIS FOUNDATION FOR BASE24-eps (UNIX)

Electronics Manufacturing Service Provider Integrates Supply Line on a Single Platform

Infor PM 10. Do business better.

Ibm Maximo Mobile Work Manager User Guide

IBM Rational RequisitePro

PhaseWare Tracker Beyond

ThingWorx Manufacturing Apps

Improving enterprise performance through operations intelligence solutions siemens.com/xhq

INFOR PM 10 DO BUSINESS BETTER. LEVERAGE EXPERIENCE.

Pinnacle Data Integration Services

SapphireIMS 4.0 ITAM Suite Feature Specification

Solution Delivery Services Bring your Real-time SPC Program to Life

Transform Application Performance Testing for a More Agile Enterprise

Brocade SANnav Management Portal and. Global View. Product Brief. Storage Modernization. Highlights. Brocade Fabric Vision Technology

Application Performance Monitoring (APM) Technical Whitepaper

TMW Systems, Inc. TMW 3GTMS Integration Service Installation Guide

Cisco Kinetic for Manufacturing

New and noteworthy in Rational Asset Manager V7.5.1

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

SUSiEtec The Application Ready IoT Framework. Create your path to digitalization while predictively addressing your business needs

Managed Services. Managed Services. Choices that work for you PEOPLESOFT ORACLE CLOUD JD EDWARDS E-BUSINESS SUITE.

Reporting in Microsoft Dynamics CRM 2011

Business is being transformed by three trends

Managing Applications with Oracle Enterprise Manager 10g. An Oracle White Paper November 2007

SYNTHETIC ACTIVE MONITORING. Copyright 2015 TestPoint All Rights Reserved

Server Configuration Monitor

Microsoft SQL Server 2000 Reporting Services

PERFORMANCE MANAGEMENT AND AVAILABILITY SOLUTIONS OVERVIEW

Developer home page Dynamics 365 for Operations Help Wiki. Dynamics 365 for Operations Help Wiki

Enterprise-Scale MATLAB Applications

Vendor: IBM. Exam Code: C Exam Name: Rational Team Concert V4. Version: Demo

End User Experience Monitoring and Management for Hospitals Using Citrix and Cerner

Virtualization Manager 7.1 Comprehensive virtualization management for VMware vsphere and Microsoft Hyper-V

IBM Tivoli OMEGAMON XE for. WebSphere Business Integration. Optimize management of your messaging infrastructure. Highlights

Kaseya Traverse Unified Cloud, Network, Server & Application Monitoring

Integration Solution for the Enterprise. Powered by

Solution Brief Patent insight Pro Enterprise

GROW WITH BIG DATA. Third Eye Consulting Services & Solutions LLC.

GlobalViewer Enterprise

GlobalViewer Enterprise

SAP Business One 9.3, version for SAP HANA Overview of the Exclusive Features. Global Roll-out, SAP July, 2018

Enabling Real-time Operational Intelligence

OSS BUSINESS INTELLIGENCE MIDDLEWARE (ASP.NET/3.0) 0501/86 APPLICATION SPECIFICATIONS, FEATURE SPECIFICATIONS & ILLUSTRATIONS

Introduction to the IBM MessageSight appliance for Mobile Messaging and M2M

IBM. Mobile Applications User s Guide. IBM Workload Automation. Version 9 Release 4

Oracle Service Cloud. New Feature Summary

System and Server Requirements

Table of Contents. Headquarters Cary, NC USA US Fax International

Data Protection Management (DPM)

ORACLE FUSION FINANCIALS CLOUD SERVICE

Systems Management of the SAS 9.2 Enterprise Business Intelligence Environment Gary T. Ciampa, SAS Institute Inc., Cary, NC

Preview: IBM Tivoli Monitoring Solutions Deliver Superior Management for Key Middleware and Operating Systems

SAP BusinessObjects Dashboard and Analytics Setup and Administration

Access and present any data the way you want. Deliver the right reports to end users at the right time

What s new in Maximo 7.6. Presenter: Jeff Yasinski

Stuck with Power BI? Get Pyramid Starting at $0/month. Start Moving with the Analytics OS

HP Cloud Maps for rapid provisioning of infrastructure and applications

Server Configuration Monitor

MANUFACTURING EXECUTION SYSTEM

AUTOMATING HEALTHCARE CLAIM PROCESSING

Intelligence. Transforming Industrial Data into Actionable Information

Maximize your JD Edwards EnterpriseOne investment with Tools and Technologies

See What's Coming in Oracle Service Cloud. Release Content Document

Enhancing productivity. enabling Success. Sage CRM

Exploring Big Data and Data Analytics with Hadoop and IDOL. Brochure. You are experiencing transformational changes in the computing arena.

VMware vcenter Operations Standard

SSL ClearView Reporter Data Sheet

Turning Data into Manufacturing Intelligence

PRIMAVERA WEB SERVICES

NICE Customer Engagement Analytics - Architecture Whitepaper

Application Monitoring FAQ

IBM Tivoli Endpoint Manager for Lifecycle Management

Brochure. IT Operations Management. Enhance Data Protection with Analytics and Insights. Micro Focus Backup Navigator for Micro Focus Data Protector

Microsoft reinvents sales processing and financial reporting with Azure

IBM Analytics Unleash the power of data with Apache Spark

Bluemix Overview. Last Updated: October 10th, 2017

Introduction to IBM Cognos for Consumers. IBM Cognos

Resolve End User Experience Issues for Any Citrix or VMware-Delivered Application

ExtendTime A completely automated IP Telephony time and attendance solution that immediately realizes an organizational return on investment.

Altiris IT Management Suite 7.1 from Symantec

Refinery Technical Service Site

Deltek Costpoint Enterprise Reporting 7.2. Release Notes

CREATE INSTANT VISIBILITY INTO KEY MANUFACTURING METRICS

David Taylor

Transcription:

System log analysis using InfoSphere BigInsights and IBM How to mine complex system logs for clues to performance issues Vincent Cailly 01 October 2013 When understood, logs are a goldmine for debugging, performance analysis, root-cause analysis, and system health assessment. In this real business case, see how InfoSphere BigInsights and the IBM are used to analyze system logs to help determine root causes of performance issues, and to define an action plan to solve problems and keep the project on track. Introduction As systems become more complex, it becomes increasingly difficult, without the right tooling, to quickly assess system health and to troubleshoot problems. This article shows how InfoSphere BigInsights and the IBM can: Increase visibility, making it easier to gauge the health of systems and applications Tremendously accelerate troubleshooting when problems occur One of my customers has decided to deploy IBM Maximo Enterprise Asset Management (EAM), a global and effective system to monitor and manage the visibility, deployment, performance, reliability, availability, lifespan, and maintenance of assets, worldwide. This is a large and complex project because the solution has to be deployed in about 80 plants across five continents. The deployment is 20 percent complete so far. Recently, the customer was experiencing severe performance issues with this new system, which is permanently changing because of the deployment in progress. The IT and IS operational teams were having trouble finding the root causes of the performance issues, and the customer asked for ideas that might accelerate the root-cause analysis and the resolution of these problems. I suggested a proof-of-concept solution using InfoSphere BigInsights and the IBM Accelerator for Machine Data Analytics to analyze the logs of the system with two objectives: Help customer resolve the performance issues Copyright IBM Corporation 2013 Trademarks Page 1 of 11

developerworks ibm.com/developerworks/ Demonstrate the value of this IBM solution When understood, logs are a goldmine for debugging, performance analysis, root-cause analysis, and system health assessment. But knowing that, both the customer and I were surprised by all the findings this proof-of-concept solution revealed. We were able to quickly determine root causes of the performance issues and define an action plan to solve the problems and keep the project on track. Technical environment for the proof-of-concept solution The proof of concept includes an application based on Maximo Enterprise Asset Management and InfoSphere BigInsights, running IBM applications. Application based on Maximo Enterprise Asset Management The customer is running two instances of the application: one for North American users and one for European users. All servers for both instances are located in Europe. Each instance of the application is made of the following components: One IBM HTTP Server instance Six IBM WebSphere Application Server instances to run the user interface, cron tasks, and on-demand reports One IBM WebSphere Application Server instance to run scheduled reports, cron tasks, and the Maximo integration framework used for the integration of the application with an Enterprise Resource Planning (ERP) solution One Oracle database InfoSphere BigInsights environment InfoSphere BigInsights was installed on a stand-alone machine (a virtual machine running on an IBM ThinkPad W530) and log files of the application were manually transferred to this virtual machine. InfoSphere BigInsights Quick Start Edition InfoSphere BigInsights Quick Start Edition is a complimentary, downloadable version of InfoSphere BigInsights, IBM's Hadoop-based offering. Using Quick Start Edition, you can try out the features that IBM has built to extend the value of open source Hadoop, like Big SQL, text analytics, and BigSheets. Guided learning is available to make your experience as smooth as possible including step-by-step, self-paced tutorials and videos to help you start putting Hadoop to work for you. With no time or data limit, you can experiment on your own time with large amounts of data. Watch the videos, follow the tutorials (PDF), and download BigInsights Quick Start Edition now. In this InfoSphere BigInsights environment, for each instance of the application we imported the following logs: The IBM HTTP Server access log (one semi-structured text file). Page 2 of 11

ibm.com/developerworks/ developerworks The SystemOut and SystemErr logs of all the WebSphere Application Server instances, which include 154 non-structured text files. When the 10MB buffer is reached, the current log file is closed and renamed. A new log file is created. In the WebSphere configuration, the number of log files to rotate is set to 10. In this case, we have 22 log files per application server: SystemOut log files: one current log file plus 10 renamed files SystemErr log files: one current log file plus 10 renamed files These logs are rotating when they reach 10MB. The Oracle database alert log (one semi-structured XML file). In total, there are 312 log files. One can easily imagine the nightmare of having to manually analyze these 312 log files without the right tooling. InfoSphere BigInsights MDA applications We ran the following InfoSphere BigInsights MDA applications: A Distributed File Copy application to import the logs into the Hadoop file system. An Extract application that uses text analytics to extract information from the batches of log files ingested into InfoSphere BigInsights. An Index application to index the record of all log files. The creation of this index is required to use the faceted browsing interface to quickly find log entries based on multiple criteria and to expedite troubleshooting. Due to limited physical resources (mainly storage) of the virtual machine, we did not run the following InfoSphere BigInsights MDA applications: The Frequent Sequence Analysis application, which examines which pattern of events happens most commonly before an error condition The Significance Analysis application, which examines which specific events are the most likely cause for an error condition The BigSheets feature to produce specific reports and feed dashboards. The next section describes how the proof-of-concept solution increases visibility into the health of the system and enables faster troubleshooting. It includes some examples of the outputs provided by IBM BigInsights and the IBM for this particular case. Increased visibility and faster troubleshooting enabled by this solution This solution makes it easier to see what's going on inside the interconnected systems and makes it faster to troubleshoot problems by providing these advantages: 360-degree view Faceted browsing Log analysis using dashboards Measure of number of Maximo EAM error messages Page 3 of 11

developerworks ibm.com/developerworks/ Analysis of the Maximo BMXAA6720W warning message Measure of number of Oracle error messages 360-degree view This solution makes it possible to get a 360-degree view of all of the events logged by different components. Logs from IBM HTTP Server, WebSphere Application Server, and Oracle database server have been transformed, aggregated, and indexed to enable an advanced search across the different log files (156 log files per instance of this Maximo EAM application, in this case). This enhanced view tremendously facilitates troubleshooting and determination of root causes. Figure 1. 360-degree view Faceted browsing The faceted browsing interface makes it easier to quickly find log entries based on multiple criteria. Figure 2 shows how easy it is to find log entries in the 156 log files of one instance by using multiple search criteria. Page 4 of 11

ibm.com/developerworks/ developerworks Figure 2. Using faceted browsing to locate log entries HTTP log analysis using InfoSphere BigInsights dashboards The InfoSphere BigInsights dashboard makes it easy to publish and share the output of the analysis. It facilitates the communication and the collaboration between the different IT and IS teams (development team, IT operational teams, etc.). Figure 3 shows a dashboard where we have published the results of the analysis of the HTTP access logs, including: HTTP status codes for all the HTTP requests received by the HTTP server. The status codes allow you to check: The number of errors (HTTP status codes > 400) logged by the HTTP server. This number helps gauge the health of the application. The browser caching efficiency: the ratio of 304 HTTP status codes. (The ratio is the number of HTTP requests with a 304 status code to the total number of HTTP requests.) URL paths causing the HTTP status code 404 errors often result in decreased performance for users, even though the decrease is sometimes invisible. Recommendation to improve server performance, eliminate all 404 errors. The number of HTTP requests per IP address allows you to view any suspect IP addresses sending many more HTTP requests than other IP addresses. The version of the HTTP protocol used for all the HTTP requests. Page 5 of 11

developerworks ibm.com/developerworks/ Figure 3. Dashboard where we have published the results of the analysis of the HTTP access log Viewing this dashboard, we can make recommendations based on some preliminary conclusions about the performance problems: Pie charts at the far left of Figure 3 Notice that only 7 percent of the HTML objects are fetched from the user agent cache (HTTP status code 304) for the EU instance, compared to 47 percent for the NA instance. Bar graphs to the right of the pie charts in Figure 3 Some IP addresses are exhibiting suspect behaviors. For example, some IP addresses are sending many more HTTP requests than a standard user of the application. Recommendation: After further investigation, we discovered that these IP addresses were allocated to machines running scripts to monitor response times of the application. Some advanced power users were trying to measure response times to provide evidence about these response times, but they did not realize that those scripts were degrading the overall performance of the system (in particular, server resource utilization and WAN bandwidth utilization). In addition, those scripts distorted the information about HTML objects fetched from the user agent cache (HTTP status code 304). For some of these scripts, HTML objects were always fetched from the user agent cache. We suggested stopping those scripts. Tables to the right of center in Figure 3 In Europe, some client machines are using V1.0 of the HTTP protocol instead of V1.1. In terms of performance, using HTTP 1.0 generally leads to a bad experience because HTTP 1.0 does not allow multiple requests to use a single connection. Recommendation: After additional investigations, we discovered that the HTTP 1.0 requests were sent by legacy end-user obsolete MS Windows XP workstations running Microsoft Page 6 of 11

ibm.com/developerworks/ developerworks Internet Explorer V6 (see Resources for a link to Microsoft Support). So we made the recommendation either to apply the solution proposed by Microsoft or to implement a snippet on the application authentication page to test the browser being used. If Microsoft Internet Explorer V6 is detected, we recommended asking the user to switch to another browser, such as Mozilla Firefox V3.5. Tables at the far right of Figure 3 All the URL paths at the origin of HTTP status code 404 are displayed. Recommendation: We suggested making the required changes on the application to get rid of all these 404 errors. Error messages logged by IBM Maximo software Another indicator to help assess the health of the application is the number of Maximo error messages logged in the WebSphere Application Server SystemOut log files. The WebSphere Application Server log extractor that comes with the IBM Accelerator for Machine Data Analytics does not allow you to get immediate information on these Maximo error messages. The format of records containing these error messages is not always the same. To get this information we had two options: To develop our own extractor To use the BigSheets feature of InfoSphere BigInsights We chose the BigSheets feature. Just by using standard basic BigSheets functions (MID, SLICEITEM, PIVOT, FILTER, etc.), it took less than half an hour to produce reports on the number of Maximo error messages logged (see Figure 4 and Figure 5.) Some Maximo errors messages showed up frequently in the logs. We suspect either technical issues at the application level or defects in the Maximo software. We recommended opening a PMR to request deeper analysis of the root causes so we can resolve the underlying problems. Page 7 of 11

developerworks ibm.com/developerworks/ Figure 4. EU instance Figure 5. NA instance Analysis of the Maximo BMWAA6720W messages The Maximo BMXAA6720W warning message indicates long-running query execution and provides useful information about the performance of the system.with BigInsights, we can easily extract the information highlighted in bold in the log record sample below. WebSphere SystemOut log records containing the Maximo BMXAA6720W warning message look like this: [6/25/13 8:28:32:140 CEST] 000000ec SystemOut O 25 Jun 2013 08:28:32:140 [WARN] BMXAA6720W - USER = (UID00195) SPID = (2082) app (WOTRACK) object (WORKORDER) : select * from workorder where (workorderid = 4568) (execution took 1317 milliseconds) Page 8 of 11

ibm.com/developerworks/ developerworks As for the Maximo error messages, we decided to use the BigSheets feature of InfoSphere BigInsights to extract fields highlighted in bold in the sample log record provided above. Then we produced reports highlighting problems using SQL queries (see Figure 6). Further and deeper analysis by a database specialist revealed several issues at the level of the database server: Lack of physical memory on the database server Data model design issues Problems with some indices Oracle database software bugs that are fixed with more recent versions of this software Figure 6. Reports highlighting problems using SQL queries Oracle error messages logged in the Oracle alert log As we did for the Maximo error messages, we used the BigSheets feature to produce reports on the number of Oracle error messages logged into the Oracle alert log (see Figure 7). This is just another indicator that helps assess the health of the system. Page 9 of 11

developerworks ibm.com/developerworks/ Figure 7. Oracle alert log Conclusion Used together, InfoSphere BigInsights and the IBM are useful in debugging performance problems, specifically in the case of this Maximo EAM application. But this solution can be applied to other situations and systems, as well. We are now working with the customer to deploy the BigInsights solution for operations in the production environment. We will pilot three business-critical applications, and if the pilot is successful, the solution will be deployed for the 20 most critical applications for this customer. For the pilot and for each application, three use cases will be covered: Publication of dashboards providing information about the health of the system. The dashboards will be: Automated to occur daily Shared across operation teams to facilitate collaboration Set up to enable the customer to act proactively when deviations are observed Validation of major application releases before moving them into production. This includes analyzing the logs of the QA environment to facilitate the decision about whether to move into the production environment. Problem troubleshooting and resolution using the advanced features of the proof of concept solution to accelerate root cause analysis and problem resolution. In short, this proof-of-concept solution can be applied in many contexts to increase visibility into the health of interconnected systems and to speed troubleshooting and root-cause analysis. Page 10 of 11

ibm.com/developerworks/ developerworks Copyright IBM Corporation 2013 (www.ibm.com/legal/copytrade.shtml) Trademarks (www.ibm.com/developerworks/ibm/trademarks/) Page 11 of 11