Data Center Operation for Cloud Computing Age Implemented with MasterScope Ver. 8

Similar documents
MISSION CRITICAL OPERATIONS

MasterScope IT Process Management Introduction. First Edition June, 2017 NEC Corporation

Middleware for Supporting Private Clouds

The 7 Common Challenges Facing IT Departments The First Step into Offensive IT Investment. Cloud Platform Division NEC Corporation

Overcoming the Management Challenges of Portal, SOA, and Java EE Applications

IBM Tivoli Monitoring

TSCM Cloud Services for Implementing the Global Mother Factory Center Concept

Managed Workplace. Combine cost effectiveness with end user satisfaction. shaping tomorrow with you

Service management solutions White paper. Six steps toward assuring service availability and performance.

Cisco IT Methods How Cisco Simplifies Application Monitoring

FUJITSU Cloud Service K5 PaaS Digitalizes Enterprise Systems

The Customer. The Challenge. The Solution

MasterScope IT Process Management Introduction. First Edition June, 2017 NEC Corporation

VoIP Solution How to Make the Best Choice for Your Business

HP Operations Analytics

BACSOFT IOT PLATFORM: A COMPLETE SOLUTION FOR ADVANCED IOT AND M2M APPLICATIONS

What Do You Need to Ensure a Successful Transition to IoT?

Leap GIO Cloud Services Specifications

IBM storage solutions: Evolving to an on demand operating environment

A technical discussion of performance and availability December IBM Tivoli Monitoring solutions for performance and availability

First Fault Performance Problem Resolution Fix Mainframe Performance Problems Right the First Time

One Stop Solution for Everything from Prediction to Failure and Influencing Factor Detection AI Support for Biomass Power Plant Business

MSC Software Standard Software Maintenance & Technical Support Usage Guide

FLEX. Course Overview

When It Needs to Get Done at 2 a.m., That s when you can rely on CA Workload Automation

How managed cloud hosting services help an organization reduce TCO?

Solution Pack. Managed Services for Virtual Private Cloud Continuity Service Selections and Prerequisites

Configuration Management in cloud environment

Fortune 10 Company Uses DevOps to Drive Efficiency. Transforming a Generations-old Approach with Chef Automate and Habitat

Are Your Cloud Applications Performance-Ready?

Simplifying Data Protection with Next-Generation Converged Infrastructure

Product Overview. secure Agent Secure Enterprise Solutions. SecureAgent Software.

Cost Optimization for Cloud-Based Engineering Simulation Using ANSYS Enterprise Cloud

Service Level Agreement

Application-centric Infrastructure Performance Management (IPM)

Application-centric Infrastructure Performance Management (IPM)

Introduction of Hinemos an Open Source Integrated System Management Software

Introduction of MasterScope Operations Management Software Suite. March, 2018 NEC Corporation

IBM Emptoris Supplier Lifecycle Management on Cloud

E-Guide. Sponsored By:

Get Proactive With Oracle Support. Denis Jaume Senior Director Software Support

SapphireIMS 4.0 Business Service Monitoring Feature Specification

NEC Management System Reforms

White Paper. Veritas Configuration Manager by Symantec. Removing the Risks of Change Management and Impact to Application Availability

WHITEPAPER. Unlocking Your ATM Big Data : Understanding the power of real-time transaction monitoring and analytics.

WHITE PAPER MARCH Improve ROI of PeopleSoft Enterprise With Business Automation

Open Service Repository/Enterprise Gateway (hereinafter

Create Field-Level Audit Trail. Real-Time, Intelligent, Non-Invasive Monitoring of End-users

Unleash the Power of Mainframe Data in the Application Economy

Transforming software delivery with cloud

Module: Building the Cloud Infrastructure

5 Things to Consider Before Implementing Your Next Automation System Project

Cloud Automation a beginner s guide

WHITE PAPER. CONTROL-M: Empowering the NetWeaver Solution

Resolve End User Experience Issues for Any Citrix or VMware-Delivered Application

IBM Service Management Buyer s guide: purchasing criteria. Choose a service management solution that integrates business and IT innovation.

ENTERPRISE OPERATIONS SERVICES

As One Technologies, Inc. Catalyst xm

Resolve End User Experience Issues for Any Citrix or VMware-Delivered Application

Resolve End User Experience Issues for Any Citrix or VMware-Delivered Application

Managing Applications with Oracle Enterprise Manager 10g. An Oracle White Paper November 2007

Accelerate with IBM Storage: VersaStack for Fast Deployments

CA Mainframe Resource Intelligence

IBM Service Management for a Dynamic Infrastructure IBM Corporation

IBM Tivoli Endpoint Manager for Software Use Analysis

On-premise or Cloud: Which is Right for Your Business

CA Virtual Performance Management

This topic focuses on how to prepare a customer for support, and how to use the SAP support processes to solve your customer s problems.

Ensuring Data Protection and Recovery in the 3rd Platform Era

ProSupport Enterprise Suite. Support that accelerates your IT transformation

Security and Patch Management for ProLiant Servers

IBM Tivoli OMEGAMON XE on z/vm and Linux

Liuzhou, Guangxi, China. *Corresponding author. Keywords: Manufacturing process, Quality management, Information technology.

COLORlynx QUICK START GUIDE

invest in leveraging mobility, not in managing it Solution Brief Mobility Lifecycle Management

Advanced Support for Server Infrastructure Refresh

End User Experience Monitoring and Management for Hospitals Using Citrix and Cerner

efiscal Networks Case Study

IBM Cloud Application Performance Management

Deltek Costpoint Manufacturing Solutions

Moving data successfully: Take 10 for a smooth transition to new storage

White paper June Managing the tidal wave of data with IBM Tivoli storage management solutions

Exhibit E LeanSight SLA. LeanSight SERVICE LEVEL AGREEMENT (SLA)

Rethinking the way personal computers are deployed in your organization

Vision Director 8.0. Accelerate Performance, Automate Optimization for your IBM i Environments. Name Title

Fast Track Your Contact Center Quality Assurance Program with the Cloud

Automated Service Intelligence (ASI)

DevOps architecture overview

Transform Application Performance Testing for a More Agile Enterprise

Accelerate Your Thermal Design. Thermal Simulation for Electronics

Introducing FUJITSU Software Systemwalker Centric Manager V15.0

AUTOMATE YOUR ORGANIZATION

Implementation & Consulting Services, Inc.

The Emerging Era of Business Process Management and Its Imperatives for an IT Leader

5 Pitfalls and 5 Payoffs of Conducting Your Business Processes in the Cloud

Why You Should Take a Holistic Approach

Preview: IBM Tivoli Monitoring Solutions Deliver Superior Management for Key Middleware and Operating Systems

QPR ScoreCard. White Paper. QPR ScoreCard - Balanced Scorecard with Commitment. Copyright 2002 QPR Software Oyj Plc All Rights Reserved

Documents the request as clearly and completely as possible on the Change Request Form Submits request to project manager

Moving From Contact Center to Customer Engagement

Transcription:

Data Center Operation for Cloud Computing Age Implemented with MasterScope Ver. 8 KATO Kiyoshi, NISHIMURA Teruo, KATSUMI Junichi Abstract Optimization of IT costs and the cloud computing service that is expected to deal quickly and flexibly with changes in future business procedures are currently attracting attention. This paper introduces the requirements of the data center operations management methods that form the foundation for the use by enterprises of cloud services. It also introduces a method of data center implementation using MasterScope, integrated operations management software provided by NEC. This method is particularly helpful for providing a performance analysis technique that is a key point in supporting the stable provision of services by minimizing system downtime. This paper also introduces the latest technologies for solving issues resulting from silent faults that are difficult to solve using traditional technologies. Keywords cloud computing, operations management, MasterScope, silent fault, invariant analysis technology, data center 1. Introduction 2. CODC Operations Management Deterioration of the environment surrounding enterprises has made it more necessary than ever to enable optimization of costs and to provide a flexible approach to change in business environments. This trend has led to an increase in attention to the cloud service, that can optimize IT costs by utilizing the required amount of resources at the required timings and can also deal quickly and flexibly with changes in business conditions. As the effectiveness of the cloud service is recognized more, it is expected that the opportunities for it to assume the core role in business activities will increase in the future. Foreseeing such a future, NEC has announced REAL IT PLAT- FORM Generation 2 as the platform vision for the cloudoriented data center (CODC) that is a foundation for the provision of the cloud services. We expect that the role of the operations management required for maintaining stable service provision using the CODC will increase following the growth in importance of the cloud service. In this paper, we summarize requirements for the operations management of the CODC and introduce MasterScope *1, the integrated operations management software for the CODC. We will also focus particularly on the solution of the silent fault that is not detectable with the previously used system monitoring software. 2.1 Issues for CODC In order to provide stable services via CODC, it is necessary to recognize operations management issues that are caused by convergence of the IT resources. Convergence of resources leads to an increase in the scale of management and the labor used in achieving it. Diverse knowledge and technical expertise are required to integrate the previously dispersed multi-vendor/multi-platform resources and to run an environment in which the physical and logical device relationships vary flexibly due to virtualization. It is thereby expected that the management procedures are going to be more complicated than ever and that the power cost is expected to increase due to the convergence of the servers. 2.2 Effects of MasterScope Ver. 8 Integrated operations management software Master- Scope has been handling the issues caused by increased scale and complexity 1). Its latest version, MasterScope Ver. 8 inherits the same concept but is enhanced for total optimization by expanding the management scope from enterprise systems to CODC. *1 MasterScope is originally sold under the name of WebSAM in Japan. NEC TECHNICAL JOURNAL Vol.5 No.2/2010 ------- 85

Data Center Operation for Cloud Computing Age Implemented with MasterScope Ver. 8 Fig. 1 MasterScope Ver. 8. MasterScope Ver. 8 offers an operations management platform that can maintain a higher service quality with fewer administrators and lower cost, even in an environment that features increased scale and complexity. This is made possible by supporting monitoring of the system situations of the entire CODC and of automating the judgments and improvements in operations. MasterScope Ver. 8 solves the issues of the CODC from the three viewpoints of optimization of operations cost, optimization of IT resources and optimization of energy cost ( Fig. 1 ). For the optimization of operation cost, it is required to predict faults and to solve their cause early in order to maintain and manage stable service provision. Nevertheless, in a cloud environment in which many resources are converged, workload exceed processing limit of the methods based on human labor or management tools, so that increased operational costs and degradation of the service quality could be a possible outcome. Therefore, MasterScope Ver. 8 monitors the potential fault by means of auto discovery/mapping of physical/logical configurations of IT resources without requiring a management function. In addition, it employs a performance analysis engine (to be described below) and thereby taking a different approach from conventional products in order to detect faults before they occur and analyze causes of faults quickly. It contributes thus to reductions in the administrator s workload and cost. Next, for the optimization of IT resources, to provide stable services, it is required to maintain the resources that form the basis of the services in a readily available and safe condition. MasterScope Ver. 8 pools the physical and virtual resources for centralized management and automates maintenance work such as patch management to support reducing the workload and costs for management. Finally, for the optimization of energy cost, the data center necessitates thoroughgoing energy saving measures. MasterScope Ver. 8 monitors the power consumption and reduces the power cost by improving the cooling efficiency and applying power-saving controls. 86

Special Issue on Cloud Computing 3. Uncertainty of Fault Detection in the Use of Cloud Computing 3.1 Silent Faults Reducing the Service Quality As described for the optimization of operation cost in the above, the quick identification of a fault is important for minimizing the downtime in the cloud environment. However, there exist faults that cannot be detected by existing system monitoring software. In the operation of the conventional data centers, faults are detected with error messages so that the identification of their cause don t occupy much time. However, some faults cannot be dealt with because they are not indicated by error messages and they become a source of degradation of the service quality. Such faults include, for example, slowdown of the processing speed of the overall system while no problem is found in monitoring of individual servers. Because of their characteristics, such faults are called silent faults. The problem with silent faults is the delays in applying countermeasures as they are not detectable by monitoring and are not noticed until users make inquiries and there is a long period until recovery because even experienced administrators take time to identify the cause. As the cause of the fault is not noticed in the form of the error message, the administrators/experts of the server, network, business application, etc. need to acquire a huge amount of performance data and analyze the problematic point by taking a long time. Furthermore, if the system has a large number of components, workload exceeds the processing limit of human analysis, even when techniques such as graphs are utilized extensively for the identification of the cause. As a result, the only effective way that can be taken for the causal identification of such faults is to resort to the expertise of highly skilled engineers who extract the doubtful points based on their experience and deductions. 3.2 Toward Solution of Silent Fault-related Issues With the CODC, the traditional silo type system configuration (in which the components from hardware to applications (AP) are independent as a system) is replaced by configurations using shared platforms. This leads to separation of the business AP administrator at the higher level and the platform administrator at the lower level. This results in the inappropriateness of the traditional method in which the system administrator analyzes the performance by monitoring the entire system. This means that the operations management of the CODC system would necessitate more time, labor and advanced skill than previously. Under these circumstances, the hard-to-detect silent faults pose an important issue for guaranteeing stable service provision because the service quality drops as the fault detection is delayed. To prevent this problem, it is necessary to use techniques for silent fault detection/analysis that are impossible to existing system monitoring software. 4. Resolution of Silent Faults 4.1 Invariant Analysis Technology At NEC, we have developed the invariant analysis technology aiming at creating a solution to the silent fault issue. This technology is a system performance analysis technology that is original from the global viewpoint and was actually developed and patented by NEC Laboratories America. Invariant means an unvarying relationship that exists for example between the quantity of traffic input to web servers and the CPU load of the servers executing processing accordingly (AP servers). In this relationship, an increase in traffic leads to an increase in the CPU load of the AP servers and a decrease in traffic leads to a decrease in the CPU load. Such relationships exist invariantly regardless of daily changes in the amount of business. The invariant analysis technology is a new technology that is capable of dealing with the strong correlation that exists in the relationship between two pieces of performance data (invariant relationship) such as in the above example. The invariant analysis acquires the performance data over a certain period from an actual operational system and exhaustively investigates the existence of invariant relationships. Invariant relationships are obtained as mathematical expressions in the form y = f(x) and are modeled as behaviors in normal operations. These models are contrasted with the actual performance data in order to define behaviors that are different from those of normal operations. 4.2 MasterScope Invariant Analyzer The MasterScope Invariant Analyzer is a system performance analysis software for applying the invariant analysis technology 2). NEC TECHNICAL JOURNAL Vol.5 No.2/2010 ------- 87

Data Center Operation for Cloud Computing Age Implemented with MasterScope Ver. 8 Fig. 2 Invariant analysis technology. This product first generates models from the performance data in normal operation. It then detects the differences between the models and current data as abnormalities and judges that the system is normal if the correlations of the models in normal operation are maintained. If the relationships of any models are lost, it analyzes whether or not the behavior is unusual and judges if there is a possibility of a fault occurrence ( Fig. 2 ). The analysis results show how many unusual behaviors exist in the overall system and specifically which performance data is unusual. From these results, the user can ascertain when faults occur, how severe the faults are, and which components cause fault. In addition, this product automatically compares the ratio of the unusual behavior in the system with the past fault cases. If the current fault pattern resembles that of past cases, it identifies similar faults so that the countermeasures taken in past cases may be referenced. 4.3 Effects of Invariant Analysis Technology Since the invariant analysis technology does not depend on the experience and deductions of individuals, it can detect abnormalities exhaustively and accurately and discover unusual behaviors within a practical computation time even when the system is of a large scale. This abnormality detection method based on a different approach to threshold monitoring makes it possible to find silent faults that are not notified by error messages. For instance, there has even been a case in 88

Special Issue on Cloud Computing which MasterScope Invariant Analyzer succeeded in analyzing and solving in half a day a silent fault that used to take about two weeks from occurrence to cause identification. Furthermore, invariant analysis technology is also capable of extracting invariant relationships from any kinds of data and of discovering unusual elements provided that the data consists of time-series digitized sequences. There is no restriction in the devices and performance data that can be supported. One of its appealing features is that it can compile data at different levels and perform cross-domain analysis of the entire system uniformly in order to increase analysis accuracy. 5. Application of Invariant Analysis Technology 5.1 Real-time Performance Analysis Analysis by MasterScope Invariant Analyzer makes it possible to find silent faults of the system and specifically identify them as needed. We incorporated the same technology in the integrated management software MasterScope MISSION CRITICAL OPERATIONS in order to monitor silent faults permanently by analyzing the collected performance data in real time. This procedure has led to the capability of optimally monitoring and treating both the ordinary faults accompanied with error messages and the silent faults. 5.2 Application to Capacity Management At present, we are also studying the application of the invariant analysis technology to capacity management. As described above, the invariant analysis technology generates accurate models from an actually running system in the form of mathematical functions. When values are substituted in these models, it is also possible to simulate the effect of the change of an item of performance data on other performance data related to it ( Fig. 3 ). When a system input value, for example the number of HTTP requests/sec., is increased sequentially, the performance data of one of the related constituent elements reaches the Fig. 3 Capacity planning. NEC TECHNICAL JOURNAL Vol.5 No.2/2010 ------- 89

Data Center Operation for Cloud Computing Age Implemented with MasterScope Ver. 8 limit value (e.g. 100% in the case of the CPU usage rate) at a certain moment. In this case the input value at that moment is the maximum tolerance of the system and the constituent elements reaching the limiting value becomes the bottleneck. In this way, it is possible to exhaustively and accurately identify the elements needing to be reinforced in the future. 6. Conclusion As data centers are increasing their scale and complexity with regard to the implementation of cloud services, improvements in their operational efficiency are expected to become an important issue for the future. The invariant analysis technology introduced here is capable of automating the performance analysis work that used to be a heavy workload for the operations managers. In addition, when it cooperates properly with a MasterScope operations management product, the workload and cost of operations can be reduced even further. In the future age of cloud computing, we will continue to contribute to the improvement of data center operations efficiency by continuing to accommodate the new technologies. References 1) MasterScope, http://www.nec.com/global/prod/software/ 2) Invariant Analyzer, http://www.nec.com/global/prod/invariantanalyzer/ Authors' Profiles KATO Kiyoshi Manager 2nd IT Software Division IT Software Operations Unit NISHIMURA Teruo IT Software Operations Unit KATSUMI Junichi Shizuoka Branch Division NEC Soft, Ltd. 90