three performance rules with CA MICS Resource Management

Size: px
Start display at page:

Download "three performance rules with CA MICS Resource Management"

Transcription

1 TECHNOLOGY BRIEF December 2010 three performance rules with CA MICS Resource Management Denise P. Kalm CA Technologies, Mainframe Solution Group

2 Table of Contents Executive summary 3 The performance rules 4 Rule 1: All the data, all the time 4 Rule 2: Understand your metrics (and statistics) 5 Rule 3: Expect the best, but plan for the worst 6 Conclusions 8 2

3 Executive summary Challenge As our work lives get more complex, it can be easy to get buried in the details and bemused by technical wizardry and advances. No matter the platform, performance analysts can struggle to manage all the data, all the time, delivering reports to all stakeholders and still meet service level agreements (SLAs.) There are fewer people and more systems to manage and complexity just keeps increasing. This can result in poorly tuned systems, which translates to lost business and lost opportunities. The obscure we see eventually. The completely obvious, it seems, takes longer. Edward R. Murrow Opportunity In a tough economic cycle, everyone is getting back to basics and focusing on the fundamentals. Though it might seem counter-intuitive, the best way to get on top of a difficult challenge is often getting back in touch with those practices that are at the heart of performance management and reinstituting the discipline they dictate. Don t reinvent, reinvigorate your CA MICS Resource Management (CA MICS) installation instead. Discover how CA MICS provides the tools you need to automate and simplify your challenging task and get you back to the fundamental best practices. Benefits How best can a performance analyst fulfill his or her job? Proactive performance management is the answer; CA MICS is the catalyst that helps speed you to your goal. What would it be worth to you to have all the data you need available to you whenever and however you needed it, with calculated fields designed to really highlight what matters (as an example, you can audit the 95th percentile of response time rather than only looking at averages) and help you prevent performance problems and plan for future growth and changes? CA MICS Resource Management is a performance analyst/capacity planner tool designed by performance analysts and capacity planners. This brief helps highlight just a few of the ways you can benefit by taking full advantage of your solution. 3

4 The performance rules Rule 1: All the data, all the time No matter what platform you are on, you need to collect all the data you can obtain, and your collectors need to be running 24x7. The old rule of thumb is you cannot manage what you do not measure, so you need to be sure you are capturing all the metrics you may need. When you believe that you can afford to wait until after a problem occurs to turn on data capture, you are ensuring that you will have that problem, and you will need the data. How acceptable is it for you to tell your manager that you will be happy to resolve the issue when it recurs? Even though some data collection can cost resources, imagine the cost of a lost customer, a lost deal. In this web-enabled world, where your customers are directly interacting with your systems and where competition is waiting to tempt them away, can you afford to give even one customer a bad experience? How many millions of dollars per hour (or per minute) does an outage or slow-down cost your employer? Many cite the cost of data collection (fairly significant for CICS 110 or DB2 100,101 records). Others are trying to save on disk costs. But in the end, how do those costs compare to the business costs when you do not have the data you need? For too many, only certain key resources are tracked. As more mission-critical information moved to disk, many people stopped focusing on tape performance, but today, tape is getting to be more and more important, especially virtual tape. Are you collecting tape data? Is it the right data? However you collect data today, do you have an easy way to correlate metrics and combine information from multiple sources into the components of a business transaction? It does little good to report that performance of a component of the system is acceptable when the end-to-end performance, as observed by a customer, isn t acceptable. And stating that you do not have the data as an answer to management is likely to be a career-limiting move. The goal is to transform data into information, and information into insight. Carly Fiorina On a mainframe, data is collected automatically by RMF, as long as you enable all the records you need. But the collected data is not easy to mine; it requires knowledge of the very complex architecture of RMF and the layout of the fields. You also don t get value-added metrics metrics that have been calculated to help you improve your understanding of the data. CA MICS Resource Management organizes and manages the data, so you can gain access to the information you need more quickly, whether you want to mine it directly on the mainframe, or through the web-based Q&R Workstation. CA MICS also provides the automation to help you manage the data, allowing you to roll up detailed daily data into weekly, monthly and yearly summaries for trend spotting. If you also own CA SMF Director, you can more quickly and completely transfer the SMF data to your CA MICS database, while minimizing duplications and missing data all the data, all the time accurately. SMF Director 4

5 also has the ability to create split files for CA MICS raw data input processing to optimize the efficient processing of the high volume data sources input by CA MICS. This helps save time and system resources required to build and update your CA MICS databases. Rule 2: Understand your metrics (and statistics) What is CPU busy? If you have been doing this job a long time, what becomes obvious is that though new metrics arrive all the time, the meaning of old metrics may have changed. When life was simpler and you simply reported on CPU busy on a uniprocessor with no LPARs, it was easy to understand what 90% busy meant. Now, there are literally hundreds of CPU parameters to monitor, and you need to have that data along with allied metrics to understand what you are looking at. CA MICS provides an impressive number of CPU metrics, with clarity of design to help you understand the difference between CPU busy or an LPAR or the entire CEC. The built-in reporting function helps to provide that you report on resource utilization in a way that is truly meaningful. The graphics capability of Q&R Reporting makes it easy for stakeholders to understand when capacity is constrained, helping you justify changes and/or upgrades. Response time is another key metric; CA MICS provides something much more useful than average response time, which rarely represents the user experience accurately. By letting you select 90 th percentile or 95 th percentile, you can more quickly assess what the majority of your users experience, giving a much more valuable picture from which to tune. Use CA MICS to define and document your resource configuration. Which LPARs are missioncritical; which are test and development? What resources are shared? Understanding this, and having the help of CA MICS, it is a lot easier to move workloads around to balance demand, steal resources temporarily and understand the impact of any change. Do you know how DB2 CPU is allocated? Do you have the right metrics on your VTS? How much CPU should a transaction or process consume? When your infrastructure was less complex, it might have been acceptable to use a homegrown performance database (CDB or CMIS in ITIL terms), but now, the complexity dictates that you need a database that brings in all the data from all the resources you manage, makes meaning of all those metrics, allows ready correlation of metrics to display a picture of business application performance and also creates useful calculated fields. Too often, default data or calculations are not all that helpful. Of the thousands of possible metrics, there are just a few that you really need to understand, but you need to understand them very well. If you don t understand it, you cannot manage it. Which transactions are actually long-running background transactions? Which ones will cost you money if they slow down? And while watching today s metrics is important, you need to watch the trends and be able to easily map your forecasted demand to your actual, so you can detect unplanned changes to your system. You need an easy-to-visualize reporting schema that lets you track the information you need at a detail level, while providing Web-based reports at a higher level to your stakeholders. 5

6 Part of this rule involves understanding the limits of a resource how hard can you drive it? In many cases, it could be workload dependent. A network may manage a very high volume on online transactions, but bog under the load of too many large downloads. The better you understand the characteristics of workloads and the demands they put on the various resources, the better you can manage your capacity and provide performance at an acceptable cost. Since you will only rarely be studying individual data points, an understanding of statistics is critical. What is your summarization level for your detail data? For many installations, one hour granularity is sufficient, but you can reduce it to your SMF recording interval, if your systems show a high degree of variance. By understanding statistics and what your tool provides, you will understand the granularity you need. Averaging 100,000 transactions over an 8-hour period will net results that are largely meaningless; consider setting a reasonable interval for summarization as well as creating meaningful groupings of data, such as shifts (day, evening, night). This is particularly critical as you try to make sense of historical data. Fortunately, once you decide what you want to do, CA MICS provides the functionality to customize summarization to help meet your reporting needs. And it offers robust historical and archiving processes so that you can more easily examine the past to understand resource utilization trends. Your goal is to detect and understand change and respond to it. Thus, you must be able to identify when something really changed and is not just an outlier. Again, CA MICS has the information you need to track and compare data points, offering you a deeper understanding of your systems. You can also opt to look at maximums and minimums and calculate standard deviations, if your knowledge of statistics is good enough. All of these help you validate the meaning of your data, ensuring that tuning decisions are correct. Use statistics when you are trying to correlate metrics understand which metrics are truly related? It is easy to note that CPU demand is correlated with transaction volume, but watch out for the max and min situation where dispatcher thrashing and low utilization effects respectively can impact that relationship. CA MICS typically creates and maintains minimums, maximums, true averages, and other derived statistical measurements in summarized views of data so that you don t have to calculate them while reporting. Rule 3: Expect the best, but plan for the worst Data analysis and proactive performance management are good practices, but you need to look into the future and plan for it. This requires two things: business forecasts and knowledge of business trends. CA MICS provides you with the data to help you understand key business trends across months and even years. As an example, those in banking know that credit card volumes peak right around Thanksgiving, falling off after December 24th. They also know that by June/July of the next year, those peaks become the norm, and so on. This information helps you calibrate business expectation and manage when you don t have the best forecasts. After all, business projections are just educated guesses. 6

7 The way to manage to this is to track forecasts versus actual using CA MICS. Then, it will be easier to know which lines of business tend to be very optimistic and which tend to project low and thus, fall short of the actual achieved volumes. It is critical to always take a look back at these projects; the business will not get blamed for SLA violations. You will. Look at all key resources. Most projections focus only on CPU utilization and while critical, it isn t the only component that can cause you resource constraints. You must also look at I/O performance (disk and tape), memory and your network. There are often correlations and tradeoffs in managing these resources you can use memory to avoid I/O, for example. But only a database that helps you understand these relationships will suffice. You also have to look at contention for resources, something that is difficult to do if you haven t mapped out your resources and business applications. Create 18 month capacity plans and keep rolling them up every quarter. But why plan so far out and so carefully? In most cases, you are pushing the limits of your resources now, with very little white space. And budget cycles are difficult; if you wait till you absolutely need a resource, the money may not be there for you. You will also lose negotiating power with your vendor if you come to them when your need is more desperate. You want to be able to plan, so you can negotiate with power and obtain the best price and performance you can. Capacity planning can be handled with the CA MICS Resource Management Capacity Planning Option (CA MICS CPO). With this tool, you can insert your forecasts and plot out the utilization of your resources, determining when you will need more. These same forecasts will be used to compare against actual as time proceeds. A recommendation is always to test out a few forecasts higher volumes than the business projects. It is not unusual to find that just past the forecast, you will run out of gas, and it is better to know this in the planning phase, so you can go back to the business and show them your results. Together, you can determine if you need to budget for the higher forecast. Keep in mind also your disaster recovery plans. If you increase the capacity of a primary resource, will its backup be sufficient? Once you have a plan, you want to keep a close eye on the system, looking for signs that the actuals aren t on track with the forecast. This gives you a chance to make midcourse corrections and go back to the business for updated plans. The CA MICS Capacity Planner provides a tool to evaluate your forecasts after some time has passed. It allows you to set a threshold (e.g., 10 percent), and can automatically alert you if newer data shows your forecast is deviating from the original projection! These observations will also add to your data on whose projections are good and whose aren t reliable. Business leaders planning campaigns to attract customers often make the mistake of overestimating who will respond to an offer. The DMA (Direct Marketing Association) has surveyed many industries and notes an average 2.1 percent response rate to mailed ads. But even with 7

8 this well-known statistic, many business people think that their campaign will be different. In sharp contrast, too often, web site launches underestimate the demand. The challenge is that when a site goes viral, it can happen overnight. Too many times, companies have lost business when their site could not respond to sudden spikes in demand (a good argument for cloud computing for these kinds of sites). Part of this rule is that you must create SLAs that are more aggressive than the official ones. This gives you time to react and tune as you see problems before they become problems for your users. By using this technique, I bought myself time for tuning and rarely had my pager go off. How great would that be! Conclusions Poorly tuned systems cost your company money. Customers abandon you for better performing web sites. Poorly allocated resources force premature capacity upgrades or impacts on other applications. While dealing with new challenges and trying to keep current on technology, it can be easy to forget some of the basic ways to make systems run fast and efficiently. CA MICS Resource Management can help you quickly go back to basics, netting improved performance and better price/performance with your key business applications. To learn more about the CA MICS Resource Management architecture and technical approach, visit ca.com/us/mainframe-resource-management + + CA Technologies is an IT management software and solutions company with expertise across all IT environments from mainframe and distributed, to virtual and cloud. CA Technologies manages and secures IT environments and enables customers to deliver more flexible IT services. CA Technologies innovative products and services provide the insight and control essential for IT organizations to power business agility. The majority of the Global Fortune 500 rely on CA Technologies to manage their evolving IT ecosystems. For additional information, visit CA Technologies at ca.com. + + Copyright 2010 CA. All rights reserved. All trademarks, trade names, service marks and logos referenced herein belong to their respective companies. ITIL is a Registered Trade Mark, and a Registered Community Trade Mark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. This document is for your informational purposes only. To the extent permitted by applicable law, CA provides this document As Is without warranty of any kind, including, without limitation, any implied warranties of merchantability or fitness for a particular purpose, or non-infringement. In no event will CA be liable for any loss or damage, direct or indirect, from the use of this document including, without limitation, lost profits, business interruption, goodwill or lost data, even if CA is expressly advised of such damages. CS0284_1210 8