z/vm Capacity Planning Overview

Similar documents
z/vm Capacity Planning Overview SHARE 120 San Francisco Session 12476

zenterprise Platform Performance Management: Overview

z/vm Platform Update Session 9447

Capacity Planning. Iván H. Alonso ztivoli IT Specialist. 3 de Marzo, de Marzo, IBM Corporation

Case Study: Emergency Scan

SHARE Anaheim Session 8440 February 28, z/vm Platform Update. Alan Altmark, IBM John Franciscovich, IBM

IBM zenterprise 196 Unified Resource Manager Hands-On Lab

CF Activity Report Review

IBM zenterprise System Linux and Virtualization Trends & Directions

IBM Wave for z/vm v1.1 for Oracle Intelligent Visualization Simplified Monitoring Unified Management IBM Corporation

Integrated Service Management

IBM Tivoli OMEGAMON XE on z/vm and Linux

Predictive Failure Analysis and Runtime Diagnostics Roundtable

Agenda. z/vm and Linux Performance Management. Customer Presentation. New Product Overview. Opportunity. Current products. Future product.

z/tpf and z13 Mark Gambino, TPF Development Lab March 23, 2015 TPFUG Dallas, TX

z/vm: Virtualization you can trust z/vm Platform Update John Franciscovich 2011 IBM Corporation

z/os WLM What Are You Thinking?

IBM Virtualization Manager Xen Summit, April 2007

Enterprise Key Management Foundation - The Future of Crypto Key Management Share San Francisco, CA February, 2013

z/os WLM What Are You Thinking?

Sizing SAP Central Process Scheduling 8.0 by Redwood

Performance Changes/Enhancements

IBM Director 5.20 for IBM i

To MIPS or Not to MIPS. That is the CP Question!

The MLC Cost Reduction Cookbook. Scott Chapman Enterprise Performance Strategies

z Systems The Lowest Cost Platform

Introduction to IBM Insight for Oracle and SAP

To MIPS or Not to MIPS. That is the CP Question!

IBM z13 Technical Innovation

IBM Tivoli Composite Application Manager for Applications Diagnostics

White Paper. Using Linux on z/vm to Meet the Challenges of the 21st Century

The IBM and Oracle alliance. Power architecture

Service Management: the Next Step after the Basics of Performance & Tuning

IBM Data Management Services Softek Replicator UNIX technical overview Data protection with replication for increased availability

IBM Tivoli Monitoring

Virtualization and Systems Management of z/vm and Linux with IBM Wave for z/vm

Inside IBM Tivoli Storage Productivity Center 4.1

Advanced z/vm Systems Management with IBM Wave for z/vm

Running Linux-HA on a IBM System z

Innovative solutions to simplify your business. IBM System i5 Family

IBM zenterprise Opens New Horizons for SAP Customers

z/os Performance Case Studies on zhpf & Coupling Facility Paulus Usong, IntelliMagic Performance Consultant

Managing CPU Utilization with WLM Resource Groups

IBM Tivoli Intelligent Orchestrator

A Fresh Look at the Mainframe

IBM System z196, z114 and z10 Capacity on Demand IBM Corporation

IBM i Reduce complexity and enhance productivity with the world s first POWER5-based server. Highlights

Improve Service Levels with Enhanced Data Analysis

IBM TotalStorage DS8000 series

IBM Tivoli OMEGAMON XE for. WebSphere Business Integration. Optimize management of your messaging infrastructure. Highlights

IBM Oracle ICC. IBM s Live Partition Mobility (LPM): Oracle s Database Support

IBM WebSphere Extended Deployment, Version 5.1

Asking the Hard Questions about Open Source Software

IBM Spectrum Scale. Transparent Cloud Tiering Deep Dive

Subtitle: Continuous Data Collection & Tivoli Enterprise Monitoring

Don t Fear the Crisis

... Oracle s PeopleSoft Enterprise Human Capital Management upgrade planning Tools and techniques for the upgrade decision

16466: Advanced z/vm Systems Management with IBM Wave for z/vm

IBM xseries 430. Versatile, scalable workload management. Provides unmatched flexibility with an Intel architecture and open systems foundation

Windows Server Capacity Management 101

z/os WLM Are You Set Up to Fail?

zenterprise Unified Resource Manager

Featuring, Remote Systems Explorer, Integrated Debugger, Support for IBM i Service Entry Points

IBM Tivoli Workload Scheduler

zenterprise Economics

Business Resilience & Compliance

Oracle Financial Services Revenue Management and Billing V2.3 Performance Stress Test on Exalogic X3-2 & Exadata X3-2

BIT600. SAP Business Workflow: Concepts, Inboxes, and Template Usage COURSE OUTLINE. Course Version: 15 Course Duration: 2 Day(s)

System i Navigator family applications

Wie unsere neuen Lösungen die Herausforderungen der zunehmenden Digitalisierung meistern

A technical discussion of performance and availability December IBM Tivoli Monitoring solutions for performance and availability

Session 2.9: Tivoli Process Managers

LOWERING MAINFRAME TCO THROUGH ziip SPECIALTY ENGINE EXPLOITATION

Sizing SAP Quality Issue Management, Version 1.0

Using the IBM Cloud App Management Database Load Projections Spreadsheet

IBM Smarter systems for a smarter planet IBM workload optimized systems

Built on proven technology

Entry Workload License Charges

IBM System z10 Business Class

Challenges of MSU Capping w/o Impacting SLAs

Preparing to Run Oracle E-Business Suite on System z and Oracle 11gR2 New Features

Learn How To Implement Cloud on System z. Delivering and optimizing private cloud on System z with Integrated Service Management

MVSS Project Opening Keynote

Gabriel India Ltd builds a single platform for group-wide ERP with IBM and SAP

IBM. Processor Migration Capacity Analysis in a Production Environment. Kathy Walsh. White Paper January 19, Version 1.2

13053: Is that a Mainframe in Your Pocket? (Discussion Session) ZSP03047-USEN-05

Service Management for the Mobile Mainframe Delivered via Cloud Lunch and Learn

ADM540. Database Administration SAP ASE for SAP Business Suite COURSE OUTLINE. Course Version: 16 Course Duration: 3 Day(s)

UrbanCode Deploy. IBM z/ TPF DevOps - Taskforce. IBM z/tpf. Jesus Galvez. April 12, z/tpf Software Engineer

IBM A Accessment: Power Systems with POWER7 Common Sales Skills -v2.

<Insert Picture Here> Oracle Exalogic Elastic Cloud: Revolutionizing the Datacenter

Using SAP with HP Virtualization and Partitioning

Addition of Cristie Bare Machine Recovery for virtual machines and additional platform-specific feature numbers

IBM SmartCloud Control Desk: High Availability and Disaster Recovery Configurations IBM Redbooks Solution Guide

MOB320. SAP Agentry Work Manager for IBM Maximo COURSE OUTLINE. Course Version: 10 Course Duration: 5 Day(s)

SCM550. Cross-Functional Customizing in Materials Management COURSE OUTLINE. Course Version: 10 Course Duration: 5 Day(s)

IBM System z9 Business Class

Oracle JD Edwards on IBM System i

IBM Tape Manager for z/vm helps automate daily tape operations, increasing data availability and improving administrator productivity

three performance rules with CA MICS Resource Management

Transcription:

Making systems practical and profitable for customers through virtualization and its exploitation. - z/vm z/vm Capacity Planning Overview Bill Bitner z/vm Customer Care and Focus bitnerb@us.ibm.com

Trademarks The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. IBM* IBM Logo* DB2* Dynamic Infrastructure* GDPS* HiperSockets Parallel Sysplex* RACF* System z* System z10* Tivoli* z10 BC z9* z/os* z/vm* z/vse zenterprise* System z196 System z114 * Registered trademarks of IBM Corporation The following are trademarks or registered trademarks of other companies. OpenSolaris, Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. INFINIBAND, InfiniBand Trade Association and the INFINIBAND design marks are trademarks and/or service marks of the INFINIBAND Trade Association. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. All other products may be trademarks or registered trademarks of their respective companies. Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions. This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Information about non-ibm products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-ibm products. Questions on the capabilities of non-ibm products should be addressed to the suppliers of those products. Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography. 2

Introduction Efficiency of one. Flexibility of Many. 40 years of virtualization. Objectives: Provoke thought and ultimately action about Capacity Planning Concepts and approaches will be covered, not so much the mechanics Companion Piece: z/vm Performance Metrics also available from the author Time permitting dialogue on what can IBM do to help in this space? 3

Companion Piece z/vm Performance Metrics Lists the top 50+ metrics that we find useful, along with descriptions on them Where appropriate, includes which: monitor record contains the information Performance Toolkit report displays the information OMEGAMON XE workspace for managing the information Rules of Thumb given in some cases 4

More to Performance Efficiency of one. Flexibility of Many. 40 years of virtualization. Performance Management (Tuning) Performance Engineering (Test Analysis) Capacity Planning Business Planning 5

Real vs. Virtual Planning Capacity planning for 'real' resources is one thing, but how do we incorporate 'virtual' resources? Need to address Overhead or management costs Define 'overhead' z/vm Control Program processor time? System Management virtual machines? Virtual Free Need to address peaks Averages alone are not sufficient Define what is acceptable capacity lag or impact to SLA Acceptable overcommitment of resources is very dependent on: Workload Environment SLA 6

Looking at Resources Efficiency of one. Flexibility of Many. 40 years of virtualization. Utilization Metrics: metrics of interest to determine utilization & distribution of resources Indicator Metrics: metrics that relate to thresholds or the degree of constraint and pain the system is expressing Quality Measures: something that indicates workload and response time and whatever is important to the business Well defined and defined throughout all the disciplines Something that can be mapped to other metrics to indicate a sweet spot Quality Measure Resource or Trigger Metric 7

Real Processor Resources Real Processor Resources are perhaps easiest to measure and manage Utilization Metrics: LPAR Overhead Time System CPU Time CP CPU time associated with virtual machines Virtual CPU time associated with virtual machines Indicator Metrics: System Spin Time (Wall clock, not processor measure) LPAR Suspend Time CPU Wait Need to handle Specialty Engines Measure each type Mixed speeds? Changing speeds? Keep in mind processor resource limits can pop up elsewhere e.g. Both ends of a HiperSockets connection require processor Compare or prorate based on workload (transaction rates). 8

Virtual Processor Resources Utilization Metrics: CP CPU Time Virtual CPU Time Total CPU Time Potentially also include: Processes within Linux Linux Steal time At very least have the above available from Performance Engineering for comparison if z/vm totals look abnormal. Again, make accommodations for Specialty Engines Real and Virtual Indicator Metrics: CPU Wait Diagnose x'44' Diagnose x'9c 9

Linux View: %Steal Efficiency of one. Flexibility of Many. 40 years of virtualization. Current Linux distributions (RHEL 5 & SLES 10) report %Steal as well as pct User and pct System %Steal: Linux view of percent of time that it had work to run but was unable to run. z/vm was dispatching other virtual machines, compare to %CPU Wait in z/vm state sampling. z/vm was executing on behalf of the Linux virtual processor, compare to CP CPU usage of the Linux virtual machine Linux yielded its time slice to z/vm via diagnose 0x9C instead of spinning on a formal spin lock. Examine diagnose rates. The z/vm partition was unable to run due to another logical partition being dispatched at the LPAR level. 10

Other Processor Planning Thoughts Need to have some measure of work or throughput Best to determine cost per <something meaningful to everyone> Performance Engineering Business Planning Performance Management Capacity Planning Establish in Performance Engineering testing what the target cost / transaction Bring in Business Planning to determine target or range of transaction load Capacity Planning projects requirements based on above two Performance Management folks can help identify problems when things do not track. This is a continuous process I prefer computing CPU seconds, but if you want to convert to some MIPS number or IFLS or Computing Units feel free. Just make sure everyone uses the same conversion numbers. 11

Real Memory Efficiency of one. Flexibility of Many. 40 years of virtualization. Utilization Metrics: NonPageable Pageable Minidisk Cache Misc Don't forget Expanded Storage Indicator Metrics: Emergency Scan on Demand Scan Emergency Scan failures Available List(s) going empty 12

Virtual Memory Types of Virtual Memory: Virtual Machines NSS/DCSS Virtual Disks in Storage PTRM and other System Utility Spaces Virtual Machine Utilization Metrics: Defined virtual memory Backed virtual memory Resident virtual memory Estimated WSS Indicator Metrics: Paging Rates (Reads and Writes) Loading User Page Wait (Asynchronous and Synchronous) A guest pages may exist on both DASD and Real Memory Private DCSSs are considered part of the virtual machine for most metrics, while Shared DCSSs have their own metrics. 13

Memory Overcommitment Gather data to determine a curve such as below Performance Engineering Tracking Production Artificially limiting amount of real memory Result is a Virtual to Real ratio for your workload that is edge of green/yellow. For example, lets say it is 1.8. If you are going to increase workload by adding 30GB of virtual, then you need to add real memory to keep the ratio at 1.8 or lower. 14 Quality Measure Virtual to Real Ratio

Other Thoughts on Memory Planning What is the right 'over commitment' number? It depends. Definition of virtual and real in the question. Constraints of SLAs All transactions sub-second vs. 99% of transactions sub-second How much can performance features improve things Workloads that are sized poorly at start have more room for improvement Paging configuration See http://www.ibm.com/vm/perf/tips/memory.html for additional information 15

Network Session of its own Real Level Virtual Level Link Aggregation Aggregates total sessions across multiple OSD chpids Does not spread load of a single TCP/IP application session across those chpids. Beware of measuring a single session as multiple sessions is where the bandwidth value is. Limits/Thresholds/Quality Measures 16

Real I/O Utilization Metrics: Channel Utilization Device I/O Rates System I/Os User Driven I/Os Device Utilization Access Density (I/Os per GB space) Indicator Metrics: Device Queuing Error Rates IOP Statistics 17

Virtual I/O Utilization Metrics Virtual I/O per Guest I/Os avoided due to MDC or VDisk Indicator Metrics Various levels in software stack where I/O queuing can occur Virtual I/O to Virtual CPU Ratio Caution: %IOA (Asynchronous I/O Wait) in Performance Toolkit and similar field in Linux includes time of I/O processing. I/O is relatively slow %IOA may only show up when there is CPU activity or wait on CPU, so a high %IOA isn t necessarily bad. 18

SSI: Capacity Planning Great flexibility in managing multiple LPARs Previously, if you split work across LPARs and had an imbalance, it was more difficult to rebalance With SSI, virtual machines can run anywhere in the cluster without a lot of additional work Greater responsibility in planning, at two levels Individual members Need to ensure sufficient capacity and resources for the workload on each member Track growth in requirements to limits of the member Cluster-wide Track growth in requirements of overall cluster to the limits of that cluster Need to ensure sufficient white space for planned outages where LGR will be used to move workload out of a given member. The Getting Started With Linux book has been updated with SSI and LGR planning tips. 19

SSI & LGR: Planning White Space Need white space for planned outages where you move work off of a given member. How will work move off the member? Use existing HA solutions to redirect work to existing servers on other members or elsewhere in enterprise. Use LGR to move to another member. Log off and then logon to another member. Shutdown non-critical virtual machine for duration of unplanned outage. To where do you move the virtual machines? To a single member or multiple members? To a member on same CEC or another CEC? To a member held in reserve (such as a DR LPAR)? It s not just one z/vm image anymore 20

Other Considerations for Planning The bucket gets heavier as you add water. Destination system may become more constrained as you continue to relocate virtual machines to it. Get the big rocks in first. In general, it is better to move the virtual machines generating the greatest memory load first. Larger virtual machines Virtual machines with higher page change rate 21

SSI & LGR: Planning White Space CPU Shared logical processors? Adjust LPAR weight settings? Vary on additional engines? I/O Ensure sufficient resources at all levels: Channel, switch, control unit, device Shared channels? Memory white space is not as easy to manage Ensure sufficient paging space and concurrency or data rate capability Increase real memory over commitment? Temporarily decrease size of some virtual machines? Use Dynamic Memory Upgrade? No downgrade available 22

Data Collection Considerations Keep all groups in mind and in agreement The value of your data increases when it can be combined with other data. Volume of data Retention time Granularity or interval of data Correlation with other data Time zone considerations Terminology 23

Methods of data collection Do It Yourself Performance Toolkit Summary/Trend/Histlog OMEGAMON XE Shipping to z/os 24

25

IBM Techline: Complete topology from z/vm data 26

IBM Techline: Example of CPU Analysis 27

IBM Techline: Memory Summary Example 28

IBM Techline Support Efficiency of one. Flexibility of Many. 40 years of virtualization. IBM Techline Support z/vm Capacity Planning http://www.ibm.com/support/techdocs/atsmastr.nsf/webindex/prs2875 Contact your IBMer for in-depth analysis See free tools such as zpcr for processor sizing http://www.ibm.com/support/techdocs/atsmastr.nsf/webindex/prs1381 Thanks to following for info on Techline and ATS Offerings: Gretchen Frye Liz Holland 29

Summary You have to Plan to do Capacity Planning if you want to do it successfully Otherwise, it becomes Capacity Scrambling Lots of resources available to help from IBM and Others 30

Contact Information: Bill Bitner IBM Endicott bitnerb@us.ibm.com +1.607.429.3286 SESSION 11891 31