High-Volume Web Site Performance Simulator for WebSphere

Size: px
Start display at page:

Download "High-Volume Web Site Performance Simulator for WebSphere"

Transcription

1 High-Volume Web Site Performance Simulator for WebSphere Authors: High -Volume Web Site Team Web address: ibm.com/websphere/developer/zones/hvws Technical Contact: Noshir Wadia Management Contact: Larry Hsiung Date: December 16, 2002 Status: Version 1.0a Abstract: This paper introduces the High-Volume Web Site Performance Simulator for WebSphere, an analytic queuing model that estimates the performance of a Web server based on workload patterns, performance objectives, and specified hardware and software. The results can be used as guidelines for configuration sizing.

2 Executive summary Clearly, more and more of your customers and employees are doing business on the Internet. Will your Web site ever be fast enough? Is there an affordable combination of hardware and software that will help you meet your response time targets? How can you know? Can you know ahead of time? How can you predict volumes when volumes are increasing but remain unpredictable? And it s not just about response time. Today s sites are typically multi-tiered and employ both horizontal and vertical scaling techniques. Each tier can contain different hardware from multiple vendors. The software and middleware used can be as diverse as the hardware. The challenge to relate performance of different combinations is considerable. To address these tough questions, IBM uses its High-Volume Web Site Performance Simulator for WebSphere. The HVWS Simulator is an analytic queuing model that estimates the performance and capacity of a Web server based on workload patterns, performance objectives, and specified hardware and software. It contains predefined workloads built with measurements of a variety of actual online customer applications including shopping, trading, banking, and others. The simulator includes special algorithms for sites with highly variable traffic and can perform what if analyses of performance and capacity. The simulator also has algorithms to recommend the optimum configuration for a given workload and specified objectives. The HVWS Simulator is updated regularly to support the most current hardware and software and additional workloads. IBM customers validate the simulator s models and algorithms during design, development, and test of each version. Many IBMers worldwide are trained to use the simulator to assist their customers in estimating what configuration will perform best for their specific workload type and volume. This paper introduces the technology and use of the simulator. 12/16/2003 2

3 Contents Executive summary... Contents... Introduction... Using the HVWS Simulator... Appendix A. Simulator input panels and sample output... References... Notices... Contributors /16/2003 3

4 Introduction The hardware and software structure of large Web sites is increasingly complex, and the behavior characteristics of the workloads are at best poorly understood, or at worst, essentially unknown because the workload has yet to be implemented. Even with this growing complexity, typical IT infrastructures can be analyzed and related models developed to assist in predicting and planning how to meet future requirements. IBM developed the High-Volume Web Site (HVWS) Performance Simulator for WebSphere to estimate the performance of complex configurations. The simulator has these key features: Applications are defined based on the intended uses of the Web site using the workload patterns associated with high-volume sites. Detailed knowledge of the workload characteristics is not necessary, although it can be used to increase the accuracy of the simulation. The simulator workload library contains measurements of real customer applications for these typical workload patterns: shopping, banking, brokerage, auction, portal, B2B, reservation system, and inventory management system. There is an additional pattern, user-defined, where the user can enter the characteristics of their specific application supported either by measurement data or documented (and verifiable) assumptions. The simulator includes built-in performance characteristics of selected pseries, xseries, zseries, and Sun models. The simulator displays performance results in sufficient detail to allow users to assess the adequacy of a given configuration for their requirements, and to provide insight into where the bottlenecks are likely to occur. This allows the simulator to be useful for planning capacity, evaluating infrastructure and workload changes, projecting Web site scalability, and reducing the cost of prototyping. The HVWS Simulator supports performance modeling based on a generalized view of the infrastructure options shown in Figure 1. The simulator enables users to adjust the number of tiers to allow a more accurate model of the site configuration under consideration. Clients Edge Server Three Tier Web Server Network Web Presentation Server Web Application Server DB Server Figure 1. Web server topology The HVWS Simulator estimates performance using an analytic model based on an enhanced version of the G/M/K queuing model. In this technique users are iteratively added to the system in an incremental fashion. Two complete sets of calculations are performed in the simulator. The base set of calculations uses built in (or user provided) measured data for CPU and disk I/O. A more conservative base plus contingency set of calculations is also performed by adding a user-supplied contingency factor to each of the measured data values before the calculation. Both results are then transformed to the target 12/16/2003 4

5 configuration using built-in scaling coefficients taken from industry standard benchmarks and measurements. At each step the calculated results are compared against the user-selected performance target(s) to determine if a target has been reached or if an early resource depletion (CPU or disk bandwidth) has occurred in any component of the infrastructure being evaluated. Resource depletion events signal the need for configuration adjustments and bring all calculations to a stop. Examination of the displayed provide the user with information about where the bottleneck occurred and what the aggregated load and performance indicators were. Using the HVWS Simulator You complete a series of steps on separate panels to define the application, hardware, software, and performance objectives, as depicted in Figure 2. Start Workload selected from library? Yes Select performance objectives No Define your own workload Provide required measurement data Define architecture Make configuration selections yourself? Yes Select hardware configuration No Estimate autonomic optimum configuration and performance Calculate results Analyze results and bottlenecks No Is the result acceptable? Exit Yes Figure 2. Overview of using the HVWS Performance Simulator Appendix A contains the primary panels used to input information. The model is designed to estimate the configuration to meet specific performance objectives. The estimated configuration does not include servers such as back up machines. You adjust the configuration by adding other servers that may be required. Examples of how the HVWS Simulator is used 12/16/2003 5

6 This section describes four customer engagements where the HVWS Simulator was used to guide the customer s selection. Online shopping A large retailer forecast a six-fold increase in demand for their online shopping application during the next holiday season. They asked IBM to estimate what would be needed to meet that demand. They also asked us to predict how much more workload the existing configuration would support. Their application server hardware was near end of life and their database server was at capacity. The current login rate was 30 per second. Using the HVWS Simulator s online shopping model against a login rate of 180 per second, we determined the current configuration would handle a login rate of 45 per second. We recommended new equipment (pseries 690s) that provided the capacity needed, reduced the footprint size, and reduced the total cost of ownership.. Online betting Our customer sought IBM s advice while selecting from three platforms they were considering for their online betting applications (AIX on pseries vs. Linux or Win32 on xseries). The platform must handle as many as 7400 concurrent users and make heavy use of caching. Seventy percent of the pages are classified as static or periodically changing static. The remaining thirty percent are dynamic or represent work needed to address cache invalidation events. We based the data collection on assumptions rather than actual measurements. The workload was a composite of publish/subscribe, shopping, self-service, and brokerage, so we used a three scenario user-defined workload pattern. The table below shows the specified workload characteristics and demonstrates how the effects of cache hits were removed from the workload. Scenario Pg / Sec % Page Hits Cache Hit % App Pg / Sec % App Pages CPU Secs SSL Login, Betting and Account Details Y Informational N User Customized N The CPU seconds and SSL values were used in the three scenario definitions and % App pages were used to specify the scenario ratios of the workload. Think time was elongated to compensate for cache satisfaction of up to 70% of the page hits. We applied the workload to each platform to determine how many components each required for a cost of ownership evaluation. We tuned each configuration to provide a two-second response time (90th percentile) with SSL turned on. We ran follow-on simulations using the pseries infrastructure to determine equipment needs for environments with partial and no caching. 12/16/2003 6

7 The simulator based results recommended 2 x p630 4-way machines with 4G RAM to handle the top end of the workload with less than 80% CPU utilization and subsecond response at over 100 page views per second. The uncached solutions required up to two more p630s. Fast-food chain A fast-food chain consisting of thousands of stores worldwide asked IBM to size an in-store order processing application based on WebSphere. The application runs in standalone mode except when linking occasionally to regional servers for inventory and human resource applications. The customer required that the application run on a single xseries. For the simulator, we specified the online shopping workload because the chain s workload pattern is similar to the buy scenario in online shopping. We used a measure of concurrent users in 1.8 minutes. We populated the model as follows: Set the workload pattern to online shopping with 100% buy transactions 0.15 minutes per page view, total 12 page views and 1.8 minute think time Performance target: user arrival rate : 0.1~0.2 user visits / sec Based on results returned from the simulator, we recommended an xseries way machine with 2G RAM to handle the top end of the workload with less than 70% CPU utilization. Online voting The customer was considering a design for an online voting application and asked IBM to estimate the cost of a WebSphere implementation. We based the data collection on assumptions rather than actual measurements. Using expected voter turnout, the customer estimated that 230 voters per second will be visiting the Web site, with an anticipated surge that can be ten times greater during short periods. We selected the quote scenario of the online trading workload pattern as appropriate for the proposed application. The recommended configuration: 3 x 4-way pseries 640-B80s for Web/WAS Tier and 1 x 4 way pseries 660 for the database tier. This configuration was estimated to satisfy anticipated voter turnout and give subsecond response time to voter s action. 12/16/2003 7

8 Appendix A. Simulator input panels and sample output This appendix contains the primary simulator input panels and sample output panels. Select workload pattern When you start the simulator, the panel below appears. Application type and workload pattern default to online shopping. You specify your project name, select your application type and workload pattern, and characterize your workload. Selecting User Defined displays the panel on the next page. 12/16/2003 8

9 Pressing CPU Serv. Time displays this panel. 12/16/2003 9

10 Specify performance objectives 12/16/

11 Specify the hardware used or projected for use 12/16/

12 Specify the software components used or projected to be used 12/16/

13 Calculate results Clicking Calculate Results calculates the estimated performance based on the workload, objectives, hardware, and software and displays the results in separate windows as shown below. 12/16/

14 Copyright IBM Corporation /16/ HVWS Performance Simulator for WebSphere

15 Graph results Clicking Graph Results displays a graph of the calculated results. 12/16/

16 Display pie chart Clicking either pie chart tab displays a pie chart of the components of the total response time for a typical Web interaction at the performance objective. 12/16/

17 References 1. See all the IBM High-Volume Web Site white papers at 2. The WebSphere Application Server Performance Web site provides a centralized access to many helpful performance reports, tools and downloads. See 3. Arnold O. Allen, Probability, Statistics, and Queuing Theory with Computer Science Applications, 1978 by Academic Press, Inc. 4. Stacey Joines, Ruth Willenborg, Ken Hygh, Performance Analysis for Java Web Sites, 2003 by Pearson Education, Inc. 5. Leonard Kleinrock, Queueing Systems, Volume 2: Computer Applications, 1976 by John Wiley and Sons, Inc. 6. M. Ajmone Marsan, G. Balbo, G. Conte, Performance Models of Mutiprocessor Systems, 1986 by The Massachusetts Institute of Technology 7. IBM Redbooks have additional WebSphere performance information at ibm.com/redbooks.nsf/portals/websphere 12/16/

18 Notices Trademarks The following are trademarks of International Business Machines Corporation in the United States, other countries, or both: AIX DB2 IBM iseries pseries RS/6000 WebSphere xseries zseries Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. Other company, product, and service names may be trademarks or service marks of others. Contributors The High-Volume Web Site team is grateful to the major contributors to this article: Yin Chen, Susan Holic, Mike Ignatowski, Noshir Wadia, Jack Woodson, Peng Ye Special Notice The information contained in this document has not been submitted to any formal IBM test and is distributed AS IS. The use of this information or the implementation of any of these techniques is a customer responsibility and depends on the customer s ability to evaluate and integrate them into the customer s operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk. 12/16/