Session 40A Application Performance and How Apdex Makes it Better. Application Performance and How Apdex Makes it Better. Session 40A.

Size: px
Start display at page:

Download "Session 40A Application Performance and How Apdex Makes it Better. Application Performance and How Apdex Makes it Better. Session 40A."

Transcription

1 Application Performance and How Apdex Makes it Better Session 40A CMG International Conference San Diego, California December 5, 2007 Peter Sevcik NetForecast, Inc. 955 Emerson Drive Charlottesville, VA Apdex Symposium 8:00-9:00 40A Application Performance and How Apdex Makes it Better 9:15-10:15 41A Apdex Process 1:15-2:15 43A Measurement Tools and Reports 2:45-3:45 44A Setting Performance Objectives Using Apdex 4:00-5:00 45A Apdex Case Studies We thank the Contributing Members for their financial support of the Alliance 2007, Apdex Alliance, Inc. All rights reserved. Slide 2

2 Peter Sevcik and NetForecast, Inc., All rights reserved. Outline BSM = TSM + APM Apdex 2007, Apdex Alliance, Inc. All rights reserved. Slide 3 Application Delivery Complexity What Matters: Task = User hits Enter to system Responds Single User Traffic Flow Flow Processing Point OS OS OS app WANs Access Firewalls & IDS LAN Switches & Routers Load Balancing System Servers Resource Pools Virtual Machines Applications Technology Silos 2007, Apdex Alliance, Inc. All rights reserved. Slide 4

3 Many Columns The datacenter now houses: Access system (firewalls, intrusion detection) LAN (many layers of Gigabit Ethernet switches) Load balancers (server directors, TCP shaping, SSL offload) Web servers Application servers Database servers SAN (storage area network) Storage However, many of the servers are in fact operating within virtual machines. So there is the added complexity of: Virtual machines VM resource pools Let us not forget the: Client (any kind of machine running any kind of browser) Network (private or Internet) 2007, Apdex Alliance, Inc. All rights reserved. Slide 5 Many Rows Tasks: Users interact with an application one task at a time From the Apdex Specification: Task time is measured from the moment the user enters an application query, command, function, etc., that requires a server response to the moment the user receives the response such that they can proceed with the application. Often called the user wait time or application response time. Turns: Tasks comprise many turns and a lot of payload Each application client-server software interaction needed to generate a user response or task. These software-level client-server interactions add to the time it takes for the software to complete a task. A turn is a client-server request-driven round-trip. Often called application chattiness. Web-based B-B application turns have grown 1995: 20 turns per task 2007: 88 turns per task Tasks are on a 13% compound annual growth rate with no end in sight 2007, Apdex Alliance, Inc. All rights reserved. Slide 6

4 Application Profiles Sample 1,000 Application Turns per Task (count) SAP Web SAP GUI MAPI CIFS WEB 2002 WEB ,000 10K 100K 1M 10M 2007, Apdex Alliance, Inc. All rights reserved. Payload per Task (Bytes) Slide 7 How Applications Are Delivered Technologies Resources Silos Assets Columns Many Columns A typical user may touch 12 subsystems to execute a task Flows Traffic Turns (app round trips) User Experience Rows Many Rows A typical task may require turns to execute a task 2007, Apdex Alliance, Inc. All rights reserved. Slide 8

5 Visualizing How an Application Works for One User (Rows and Columns) Each column represents the device used within each technology 12 Technologies 30 Turns Each row represents a turn within each task within the flow 2007, Apdex Alliance, Inc. All rights reserved. Slide 9 Two Views of Performance Performance Delivered Task A 12 Technologies Note: Each task requires 10 turns, or rows 30 Turns Task B This task had significantly poorer performance with 4-times the number of reds Task C Every column delivered the same performance 2007, Apdex Alliance, Inc. All rights reserved. Slide 10

6 Real World Examples Column Row City Traffic Lights Each light is operating, no roads are congested, all must be well. The lights are synchronized to maximize traffic flow and each car gets through the city faster. Package Delivery Credit Card Service Airline Service The postal service keeps their offices open and their carriers are punctual. You post a parcel and it arrives at the other end. The credit card company checks your credit, processes debits and payments, keeps a strict set of rules on how you must pay. Private pilots fly in and out of general aviation terminals that are well maintained. They do not have to file a flight plan. FedEx provides on-line 24-hour detailed tracking of your package from door-to-door. The card company keeps track of your buying habits. It calls you when there is a suspicious charge and cancels the charge. Commercial flights have strict flight rules. The FAA tracks every plane throughout the flight. 2007, Apdex Alliance, Inc. All rights reserved. Slide 11 Service Delivery Frameworks ITIL and ITSM (Columns) The IT Infrastructure Library (ITIL) describes the organisation of IT resources to deliver business value, and documents processes, functions and roles in IT Service Management (ITSM) ITSM (IT Service Management) is focused on managing technologies and software not user flows Application Performance Management APM (Rows) is missing in ITIL and ITSM APM is the art and science of making applications run well by linking application performance goals to business objectives Applied during ongoing operations of a business application APM provides process and tools to manage user flows APM is required for an end-user experience SLA 2007, Apdex Alliance, Inc. All rights reserved. Slide 12

7 ITIL Processes Related to APM Service Operation Event Management Incident Management Request Fulfillment Problem Management Access Management Operational Activities of Processes Change Management Configuration Management Release and Deployment Management Capacity Management Continuity Management Availability Management Knowledge Management Financial Management for IT Services Continual Service Improvement Service Level Management Minimal set of processes that relate to ongoing APM delivery quality ITIL v3 published in May 2007 comprises 5 volumes: Service Strategy Service Design Service Transition Service Operation Continual Service Improvement 2007, Apdex Alliance, Inc. All rights reserved. Slide 13 ITIL Service Delivery Applied to Flows Incident Management Loss of user access, loss of service within a geographic region, slow performance, software incompatibility (client and server failing to communicate), missing cookies, can t acquire address or credentials, etc. Availability Management All authorized user access methods (wireline, cell service, WiFi) are working, client-server connections can be made, all devices on a flow path are operating, SSL keys are installed and certified, alternate routing, etc. Capacity Management Sufficient bandwidth for each flow, QoS and precedence handling, load balancing, traffic control, sufficient TCP connection pools, latency within application needs, proper application acceleration techniques applied, etc. Service Level Management Flow characteristics are known and supported, user response time supports the business function, voice services meet quality standards, videoconferencing supports business function, etc. 2007, Apdex Alliance, Inc. All rights reserved. Slide 14

8 Different Reports on the Same Infrastructure Technology Service Management Application Performance Management Service Service Level Service Level Level Service Service Level Service Level Level Capacity Capacity Capacity Availability Availability Availability Core Devices And Systems Capacity Capacity Capacity Availability Availability Availability Incident Incident Incident Incident Incident Incident Per technology, device, subsystem, system Per flow, application, user group, location 2007, Apdex Alliance, Inc. All rights reserved. Slide 15 The Views Diverge With Sophistication Technology Service Management Application Performance Management Srvc Level Srvc Level Capacity Capacity Availability Availability Asset Centric Incident Incident Common Attributes User Centric 2007, Apdex Alliance, Inc. All rights reserved. Slide 16

9 Diverging Granularity Detail helps capacity planning and ROI Utilization of each subsystem (memory, disk, etc) Detail helps link to business and SLA Individual user flows (resp time, VoIP quality, etc) More Granularity Utilization of each Device Actual load Actual traffic Source-Destination flows by location Asset Centric Predicted aggregate load Common Attributes User Centric 2007, Apdex Alliance, Inc. All rights reserved. Slide 17 Stages of Maturity Reactive Diagnosis Diagnostic tools, problem triage, fault identification Proactive Intervention Ongoing measurement, trend analysis, resource planning Quality Warranty Service objectives are stated Portfolio Management Managing many business applications as a group 2007, Apdex Alliance, Inc. All rights reserved. Slide 18

10 APM Value Framework Incident Availability Capacity Service Level Portfolio Management Quality Warranty Proactive Intervention Reactive Diagnosis 2007, Apdex Alliance, Inc. All rights reserved. Slide 19 BSM Business Service Management TSM Technology Service Management Technologies APM Application Performance Management Flows Incident Availability Availability Capacity Service Level Incident Capacity Service Level Portfolio Management Quality Warranty Proactive Intervention Reactive Diagnosis 2007, Apdex Alliance, Inc. All rights reserved. Slide 20

11 Stereoscopic View Stereoscopic views of IT ITSM assumes that if service is available, the user experience is satisfactory APM assumes that if response time is satisfactory, the service is available Each image is convincing but each is incomplete The 3-D effect adds clarity and new information not seen in each image alone Good BSM needs both ITSM is a necessary but insufficient service model APM can t exist without a good TSM foundation 2007, Apdex Alliance, Inc. All rights reserved. Slide 21 BSM Framework Supported TSM Technology Service Management Technologies APM Application Performance Management Flows Incident Availability Availability Capacity Service Level Incident Capacity Service Level Portfolio Management Apdex Applies Quality Warranty Proactive Intervention More Value Should be balanced More Value App SLA Reactive Diagnosis 2007, Apdex Alliance, Inc. All rights reserved. Slide 22

12 Peter Sevcik and NetForecast, Inc., All rights reserved. Outline BSM = TSM + APM Apdex 2007, Apdex Alliance, Inc. All rights reserved. Slide 23 Today s Problem: Many Numbers, Little Insight Which application is in trouble? Measured Response Time (seconds) App A App B App C App D App E Day Average Best Hour Worst Hour 95 th Percentile , Apdex Alliance, Inc. All rights reserved. Slide 24

13 Example: 100 Numbers Start with what you have Your measurement tool produced 100 samples The samples are Single application User-level response time measurements One hour period of observation Is the application operating well? 2007, Apdex Alliance, Inc. All rights reserved. Slide 25 1 Numbers Beget Numbers Number of Samples in Time Period sec Average 7.6 Median 3.9 Mode 4.8 Standard Deviation th Percentile 22.5 Minimum 1.3 Maximum 59.6 Now you have 137 numbers. Can you answer the question, Is the application operating well? Incremental Time Period (sec) 2007, Apdex Alliance, Inc. All rights reserved. Slide 26 2

14 Apdex Defined Apdex is a numerical measure of user satisfaction with the performance of enterprise applications It defines a method that converts many measurements into one number Uniform 0-1 scale, 0 = no users satisfied, 1 = all users satisfied Standardized method It is a comparable metric across all applications, and Across enterprises 2007, Apdex Alliance, Inc. All rights reserved. Slide 27 Deconstructing Application Transactions Session = Period of time that a user is connected to an application Start the application End or suspend the application Process = A group of user interactions that accomplish a goal Get new , add an employee, check on inventory status, etc. Idle Task = Each interaction with the application during the session Type or choose User waits User reads or thinks Type Wait Read Enter or click System responds Enter or click System responds 2007, Apdex Alliance, Inc. All rights reserved. Slide 28

15 Deconstructing Application Transactions (con t) Turn = Each application client-and-server software interaction needed to generate a system response Wait Protocol = Each TCP Open, ACK, retransmission, etc, required to operate a Turn and move Payload Packet = Each packet as seen on the wire in support of the above 2007, Apdex Alliance, Inc. All rights reserved. Slide 29 The Task Defined Task response time is the elapsed time required for an application system to respond to a human user input such that the user can effectively proceed with the process they are trying to accomplish Time when the user is waiting in order to proceed User feels the responsiveness of the application Long Task time makes the user less productive The Task is what a user can time with a stopwatch 2007, Apdex Alliance, Inc. All rights reserved. Slide 30

16 How Users View Application Task Performance Satisfied User maintains concentration Performance is not a factor in the user experience Time limit threshold is unknowingly set by users and is consistent Tolerating Concentration is impaired Performance is now a factor in the user experience User will notice how long it is taking Frustrated Performance is typically called unacceptable Casual user may abandon the process Production user is very likely to stop working 2007, Apdex Alliance, Inc. All rights reserved. Slide 31 Example Probability of Experiencing the Time 16% 14% 12% 10% 8% 6% 4% 2% 52% Satisfied 42% Tolerating 6% Frustrated 0% Load Time of a Typical Business Page (sec) 2007, Apdex Alliance, Inc. All rights reserved. Slide 32

17 How Apdex Works Start with a sufficient number of Task measurement samples Target response time T defines the satisfied zone (0-T sec) T is shown as a subscript of all Apdex values (for example 0.80 T ) Count the number of samples within three performance zones Satisfied, Tolerating, Frustrated Given Target response time T and Sufficient response time measurement samples Then Tolerating count Satisfied count + 2 Apdex T = Total samples Note Frustrated samples are not in numerator but are counted in total samples Index 0 = Failure; 1 = Perfection (all users satisfied) 2007, Apdex Alliance, Inc. All rights reserved. Slide 33 Putting it All Together 2 Report Group: Application User Group Time Period Existing Task Response Time Measurement Samples 3 1. Define T for the application T = the application target time (threshold between satisfied and tolerating users). F = threshold between tolerating and frustrated users is calculated (F = 4T). 2. Define a Report Group (details available are tool dependent). 3. Extract data set from existing measurements for Report Group. 4. Count the number of samples in three performance zones. 5. Calculate the Apdex formula. 6. Display Apdex result (T is always shown as part of the result). Excellent Good Fair 1.00 T 0.94 T 0.85 T 0.70 T 1 Frustrated F Tolerating T 4 5 Tolerating Satisfied + 2 Apdex T = Total samples Poor 0.50 T Satisfied 0.00 T 2007, Apdex Alliance, Inc. All rights reserved. Slide 34 6 Unacceptable

18 The Apdex View of the 100 Numbers User productivity is impaired if the application responds in more than 8 seconds T = 8 sec Apdex for the 100 measurements = The application barely providing Good performance 100 numbers = Excellent Good Fair Poor Unacceptable 1.00 T 0.94 T 0.85 T 0.70 T 0.50 T 0.00 T 2007, Apdex Alliance, Inc. All rights reserved. Slide 35 4 Apdex Highlights the Long-Tail Probability of Experiencing the Time 16% 14% 12% 10% 8% 6% 4% 2% Major ecommerce site ($4B annual on-line sales) North American broadband users accessing the San Francisco data center 52% Satisfied 42% Tolerating This site had an average Keynote response time of 4 seconds, so it looked like all was well But: Apdex = = Fair 6% Frustrated 0% Load Time of a Typical Business Page (sec) 2007, Apdex Alliance, Inc. All rights reserved. Slide 36

19 Apdex Benefits Process Forces a process by which the enterprise becomes focused on the important performance management issues Simplicity Converts mountains of existing response time measurements into a simple value that can be easily understood by non-technical managers Business Linkage Offers a clear picture of how well the IT infrastructure is really performing in support of specific business objectives Open Standard Processes and results that can be applied across industries and applications 2007, Apdex Alliance, Inc. All rights reserved. Slide 37 Getting Started with Apdex Start measuring and reporting Pick any application and use any tool you can 17+ vendors can supply the data necessary to calculate Apdex reports Currently 5 vendors generate Apdex reports within their product Getting some data is better than no data It serves as a foundation for planning and budgeting Full benefit of Apdex requires a process Involve key stakeholders Formal dialog on Apdex terms Benchmark your APM capabilities before and after an Apdex pilot Plan for application SLAs based on Apdex Set the stage for continual APM quality improvement 2007, Apdex Alliance, Inc. All rights reserved. Slide 38

20 Apdex Methodology Gather Agree Improve Measure Gather baseline data Test Apdex Parameters 1) Work with users to set T A T 2) Work with business managers to define the service objective for A Apdex reports & trends Improve performance where needed Continual performance & process improvement 2007, Apdex Alliance, Inc. All rights reserved. Slide 39 The Apdex Alliance Apdex Alliance is a non-profit industry alliance Open collaborative approach Contributing Member Annual dues Supporting Member Free Individual interested in applying Apdex within their organization Currently more than 700 members Apdex is FREE You can use Apdex in your organization or a commercial product You or your organization do not need to join Apdex to use Apdex You join to learn more and be plugged into the community 2007, Apdex Alliance, Inc. All rights reserved. Slide 40

21 Leveraging the Apdex Alliance Contributing Members Are promoted as sponsors of the Alliance Receive support in implementing Apdex reporting All Members (Contributing and Supporting) Receive a quarterly newsletter Have access to formal documents and news on the web site Can participate in the Apdex Exchange (Google Group) discussion board All members can choose to join the discussion board You can receive all postings, summaries, or only see material when you log in There are more than 100 people in the Exchange The numbers: 7 Contributing, 700 Supporting, 100 in the Exchange 2007, Apdex Alliance, Inc. All rights reserved. Slide 41 Apdex Alliance Member Survey 2007 Web-based survey 70 responses Number of employees 15% less than % 500 to 10,000 48% more than 10,000 Other Manufacturing Business & IT Services IT Product Manufacturing Carriers & Utilities Healthcare Education & Research Transportation Government Financial Services 2007, Apdex Alliance, Inc. All rights reserved. Slide 42

22 Who Should Lead the Apdex Alliance? Vendors, 4% Enterprises, 11% Balanced, 86% 2007, Apdex Alliance, Inc. All rights reserved. Slide 43 Apdex In Use Our Standard Report, 5% In Regular Use, 10% No Plans, 26% Pilot Project, 9% 45% Using Apdex Now Experimenting, 21% Likely in 1 Year, 29% 2007, Apdex Alliance, Inc. All rights reserved. Slide 44

23 The Big Picture Deliver User Value APM Manage Asset Efficiency TSM 2007, Apdex Alliance, Inc. All rights reserved. Slide 45 Thank You Articles and reports on performance measurement, analysis, and management are available for free at Information about Apdex and joining the Apdex Alliance is at