Ch. 6: Understanding and Characterizing the Workload

Size: px
Start display at page:

Download "Ch. 6: Understanding and Characterizing the Workload"

Transcription

1 Ch. 6: Understanding and Characterizing the Workload Kenneth Mitchell School of Computing & Engineering, University of Missouri-Kansas City, Kansas City, MO Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 1/2

2 Introduction The performance of a system depends heavily upon the characteristics of the load Understanding and characterizing the workload is the first step Workload: the set of all inputs that a system receives from the environment over a given period of time If the system is a database server, then its workload consists of all transactions (e.g., query and update) processed by the server during an observation interval Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 2/2

3 Introduction Cont. Workload characteristics are represented by a set of information (e.g., arrival and completion time, CPU time, number of I/O operations, and size of object requested) for each request One needs to reduce and summarize the information that characterizes the workload It is possible to change model parameters to gain insight into the behavior of the system Choice of characteristics and parameters is important Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 3/2

4 Example If one wants to study the cost benefit of creating a proxy server for a Web site Workload characteristics are: frequency of document reference, concentration of references, document sizes, and inter-reference times If one wants to study the impact of a faster CPU on the response time of a Web server Workload characteristics are: average CPU time for a request, average number of I/O operations per request, average request response time, etc. Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 4/2

5 Common steps 1. Specification of a point of view from which the workload will be analyzed 2. Choice of the set of parameters that capture the most relevant characteristics 3. Monitoring the system to obtain the raw performance data 4. Analysis and reduction of performance data 5. Construction of a workload model 6. Verification that the characterization captures all the important performance information Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 5/2

6 Workload of a Corporate Portal 1,000 employees have access with plans of 3,500 next year Applications from simple text to audio and video Employee directory, human resources, health insurance payments, quality management, on-demand interactive training Intranet consists of 5 Web servers. Fig 6.1 Users complain about response time of human resources service (sever B) Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 6/2

7 Workload How do we characterize the workload of the corporate portal? 1. Define workloads we need to characterize. Client or server? Usually described in terms of traffic characteristics (packet size distribution and interarrival time) 2. Level of workload distribution. High level (user s point of view such as Web transactions). Low level, (CPU time per request or packets exchanged) During 1 second interval, 10 HTTP requests are observed to be received at server B. What is the workload presented to server B? Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 7/2

8 First Approach Identification of basic components May be a job, a transaction, interactive command, a process, an HTTP request, etc. In C/S environments: Client request or database transaction In banking system: Transaction (account balance inquiry, account update, loan status enquiry) File server: Request for service (read or write) Many different types for many different purposes Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 8/2

9 First Approach Different levels of characterization 1. Business characterization 2. Functional characterization 3. Resource-oriented characterization See fig. 6.3 Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 9/2

10 Example Online bookstore has the following functions: Search, Browse, Select, Register, Login, Add, and Pay Consecutive and related requests are called a session What is average session length? What are functions customers visit the most? What is the percentages of images in the workload? Requires a layered approach Fig. 6.4 Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 10/2

11 Simple Example Return to server B Assume documents are the same size and 15KB long Implies same CPU and I/O times for all 10 HTTP requests Represented by pair (0.013, 0.09) (CPU, disk service) However, Web documents have great variation in size, which leads to great variability in the above representation Actual times shown in table 6.1 Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 11/2

12 Simple example Which values to use? Representativeness Accuracy in representing workload. Typical web request (average all CPU times and I/0 times) Table 6.2 How close? Compare server executing workload model with real workload. Fig 6.5 Examining table 6.2, one can divide into 3 classes based on document size (table 6.3) Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 12/2

13 Arrivals What about the arrival rate of requests? 1. The number of users that request service 2. How often the user interacts with the server (inverse of think time) Real workloads: All original programs, transactions, requests processed during a given period of time Workload models are used instead Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 13/2

14 Workload Model Natural models: Trace driven simulations from logged data using real programs Artificial models: abstract programs including instruction mixes, kernels, synthetic programs, artificial benchmarks, and drivers. Analytic models require new representation of workload Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 14/2

15 Parameters Component interarrival times (e.g., transaction and request) Service demands Component sizes execution mix (percent of each component class) Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 15/2

16 Parameters for file server Frequency distribution of each type of request (read, write, create, rename) on total workload Request interarrival type distribution File referencing behavior, (percentage of accesses made to each file on the disk subsystem) Above items completely specify the workload model. Capable of driving synthetic programs that accurately represent real workloads Other models of I/O devices consider spatial locality Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 16/2

17 Graph-Based models Graphs can also be used to represent workloads Customer Behavior Model graph (CBMG) is used to capture navigational behavior through a Web site Fig. 6.6 Transitional (probabilities) and temporal (server-perceived think time) aspects Let V j be the number of visits to state j V add = V Select 0.2 In general V j = n 1 k=1 V k p k,j Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 17/2

18 Graph-Based models Since V 1 = 1 entry state. Solve the following system of equations V 1 = 1 V j = n 1 k=1 V k p k,j j = 2,...n 1 Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 18/2

19 Work Characterization Methodology Steps required to construct a workload model as input to an analytic model Focus in resource oriented characterization of workloads Define subsystems analyzed and reference points where measurements are taken (i.e., site point of view, server point of view, client point of view (end to end)) Identification of basic components. Usually transactions and requests Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 19/2

20 Work Characterization Methodology Choice of characterizing parameters 1. Workload intensity (arrival rate, number of clients, think time) 2. Service demands, specified by K-tuple (D i1,d i2,...d ik ) where K is the number of resources and D ij is the service demand of basic component i at resource j Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 20/2

21 Data Collection Assigns values to each component of the model Generates as many tuples as the number of components in the workload Data collection includes the following: 1. Identify the time windows of measurement sessions 2. Monitor and measure system activities during the defined time windows 3. From the collected data, assign values to each characterizing parameter of every component of the workload May not be possible to do directly Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 21/2

22 Partitioning the Workload Request for a video clip is different from a request for an HTML document Improve representativeness and increase predictive power Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 22/2

23 Partitioning the Workload Which attributes are used to determine similarity? 1. Resource usage such as CPU and I/O 2. Applications: accounting, inventory, customer service, or streaming video HTML etc. 3. Objects: Document types 4. Geographical orientation: Local vs. remote 5. Functional: Different low level functions cp, ls date, find 6. Organizational units 7. Mode: (a) Interactive: Alternates waiting and thinking (b) Transaction: Requests (c) Continuous: Batch processing (DNS server) Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 23/2

24 Calculating class parameters Averaging D j = 1 p p D lj j = 1, 2,...,K l=1 May also include variance measures for homogeneity Clustering: Centroids outliers Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 24/2

25 Data Analysis Sampling drawing Parameter transformation: logarithmic scaling Outlier removal One method of scaling to lessen outlier effect is trimming D t i = measuredd i min{d i } max{d i } min{d i } Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 25/2

26 Distance measures Use Euclidean Metric d = K (D in D jn ) 2 n=1 Problems created using different values and ranges are addressed by scaling Use z score zscore = See example measured value mean value standard deviation Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 26/2

27 Clustering Algorithms Identify natural groups of components Hierarchical and non-hierarchical algorithms Minimum spanning tree (hierarchical): Min distance of n clusters of size 1 are fused to create n 1 clusters linkage distance: Farthest distance between a component in one cluster and one in another cluster k means algorithm. Find k points in the workload that act as initial estimate of k centroids How many clusters? Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 27/2

28 Web workloads Much study has been done in this area File sizes are heavy tailed (Pareto) Popularity is Zipf-like Power law y x α Heavy tail P[X > x] = kx α L(x) Pareto P[X > x] = kx α Zipf s law f(r) = C/r α Large events are rare, but small ones are quite common Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 28/2

29 Bursty Workloads Self similar traffic Elastic applications Streaming applications Kenneth Mitchell, CS & EE dept., SCE, UMKC p. 29/2