A Metric of Fairness for Parallel Job Schedulers

Size: px
Start display at page:

Download "A Metric of Fairness for Parallel Job Schedulers"

Transcription

1 CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2008; 05:1 7 [Version: 2002/09/19 v2.02] A Metric of Fairness for Parallel Job Schedulers John Ngubiri, Mario van Vliet Nijmegen Institute for Informatics and Information Science, Radboud University Nijmegen, Toernooiveld 1, 6525 ED, Nijmegen, The Netherlands. SUMMARY Fairness is an important aspect in queuing systems. Several fairness measures have been proposed in queuing systems in general and parallel job scheduling in particular. Generally, a scheduler is considered unfair if some jobs are discriminated while others are favored. Some of the metrics used to measure fairness for parallel job schedulers can imply unfairness where there is no discrimination (and vice versa). This makes them inappropriate. In this paper, we show how the existing misrepresents fairness in practice. We then propose a new approach for measuring fairness for parallel job schedulers. Our approach is based on two principles: (i) since jobs have different resource requirements and find different queue/system states, they need not to have the same performance for the scheduler to be fair and (ii) to compare two schedulers for fairness, we make comparisons of how the schedulers favor/discriminate individual jobs. We use performance and discrimination trends to validate our approach. We observe that our approach can deduce discrimination more accurately. This is true even in cases where the most discriminated jobs are not the worst performing jobs. key words: Fairness, Scheduler, Net benefit 1. Introduction Parallel job schedulers are mostly evaluated using performance metrics [6]. A performance metric is a representation of how quality of service from the system is interpreted. The metric can be system based or user based. System based metrics include utilization and capacity loss. Common user based performance metrics include average waiting time (AWT), average response time (ART), average job slow-down (AJSD) and throughput. Performance metrics, in some cases, may not accurately represent the user s needs. They may misrepresent them in specific circumstances leading to users drawing wrong deductions. AJSD for example, may exaggerate poor performance in short jobs. A job stream with many s: {ngubiri, mario}@cs.ru.nl Received Copyright c 2008 John Wiley & Sons, Ltd. Revised June 29, 2008

2 2 J. NGUBIRI AND M. VAN VLIET short jobs will have a misleading poor performance if AJSD metric is used. The implication of the performance metrics also depends on the system set up. Differences in system set ups may call for differences in deductions got from the metric values. For example, in space-slicing systems, AWT and ART can be interchangeably used. This is because ART = AW T + τ (τ = mean execution time and is independent of the scheduler). However, in time-slicing systems, the two metrics do not lead to related conclusions. This is because job response time cannot be deduced from the time it starts processing. A lot of care, therefore, has to be taken when choosing a performance metric [5]. Even when an appropriate performance metric is used, the average metric value can give misleading results. This is because it gives a global view of performance but does not show internal discrimination/favoritism among the jobs. A scheduler may have an impressive (average) metric value yet some jobs perform well at the expense of others. Such a scheduler is unfair. Unfair schedulers may have impressive performance metric values which hide the underlying discrimination leading to user dissatisfaction [16]. There are many performance metrics [6] and specific metrics are appropriate in specific scenarios. Fairness metrics [9][17][19] also exist. However, they misrepresent fairness (discrimination/favoritism) in some cases. This may lead to counter intuitive deductions. In this paper, we study how fairness/discrimination is represented in three common approaches used to evaluate fairness for parallel job schedulers. The approaches considered are (i) the dispersion approach, (ii) fair time approach and (iii) the Resource Allocation Queuing Fairness Measure (RAQFM). We show scenarios in which implied unfairness is not necessarily unfairness in practice. Motivated by the identified weaknesses in the existing approaches, we propose a new approach to fairness evaluation for parallel job schedulers. Our underlying principles are: (i) since jobs have different service requirements and find different queue/system states, they do not have to perform equally for the scheduler to be fair and (ii) since scheduler decisions can have far reaching effects, discrimination/favoritism needs to be evaluated from a global rather than individual decisions perspective. We propose metrics that use our approach and experimentally compare selected schedulers for fairness. We use performance and discrimination trends within the job stream to intuitively interpret the fairness deductions. We then chose the most appropriate (generic) metric for fairness evaluation. We show that our approach represents unfairness more accurately. This is true even in cases where the most discriminated jobs are not the worst performing jobs. The rest of the paper is as follows. We discuss related work in Section 2. This includes ways fairness is envisaged and evaluated. In Section 3, we identify cases where (using existing metrics) a fair scenario may be interpreted as unfair (and vice versa). In Section 4, we describe our approach to fairness evaluation. We then describe an experiment to validate our approach in Section 5. We present results from our experiments in Section 6 and discuss the results in Section 7. Finally, we make our conclusions and suggestions for future research in Section Related Work In this section, we discus ways fairness is envisaged and evaluated in parallel job scheduling.

3 A METRIC OF FAIRNESS FOR PARALLEL JOB SCHEDULERS Perceptions of fairness in parallel job scheduling Fairness, like performance, can be viewed from the system and user point of view. From the system point of view, fairness is mostly studied in multi-site processing systems (like in [10]). It considers how the load is distributed among the participating processing sites. It is assumed that joining the different sites would be of benefit to all sites. In case jobs from a certain site perform poorer than they would if the site was standalone, then they are considered to be unfairly treated. From the users point of view (like in [9][26]), fairness is looked at in terms of relative favoritism/discrimination among the jobs/users. This arises from the way the scheduler allocates resources to the competing jobs. Some may be favored while others are discriminated. In this paper, we look at fairness from the users point of view. We therefore restrict or selves to it in subsequent sections Approaches to fairness evaluation Fairness in parallel job scheduling has largely been measured using dispersion [9][26], fair start time approach [18] and RAQFM [3][17] Measures of dispersion In this approach, the extent of dispersion among the performance of the different jobs is used to measure fairness. Let us consider a job stream of N jobs J 1,J 2,...J N. If the performance metric used is x and J i has a metric value x i, then the mean performance µ of the job stream is given by: µ = 1 x i (1) N The dispersion measures of variance (σ 2 ), standard deviation (σ), and coefficient of variation (C V ) are defined as: σ 2 = 1 (x i µ) 2 (2) N [ ] σ = (x i µ) 2 (3) N C V = σ µ Jain et al. [9] proposed that a good fairness measure should be continuous, bounded, scale independent and population size independent. They proposed the fairness coefficient κ where κ = (4) 1 (1 + C 2 V ) (5)

4 4 J. NGUBIRI AND M. VAN VLIET This approach bases on the principle that in an ideally fair scenario, jobs are expected to have the same performance. A high dispersion measure (low κ) implies that some jobs are discriminated at the expense of others. A low dispersion therefore implies a fair scheduler Fair start time analysis Fair start time analysis [18][19] uses the principle that in an ideally fair environment, any job s processing should not be delayed by jobs that arrived after it. A job is considered unfairly treated if its processing is delayed by jobs that arrived after it. The average delay for all the unfairly treated jobs is used to measure fairness. Let us consider a job stream J i ; i = 1, 2, 3,...N. The fair start time t f i of J i is the time it starts to process if no job arrived after it. To get t f i, the job stream is truncated at J i and scheduled. The time J i starts processing is t f i. The actual start time ts i of J i is the time J i starts processing if other jobs arrived after it. To get t s i, the entire job stream is scheduled. The job J i is unfairly treated if t f i <ts i. The average of t s i t f i for all unfairly treated jobs is used to measure fairness RAQFM The RAQFM was proposed in [17] and analyzed in [3]. It was meant to measure fairness on single server systems (the case for multi-server systems was reported as ongoing work). The underlying principle is that at any time, all jobs in the system deserve an equal share of the resources. This implies that if, at time t, the system has N(t) jobs, then each job is fairly 1 N(t) entitled to of the resources. In practice, jobs do not necessarily get a fair share of the system resources. If a job J i (arriving at t a i and departing at td i ) is granted service of proportion s i (t) of the system resources at time t, then its temporal discrimination d i (t) is given by d i (t) =s i (t) 1 N(t) The total discrimination D i throught the time J i spends in the system is given by D i = t d i t a i (6) ( s i (t) 1 ) dt (7) N(t) If D i is positive, then J i is treated better than it would have been fairly treated (favoured) and if D i is negative, then J i is treated worse (discriminated). To measure the overall fairness of the scheduler, the variance Var[D] is used (the mean E[D] = 0 for non idling systems). Low Var[D] implies high scheduler fairness. 3. Fairness Metrics Representation of Discrimination We now study the ways the existing approaches represent (un)fairness in a parallel job scheduling set up. We relate the representations with what would be felt in practice.

5 A METRIC OF FAIRNESS FOR PARALLEL JOB SCHEDULERS Dispersion approach In dispersion approach, an ideally fair case would be when there is no variation in performance of the jobs. An increase in dispersion implies that some jobs are favored at the expense of others. In practice, this is not always the case. There are cases where there is a high dispersion in jobs performance which is not caused by discrimination/favoritism [15]. To illustrate this, we compare performance and dispersion for a job stream scheduled by First Come First Served (FCFS) and Conservative Backfilling (CB). Let us consider an online job stream J 1,J 2,...J N scheduled on a supercomputer. Let us assume that the average waiting time and variance when the job stream is scheduled by FCFS are µ and σ 2 respectively. In case the job stream is scheduled by CB, the average waiting time and variance are µ and σ 2 respectively. Job J i has waiting time t w i when scheduled by FCFS and t w i when scheduled by CB. If the job stream is scheduled by CB and no job is able to leapfrog to get scheduled, then CB is procedurally the same as FCFS. In such a case, t w i = t w i J i : i =1, 2,...N. However, in case some jobs are able to leapfrog and get scheduled, they do so in such a way that no job gets a reservation setback (condition for CB). The jobs that jump get scheduled earlier than they would in FCFS. Their waiting times therefore are lower than they would in FCFS. The jobs initially behind the backfilled jobs may get an improvement in reservation since the queue is now shorter. This implies that: t w i t w i J i i =1, 2...N (8) By definition, µ = 1 N t w i and µ = 1 N t w i From Equaton (8), we can deduce that µ µ (9) We now compare the variances of the two job streams. By definition, σ 2 = 1 N (t w i µ) 2 and σ 2 = 1 N (t w i µ ) 2 From the definitions, we can subtract the two variances to get σ 2 σ 2 = 1 N = 1 N i=n (t w i µ ) 2 1 i=n (t w i µ) 2 (10) N (t w i µ + t w i µ)(t w i µ t w i + µ) (11)

6 6 J. NGUBIRI AND M. VAN VLIET Let us define two positive variables δt i = t w i µ in Equation (11) to get tw i and δµ = µ µ. We substitute for t w i and σ 2 σ 2 =(2t w i 2µ + δµ δt i )(δµ δt i ) (12) = 1 (2(t w i µ)+(δµ δt i ))(δµ δt i ) N (13) = 1 N = 1 N = 1 N = 1 N = 1 N (δµ δt i ) N (t w i µ)(δµ δt i ) (14) (δµ δt i ) (t w i N δµ tw i δt i µδµ + µδt i ) (15) [ (δµ δt i ) N ] t w i δµ t w i δt i µδµ + µδt i (16) N [ ] (δµ δt i ) Nµδµ t w i δt i Nµδµ+ µδt i (17) N (δµ δt i ) N (µ t w i )δt i (18) The first term of RHS of Equation (18) is always positive. We need to determine whether the second term is also always positive. Let us break the job stream into two disjoint sets S 1 and S 2 such that J m S 1 iff t w m µ and J n S 2 iff t w n >µ. Since µ is a measure of central tendency for t w i : i =1, 2,...N, then (µ t w m) = (µ t w n ) (19) J m S 1 J n S 2 In FCFS, small jobs perform better than large jobs [14]. This is because they do not wait for long when they reach the head of the queue. Since S 1 has the best performing jobs, it is dominated by small jobs. This implies that S 2 is dominated by large jobs. Srivanasan et al. [21] observed that small jobs perform better in backfilling than large jobs. This is because they are more likely to leapfrog and get scheduled. By definition, δt i is the benefit in performance a job gets by being scheduled by CB instead of FCFS. Due to their ability to leapfrog, small jobs get a bigger benefit compared to large jobs. This implies that multiplying the LHS of Equation (19) by a comparatively larger value (δt m ) makes it grow (absolutely) faster than the RHS which is multiplied by a smaller value δt n. This implies that (µ t w m)δt m > (µ t w n )δt n (20) J m S 1 J n S 2

7 A METRIC OF FAIRNESS FOR PARALLEL JOB SCHEDULERS 7 Equation (18) can be re written as σ 2 σ 2 = 1 N (δµ δt i ) N J m S 1 (µ t w m )δt m }{{} >0 + 2 N J n S 2 (µ t w n )δt n }{{} <0 (21) The Inequality (20) implies that the RHS of Equation (21) is positive. This implies that σ 2 >σ 2 (22) Inequality (22) implies that FCFS is fairer than CB. The same deduction is made from σ, C V and κ. However, Inequality (8) shows that no job in FCFS has a better performance than it has in CB. This implies that the high dispersion in Inequality (22) is not a result of discrimination by CB. Dispersion, therefore, may misrepresent unfairness. In practice, there are circumstances other than starvation that can increase dispersion in parallel job schedulers. These include: 1. Changes in traffic: Supercomputers normally have daily and weekly peaks in traffic. A job that arrives in a peak time gets poorer performance compared to that that arrives in off-peak hours. Dispersion in performance is therefore inevitable. This however does not imply that jobs are starved during peak hours or favored in off-peak hours. 2. Job schedulability: Different jobs have different levels of schedulability [14]. Since small jobs require fewer resources, they are easy to fit in the system. This makes them perform better than large jobs. This also translates into performance dispersion. But this dispersion is not caused by starvation. One possible way of reducing dispersion is by imposing a performance disadvantage to better performing jobs. For example jobs arriving in off-peak hours can be intentionally delayed. Much as this reduces dispersion, it is counter intuitive to propose that it improves fairness. Measures of dispersion therefore some times misrepresents starvation and hence fairness Fair time analysis Fair time analysis looks deeper into the unfair treatment to individual job level. It uses the average of the difference between fair start time and actual start time for all unfairly treated jobs. The weakness with this approach is that it takes into account only the delay suffered by the job relative to its neighbors. It does not put into consideration the global benefits/setbacks it gets from decisions taken when it was still deep inside the queue. Since the delay towards the end of the queue is not necessarily the net delay caused by scheduler decisions, it partially evaluates the scheduler. As an illustration, let us consider a six node computer system which has scheduled up to the (k 1) th job (Figure 1). Let us consider a queue where the k th,(k +1) th and (k +2) th job have sizes 4, 3 and 2 and execution times 5, 2 and 4 respectively. Let us examine what happens in a case where either the k th or the (k +1) th job is to be scheduled next. The state of the system in each of the cases is shown in Figure 2.

8 8 J. NGUBIRI AND M. VAN VLIET Figure 1. System status before scheduling jobs J k,j k+1... Figure 2. System states for Case 1 (top) and Case 2 (bottom) (i) Case 1: If J k is scheduled next, J k+1 cannot start processing within the shown timeframe. (ii) Case 2: If J k+1 is scheduled next, then both J k and J k+2 can start processing within the shown timeframe. Let us now focus on the reservations of jobs after J k+2. They have better reservation times in Case 2 than in Case 1. This is because in Case 1, J k+1 is still in the queue. The differences in reservations for jobs that arrived after J k+2 are caused by jobs that arrived before them. This shows that a job can be unfairly treated (delayed) due to a decision made on jobs that arrived before it. The effect of these decisions, therefore, should contribute to fairness of the scheduler with respect to a certain job. This is not the case for fair start time approach described in [18]. Only delays caused by the jobs that arrived after the job of interest are considered. Since all the decisions are made by the scheduler, and we are measuring the fairness of the scheduler, all their effects need to be put into consideration. Overall, the decisions of a scheduler can affect reservation of a certain job in many ways. If we consider a job J i, the decisions that can affect its reservation time of J i include: (i) The job J i is made to leapfrog (like in backfilling) leading to an improvement in its reservation time;

9 A METRIC OF FAIRNESS FOR PARALLEL JOB SCHEDULERS 9 (ii) Jobs ahead of J i in the queue are made to leapfrog and get processed (like in backfilling). This shortens the queue and may improve the reservation time of J i. (iii) Jobs initially ahead of J i are badly packed during processing which leads to system fragmentation and hence delaying the processing of J i ; and (iv) Jobs initially behind J i in the queue leapfrog, get processed and delay the reservation time of J i. The effects of each of the possible options need to be put into consideration so that the scheduler is evaluated in totality. The fair start time approach largely deals with effect (iv) which is not necessarily the net effect on the reservation by the scheduler RAQFM RAQFM also uses the individual fairness experience of individual jobs and aggregates them into the fairness measure. However, it looks at discrimination in a different way from that of fair time analysis. It also uses variance, rather than the mean discrimination of the jobs to compute the entire job stream discrimination. It makes an assumption that all jobs in the queue are entitled to equal shares of the available resources. This, in parallel job scheduling, creates an unfair scenario. Jobs have different levels of seniority (arrival time) and service requirements (size) [17]. It is counter-intuitive to assume fairness when jobs having different requirements have the same resource entitlement. The same applies to jobs that have spent different times in the queue. In some cases, a job that gets the ideal service from the system can be considered favored or discriminated. We illustrate this in an example below. Example 3.1. Let us consider a case where two jobs arrive simultaneously in a 10 processor system with no running jobs. Jobs J 1 and J 2 need 2 and 6 processors respectively and runs for 5 units of time each. Using the RAQFM, each job is entitled to 1 2 of the system resources. Job J 1 gets 1 5 while J 2 gets 3 5. The total discrimination for J 1 is ( ) 5= 3 2 and that for J 2 is ( ) 5= 1 5. In Example 3.1, since the two jobs get ideal service, there is no favoritism/discrimination on any of them. However, the RAQFM implies otherwise. The RAQFM, never the less, adequately handles practically challenging cases of extreme differences in service requirements but small differences in seniority [3]. 4. The Net Benefit Approach We now describe our approach to fairness evaluation Motivation In Section 3, we showed how some of the existing approaches may imply discrimination (and hence unfairness) where it is not. Their implied unfairness is not necessarily unfairness in practice. They can, therefore, lead to unrealistic conclusions. We therefore propose a new

10 10 J. NGUBIRI AND M. VAN VLIET approach that puts the weaknesses identified into consideration. Specifically, the weaknesses addressed by our approach are: (i) Reference points: The fair start time approach uses the fair start time of a job as a reference point. It bases on it to decide whether a job is unfairly treated or not. The fair start time approach rightly considers every job to have its own fair start time (which depends on circumstances like load). However, the fair start time for a job depends on the scheduler as well. A job that has the same discrimination from two different schedulers does not necessarily have the same performance. This is because the fair start times are not necessarily the same. We consider each job to have its own reference point. However, the reference point for every job is fixed and independent of the scheduler(s) being studied. (ii) The effects of schedulability and queue/system states: In the dispersion approach, jobs are expected to have the same performance. This is irrespective of job schedulability and queue/system states. This, in its self, is unfair. Even without discrimination, a job arriving in off-peak hours gets better performance than that which arrives during peak hours. In our approach, we expect jobs to have different performances dictated by their schedulability and the system/queue states they find. However, if the performance of a certain job is unfairly altered, then the alterations contribute to the fairness. (iii) Evaluation of unfair treatment: We realized that the difference in actual and fair start time of a job implied in fair start time approach is not necessarily the net benefit/loss in performance a job gets. We therefore use the net benefit a job gets by being scheduled by a certain scheduler instead of another. We compare the performance of a job for different schedulers. Our approach therefore is motivated by and seeks to address the weaknesses identified in existing approaches Description The approach Like in [9][18][19][26], we need an appropriate performance metric so as to measure scheduler fairness. We take job waiting time to be appropriate. Other metrics can be used in a similar way. Let us consider a job J i in a job stream. If the job stream is scheduled by an ideal fair scheduler S f, the waiting time of J i is t f i. Let us also consider a case where the job stream is scheduled by two other schedulers S 1 and S 2. Let us assume that the waiting times of J i when the job stream is scheduled by S 1 and S 2 are t 1 i and t 2 i respectively. If the net benefit J i gets by being scheduled by S 1 (S 2 ) other than the fair scheduler is b 1 i (b2 i ), then b1 i = tf i t1 i and b 2 i = t f i t2 i. In case of a positive benefit, then J i is favored by the scheduler. In case it is negative, then J i is discriminated. If b 1 i >b2 i, then S 1 is fairer than S 2 with respect to J i. Since we do not know the value of t f i, we cannot get the numerical value of b1 i and b2 i. However, if we compute the difference between the benefits, we get b 1 i b2 i =(tf i t1 i ) (tf i t2 i )=t2 i t1 i

11 A METRIC OF FAIRNESS FOR PARALLEL JOB SCHEDULERS 11 which is independent of t f i. This implies that for a job J i, we can compare two schedulers for fairness without necessarily knowing the ideal fair scheduler. If we are to compare many schedulers, the process of comparison can be tedious. If for example we are to compare n schedulers, we need to make 1 n(n 1) subtractions for each job. 2 To reduce the number of subtractions to be made, we can chose a base scheduler and compare all schedulers to it. We then use the benefits with respect to the base scheduler to compare the schedulers Implementation of the approach To implement the net benefit approach, we first generalize the computation of the benefits to the entire job stream. Let us consider a job stream J 1,J 2...J N and a case where we are to evaluate the fairness of a scheduler S. First, we schedule the job stream by a base scheduler and then by S. We define b i as the benefit job J i gets by being scheduled by S instead of the base scheduler. The value of b i is got by computing the difference between J i s performance when scheduled by S and when scheduled by the base scheduler. The subtraction has to be done in such a way that a job that performs better with S than with the base scheduler has a positive value of b i. Let us group jobs in three disjoint sets: (i) One in which jobs has registered a benefit (S b ), (ii) one where jobs have registered deterioration (S d ) and (iii) one where there is no change in performance (S n ). We define the total benefit B and discrimination D of S as: B = b i and D = b i J i S b J i S d To measure the fairness of the scheduler S, we can use discrimination or discrimination in excess of the benefits. This can be computed for the entire sets S d and S b or for the most affected jobs (extremes). We define the associated fairness metrics as: 1. Total Discrimination (D): This is the sum of all the discriminations D in the job stream. 2. Marginal Discrimination (MD): This is the discrimination in excess of the benefits in the job stream; MD = D B. 3. Average Discrimination (AD): This is the average discrimination for all discriminated D n(s d). jobs in the job stream; AD = 4. Extreme Discrimination (D x ): This is the total discrimination for the most discriminated proportion x of the job stream. If the set of the most discriminated proportion x of the job stream is Sd x, then D x = all J i S b x i. d 5. Average Extreme Discrimination (AD x ): This is the average discrimination for all jobs in the most discriminated proportion x of the job stream; AD x = 1 n(sd x)d x. 6. Extreme Marginal Discrimination (MD x ): This is got by getting the marginal discrimination but using the extreme proportion x (of the job stream) for both discriminated and favored jobs. If Sb x is the set of the most favored proportion x of the job stream, then MD x = all J i S b d x i all J k S b b x k. 7. Average Extreme Marginal Discrimination (AMD x ): This is the average marginal discrimination for the extreme proportion x of the job stream; AMD x = MDx x N.

12 12 J. NGUBIRI AND M. VAN VLIET For each metric, a lower value implies more scheduler fairness. We use D, MD, AD, AD x and AMD x in our studies. This is because D x and MD x are directly deducible from AD x and AMD x respectively. We illustrate the metrics by the example bellow: Example 4.1. Let us consider a job stream of 20 jobs with benefits below; i b i We are required to compute the metric values together with corresponding extreme values for x =0.1 and x =0.2. Let us first compute the metric values for non extreme cases. We first identiify the appropriate sub sets. S d = {J 16, J 17, J 18, J 19, J 20 } S n = {J 12, J 13, J 14, J 15 } S b = {J 1, J 2, J 3, J 4, J 5, J 6, J 7, J 8, J 9, J 10, J 11 } From the definitions B and D, B = =58 D = =35 The values of the defined metrics are: D =35 MD =35 58 = 23 AD = 35 5 =7 We now identify the sub sets for the 0.1 and 0.2 extremes. Sd 0.1 = {J 19,J 20 } and Sd 0.2 = {J 17,J 18,J 19,J 20 }. Sb 0.1 = {J 1,J 2 } and Sb 0.2 = {J 1,J 2,J 3,J 4 }. We then compute the metric values based on the extremes. D 0.2 = =33and AD 0.2 = 33 4 =8.25 D 0.1 =9+9=18and AD 0.1 = 18 2 =9 MD 0.2 =( ) ( ) = 1 MD 0.1 = (9 + 9) (10 + 9) = 1 AMD 0.2 = = 1 4 AMD 0.1 = = 1 2 We observe the different metrics have different implications. We look deeper into these differences and appropriateness in Section 7.

13 A METRIC OF FAIRNESS FOR PARALLEL JOB SCHEDULERS Experimental Evaluation In this section, we describe the experimental set up used to evaluate our approach. Our evaluation is to study how the metrics in our approach represent discrimination for selected schedulers. We consider a case where the scheduler (i) does not know job runtime and (ii) can schedule a job on different/multiple sites. The scheduler therefore has to use non reservation approaches to control starvation but can chose to process a job on a different site/multiple sites The model We consider a system made up of M homogenous clusters C 1,C 2,...C M with n processors each. The clusters are linked by a backbone. The communication speed in a backbone is slower than that within the clusters. The system can allow co-allocation of multi-component jobs. Components of co-allocated jobs start (and end) processing at the same time. Clusters process jobs by pure space-slicing. Migration of running jobs is not possible on the system Scheduling algorithms We use three scheduling algorithms. The First Come First Serve (FCFS), the Fit Processors First Served (FPFS)[1] and the Greedy Scheduler [11][13] FCFS In FCFS, the jobs start processing in the (strict) order of their arrival. In case a job at the head of the queue cannot fit in the system, the scheduler waits until enough space has been created. When enough processors are freed, it starts processing. We use FCFS as a base scheduler FPFS FPFS is like FCFS with backfilling when the scheduler does not know the job runtimes. Instead of using the job reservation to protect it from starvation, it uses the number of times it is jumped by leapfrogging jobs. Jobs are queued in their arrival order. The scheduler starts from the head of the queue and search deeper for the first job that fits into the system. In case one is found, it jumps all jobs ahead of it and starts processing. If none is found, the scheduler waits either for a job to finish execution or a job to arrive and the search is done again. To minimize possible starvation of some jobs, the scheduler limits (to maxjumps) the number of times a job at the head of the queue can be jumped. After being jumped maxjumps times, no other job is allowed to jump it until enough processors have been freed (by terminating jobs) to have it scheduled. We use FPFS(x) to represent FPFS when maxjumps = x.

14 14 J. NGUBIRI AND M. VAN VLIET The greedy scheduler The greedy scheduler was proposed in [11]. It incorporates job seniority and schedulability in prioritization. In the greedy scheduler, jobs are queued in the arrival order. Each job has a priority indicator which is the product of the time it has spent in the queue and its hardness value. The hardness value is an approximation of job schedulability. In this work, job hardness value is its size. The scheduler picks the highest priority job from the first depth jobs that fits into the available free processors. If none is available, it waits until enough jobs terminate or other jobs arrive (in case the queue length is less than depth). Like done in FPFS, the job at the head of the queue is protected from starvation by limiting the number of times it can be jumped to maxjumps. A job jumped maxjumps times while at the head of the queue has to be scheduled next. We use Greedy(j, d) to represent the scheduler when maxjumps = j and depth = d Placement policy We use the Worst Fit (WFit) placement policy to map jobs/components onto clusters. In the WFit policy, the k th widest component is processed in the k th freest cluster. WFit distributes free processors evenly among the clusters The job stream We use two workload traces for our evaluation. (a) TR-1: The first trace is from the second version of the Distributed ASCI Supercomputer (DAS-2) [22] archived at the Grid Workloads Archive [23]. It consists of jobs processed between 2005 and (b) TR-2: The second trace is from DAS cluster at Vrije University Amsterdam archived at the Parallel Workloads Archive [24]. It consists of jobs processed between January and December We make some modifications in the traces: (i) We only used jobs up to the size of 64. This is done to have only jobs that fit in our modeled multi-cluster system (68 processors); (ii) We removed all jobs that appear repeated. Jobs with the same size, arrival time and execution time are considered duplicated. We only consider one of them. This is done to minimize the effect of workload flurries [25]; and (iii) Jobs whose execution time is 0 were eliminated. The execution time value of 0 came up because job execution times are rounded off to integers prior to archiving. Jobs with run times less than 0.5 were assigned run time of 0 in the traces. Large jobs, whose size is greater that a threshold thres are broken into components and coallocated. Co-allocated jobs suffer (a fixed) execution time penalty. If a job of size s is broken into n components, n 1 of the components are made to have width of s each and the nth n

15 A METRIC OF FAIRNESS FOR PARALLEL JOB SCHEDULERS 15 Figure 3. Utilization variations for JS-1 (top) and JS-2 (bottom) component has s (n 1) s n. Though we get job runtimes from the trace, we assume they are unknown before the jobs finish execution. We model this by not using runtime information to make scheduler decisions Set up instances We use a system of 4 clusters with 17 nodes each. This is done so as to have the system run at a fairly high load without altering job arrival and execution times. We take thres = 10 and break up jobs whose size is greater than thres into 4 components and co-allocate them. Co-allocated jobs suffer a 30% execution time penalty. This is to cater for the slower inter-cluster speeds and job communication intensity [4][12]. We use AWT as a performance measurement metric and use FPFS(5), FPFS(20), Greedy(5,5), Greedy(5,20), Greedy(20,5) and Greedy(20,20) scheduler instances Evaluation Scheduler differences in performance (and fairness) happen when the system is running at a high utilization. Due to daily (and weekly etc) fluctuations, it is hard to achieve a stable state when jobs from traces are used. We therefore take measurements in for jobs where the system runs at a high utilization. For JS-1, we take readings between the 7400 th and th job. For JS-2, we take readings from the 6500 th and 8500 th job. The trends for utilization are shown in Figure 3.

16 16 J. NGUBIRI AND M. VAN VLIET Figure 4. Fairness by D and MD for JS-1 (left) and JS-2 (right) Figure 5. Fairness by AD, AD 10 and AD 5 for JS-1 (left) and JS-2 (right) 6. Results We now present the fairness of the five scheduler instances for the two job streams. We also study the trend of the worst performing jobs and well as the most discriminated jobs Relative fairness among the schedulers We compute the fairness of the different schedulers using the different metrics. We present fairness of the different schedulers (i) byd and MD in Figure 4, (ii) byad (with 5% and 10% extremes) in Figure 5 and (iii) byamd with 5% and 10% extremes in Figure 6. We summarize the relative fairness of all the schedulers for JS-1 and JS-2 in Table I and Table II respectively. For each metric, the scheduler in column 1 is the fair most scheduler while the one in column 6 is the most unfair scheduler. If two or more schedulers are next to

17 A METRIC OF FAIRNESS FOR PARALLEL JOB SCHEDULERS 17 Figure 6. Fairness by AMD 5 and AMD 10 for JS-1 (left) and JS-2 (right) Table I. Order of fairness for JS D Greedy(20,20) Greedy(20,5) Greedy(5,20) Greedy(5,5) FPFS(20) FPFS(5) AD Greedy(20,20) Greedy(20,5) Greedy(5,20) Greedy(5,5) FPFS(20)* FPFS(5)* AD 10 Greedy(20,20) Greedy(20,5) Greedy(5,20) FPFS(20)* FPFS(5)* Greedy(5,5) AD 5 Greedy(20,20) Greedy(20,5) Greedy(5,20) FPFS(20) FPFS(5) Greedy(5,5) MD Greedy(20,20) FPFS(20) Greedy(20,5) Greedy(5,20) FPFS(5) Greedy(5,5) AMD 5 Greedy(20,5) Greedy(20,20) Greedy(5,20) Greedy(5,5) FPFS(5) FPFS(20) AMD 10 Greedy(20,5) Greedy(20,20) Greedy(5,20) Greedy(5,5) FPFS(5) FPFS(20) Table II. Order of fairness for JS D Greedy(5,20) FPFS(5) FPFS(20) Greedy(20,20) Greedy(20,5) Greedy(5,5) AD Greedy(5,20) Greedy(20,20) Greedy(20,5)* FPFS(20)* FPFS(5) Greedy(5,5) AD 10 Greedy(5,20) FPFS(5) FPFS(20) Greedy(20,5) Greedy(20,20) Greedy(5,5) AD 5 Greedy(5,20) FPFS(5) FPFS(20) Greedy(20,5) Greedy(20,20) Greedy(5,5) MD Greedy(5,20) FPFS(5)* FPFS(20)* Greedy(20,5) Greedy(20,20) Greedy(5,5) AMD 5 Greedy(5,20) FPFS(20) FPFS(5)* Greedy(20,20)* Greedy(20,5) Greedy(5,5) AMD 10 Greedy(5,20) FPFS(20) FPFS(5) Greedy(20,20) Greedy(20,5) Greedy(5,5) each other in a row and they have asterisks, then they are equally as fair as viewed by that particular metric. We observe that: 1. The trend of relative fairness among the jobs highly depend on the job stream being scheduled.

18 18 J. NGUBIRI AND M. VAN VLIET 2. For a certain job stream, there is a close to similar trend in relative fairness among the schedulers for certain metrics. Similar trends are observed in D, AD, AD 5 and AD 10 as well as in AMD 5 and AMD 10. The differences in relative fairness among schedulers for the different job streams can be attributed to the effect of job stream composition. Studies in performance of parallel job schedulers [7][8] showed that the performance of a scheduler highly depends on the job stream used. The same seem to apply to fairness. Studying the effect of job stream characteristics on scheduler fairness is outside the scope of this work. However, irrespective of the job stream effect, fairness metrics need to accurately imply the discrimination with in the job stream. We therefore analyze the fairness for both job streams but each job stream is analyzed independently. For a certain job stream, the differences in relative fairness can be attributed to differences in metric appropriateness (accuracy in representing discrimination). Some differences can also be due to circumstantial differences in the way the discrimination in the job stream is aggregated by the metric. We seek to identify which metrics accurately/inaccurately represent discrimination/fairness. To do this, we first examine the trend of the worst performing jobs as well as the trend of the most discriminated jobs Trend for worst performing and most discriminated jobs We present the trend of the worst performing jobs and most discriminated jobs for all the schedulers in Figure 7. We note that for a certain scheduler, the i th worst performing job is not necessarily the i th most discriminated job. We observe that the i th worst performing job for each of the schedulers have different waiting times. The same trend exists for discrimination. We also observe that there may exist values of i where the relative performance between two schedulers change. For example, when we consider JS-1, at i 40, the performance of the i th worst job for FPFS(5) and FPFS(20) is the same. For i<40, the FPFS(5) job outperforms FPFS(20) and for i>40, FPFS(20) job outperforms FPFS(5). This trend also applies to other schedulers. We also observe that there are i values beyond which the relative discrimination among schedulers change. 7. Discussion In this section, we discuss the trend of performance and discrimination. We then intuitively deduce the appropriate metrics for evaluating fairness in parallel job schedulers Performance and discrimination trends From Figure 7, we observe that different schedulers have different trends of performance and discrimination. In terms of performance, we observe a smooth improvement in performance for the worst performing jobs. Some schedulers, (like FPFS(20) in JS-1) register high rates of improvement.

19 A METRIC OF FAIRNESS FOR PARALLEL JOB SCHEDULERS 19 Figure 7. Trends of worst performing jobs (left) and most discriminated jobs (right) for JS-1 (top) and JS-2 (bottom) In terms of discrimination, we also observe a reduction. Some schedulers have sharp changes at specific intervals. The sharp changes in discrimination are mostly for the greedy schedulers where d = 5. This implies that the sharp changes are attributed to the effect of a small depth value. The scheduler with the worst performing jobs is not necessarily the one with the most discriminated jobs. In JS-1 for example, while the Greedy(5,5) has the most discriminated jobs, it does not have the worst performing jobs. The good performance of the worst performing jobs is in line previous studies [13]. However, the small jobs are blocked by the small depth which make them un able to jump and get processed. They therefore get the largest discrimination despite not having the worst performance.

20 20 J. NGUBIRI AND M. VAN VLIET 7.2. Metric appropriateness The underlying principle of fairness (from a user point of view) is that jobs should not be unnecessarily discriminated. From Figure 7, we observe that for JS-1, Greedy(20,20) has the lowest discrimination and the best performance of worst performing jobs. For JS-2, Greedy(5,20) has the lowest discrimination and the best performance of worst performing jobs. It can therefore be deduced that Greedy(20,20) and Greedy(5,20) are the most fair schedulers for JS-1 and JS-2 respectively. From Figure 4, Figure 5 and Figure 6, we observe that this trend is followed by all schedulers apart from AMD 5 and AMD 10. We can therefore conclude that AMD 5 and AMD 10 are inappropriate to measure fairness. On a closer look at Figure 4 MD can also be considered inappropriate despite having Greedy(20,20) as the most fair scheduler. From JS-1 job steam, the MD metric considers FPFS(20) to be fairer than all schedulers except Greedy(20,20) which is not deducible from Figure 7. We can therefore generalize that marginal values are inappropriate in measuring fairness. This can be attributed to the fact that taking the discrimination in excess of the benefit assumes equality in the resource requirements of the favored and discriminated jobs. This is not necessarily correct. We therefore consider non marginal metrics. The differences in other metrics are circumstantial rather than metric based. They may therefore be applicable in some circumstances depending on the set up of the scheduling environment Circumstantial appropriateness of metrics The metrics of D, AD and AD x have an approximately similar trend. However, there are some contradictions. These contradictions are largely due to the way the metric itself is made up. We describe two of the apparent contradictions and how they can both be applied in practice. 1. D Vs AD: We observe that FPFS(20) is fairer than FPFS(5) using the D metric but are equally as fair for AD. This implies that while the two offer equal average discrimination, FPFS(5) discriminates more jobs. 2. D Vs AD 5 : We observe that while Greedy(5,5) has a fairly lower D, it has the highest AD 5 of all the schedulers. Since the number of discriminated jobs are not the same for all the schedulers, having the same mean discrimination does not imply having the same total discrimination. A scheduler that offers fairly low discrimination to many jobs will have a high TD unlike a scheduler that offers high discrimination to a few jobs. In practice, if many jobs suffer a lot of discrimination, then it can be implied from the performance metric as poor performance. This is because the big number of jobs can substantially affect the job stream performance. However, if a few jobs suffer a lot of discrimination, it cannot be implied from performance. This is because the few jobs cannot substantially affect the entire job stream performance. Measures of fairness need to unearth the discrimination that cannot be implied by the average performance values. The fairness metric therefore needs to identify the hidden discrimination. It is therefore be more realistic to use AD x as a generic measure for fairness. The choice of x used is also important. It needs

21 A METRIC OF FAIRNESS FOR PARALLEL JOB SCHEDULERS 21 to be low enough to reveal the discrimination that cannot be revealed by performance metrics but high enough to give a clearer picture. Setting it too low (say 0.001% of the jobs) may not reveal the true picture of extremes. However, in cases where small details matter a lot, it may need to be that low. It is therefore to the desecration of the system owners that the appropriate value of x is chosen. If x is high enough to include all jobs in S d, then AD x = AD 8. Conclusion and Future Work We have proposed a new approach to fairness evaluation for parallel job schedulers. Our approach bases on the fact that since jobs do not have the same resource requirements and arrive when the queue and system have different states, they are not expected to have the same performance in a fair set up. Every job therefore needs to have a time it would start if scheduled by an ideal fair scheduler. Since the ideal fair scheduler is unknown, it is by passed by comparing schedulers to the base scheduler. We use performance of the worst performing jobs and the most discriminated jobs to validate our approach. We have observed that our approach can be able to represent unfair treatment of jobs. Unfair treatment is also implied in cases where the worst performing jobs are not the most discriminated jobs. We have observed that since the unfairness is mostly un detected when the discriminated jobs are too few to substantially affect the average performance, it is better to use the average discrimination for the most discriminated proportion x of the job stream. The choice of x should depend on what the users can consider to be big enough to show generality but small enough to represent the extreme. Our work also opens up some avenues for future research. There is need to study the effect of the base scheduler on the relative fairness of schedulers. Our consideration of selected schedulers has been basically for demonstration purpose rather than comparing them for fairness. There is therefore a need to study the relative fairness among schedulers and how the scheduler fairness varies with scheduler/job stream parameters. Finally, there is need for to study the group-wise fairness/discrimination of schedulers. Job groups can be by job based characteristics like size and memory requirements. REFERENCES 1. Aida K, Kasahara H and Narita S. Job scheduling scheme for pure space sharing among rigid jobs. In Proceedings of the 4 th Workshop on Job Scheduling Strategies for Parallel Processing, LNCS 1459, Feitelson D G, Rudolph L (Eds.). Orlando, Florida, 1998, Avi-Itzhak B, Levy H. On measuring fairness in queues, Advances of Applied Probability 2004; 36(3): Avi-Itzhak B, Levy H, Raz D. A Resource-allocation queueing measure: properties and bounds, Queueing Systems 2007; 56(2): Bucur A I D, Epema D H J. The influence of communication on the performance of co-allocation. In Proceedings of the 7 th Workshop on Job Scheduling Strategies for Parallel Processing, LNCS 2221, Feitelson D G, Rudolph L (Eds.). Cambridge, MA, 2001, Frachtenberg E, Feitelson D G. Pitfalls in parallel job scheduling evaluation. In Proceedings of the 11 th Workshop on Job Scheduling Strategies for Parallel Processing, LNCS 3834, Feitelson D G, Frachtenberg E, Rudolph L, Schwiegelshohn U (Eds.). Cambridge, MA, 2005,

22 22 J. NGUBIRI AND M. VAN VLIET 6. Feitelson D G, Rudolph L, Schwiegelshohn U. Parallel job scheduling - A status report. In Proceedings of the 10 th Workshop on Job Scheduling Strategies for Parallel Processing, LNCS 3277, Feitelson D G, Rudolph L, Schwiegelshohn U (Eds.). New York, NY, 2004, Feitelson, D.G. (2002). The forgotten factor: facts on performance evaluation and its dependence on workloads. In Proceedings of the 8 th International Euro-Par Conference, LNCS 2400, pp Feitelson, D.G. (2003). Metric and workload effects on computer systems evaluation. Computers, 36(9), pp Jain R, Chiu D, Hawe W R. A quantitative measure of fairness and discrimination for resource allocation in shared computer systems, Technical Report TR-301, Digital Equipment Corporation, Jones W M and Ligon III W B. Ensuring fairness among participating clusters during multi-site parallel job scheduling. In Proceedings of the 12 th International Conference on Parallel and Distributed Systems, 2006, Ngubiri J, van Vliet M. Using the Greedy Approach to Schedule Jobs in a Multi-Cluster System, In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, H.R. Arabnia(Ed). Las Vegas, Nevada, 2006, Ngubiri J, van Vliet M. Co-allocation with communication considerations in multi-cluster systems, In Proceedings of Euro-par 2008(to appear). 13. Ngubiri J, van Vliet M. The Greedy multi-cluster scheduler: performance bounds and parametric sensitivity, International Journal of ICT Resarch 2008; (to appear). 14. Ngubiri J, van Vliet M. Group-wise performance evaluation of processor co-allocation in multi-cluster systems. In Proceedings of the 13 th Workshop on Job Scheduling Strategies for Parallel Processing, Frachtenberg E, Schwiegelshohn U (Eds.). Seattle, WA, 2007, Ngubiri J, and van Vliet M. Performance, Fairness and Effectiveness in space-slicing multi-cluster schedulers, In Proceedings of the 19 th IASTED International Conference on Parallel and Distributed Computing Systems, Cambridge, MA, Rafaeli A, Kedmi E, Vsluli D, Barron G. Queues and Fairness: A multiple study experimental investigation, Raz D, Levy H, Avi-itzak B. A Resource-allocation queuing fairness measure, SIGMETRICS Performance Evaluation Review 2004; 32(1): Sabin G, Kochhar G, Sadayappan P. Job fairness in non-preemptive job scheduling, In Proceedings of the 2004 International Conference on Parallel Processing, Montreal, Canada, 2004, Sabin G, Sudayappan P. Unfairness metrics for space-sharing parallel job schedulers. In Proceedings of the 11 th Workshop on Job Scheduling Strategies for Parallel Processing, LNCS 3834, Feitelson D G, Frachtenberg E, Rudolph L, Schwiegelshohn U (Eds.). Cambridge, MA, 2005, Schwiegelshohn U, Yahyapour R. Fairness in parallel job scheduling, Journal of Scheduling 2000; 3(5): Srinivasan, S., Kettimuthu, R., Subrarnani, V. and Sadayappan, P. (2002). Characterization of backfilling strategies for parallel job scheduling. In Proceedings of the International Conference on Parallel Processing Workshops, 2002, The Distributed ASCI Supercomputer The Grid Workloads Archive The Parallel Workloads Archive Tsafrir D, Feitelson D G. Instability in parallel job scheduling simulation: The role of workload flurries, In Proceedings of the 20 th Parallel and Distributed Processing Symposium, Rhodes Island, Greece, 2006, Vasupongayya S, Chiang S -H. On job fairness in non-preemptive parallel job scheduling. In Proceedings of the 17 th IASTED Conference on Parallel and Distributed Computing and Systems, Zheng S Q (Ed.). Phoenix, AZ, 2005.