Analysis of Round-Robin Variants: Favoring Newly Arrived Jobs

Size: px
Start display at page:

Download "Analysis of Round-Robin Variants: Favoring Newly Arrived Jobs"

Transcription

1 Analysis of Round-Robin Variants: Favoring Newly Arrived Jobs Feng Zhang, Sarah Tasneem, Lester Lipsky and Steve Thompson Department of Computer Science and Engineering University of Connecticut 7 Fairfield Road, Unit Storrs, CT 669 ( fzhang@engr.uconn.edu; tasneems@easternct.edu; lester@engr.uconn.edu; sat@engr.uconn.edu) Keywords: Processor sharing (PS), round-robin (RR), last-come-first-served with preemptive resume (LCFSPR), foreground-background (FB), shortest remaining processing time (SRPT) Abstract Highly varying job demands generally consist of many short jobs mixed with several long jobs. In principle, without foreknowledge of exact service times of individual jobs, processor sharing is an effective theoretical strategy for handling such demands. In practice, however, processor sharing must be implemented by time-slicing, which incurs non-negligible job switching overhead for small time-slices. A research issue is then how time-slicing performs if large time-slices have to be used. In this paper, we investigate several roundrobin variants and the results from Discrete Event Simulation show that, by favoring newly arrived jobs, the performance of round-robin with large time-slices could be better than that of ideal processor sharing. The simple immediate-preemption scheme, which serves the new jobs immediately by preempting the current active job, is shown to further improve the performance of round-robin. Keywords: Processor sharing (PS), round-robin (RR), last-come-first-served with preemptive resume (LCFSPR), foreground-background (FB), shortest remaining processing time (SRPT). INTRODUCTION Consider a single server where jobs arrive according to a Poisson process, with arrival rate, λ, that is, an M/G/ queue. The individual jobs have service time demands which can be described by a Probability Distribution Function (PDF), F(x), with mean x := E[X], variance σ, and squared coefficient of variation, C v := σ / x. If job demands vary widely then C v In this case it is important to favor short jobs to reduce the mean system time (response time). If the individual service times are known exactly, the Shortest Remaining Processing Time (SRPT) strategy is optimal [, ]. Here, the currently executing job is preempted if a newly arrived job has see the Appendix for a full discussion of PDFs. a smaller service-time requirement than the time remaining for the present job. If the CPU time requirement of each job is not available beforehand, other strategies, such as processor sharing (PS) and last-come-first-served with preemptive resume (LCFSPR), can be used to implicitly favor short jobs. The following formulas show that an M/G/ queue using PS (or LCFSPR) has the same mean system time as an M/M/ queue [], which outperforms an M/G/ queue using FCFS if C v > : PS/LCFSPR : E[T] = FCFS : E[T] = x ρ x ρ + xρ Cv ρ Here, T is the random variable denoting system time, and ρ = λ x ( < ρ < ), the utilization parameter. It is clear from Eqs. (,) that E[T] of PS (as well as that of LCFSPR) does not depend on C v, whereas E[T] of FCFS is proportional to C v. In practice, however, PS must be implemented by timeslicing. For example, round-robin, a common time-slicing scheme, serves a queue of awaiting jobs by turns. Specifically, a job is chosen from the front of the queue and is served for at most a time-slice. If it completes its service time requirement during the time-slice, it releases the processor and leaves the system immediately. Otherwise, it is preempted after the allocated time-slice and put at the end of the queue, awaiting further service. As, the performance of roundrobin approaches that of PS if the job switching overhead is ignored. However, the overhead grows unboundedly large as. So cannot be set to a value close to. Instead, a finite non-negligible has to be used, which raises a question: how well does round-robin perform for nonzero? While the analytical solutions of mean system time are known for the case of (i.e., processor sharing) under the assumption of negligible overhead and the case of (i.e., FCFS), little is known regarding how round-robin performs for other values, except that if the newly arrived jobs are always put at the end of the queue, the performance of round-robin degrades monotonically with the increase of for C v > under the assumption of no overhead, as illustrated in Figure. In [], it was demonstrated that by serving a newly arrived job right after the expiration of the current time-slice, the per- () ()

2 (-ρ)e[t] H Exp E - Figure. Illustration of performance curve of round-robin in M/G/ queue as a function of time-slice with G being hyper-exponential- (H, Cv =.), exponential (Exp), or Erlangian- (E ). Newly arrived jobs are always put at the end of the queue. It is assumed that E[x] =.,ρ =.7. In the case of exponential distribution, ( ρ)e[t] is simply one. For both H and E, as, i.e., in the case of processor sharing, ( ρ)e[t] approaches to from Eq.(). As, the performance of round-robin is essentially the same as that of FCFS. Clearly, FCFS instead of round-robin should be used for E, or more generally any distribution with Cv <. Only for Cv >, is round-robin more preferable to FCFS. formance of round-robin in handling highly varying job demands is better than that of PS for a range of values. In this paper, we show by Discrete Event Simulation that the performance of round-robin can be improved further in terms of mean system time by preempting the current active job and serving a newly arrived job immediately. Moreover, the range of values that outperforms the case of = is much wider, even though no overhead is considered for the case of =. Such a strategy is simple to use since insertion of jobs happens only at the front or the end of the queue, and no extra preemption scheme is needed. Note that it does not require the service time information of individual jobs to apply this strategy. The rest of this paper is organized as follows: Section. summarizes some related work; Section. describes three round-robin variants favoring newly arrived jobs; The simulation results are presented in Section.; Section. concludes the paper.. RELATED WORK Since, in general, we do not know the exact service times of individual jobs until their completion, SRPT is inapplicable. More practical strategies should not depend on this assumption, but still get short jobs favored. Over time, in addition to PS (as well as round-robin) and LCFSPR, several such strategies have been presented, such as foregroundbackground (FB, also called least-attained-time) [], shortestresidual-time (SRT, also called shortest-expected-remainingprocessing-time) [6], etc. A common feature of these strategies is that preemption is employed to favor possibly short jobs. For example, both LCFSPR and FB preempt the current active job whenever a new job arrives, whereas SRT favors jobs expected to finish soon. Among these strategies, some are simple to use and incur less overhead, while others incur more overhead. For example, round-robin and LCFSPR only require insertion of jobs at the front/back of queue (with complexity O()), whereas FB and SRT need to maintain a sorted queue (with complexity O(log(n))), and keep track of the attained times and residual times of individual jobs (residual time computation could be time-consuming). Due to their simplicity, round-robin and LCFSPR are often used, although their performance may be worse than that of more complicated strategies like FB and SRT (without considering any overhead). Our focus in this paper is therefore on simple but effective schemes. Various round-robin strategies, such as prioritized roundrobin [7], cycled round-robin [8], deficit round-robin [9], and group round-robin [], have been developed for scheduling jobs in computer and communication systems. In studying these strategies, fairness is often considered as a major goal. In this paper, we mainly consider the issue of handling newly arrived jobs in implementing round-robin strategy. We briefly discuss fairness at the end of Section... ROUND-ROBIN VARIANTS In general descriptions of round-robin, handling of newly arrived jobs is either not mentioned or the new jobs are simply appended at the end of the queue [7, 8]. Does this mean that handling of newly arrived jobs is not important? Is there any performance difference regarding how to handle newly arrived jobs? Is it better to serve the newly arrived jobs first? Or is the way of putting newly arrived jobs at the end indeed the best strategy? To answer these questions, we start from the following observations. Specifically, in the case of Cv, a newly arrived job is more likely to be a short job, while in the case of Cv <, jobs tend to have similar service times. In the former case, each newly arrived (and likely short) job could be penalized greatly by other (possibly long) jobs in front of it if it is appended at the back and is big. Overall, short jobs may be delayed significantly by long jobs. In the latter case, while each new job is penalized, other jobs of similar size can get through early. As a result, the decision of putting newly arrived jobs at the back seems appropriate (keeping in mind that, for the exponential distribution, it does not matter where the new jobs are inserted due to its memoryless fea-

3 ture). Based on this observation, it is clear that the handling of newly arrived jobs is important if C v and putting newly arrived jobs at the back may not be the best strategy. To avoid unnecessary delay on the short jobs in the case of C v, it seems critical to serve newly arrived jobs first. In the following, three round-robin variants are identified based on this idea. The first round-robin variant (denoted by ) has been studied in [] (denoted as Var. B originally). In this variant, a newly arrived job is inserted at the front of the queue so that it is served right after the current job in execution finishes or uses its time-slice. If the new job is shorter than, it can finish in one time-slice and its mean response time is no greater than.. For not too-small a, most short jobs can get through quickly. Only if a new job is a long job, will it take multiple time-slices. Each time it gets a time-slice, it moves to the end of the queue and will not get another one until all the jobs in front of it (including newly arrived jobs) get served. As a result, its delay on short jobs is alleviated. While no preemption is performed on the current active job (before expiration of its time-slice) in, the second and third variants (denoted by and ) favor a newly arrived job by preempting the current active job and serving the new job immediately. The preempted job is put at the front of queue. The difference between these two variants is that in, the preempted job will get another full time-slice when resumed, while in, the preempted job will get its remaining time-slice (i.e., unused portion of ) when resumed. In both variants, if the current active job could not finish in the allocated time-slice and no preemption happens before expiration of its time-slice (i.e., no new job arrives), the job will move back to the end of the queue and get a full time-slice in the next round. All these variants are simple to be implemented since the insertion of jobs is only needed at the front and end of the queue and no exact service times of individual jobs are required to be known. While and require immediate preemption of the current active job, no special preemption scheme needs to be defined. Instead, the existing preemption scheme can be reused since jobs need to be preempted anyway at the end of their time-slices and will also be interrupted temporarily upon job arrivals and occurrence of other events (unless some secondary processor is used to handle such events). The only extra information that needs to be managed in is the remaining time-slice values of individual jobs, which can be easily computed. In the next section, we compare the performance of these three variants using simulation. To facilitate the following discussion, in Table, we list brief descriptions of these variants. Table. Description of round-robin variants. Strategy Description Front insertion of new jobs, no immediate preemption of current active jobs Immediate preemption of current active jobs, allocated full when resumed Immediate preemption of current active jobs, allocated remaining when resumed. SIMULATION AND ANALYSIS In this section, through Discrete Event Simulation we study the performance of the above round-robin variants in M/G/ queues as a function of values under the assumption of negligible overhead. Since round-robin is inferior to FCFS in the case of C v <, we focus on several job service time distributions with C v >. Hyper-exponential- (H ) and hyper- Erlangian- (HE ) are considered because of their simplicity. The C v value of these two distributions is set to. in the study. As many practical computer communication applications have the job service times to be power tails [], a power-tail distribution (PT) based on the truncated-power-tail (TPT ) distribution model is included with α =.. A short description of each of the distributions can be found in Appendix. We assume that the mean service time of each distribution is. (i.e., x =.) without loss of generality. We find that in all variants, there are some finite nonnegligible values that outperform the case of = (i.e., corresponding to PS). The best variant is and followed by. These two demonstrate the importance of using immediate preemption, especially under large values... Improved Performance for Finite Figures,, and show the results of using the three round-robin variants in M/G/ queues (with G being H, HE, or PT ) under different load conditions. A finite set of values across a large range is selected to show the behavior of the variants near origin (i.e., for relatively small values) as well as their trends with respect to the increase of. For each chosen and a given distribution, one simulation run of using 9 job samples is conducted. We plot ( ρ)e[t] instead of just E[T] for the purpose of comparing with M/M/, which of course is the same as pure processor sharing. Note that ( ρ)e[t] is really the ratio of mean system time for a round-robin strategy to that for M/M/ since x =. From the figures, it can be seen that for very small values, all the variants perform similarly and approach to that of PS. This is expected since most jobs cannot finish in one timeslice and have to share the processor with each other. It also does not make much difference regarding whether immediate preemption is used or not. Specifically, even without such

4 (a) H (b) HE (a) H (b) HE (-ρ)e[t] (-ρ)e[t] (-ρ)e[t] (-ρ)e[t] - Figure. The performance of round-robin variants for ρ =.: (a) In the case of H (9% short jobs of mean service time.9 and % long jobs of mean service time 7.6), the best performance of,, and is reached at (.,.98), i.e., =., ( ρ)e[t] =.98, (.,.9), and (.,.98), respectively; (b) In the case of HE (9% short jobs of mean service time.6 and % long jobs of mean service time 8.), the best performance of,, and is reached at (.,.98), (.6,.9), and (.6,.9), respectively; (c) In the case of PT, the best performance of,, and is reached at (.,.966), (.,.87), and (,.86), respectively. preemption, the extra waiting times experienced by new jobs are negligible as. The second similarity is that as increases, the performance of all the variants gets improved first and then degrades eventually for large values. Our explanation on performance improvement is that with the increase of, more and more short jobs can get through in one time-slice without being delayed much by the existence of long jobs. When is in the order of mean service time (i.e., x), this allows almost all the short (and therefore comparable size) jobs to be LCFS, but allows processor sharing for mixtures of lots of short jobs with or long ones. As is increased further, long jobs will benefit more from the allocation of large time-slices in comparison to short jobs. This is why the best performance is achieved for values in the order of mean service time. Note that the three variants do not require to know the individual service times until the jobs finish. - Figure. The performance of round-robin variants for ρ =.7: (a) In the case of H, the best performance of,, and is reached at (.7,.9), (.,.86), and (.,.87), respectively; (b) In the case of HE, the best performance of,, and is reached at (.,.9), (.6,.9), and (.,.97), respectively; (c) In the case of PT, the best performance of,, and is reached at (.,.9), (.,.79), and (6,.7), respectively. Moreover, the performance of all the three variants improves with the increase of the job load. As an illustration, we plot the results of using under different load conditions (i.e., different ρ values) in Figure. It is clear that the range of values that outperform the case of = gets wider, and a more significant reduction on the mean system time is often achieved. Such an improvement is expected since with the increase of the job load, it is more likely to have lots of relatively short jobs mixed with a couple extremely long jobs. So there are more opportunities to favor short jobs across a wide range of values and reduce the mean system time as much as possible. By always giving priority to newly arrived jobs and thereby favoring short jobs, long jobs may be unfairly treated. We know that for PS, the expected response time of any individual job is proportional to its service time requirement, i.e., E[T(x)] = x/( ρ), where T(x) is a random variable denoting response time of a job of service time x. Clearly, the slowdown ratio (= E[T(x)]/x) is the same for jobs with different service time requirements and equals to /( ρ). Therefore,

5 (-ρ)e[t] (-ρ)e[t] 6 (a) H (b) HE - - Figure. The performance of round-robin variants for ρ =.9: (a) In the case of H, the best performance of,, and is reached at (.,.89), (.,.797), and (.8,.79), respectively; (b) In the case of HE, the best performance of,, and is reached at (.,.96), (.6,.887), and (.8,.887), respectively; (c) In the case of PT, the best performance of,, and is reached at (,.7), (,.6), and (8,.6), respectively. PS could be considered to be a fair scheme. To judge whether the three variants are fair to long jobs, we examine their performance under different load conditions using several chosen values, namely.,.,., and.. By measuring the slowdown ratios of jobs in different service time bins (i.e., partitioning service time intervals), we find that the slowdown ratios of long jobs are not far away from /( ρ) for a given ρ, usually only several percent more, and the worst case happens at =.. For example, under ρ =.7, the slowdown ratios of long jobs are at most.% worse than that of PS among the considered values for H, while under ρ =.9, they are at most 7.% worse. On the other hand, the slowdown ratios of short jobs could be reduced by half or more. This tells us that long jobs are not punished much, while short jobs gain a lot. Note that there are other fairness measures. For example, if fairness is defined to have a similar amount of waiting time for any individual job, FCFS is fair. Apparently, long jobs prefer FCFS, while short jobs suffer a lot. With short jobs being the majority, such a fairness measure is actually unfair in the view of short jobs. (-ρ)e[t] (-ρ)e[t] ρ=. ρ=.7 ρ=.9 (a) H - (b) HE - - Figure. The performance of under different load conditions: (a) H ; (b) HE ;. It can be seen that the performance gets improved in terms of both a wider range of values and a more significant reduction on the mean system time... Effectiveness of Immediate Preemption Another observation drawn from the figures is that and outperform under all the load conditions. This can be explained by noting that, on one hand, without immediate preemption, newly arrived short jobs cannot avoid the delay caused by execution of long jobs where the delay becomes significant with the increase of. On the other hand, with immediate preemption, newly arrived short jobs can be served immediately. If a newly arrived job turns out to be a long job, it is likely to be preempted by a short job soon. Hence, the immediate preemption scheme is helpful in avoiding the drawback. While short jobs in execution may be preempted by newly arrived jobs, they will not be delayed much since they will be resumed right after the processing of preempting jobs. It is important to note that the performance will be worse if the preempted jobs are denied the rest of their time-slices and put at the end of queue (not shown here). Without immediate preemption, degenerates to LCFS (which has the same performance as FCFS in terms of mean system time) as. For H and HE, whose C v is set to. (i.e., a finite value), we know that the asymptotic value of E[T] is finite according to Eq. (). However, for PT, whose C v is infinite, has infinite E[T] on the limit. Such a disadvantage is overcome by using immediate preemption in and. Specifically, as,

6 both and become LCFSPR, which is known to have the same mean system time as M/M/... vs. From the results, it can be seen that performs no worse than under all the examined load conditions. The difference is small under light load conditions, as shown in Figures. This is because jobs are less likely to compete with each other and to be preempted by others. In addition, most jobs are short. Hence, it does not matter much whether to give a preempted job a full time-slice or its remaining timeslice. Under moderately-heavy load conditions (as shown in Figures and ), however, more preemptions tend to occur. On one hand, by always giving a full time-slice to a preempted job, will serve long jobs for more than one full time-slice in each round (i.e., before putting them at the end of queue) so that short jobs may be delayed. On the other hand, will give only one full time-slice to long jobs in each round so that the delay on short jobs is reduced. This is why in the case of, the range of values that outperform the case of being zero is quite large. Given such a large range of values, there is often no need to find the optimal one. Instead, we can safely pick one in the range (e.g., = or ) and expect good performance. The above discussion can explain the results of H and PT. However, as illustrated in the figures, and perform similarly under all the examined load conditions for HE. One possible explanation is that unlike H, which has a monotonically increasing residual-time (i.e., expectedremaining-time) function, HE does not have this characteristic, i.e., longer attained times not necessarily corresponding to longer residual times. For H, the preempted jobs are more likely to have longer residual times than the newly arrived ones, while this may not be true for HE. So may not help much by only giving their remaining time-slices to the preempted jobs. Another factor is that may not be efficient if the total work load of short jobs (i.e, jobs in the short job branch) only takes a small percentage. By assuming that % of jobs come from the long job branch, the total work load of short jobs is only.% for HE, while it is about 6.9% for H. Under such settings, turns out to work well for H, but not for HE. In view of this, we investigate the performance of and under the assumption that each of the distributions has balanced branches (i.e., p T = p T, where p i and T i are the branching probability and mean service time of i th branch for i = or ). We find that outperforms for each of the distributions, as Figure 6 shows. In a balanced setting, there are more short-size jobs for H and HE in comparison to the case of assuming % of long jobs. This could be a possible reason why outperforms for HE in a balanced setting. (-ρ)e[t] (a) H - (b) HE - Figure 6. The performance of round-robin variants for ρ =.7 in a balanced setting: (a) H (9.% short jobs of mean service time. and.77% long jobs of mean service time.7); (b) HE (96.7% short jobs of mean service time.8 and.% long jobs of mean service time.8). outperforms for both H and HE.. CONCLUSIONS AND FUTURE WORK In this paper, we addressed the issue of how to improve performance of round-robin in handling highly varying job demands. The key idea is to favor newly arrived jobs, which are more likely to be short jobs. Through simulation, it was demonstrated that there exist some intermediate values that outperform the pure processor sharing ( = ), and the optimal is typically of the order of mean job service time. Moreover, the scheme, which preempts the current active job immediately upon job arrivals, can improve performance of round-robin effectively. By giving only the remaining timeslice to a preempted job, the range of intermediate values can be quite large. We show that the overall performance of is the best amongst the three examined roundrobin variants.these remarkable results suggest that a large value can be used in practice and thereby keep the job switching overhead small. Another advantage of using these round-robin variants is their simplicity in comparison to other queueing disciplines, such as FB and SRT. As a future work we are in a process of gathering simulation results that investigate the delay experienced by two classes of jobs. We also plan to include the context switching overhead factor. 6. APPENDIX: SERVICE-TIME DISTRIBU- TIONS USED HERE For any service time distribution, the probability that the service time is no more than x is F(x) = Pr(X x), which is called the Probability Distribution Function (PDF) in proba-

7 bility theory. Its derivative, f(x) = df(x) dx, is called the probability density function (pdf). In the following, we describe the service-time distributions examined in this paper and give their pdfs. The well-known exponential distribution has pdf f(x) = µe µx, where µ is the service rate. All the other distributions examined here use exponential distribution as a building block. The Erlangian-n distribution (i.e., n-phase Erlangian distribution) describes the distribution of the sum of n mutually independent, identically distributed exponential random variables. Its pdf is known to be, f(x) = µ(µx)n (n )! e µx. The hyper-exponential- distribution describes the service times of jobs whose execution takes one of two exponential branches, whose service rates µ and µ, respectively. The first branch will be taken with probability p and the other with probability p (p + p = ). Its probability density function is, f(x) = p µ e µ x + p µ e µ x. The hyper-erlangian-n distribution also describes a twobranch process. However, instead of being exponential, each branch is E n. By referring to the pdf of E n and that of H, the probability density function of HE n is, f(x) = p µ (µ x) n (n )! e µ x µ (µ x) n + p e µ x. (n )! The truncated-power-tail (T PT ) distributions are a special class of generalized hyper-exponential distributions in that both the branching probabilities and branch service rates are geometric. Specifically, let θγ α =, where θ < and α. The probability of moving to the i th branch p i is p θ i and the service rate of the i th branch is µ/γ i if the first branch has a service rate of µ. The variance of a T PT distribution is finite but can be set to arbitrarily large by increasing the number of branches as well as varying θ and/or α. A nice property of TPT is that its behavior in a finite range is similar to that of a power-tail distribution. Furthermore, by removing the constraint on the maximum number of branches, a power-tail distribution with α < can be represented. For a more complete discussion of distributions in Queueing Theory, please refer to []. [] F. Baskett, K. M. Chandy, R. R. Muntz, and F. G. Palacios. Open, closed, and mixed networks of queues with different classes of customers. J. ACM, ():8 6, 97. [] Feng Zhang and Lester Lipsky. Simulation of several round-robin variants in M/G/ queues. In 6 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS6), 6. [] Misja Nuyens and Adam Wierman. The foregroundbackground queue: A survey. Perform. Eval., 6(- ):86 7, 8. [6] Kenneth C. Sevcik. Scheduling for minimum total loss using service time distributions. J. ACM, ():66 7, 97. [7] Edward G. Coffman and Leonard Kleinrock. Feedback queueing models for time-shared systems. J. ACM, ():9 76, 968. [8] William G. Bulgren and Lee-Ho Hwang. A simulation study of time-slicing in non-exponential service environment. In ANSS 8: Proceedings of the th annual symposium on Simulation, pages 6 8, 98. [9] M. Shreedhar and George Varghese. Efficient Fair Queueing Using Deficit Round Robin. In Proc. of ACM SIGCOMM 9, pages, 99. [] Bogdan Caprita, Jason Nieh, and Wong Chun Chan. Group Round Robin: Improving the Fairness and Complexity of Packet Scheduling. In ANCS : First ACM/IEEE Symposium on Architectures for Networking and Communications Systems, pages 9,. [] M. Greiner, M. Jobmann, and L. Lipsky. The importance of power-tail distributions for modeling queueing systems. Operations Research, 7(): 6, 999. [] Lester Lipsky. QUEUEING THEORY: A Linear Algebraic Approach. Springer, 8. REFERENCES [] N. Bansal and M. Harchol-Balter. Analysis of srpt scheduling: Investigating unfairness. In Proc. ACM Sigmetrics,. [] L.E. Schrage and L.W. Miller. The Queue M/G/ with the Shortest Processing Remaining Time Discipline. Operations Research, ():67 68, 966.