Task Scheduling in Grid Computing: A Review

Size: px
Start display at page:

Download "Task Scheduling in Grid Computing: A Review"

Transcription

1 Advances n Computatonal Scences and Technology ISSN Volume 10, Number 6 (2017) pp Research Inda Publcatons Task Schedulng n Grd Computng: A Revew Manjot Kaur Bhata Professor, Department of Informaton Technology Jagan Insttute of Management Studes, Delh, Inda. Abstract Grd computng s a category of hgh performance computng envronment facltates to solve large scale computatonal demands. It nvolves varous research ssues such as resource management, task schedulng, securty problems and nformaton management. Task schedulng s an mportant element of parallel, dstrbuted computng and grd computng. The man am of task schedulng n grd computng s to ncrease system throughput, performance and also to satsfy resource requrements of the task. Grd scheduler n grd computng envronment s an applcaton responsble for allocatng the sutable resources to ndependent tasks to maxmze the system utlzaton. Keywords: Grd Computng, Grd schedulng, Task schedulng, Grd ssues, Schedulng algorthms 1. INTRODUCTION Grd s an ntegraton of heterogeneous resources, termed to be an deal nfrastructure f t s able to provde dfferent types of resources such as: processng unts, storage unts and communcaton unts (Foster, Kesselman, 1999). However, n realty grd mplementatons usually work on the collaboraton of certan type of resources. Generally grd can be followng types: Computatonal grd: Hgh performance computng needs hardware and software wth hgh end computatonal capabltes to meet the processng demands of complex scentfc problems. A computatonal grd shares processng power as the man computng resource among ts nodes and provdes dependable, consstent, pervasve and nexpensve computatonal facltes to tasks. It provdes

2 1708 Manjot Kaur Bhata computatonal power to process large scale jobs, to better utlze resources and to satsfy requrement for nstant access to resources on demand. Data grd: Data grd can be vewed as the storage component of a grd envronment assembled from portons of a large number of storage devces. It acts as a massve data storage system that deals wth the storage, sharng and management of large amount of dstrbuted data. Data grd helps scentsts and engneers to access local or remote data to carry out extended calculatons for applcatons that requre large amount of dstrbuted data. 2. GRID APPLICATIONS Ths hgh end technology can be used to solve computatonally complex scentfc, mathematcal and fnancal problems (Joseph, 2004): In medcal professon doctors need to analyze and compare large amount of mages comng from varous resources and work on problems lke drug dscovery. Team of meteorologsts can utlze grd envronment for analyss and securty of petabytes of clmate data and for weather forecastng. Problems lke dsaster response, urban plannng and economc modelng are tradtonally assgned to natonal governments. v Engneers and accountants share resources to analyze terabytes of structural data. Understandng the requrement of the above lsted grd computng applcatons, some of the common needs of these applcatons can be summarzed (Joseph, 2004): Applcaton parttonng that nvolves breakng the problem nto dscrete peces Dscovery and schedulng of tasks and workflow Data communcatons dstrbutng the problem data where and when t s requred v Provsonng and dstrbutng applcaton codes to specfc system nodes v Results management assstng n the decson processes of the envronment v Autonomc features such as self-confguraton, self-optmzaton, self-recovery, and self-management 3. GRID SCHEDULERS Grd scheduler s an applcaton responsble for managng the selecton and executon of jobs on sutable machne avalable n Grd system. The mportant functons performed by schedulers are: Managng large queue of global jobs, parttonng of jobs nto tasks and to schedule parallel executon of tasks, provdng best match to jobs from the avalable

3 Task Schedulng n Grd Computng: A Revew 1709 resources, advance reservaton of resources for jobs, storng jobs and data related to jobs, convertng global jobs nto local jobs, servce-level agreement valdaton and enforcement, examnng job executons and status, reschedulng and correctve actons of partal falover stuatons, mantan accountng records for all the jobs and transactons. These Grd schedulers then form the centralzed, herarchcal and decentralzed structure (Rodero, Gum & Corbalan, 2009) In centralzed scheduler, all tasks are sent to one entty n the system. Ths entty acts as a central server or central scheduler and need to have the knowledge of all the resources wth more control on resources. The man drawback wth ths scheduler s ts lack of scalablty and centre of falure. Performance and avalablty of ths type of schedulng system depends upon the accessblty and speed of the scheduler. The herarchcal scheduler, s organsed on dfferent levels havng a tree structure (Kurowsk, 2008). The meta scheduler at the top access larger set of resources through the local schedulers at lowest level n the herarchy. Here, local schedulers have the data of the resources. Herarchcal schedulers can handle the problem of scalablty and the sngle pont of falure up to some extent as compare to centralzed schedulng. The decentralzed scheduler has multple components that work ndependent and collaborate for obtanng the schedule (Lu, 2007). In decentralzed or dstrbuted schedulng each ste n the grd has local scheduler that manages ts computatonal resources. There s no central entty to control the resources. The job schedulng requests by the grd users or local users are managed and processed by the local schedulers or transferred to other local scheduler. Therefore, n comparson to centralzed scheduler decentralzed scheduler provdes better relablty and fault tolerance but absence of global scheduler that mantans the records of all resources and jobs could results n low effcency. The avalable Grd Resource Management system lke: Condor(Raman et.al., 1998), Globus(Czajkowsk,1998), NetSolve(Casanova et. al.,1997), Nmrod/G(Buyya et. al., 2000), AppLeS(Casanova et. al., 2000) uses dfferent task schedulng approaches. The Condor uses centralzed scheduler and desgned to mprove overall throughput of the system n a controlled network envronment. Its schedulng algorthm does not consder any QoS requrement of tasks. The AppLeS schedulng algorthm focuses on effcent co-locaton of data and experments as well as adaptve schedulng. The Nmrod uses decentralzed scheduler and ts schedulng approach s based on predctve prcng model and Grd economy model. The Netsolve has decentralzed scheduler and schedulng approach focuses on fxed applcaton orented polcy consderng soft QoS.

4 1710 Manjot Kaur Bhata 4. TASK SCHEDULING IN GRID COMPUTING Task schedulng s an essental component of parallel, dstrbuted computng and grd computng. Task schedulng n parallel computng ams to reduce the turnaround tme of jobs whle task schedulng n grd computng targets to maxmze resource utlzaton. Many researchers have proposed schedulng algorthms for parallel system (Zhou et al., 1997, Fetelson et al., 1997, Krallmann et al., 1999) but schedulng n grd computng s more complcated. Therefore, varous researchers (Hamscher et al.,2000, L et al.,2005, Ernemann,2002) have shown nterest n t. Krauter et al. (2002), surveyed and analyzed grd resource management based on classfcaton of scheduler organzaton, system status, schedulng and reschedulng polces. The am of task schedulng n grd computng s not only to fnd an optmal resource to mprove the overall system performance but also to utlze the exstng resources more effcently. Many researchers have proposed several task schedulng algorthms n grd envronment. Queue based task schedulng lke Frst Come Frst Served (FCFS), uses queues where tasks are stored untl they are scheduled for executon. FCFS schedules tasks accordng to ther submsson order and checks the avalablty of the resources requred by each task. The schedulng crtera n FCFS do not consder executon tmes of the submtted tasks that lead to low utlzaton of the system resources, because the watng task cannot be executed f older tasks are n the queue, even f the resources t requres are all avalable (Techouba et. al, 2007). Task schedulng systems such as Sun Grd Engne, (SGE) (Gentzsch, W, 2001), Condor (Than, D., Tannenbaum, T. and Lvny, M, 2002), glte WMS (Burke, S., Campana, S., Pers, A., Donno, F., Lorenzo, P., Santnell, and Scaba, 2007), GrdWay (Huedo,E., Montero,R. and Llorente, I., 2001), Grd Servce Broker (Venugopal, S., Buyya, R. and Wnton, L., 2004) etc., are all queueng systems. These systems use queues to store the tasks submtted to the scheduler untl they are scheduled for executon. Tasks schedulng algorthms based on heurstc approach can be dvded nto two categores onlne mode and the other batch mode. In onlne mode, whenever a job arrves to the scheduler t s allocated to the frst free machne. In ths method, the arrval order of the job s mportant. Each task s consdered only once for matchng and schedulng. In case of batch mode, the jobs are collected n a set and are examned for mappng at prescheduled tmes called mappng events. Ths ndependent event uses heurstc approach to make better decson. Ths mappng heurstcs do better task/resource mappng because the heurstcs have the resource requrement nformaton for the meta-task, and know the actual executon tme of a larger number of tasks. Several heurstcs approaches lke Mn-mn, Max-mn, UDA and Suffrage are proposed for schedulng ndependent tasks. UDA(User Defned Assgnment): UDA(Fdanova, S., Durchova, M., 2006) assgns each task n an arbtrary order, to the machne, wth the best expected

5 Task Schedulng n Grd Computng: A Revew 1711 v v executon tme for that task, regardless of the machne avalablty. It assgns too many tasks to a sngle grd node. Ths leads to overloadng and the ncreases the response tme of the tasks. OLB(Opportunstc Load Balancng): OLB (Maheswaran, M., Al, S., Segel, H. J., Hensgen, D., Freund, R. F., 1999) assgns each task n an arbtrary order, to the next avalable machne, regardless of the task s expected executon tme on that machne. Ths algorthm produces the poor results as t s not consderng the expected executon tme. Mn-Mn Algorthm: Mn-Mn (Wu, M., Shu, W., Zhang, H., 2000) begns wth the batch of all unassgned tasks(ut) and s dvded nto two phases. In the frst phase, the set of mnmum expected completon tme for each task n UT s found. In the second phase, the task wth the overall mnmum expected completon tme from UT s chosen and assgned to the correspondng machne. Ths process s repeated untl the machnes are assgned to all the tasks. However, the Mn- Mn algorthm s unable to balance the load well as t usually does the schedulng of small tasks ntally. Max-Mn Algorthm: Max-Mn (Maheswaran et. al., 1999) dffers from Mn- Mn n second phase, where tasks wth overall maxmum expected completon tme from UT s chosen and assgned to correspondng machne. In comparson to Mn-Mn, Max-Mn gves prorty to longer task and schedules t frst. Suffrage Algorthm: In Suffrage algorthm, dfference between the mnmum and second mnmum completon tme for each task s calculated and ths value s defned as suffrage value. The task wth maxmum suffrage value s assgned to the machne wth mnmum completon tme. Schedulng s a process to fnd the optmal solutons, some characterstcs of the nature can be used to fnd the optmal soluton for Grd schedulng from large soluton set. Some of the nature nspred tasks schedulng algorthms are: Genetc Algorthm(GA): GA (Braun et. al., 2001) s a technque used for large soluton space. It works on a populaton of chromosomes for a gven set of UT where each chromosome represents a possble soluton. The ntal populaton can be generated ether randomly from a unform dstrbuton or by applyng other heurstc algorthms, such as Mn-Mn. Generatng ntal populaton by Mn-Mn s called seedng the populaton wth a mn-mn chromosome. Smulated Annealng (SA): SA (Zomaya, A. Y. and Kazman, R, 1999 and Lu, Y., 2004), s an teratve technque whch s based on the physcal process of annealng. It consders only one possble mappng for each unassgned task at a tme. Smulated annealng uses a procedure smlar to thermal process of obtanng low-energy crystallne states of a sold that probablstcally accept poor solutons

6 1712 Manjot Kaur Bhata n an attempt to obtan a better search of the soluton space based on a system temperature. The temperature s the total completon tme and better soluton s the optmal task machne mappng. The Genetc Smulated Annealng(GSA): GSA(P. Shroff, D. Watson, N. Flann, R. Freund, 1996) heurstcs s a combnaton of the GA and SA technques. GSA follows the procedures smlar to the GA. For the selecton process, GSA uses the SA coolng schedule and system temperature. v Tabu search (Tabu): Tabu search(falco, I. Balo, R., Tarantno, E. and Vaccaro, R., 1994 and Glover, F. and Laguna, M., 1997) s a strategy that saves prevous regons of the soluton space and avod repeatng a search near the areas that have already been searched. A mappng of meta-tasks uses the same representaton as a chromosome n the GA approach. Tabu search s mplemented begnnng wth a random mappng as the ntal soluton, generated from a unform soluton. These algorthms have several advantages and have some drawbacks also. The algorthms GA, SA and GSA are dffcult to mplement. The heurstc algorthms proposed for task schedulng n (Braun et al., 2001) and (Rtche, 2003) depend on statc envronment of system load and the expected value of executon tmes. The schedulng algorthm proposed by L,Yang n 2006 descrbes that the task can search very large spaces of canddate and wll be moved from one machne to another machne based on the solutons, t ncreases the traffc n the grd system that leads to degradaton n performance. The paper proposed by Yan et al. n 2005 consders communcaton cost and dfferent ant agents. CONCLUSION Grd computng s useful n varous ranges of applcatons such as medcal, weather forecastng, engneerng, research and others. Task schedulng and resource schedulng are the mportant areas of research n grd computng. Ths paper dscusses the task schedulng algorthms proposed by varous researchers. The algorthms proposed by researchers can be categorzed nto heurstc approach algorthms and nature nspred algorthms. The man goal of task schedulng algorthms s to mnmze the executon tme of each job or to mprove the processng capacty of the avalable resources. The advantages and dsadvantages of the algorthms are dscussed. The future work wll nclude a new schedulng algorthm to optmze the watng tme of the jobs. REFERENCES [1] Buyya, R., Abramson, D., & Gddy, J. (2000, May). Nmrod/G: An archtecture for a resource management and schedulng system n a global

7 Task Schedulng n Grd Computng: A Revew 1713 computatonal grd. In Hgh Performance Computng n the Asa-Pacfc Regon, Proceedngs. The Fourth Internatonal Conference/Exhbton on (Vol. 1, pp ). IEEE. [2] Braun, T. D., Segel, H. J., Beck, N., Bölön, L. L., Maheswaran, M., Reuther, A. I.,... & Freund, R. F. (2001). A comparson of eleven statc heurstcs for mappng a class of ndependent tasks onto heterogeneous dstrbuted computng systems. Journal of Parallel and Dstrbuted computng, 61(6), [3] Burke, S.,Campana, S., Pers, A. D., Donno, F., Lorenzo, P., Santnell, R. and Scaba, A. (2007). glte 3 user gude. Worldwde LHC Computng Grd. [4] Fdanova, S., & Durchova, M. (2006). Ant algorthm for grd schedulng problem. In Large-Scale Scentfc Computng (pp ). Sprnger Berln Hedelberg. [5] Casanova, H. and Dongarra, J., (1997). Netsolve: A Network Server for Solvng Computatonal Scence Problems, The Internatonal Journal of Supercomputng Applcatons and Hgh Performance Computng, Vol 11, Number 3, pp , [6] Czajkowsk, K., Foster, I., Karons, N., Kesselman, C., Martn, S., Smth, W., & Tuecke, S. (1998, January). A resource management archtecture for metacomputng systems. In Job Schedulng Strateges for Parallel Processng (pp ). Sprnger Berln Hedelberg. [7] Foster, I., & Kesselman, C. (1999). The Globus project: A status report. Future Generaton Computer Systems, 15(5), [8] Falco, I., Del Balo, R., Tarantno, E., & Vaccaro, R. (1994, June). Improvng search by ncorporatng evoluton prncples n parallel tabu search. InEvolutonary Computaton, IEEE World Congress on Computatonal Intellgence., Proceedngs of the Frst IEEE Conference on (pp ). [9] Lu, K., Subrata, R. and Zomaya, A.Y.(2007). On the performance drven load dstrbuton for heterogeneous computatonal grds. Journal of Computer and System Scences, Vol. 73, Issue 8, pp [10] Rodero, I., Gum, F., and Corbalan, J., (2009). Evaluaton of coordnated grd schedulng strateges. In proceedngs of the 11th IEEE nternatonal conference on Hgh Performance Computng and Communcatons(HPCC) (pp. 1-10). Washngton,DC,USA. [11] Kurowsk, K., Nabrzysk, J., Oleksak, A., Węglarz, J.(2008). Multcrtera approach to two-level herarchy schedulng n Grds. Journal of Schedulng 11(5), [12] Techouba, A., Capannn, G., Baragla, R., Puppn, D. and Pasqual, M.(2007). Back_llng strateges for schedulng streams of jobs on computatonal farms. In CoreGRID Workshop on Grd Programmng Model, Grd and P2P Systems Archtecture,Grd Systems, Tools and Envronments. Sprnger. [13] Zhou, X., Da. Y.S. and Rana, X. (July,2007) Dual-Level Key Management for secure grd Communcaton n dynamc and herarchcal groups, Future

8 1714 Manjot Kaur Bhata Generaton Computer Systems, Scence Drect, Vol. 23 No.6, pp [14] Hamscher, V., Schwegelshohn, U., Stret, A., & Yahyapour, R. (2000). Evaluaton of job-schedulng strateges for grd computng. In Grd Computng GRID 2000 (pp ). Sprnger Berln Hedelberg. [15] Gentzsch, W. (2001). Sun grd engne: Towards creatng a compute power grd. In Cluster Computng and the Grd, Proceedngs. Frst IEEE/ACM Internatonal Symposum on (pp ). IEEE. [16] Than, D., Tannenbaum, T. and Lvny, M.(2002). Condor and the Grd. In Fran Berman, Geo_rey Fox, and Tony Hey, edtors, Grd Computng: Makng the Global Infrastructure a Realty. John Wley & Sons Inc. [17] Huedo, E., Montero, R., & Llorente, I. M. (2005). The GrdWay framework for adaptve schedulng and executon on grds. Scalable Computng: Practce and Experence, 6(3). [18] Venugopal, S., Buyya, R., & Wnton, L. (2004, October). A grd servce broker for schedulng dstrbuted data-orented applcatons on global grds. In Proceedngs of the 2nd workshop on Mddleware for grd computng (pp ). ACM. [19] Wu, M. Y., Shu, W., & Zhang, H. (2000, May). Segmented mn-mn: A statc mappng algorthm for meta-tasks on heterogeneous computng systems. InHeterogeneous Computng Workshop (pp ). IEEE Computer Socety. [20] Maheswaran, M., Al, S., Segal, H. J., Hensgen, D., & Freund, R. F. (1999). Dynamc matchng and schedulng of a class of ndependent tasks onto heterogeneous computng systems. In Heterogeneous Computng Workshop, 1999.(HCW'99) Proceedngs. Eghth (pp ). IEEE. [21] Zomaya, A. Y., & Kazman, R. (1999). Smulated annealng technques. In Algorthms and theory of computaton handbook (pp ). (M. J. Atallah, Ed.), pp. 37-1_37-19, CRC Press, Boca Raton, FL. [22] Shroff, P., Watson, D. W., Flann, N. S., & Freund, R. F. (1996, Aprl). Genetc smulated annealng for schedulng data-dependent tasks n heterogeneous envronments. In 5th Heterogeneous Computng Workshop (HCW 96) (pp ).