A static jobs scheduling for independent jobs in Grid Environment by using Fuzzy C-Mean and Genetic algorithms

Size: px
Start display at page:

Download "A static jobs scheduling for independent jobs in Grid Environment by using Fuzzy C-Mean and Genetic algorithms"

Transcription

1 Proceedngs of the Postgraduate Annual esearch Semnar A statc obs schedulng for ndependent obs n Grd Envronment by usng Fuzzy C-Mean and Genetc algorthms SIILUCK LOPUMAEE, MOHD OO MD SAP, ABDUL HAA ABDULLAH 3, SUAT SIOY 4,4 Faculty of Scence and Technology, Suan Dust aabhat Unversty 95 aasrma d., Dust, Bangkok, Thaland, Tel: (66)-44544, Fa: (66) Emal: { srluck_lor, 4 surat_sr@dust.ac.th Faculty of Computer Scence and Informaton Systems, Unversty Technology of Malaysa, 830 Skuda, Johor, Malaysa, Tel: (607) , Fa: (607) Emal: { mohdnoor, 3 hanan@fsksm.utm.my Abstract The concept of Grd computng s becomng a more mportant for the hgh performance computng world. Such fleble resource request could offer the opportunty to optmze several parameters, such as coordnated resource sharng among dynamc collectons of ndvduals, nsttutons, and resources. Specfcally, we nvestgate the statc ob schedulng algorthm for ndependent obs. In ths paper we propose and evaluate epermentally a statc schedulng for ndependent obs that rely on determnng ob characterstcs at runtme and obs allocate to resources. We present a statc ob schedulng algorthm by usng Fuzzy C-Mean and Genetc algorthms. Our model presents the strateges of allocatng obs to dfferent nodes, whch we have developed the model by usng Fuzzy C-Mean algorthm for predcton the characterstcs of obs that run n Grd envronment and Genetc algorthm for obs allocated to large scale sharng of resources. The performance of our model n a statc ob schedulng have researchers wll be dscussed. Our model has shown that the schedulng system wll allocate obs effcently and effectvely. Key-Words: - Grd computng, Fuzzy C-Mean algorthm, Genetc algorthm, a statc obs schedulng, Heterogeneous computng system. I. ITODUCTIO rd Computng s the prncple that occurs for a long G perod of tme by focusng on vrtual organzatons [] to share large-scale resources, nnovatng applcatons and n some cases gettng hgh-performance orentaton. Under ths prncple, Grd has problem n fleble, secure, coordnated resource sharng among dynamc collectons of ndvduals, nsttutons, and resources. In Grd [] concept s a new generaton of technologes combne physcal resources and applcatons that provde vastly more effectve solutons to comple problems (e.g., scentfc, engneerng and busness). These new technologes must be bult on secure dscovery, obs allocates to resources, ntegraton resources and servces from the others. In [3] s a formal defnton of Grd concepts. They defne conceptual models s abstract machnes that support applcatons and servces. Fg. taken from [3], are formally defned (e.g., Organzaton, Vrtual Organzaton, Vrtual Machne, Programmng System, etc.). Currently, Global Grd Forum [4] formulated and provded standards documents of vrtual organzaton. Grd Computng goes beyond dstrbutng and sharng resources to more applcatons, whch are adaptng to use Grd resources. Although, usng dstrbuted resources are useful but ths s possble only f the Grd resources are scheduled as well. The optmal scheduler wll result n hgh performance n Grd computng but the poor scheduler wll be makng contrast result. ow, the Grd schedulng s bg topc n Grd envronment for new algorthm model. The Grd schedulng s responsble for resource dscovery, resources selecton, and ob assgnment over a decentralzed heterogeneous system, whch resources belong to multple admnstratve domans. ormally, the resources are requested by a Grd applcaton, whch use to computng, data and network resources etc. However, Grd schedulng of applcatons s absolutely more comple than schedulng an applcatons of a sngle computer. Because resources nformaton of sngle computer schedulng s easy to get nformaton, such as CPU frequency, number of CPU a machne, memory sze, memory confguraton and network bandwdth etc. But Grd envronment s dynamc resources sharng and dstrbutng. Then an applcaton s hard to get resources nformaton, such as CPU load, avalable memory, avalable network capacty etc. And Grd envronment also hard to classfy obs characterstc, that run n Grd. There are bascally two approaches to solve ths problems, the frst

2 Proceedngs of the Postgraduate Annual esearch Semnar 006 s based on obs characterstc and second s based on a dstrbuted resources dscovery and allocaton system. It should optmze the allocaton of a ob allowng the eecuton on the optmzaton of resources. The schedulng n Grd envronment has to satsfy a number of constrants on dfferent problems. We have defned a set of them to study the feasblty and the usefulness of applyng Fuzzy C-Mean algorthm and genetc algorthm to ths feld. The model s desgned for multple obectves n the obs schedulng system n whch often nvolve a lot of hstorcal data and many comple obectve. Our model consder n ob submt tme, run tme, dle tme and obs end tme that each ob runnng on the Grd. Fuzzy C-Mean algorthm and Genetc algorthms are appled to solve the obs schedulng system wth multple obectves and the results have shown that the obs schedulng system can mprove the performance and can allocate obs effcently and effectvely. In ths paper we dscussed Fuzzy C-Mean clusterng and Genetc algorthm n schedulng system. To ths end, we are revewng a related work n Secton II. et, Secton III we bref our desgn and archtecture. Secton IV, concludes the paper by summarzng ths work, and provdng a prevew of future research n ths area, s gven n Secton V. II. ELATED WOKED Most of the ob schedulng n Grd envronment based on ob eecute tme and ob run tme has been proposed. In [5], the module predcton engne s a part of schedulng and offer a hstory based approach for estmatng the run tme of ob submsson. Intellgence wll be a key feature n the net generaton of Grd envronment. In [6] proposed two modules for predctng the completon tme of obs n a servce Grd and applyng genetc algorthm to ob schedulng. The problems of schedulng system on the Grd envronment have been n [7], [8], [9], [0]. All of them adopt the method of genetc algorthm for obs schedulng by applyng dfferent obs characterzaton to mprove performance. We notced that ther methods focused on optmzaton or sub-optmzaton for schedulng system. Effcacous and effectve ob schedulng n Grd requres to model can allocate the avalable resources on Grd nodes to compute obs, determne the current workload and predct the ob eecuton tme. In [], was ob schedulng n parallel and cluster computng; ther goals are to acheve performance and load balancng across the entre system. Facng varyng stuatons, ntellgent Grd envronments need complcated schedulng strateges and algorthms to handle dfferent knds of obs. Heurstc algorthms are often used n Grd envronment for schedulng system. The algorthms use hstorcal data of workload and eplct constrants to schedulng obs [], [3]. In [4] proposed models for schedulng system by usng genetc algorthm wth multple goals. It s consdered problems n parallel machne schedulng. Our approach, especally eamne the mplcatons of the fact that workload of obs s epected to have an mpact on the resources utlzaton and, even more nterestngly for researcher on the performance qualty. We use nformaton about statc workload data from the Standard Workload Archve [4] and t has been epermented n several publcatons [5], [6]. These workload traces conssts of nformaton about all ob submssons on a machne for a certan perod of tme whch usually ranges over several months and several thousands of obs. Therefore, t s reasonable to start wth the avalable workload traces nformaton from the compute centers to evaluate the mpact of obs characterzaton n Grd envronment. III. DESIG AD ACHITECTUE A. Clusterng Clusterng algorthms refer to automatc unsupervsed classfcaton method n data set and t can be classfcaton nto dfferent categores data set base on the type of nput parameters, type of clusters they are nterested. Clusterng algorthms for data sets can be found n dfferent felds, such as statstcs, computer scence, bonformatcs and machne learnng. The most famous clusterng algorthm s the Fuzzy C-Mean algorthm. Ths paper ams to ntroduce cluster analyss for classfcaton of obs characterstc, or obects, accordng to smlartes among them, and organzng obects nto groups. A cluster s a group of obects that are more smlar to each other than to obects n other clusters. Smlarty s often defned by means of dstance based upon the length from a data vector to some prototypcal obect of the cluster. The data are typcally observatons of some phenomenon. Each obect conssts of m measured varables, grouped nto an m-dmensonal column vector {,,, n. A set of n obects s denoted by U {,,, p,,, n,,, n X [ ] p, p, p, n To dstngush between labeled and unlabeled patterns we wll ntroduce a two-valued (Boolean) ndcator vector b [ b ], k,,.. wth 0 - entres n the followng k manner f pattern s labeled k b k 0 otherwse A-. Fuzzy c-means (FCM) clusterng The Fuzzy C-Means s one of the estng clusterng

3 Proceedngs of the Postgraduate Annual esearch Semnar 006 methods for buldng fuzzy parttons. Ths method wll be used n ths paper as the basc tool for buldng obs characterzaton n Grd envronment. Fuzzy C-Means (FCM) algorthm, also known as fuzzy ISODATA, was ntroduced by Bezdek [5] as etenson to Dunn s [6] algorthm to generate fuzzy sets for every observed feature. Fuzzy clusterng methods allow for uncertanty n the cluster assgnments. ather that parttonng the data nto a collecton of dstnct sets (where each data pont s assgned to eactly one set), fuzzy clusterng creates a fuzzy pseudo partton, whch conssts of a collecton of fuzzy sets. Fuzzy sets dffer from tradtonal sets n that membershp n the set s allowed to be uncertan. A fuzzy set s formalzed by the followng defntons. Let X {,,, n be a set of gven data, where n s a set of feature data. The mnmzaton obectve functon of FCM algorthm s frequently used n pattern recognton follow as; C m J ( U, V ) ( μ ) v ; m < () Where m s any real number greater than, V { v, v,, v C are the cluster centers. U ( μ ) C s the degree of membershp of vector n the cluster. The values of matr U should satsfy the followng condtons; μ [ 0,] ;,,, ;,,,C () C μ ;,,, (3) The Eucldean dstance d v s any norm epressng the smlarty between any measured data and the center and calculate the cluster centers V accordng to the equaton: v m ( μ ) ( ) m μ ;,,,C (4) Fuzzy parttonng s carred out through an teratve optmzaton of the obectve functon shown above, wth the update the fuzzy partton matr U by: u (5) C d m k d k Ths algorthm wll stop f ma ( k ) ( ) μ μ < φ ; φ [0,] and k s the teraton step. All equaton shown above, converge to Fuzzy C-Mean algorthm by followng step: Algorthm Fuzzy C-Means Step : Generate an ntal U and V Step : At k-step: calculate the centers vectors C (k) [v ] wth U (k) : v ( μ ) ( ) m μ m Step 3: Update U (k), U (k+) u C d m k d k Step 4: If U (k+) - U (k) < φ then STOP; otherwse return to step. Fgure : Fuzzy c-means [7] B. Genetc Algorthms Genetc algorthms (GA) are a class of stochastc search algorthms whch based on bologcal evoluton [8], [9]. A basc GA can be represented as n Fg 7. taken from [0]. Genetc algorthms combne the eplotaton of past results wth the eploraton of new areas of the search space by usng survval of the fttest technques combned wth a structured of randomzed nformaton echange, a GA can mmc some of the nnovatve ntellgence of human search. A generaton s a collecton of artfcal creatures (strngs). In every new generaton a set of strngs s created usng nformaton from the prevous tmes. Occasonally a new part s tred for good measure. GAs are randomzed, but they are not smple random walks. They effcently eplot hstorcal nformaton to speculate on new search ponts wth epected mprovement. The approach used n ths work generates a set of ntal schedulng, evaluates the schedulng to obtan a measure of ftness, selects the most approprate and combnes then together usng operators (crossover and mutaton) to formulate a new set of solutons. The basc type of GAs, known as the smple GA (SGA), uses a populaton of bnary strngs, sngle pont crossover, and proportonal selecton [8], [9]. Many other modfcatons to the SGA have been proposed; some of these are adopted n our work. The followng subsectons eplan the steps n our proposed approach; GA( ) { Generate ntal populaton of Jobs ndvduals Evaluate ndvduals accordng to ftness functon; Whle stoppng condton s satsfed. { Count from to amount generaton; Select two parents from ntal populaton (Populaton Parent and Parent );

4 Proceedngs of the Postgraduate Annual esearch Semnar Crossover (Parent and Parent ) Chld; Mutaton (Parent p, Parent q, Chld); Ftness (Chld c, Best Chromosome bc); Improvement (Chld c); eplace (Chromosome chrom, Chld c); Schedulng( chromosome); return set of the chromosome n populaton for ob schedulng. C. Desgn and Algorthm Ths secton wll brefly descrbe how workload data wll separate to three classfcatons and the strategc of them wll allocate to optmal Grd nodes. Let us consder a set of obs of workload, a set of heterogeneous computng system. Jobs are subect to constrants on the Grd envronment then we selected a set of propertes of obs and descrptons relevant to real world stuatons, an eample dataset s presented as n Table. Table :data set of obs propertes representaton Then G n n m m Fg. shows our model by usng Fuzzy C-Mean clusterng technque for predcton obs characterstc to three classfcatons, frst Heavy obs workload, second Medum obs workload, fnally Low obs workload. And we desgned GA algorthm, whch was the strateges of allocatng obs to dfferent nodes. Hgh obs Workload Jobs Workload Fuzzy C Mean Algorthm Medum obs Workload nm Low obs Workload GA Algorthm Instance Propertes type Decson feld Schedulng Job umber Yes Submt Tme o Grd odes 3 un Tme Yes 4 umber of allocated processors o Fgure : Our Model 5 User ID Yes 6 Group ID o 3 4 m J s, r, Let J [ ] y y y y be a set of gven obs, where y s the yth of User ID and s the th of Job umber. As a Grd envronment conssts of the vector value of n nodes and a node conssts of m resources then followng as: G and [ 3 m ] 3 n 3 n Fgure 3: The strategc of allocaton obs to Grd nodes Fg., 3 shows how the ob wll compute to optmal Grd nodes, whch we separate obs to three classfcatons, after that we allocate each obs n dfferent classfcaton to optmal Grd nodes. D. Our schedulng algorthms Fuzzy C-Mean and SJF( ) { Input: J [ ] y where as each of obs order n ob number. y y y,

5 Proceedngs of the Postgraduate Annual esearch Semnar Fnd classfcaton of workload by usng Fuzzy C-Mean, such that. Heavy workload: Group of obs s large ob run-tme. Such as t H,t H,,t Ha ; a y. Medum workload: Group of obs s medum ob runtme. Such as t M,t M,,t Mb ; b y 3. Lght workload: Group of obs s short ob run-tme. Such as t L,t L,,t Lc ; c y whle (obs y ) { f (Heavy workload) { Mappng obs to processor by usng GA: th th t t H 3 H f (Medum workload) { Mappng obs to processor by usng GA: tm tm t t M 3 M f (Lght workload) { Mappng obs to processor by usng GA: tl tl t t L 3 L Output: WatngTm e WatngTme WatngTme 3 WatngTme y Mn(Makespan n ), where as n s each of node n Grd envronment. y WatngTm e Mn y IV. EXPEIMETAL SETUP AD ESULTS In ths eperment, we used obs workload data from the Standard Workload Archve [8]. Ths data conssts of,000 obs, each of whch has 8 propertes feld, however we focused on some propertes that prevous menton. In our eperments we assumed that each of obs s allowed to run n each node by usng space-sharng mechansm. Our smulaton, we smulated 50 dfferent performance nodes n Grd envronment. Our eperments showed classfcaton of obs workload to three groups, frst s heavy workload, second s medum workload and thrd s lght workload. Fgure 4 shows the obs characterzaton by usng Fuzzy C-Mean algorthm separated workload data to three groups, (a) s heavy workload, (b) s medum workload and (c) s lght workload. Fgure 4: Jobs characterzaton In ths eperment, the obs n workload data s predcted to three classfcaton, heavy workload s un-tme from 6,959 to 9,76 and total obs are obs, Medum workload from,790 to 6,540 and total obs are 69 obs and Lght workload from to,759 and total obs are 909, see Table. Table : obs classfcatons n Grd envronment. o. Classfcaton Total Heavy Workload Medum Workload 69 3 Lght Workload 909 In Fgure 5, 6 shows the dstrbuton of make-span tme and average watng tme of all obs n Grd envronment. Our smulaton defne crossover rate (P c ) s 0.9, mutaton rate (P m ) s 0.9 and 500 generaton wth each type of workload.

6 Proceedngs of the Postgraduate Annual esearch Semnar Ftness.00E E E E E E E Generaton Fgure 5: Make-span tme of all obs n Grd envronment Ftness.60E+04.40E+04.0E+04.00E E E E+03.00E E Generaton Fgure 6: Average watng tme of all obs n Grd envronment V. COCLUSIOS We have studed the ob schedulng problem for Grd envronment as a combnatoral predcton and optmzaton. Several observatons are lsted;. We have proposed the ntellgence schedulng n Grd envronment and used t for our algorthm by usng Fuzzy C-Mean clusterng technque for predctng three classfcatons and GA algorthm for allocate them to dfferent Grd nodes.. We used obs workload from the Standard Workload Archve [8] on space-sharng mechansm. We show that the proposed model captures the obs characterzaton of real workload n three dfferent classfcatons. Ths model can be used to our algorthm for smulatng and evaluatng schedulng polces for smulatng Grd envronment. 3. Many schedulng algorthms for Grd envronment depend on statc nformaton provded by the Standard Workload Archve [8] and the dfferent performance nodes n Grd, smulatng by us. We observe that ths nformaton s often unrelable. However, t s a useful way. 4. The eperment results on the make-span tme and average watng tme have shown that the schedulng system usng our algorthm can allocate the optmal results. 5. The results provded here suggest that the researcher look forward to new method for such problems should consder combne them wth ther method. VI. FUTUE WOK The followng wll be our future works;. Our smulaton envronment wll nclude crtcal parameters, such as submt tme, Grd network cost, ob mgratons overhead, faults tolerance.. We plan to nvestgate the swarm ntellgence mechansm, such as ant colony algorthm for Grd schedulng. 3. We wll nclude a more comple characterzaton of the constrants for Grd schedulng and wll mprove the complety problems n Grd envronment. VII. ACKOWLEDGMET We would lke to thank Suan Dust aabhat Unversty for supported us. EFEECES [] Foster I, Kesselman C, Tuecke S. The anatomy of the Grd: Enablng scalable vrtual organzatons. Internatonal Journal of Supercomputer Applcatons 00. [] I. Foster and C. Kesselman, Eds., The Grd : Blueprnt for a ew Computng Infrastructure. San Francsco, CA: Morgan Kaufmann, 004. [3] M. Parashar, J. Browne, Conceptual and Implementaton Models for the Grd. [4] Global Grd Forum [Onlne]. Avalable: [5] ArshadAl, Ashq Anum, Julan Bunn,. Cavanaugh, Frank van Lngen,. McClatchey, Muhammad Atf Mehmood, H. ewman, C. Steenberg, M. Thomas, I. Wllers. PEDICTIG THE ESOUCE EQUIEMETS OF A JOB SUBMISSIO. [6] Y. Gao, H. ong, Joshua Zheue Huang. Adaptve grd ob schedulng wth genetc algorthms. [7] Downey, A, Predctng Queue Tmes on Space-Sharng Parallel Computers, n Internatonal Parallel Processng Symposum, 997. [8] Gbbons,., A Hstorcal Profler for Use by Parallel Schedulers, Master's thess, Unversty of Toronto, 997. [9] Vraalsen, F.,.A. Aydt, C.L. Mendes, and D.A. eed, Performance Contracts: Predctng and Montorng Grd Applcaton Behavor, GID 00: Proceedngs of the Second Internatonal Workshop on Grd Computng, 00. [0] MEHA, P., SCHULBACH, C. H., AD YA, J. C. A Comparson of Two Model-Based Performance-Predcton Technques for Message-Passng Parallel Programs. In Proceedngs of the ACM Conference on Measurement & Modelng of Computer Systems - SIGMETICS 94 (ashvlle, May 994), pp [] SAAVEDA-BAEA,. H., SMITH, A. J., AD MIYA, E. Performance predcton by benchmark and machne characterzaton. IEEE Transactons on Computers 38, (December 989), [] KAPADIA,., FOTES, J., AD BODLEY, C. Predctve Applcaton-Performance Modelng n a Computatonal Grd Envronment. In Proceedngs of the Eght IEEE Symposum on Hgh- Performance Dstrbuted Computng (edondo Beach, Calforna, August 999), pp [3] PETITET, A., BLACKFOD, S., J.DOGAA, ELLIS, B., FAGG, G., OCHE, K., AD VADHIYA, S. umercal Lbrares and The Grd: The GrADS Eperments wth ScaLAPACK. Tech. ep. UT- CS-0-460, Unversty of Tennessee, Aprl 00. [4] Parallel Workloads Archve. labs/parallel/workload/, Jun 004. [5] J. Bezkek, Pattern ecognton wth Fuzzy Obectve Functon Algorthms, Plenum Press, USA, 98. [6] J.C. Dunn, A Fuzzy elatve of the ISODATA process and ts Use n Detectng Compact, Well Separated Clusters, Journal of Cybernetcs, Vol. 3, o.3, 974, pp

7 Proceedngs of the Postgraduate Annual esearch Semnar [7] A Tutoral on Clusterng Algorthms: Fuzzy C-Means Clusterng Clusterng/tutoral_html/nde.html. [8] J. Jann, P. Pattnak, H. Franke, F. Wang, J. Skovra, and J. ordan. Modelng of Workload n MPPs. In D. Fetelson and L. udolph, edtors, IPPS 97 Workshop: Job Schedulng Strateges for Parallel Processng, pages Sprnger Verlag, Lecture otes n Computer Scence LCS 9, 997. [9] C. Ernemann, B. Song, and. Yahyapour. Scalng of Workload Traces. In D. G. Fetelson, L. udolph, and U. Schwegelshohn, edtors, Job Schedulng Strateges for Parallel Processng: 9th Internatonal Workshop, JSSPP 003 Seattle, WA, USA, June 4, 003, volume 86 of Lecture otes n Computer Scence (LCS), pages Sprnger-Verlag Hedelberg, October 003. [0] J. Krallmann, U. Schwegelshohn, and. Yahyapour. On the Desgn and Evaluaton of Job Schedulng Systems. In D. G. Fetelson and L. udolph, edtors, IPPS/SPDP 99 Workshop: Job Schedulng Strateges for Parallel Processng. Sprnger, Berln, Lecture otes n Computer Scence, LCS 659, 999.