Availability based Scoring Model for Resource Grouping in Desktop Grid Computing

Size: px
Start display at page:

Download "Availability based Scoring Model for Resource Grouping in Desktop Grid Computing"

Transcription

1 Indan Journal of Scence and Technology, Vol 8(30), DOI: /jst/2015/v830/87393, November 2015 ISSN (Prnt) : ISSN (Onlne) : Avalablty based Scorng Model for Resource Groupng n Desktop Grd Computng Soodeh Peyvand 1*, Rohza Ahmad 2 and M. Nordn Zakara 1 1 Hgh Performance Computng (HPC) Center, Unverst Teknolog PETRONAS, Tronoh, Perak, Malaysa; soodehpeyvand@gmal.com, 2 Department of Computer and Informaton Scences, Unverst Teknolog PETRONAS, Tronoh, Perak, Malaysa Abstract Many scentfc projects such as SETI@home are leveragng on enormous numbers of dstrbuted nternet-based dle desktop computers to execute ther jobs n a Desktop Grd (DG) computng envronment. These dle desktop computers (volunteers) are volatle,.e., they are free to jon and leave the DG systems. Besdes, the volunteers mght also have dfferent resources whch are heterogeneous n characterstcs and capabltes. Due to the above facts, volunteers n a DG system have dfferent levels of relablty and avalablty whch makes schedulng of jobs more challengng. Thus, ths paper presents a groupng method whch assgns these heterogeneous volatle volunteers nto dfferent levels of relablty groups, accordng to ther avalablty score. In order to derve the avalablty score model for the groupng purposes, ths study uses statstcal analyss of real trace data whch are taken from SETI@home project. By havng the proper groupng, better assgnment of jobs can be acheved n a DG system. Smulaton runs of the approach shows that t s feasble to volunteers are able to be grouped usng the scorng model. Keywords: Avalablty Scorng, Desktop Grd Computng, Groupng, Volatlty 1. Introducton Desktop grd computng systems use computng resources of dle desktop computers smultaneously n order to solve problems of scentfc and engneerng applcatons 1. Ths s acheved by breakng enormous problems nto smaller ndependent unts (tasks) and dstrbutng them to some avalable dle desktop computers for concurrent processng 2 6. The partcpatng desktop computers, whch are also known as volunteers or hosts, have dverse characterstcs n ther resources such as the number of CPU cores and the amount of memory space avalable. Besdes, the volunteers are also free to jon and leave the executon of tasks wthout any restrcton. Hence, volunteers n DG systems have dfferent behavor n executon such as dfferent partcpaton tme, avalablty and volatlty. Varaton of hardware and behavor of volunteers make t dffcult for a resource manager and job scheduler n the DG system to schedule tasks as well as control the allocated tasks and volunteers 7,8. Consequently, ths has led to the delay of jobs executon and also the degradaton of the entre DG s performance. In order to acheve the goals of resource management and job schedulng, avalablty of volatle volunteers s one of the key ssues whch need to be addressed n DG systems. Hence, a more relable resource management and schedulng polcy can be acheved by obtanng some predcted avalablty nformaton of the resources nvolved n the DG systems 9. To understand more about the avalablty of volatle volunteers n DG systems, characterstcs of trace data wth regards to avalablty would have to be analyzed 10. For that, statstcal analyss s one technque whch can be used to analyze the data and predct avalablty of the volatle volunteers 11. The technque makes predctons by analyzng past observatons of the volunteers. Hence, determnaton of smlartes among the past observatons and somehow relate them to the *Author for correspondence

2 Avalablty based Scorng Model for Resource Groupng n Desktop Grd Computng current stuaton are a major challenge n conductng statstcal analyss on the DG systems. Besdes avalablty, relablty s another concern of DG schedulng. Relablty s defned as the probablty that a DG system or scheduler performed ts task n certan condtons durng a specfed perod of tme 12. To mprove relablty of DG systems, methods such as reputaton or score based schedulng can be appled The method bascally chooses the most sutable volunteer (based on ts reputaton) to perform specfc job assgnment. Furthermore, resource/volunteer groupng can also be mplemented as a method for mprovng schedulng effcency n term of relablty 15,16. The method classfes heterogeneous resources nto homogeneous groups based on certan crtera By havng done so, jobs can be assgned to proper group for hgher chances of successful executon. Ths paper presents our dea of constructng groups of volunteers accordng to ther avalablty score by usng smple groupng method. Ultmately, each group wll represent a dfferent level of relablty n a DG system. To elaborate the dea, ths paper s organzed n the followng manner: secton 2 begns wth related works, secton 3 contnues wth the presentaton of an avalablty scorng model, secton 4 provdes a technque whch can be used for groupng the volunteers, secton 5 presents and dscusses the results of our work, and Secton 6 ends the paper wth concluson and future works. 2. Related Works In order to classfy the volatle and heterogeneous volunteers n desktop grd computng system, one should study the dynamc nature of volunteers wth uncertan avalablty. Therefore, one way to get more knowledge from volatle volunteers s by analysng trace data set n DG envronment regards to volunteers avalablty 9,10. Cho et al. 14 used the groupng method to construct groups of volunteers accordng to ther attrbutes such as dedcaton, volatlty, host avalablty and credblty. The authors dd not elaborate other state propertes such as tme zone, RAM sze and CPU avalablty nterval. Some exstng DG systems provde statc based groupng accordng to dsk space or operatng system 2, Furthermore, Neary et al. 20, Chakravart et al. 21, Zhou et al. 22, Montresor et al. 23 and Zhong et al. 24 proposed dynamc based groupng n terms of regstraton tme, tme zone, performance and workload rrespectve avalablty, reputaton or score, duraton of partcpaton and locaton whch are effectve on completon tme, relablty and result correctness. In concluson, due to the volatlty of DG s volunteers, both dynamc and statc nformaton of volunteers are mportant n order to ensure hgher chance of the job beng completed. Ths nformaton can be combned together as hybrd based groupng whch ncluded both dynamc nformaton of volunteers such as avalablty duraton (hstorcal nformaton) and statc nformaton such as tme zone, hardware capacty and so on. 3. Avalablty Scorng Model Before the volatle and heterogeneous volunteers n a DG system can be assgned to varous relablty groups, the crtera or factors whch can be used to classfy them wll have to be determned frst. In ths study, these factors were derved from some statstcal analyses whch have been performed on real trace data. The real trace data set used was downloaded from edu.au; and the data collected were related to a project named SETI@home. Snce, the study was amed to fnd nformaton on avalablty, as mentoned n our prevous publshed work 25, the trace data collected were focusng manly on CPU avalablty of the volunteers nvolved n the project for the perod of 8 months (Aprl 1, 2007 to February 1, 2008). In the paper, we have proved usng Spearman correlaton method, Pearson s ch-square and Kruskall-Walls test that the avalablty of a volunteer has an assocaton and relaton wth some factors of the volunteer. The factors dentfed were average CPU avalablty (ACPUA), RAM sze, number of processors or CPU cores, and tme zone and locaton of the volunteers 25. For each of the dentfed factors, Frequency Occurrence of Host/volunteer (FoH) was calculated for each possble value and range of each factor. For example, out of the volunteers nvolved n the SETI@home project, the numbers of them whch have 1 CPU core, 2 CPU cores and so on were calculated. To lessen the gaps between the resultng values, a weght fracton between 0 to 1 was assgned to each range. The value was derved by dvdng the range s FoH by the total number of hosts (ToH) as shown n Equaton (1). WeghtFrac ton = FoH ToH (1) 2 Vol 8 (30) November Indan Journal of Scence and Technology

3 Soodeh Peyvand, Rohza Ahmad and M. Nordn Zakara In addton, the ranges for each factor were ranked accordng to ther executon capablty. For example, a bgger RAM sze would be consdered better as compared to a smaller RAM sze, and therefore has a hgher rank. Smlarly, hgher ACPUA s more relable n terms of avalablty as compared to a lower value. Thus, t should be ranked hgher. For the rankng determnaton, the values were calculated usng Rank Sum method as n Equaton (2). The equaton calculates the weght W k for range k where n s the number of ranges 26. n + 1 k (2) W = k n = 1 n + 1 To further lead to the formulaton of the avalablty scorng model, the derved factors above were grouped accordng to ther dynamcty. In other words, ACPUA s consdered as dynamc factor snce ts value changes over tme. RAM sze and number of CPU cores on the other hand, are farly statc, hence they were dentfed as statc factors. Meanwhle the dynamc factor, statc factor as well as locaton and tme zone when combned together were consdered as hybrd factors. Usng the dynamc factor,.e., ACPUA value of a volunteer, the avalablty weght of volunteer at tme t was calculated usng Equaton (3). The equaton gves the rankng weght of the ACPUA nterval that was extracted from the trace data of volunteer from tme 0 untl tme t. A = RWA (3) Furthermore, the statc factors,.e., RAM sze and CPU cores, were combned together nto the calculaton of computaton power for a volunteer. The formula gven n Equaton (4) produces the rankng weght of RAM sze and number of processors of volunteer at tme t, where RWP s the rankng weght of CPU and RWR s the rankng weght of RAM sze. Both values have been derved from earler Equaton (2). CPU cores (WFP), weght fracton of RAM sze (WFR), weght fracton of combnaton of tme zone and locaton (WFE), and weght fracton of ACPUA (WFA) were all ncluded n the formula. F = WFP WFR WFE WFA (5) Consequently, after each of the values n fgure 1 have been derved usng Equaton (3), Equaton (4) and Equaton (5) respectvely, Equaton (6) combnes them nto a mathematcal hybrd equaton named as Avalablty Scorng Model (ASM). S = A F C (6) + + The model says that avalablty of volunteer at tme t s the weghted sum of dynamc factor A wth hybrd factor F and statc factor C. The model modfes our prelmnary model whch was publshed n Groupng Volunteers Once the avalablty scorng model has been formulated, each volunteer s avalablty can now be predcted. Hence, resource/volunteer groupng can organze volunteers wth smlar weght n terms of ther RAM sze, number of processors, ACPUA, locaton and tme zone nto the same group. Ths secton defnes the groupng approach used to group avalable hosts at tme t nto four groups accordng to the avalablty scorng model. In order to categorze the avalable volunteers, we adopted the two levels classfcaton approach whch was reported n 15. In the frst level, volunteers were classfed nto four classes accordng to F and C as shown n Fgure 2. The class FC1 s a set of volunteers that have hgh computaton power and hgh weght fracton, whle the class FC2 s a set of volunteers wth low computaton power and hgh weght fracton. Meanwhle, the class FC3 Computaton Power (C) C RWP RWR = (4) + The weght fracton of volunteer at tme t s another value that needs to be calculated. The value whch can be derved usng Equaton (5) comprses both the dynamc and statc factors of the volunteer. In other words, the equaton sums up all weght fractons whch have been calculated usng Equaton (1). Hence, weght fracton of Avalablty Weght (A) Weght Fracton (F) Fgure 1. Dynamc, statc and hybrd factors of avalablty scorng model n DG system. Vol 8 (30) November Indan Journal of Scence and Technology 3

4 Avalablty based Scorng Model for Resource Groupng n Desktop Grd Computng Weght Fracton (F) Avalablty Weght (A) FC2 FC1 AFC2 AFC1 FC4 FC3 AFC4 AFC3 Fgure 2. C. Computaton Power (C) Volunteers groupng accordng to F and Fgure 4. and C. Computng Power & Weght Fracton (FC) Volunteers groupng accordng to A, F s a set of volunteers whch have hgh computaton power and low weght fracton, and the class FC4 ncludes the volunteers wth low computaton power and low weght fracton. Fgure 3 shows the groupng algorthm for frst level of volunteers classfcaton. In the second level, the classfed volunteers are further organzed nto four groups accordng to dynamc factor A as shown n fgure 4. The AFC1 group keeps volunteers wth hgh avalablty fracton, hgh weght fracton and hgh computaton power. Therefore, ths Classfy n Level 1; If (F hgh) then If (C = hgh) then = FC1 Else = FC2 Else If (C = FC3 Else = FC4 //Fnsh level 1 hgh) then Fgure 3. Algorthm for volunteers groupng constructon accordng to F and C group s consdered as hgh relablty level group. The AFC2 group ncludes volunteers wth low computaton power, but hgh weght fracton and hgh avalablty weght. Hence, ths group s as medum relablty level group. Furthermore, the AFC3 group nvolves volunteers wth low avalablty weght and low weght fracton, but hgh computaton power. Ths group s called low relablty level group. Fnally, the AFC4 group comprses the volunteers wth low computaton power, low weght fracton and low avalablty weght. Therefore, ths group s least relable group and was addressed as very low relablty level group. The rankng of the groups can be seen n Table 1 where the second column shows the rank order of each group and the thrd column shows the rankng weght value (RW ) or relablty level of group. The rankng weght values were derved usng the same Equaton (2). The volunteer s groups were constructed usng the algorthm of groupng as shown n Fgure 5. In the algorthm, we can see that the volunteers are assgned nto ts group accordng to ther F, C and A scores whch can be ether hgh or low. Furthermore, hgh value and low value n groupng algorthm wll be derved by expermental results whch wll be dscussed n the next secton. 5. Result and Dscusson In order to carry out the volunteers groupng constructon algorthm n prevous Fgure 4, hgh, mddle and low values for average avalablty, weght fracton and computng power should be defned frst. Ths was accomplshed through smulaton and testng. In order to smulate the groupng algorthm, the hgh and low bounded values for A, F and C were ntally assumed to be the maxmum and mnmum values whch were derved from statstcal analyss that we have done n 4 Vol 8 (30) November Indan Journal of Scence and Technology

5 Soodeh Peyvand, Rohza Ahmad and M. Nordn Zakara Classfy n Level 2; If ( FC1) then If (A = hgh) then =AFC1 Else If ( If (A = low) then =AFC3 Else If ( If (A = hgh) then =AFC2 Else =AFC4 //Fnsh level 2 FC3) then FC2) then Fgure 5. Algorthm of volunteers groupng constructon accordng to A, F and C. our prelmnary works n 25,27. The avalablty weght was assumed to be zero due to the lack of avalablty hstory n the frst smulaton run. Based on our works reported n 25,27, the maxmum value of RWP s and the maxmum value of RWR s equal to Hence, usng Equaton (4), the maxmum value for C wll be (nearly 0.5). Moreover, maxmum values of WFR, WFP, WFA and WFE are respectvely: , , and Hence, maxmum value of F wll be when calculated usng Equaton (5). Table 1. Level) Rankng of Volunteer s Group (Relablty Group of Volunteer Rank Order Rank Weght (RW ) AFC AFC AFC AFC Furthermore, the groupng algorthm also assumed that 1.9 s the upper bound and a value whch s above zero but lesser than the upper bound as the lower bound of F. As for C, the upper bound s 0.5 and the lower bound should be above zero but lesser than the upper bound. Then, mddle bound values for F and C should be extracted from smulaton results. The reason for ths s to fnd the most balanced dstrbuton of groups. Table 2 dsplays dfferent testng results of groupng by varous mddle bound values. These testng assumed dfferent number of avalable volunteers at tme t. In the frst smulaton run,.e., case 1, 99 volunteers were assumed to be avalable n the DG system, whle n the second run,.e., case 2, 3997 volunteers were assumed avalable. As for the thrd run,.e., case 3, volunteers were consdered. In order to dscover the best mddle bound values for F and C, the rato of volunteer s number n each group should be almost balanced n all of the dfferent cases. For the smulaton, the testng started wth the ntal mddle bound for F and C to be the mddle value between ther upper and lower bound. The next subsequent tests decreased the mddle bound of ether one of F or C untl one of the group was assgned zero host. In ths case, another run wll be done wth mddle of C s decreased. Ths s to see f a dfferent value can make the group nonzero. If the result s stll the same then the smulaton wll stop. Then, we wll check backwards for the most balanced groupng assgnment. Ths wll help us to choose F and C values to fnd mddle bound for both. Therefore, across three cases of smulaton, the mddle bound values for F and C were derved through 9 tests n case 1, 8 tests for case 2 and case 3. All cases stopped decreasng the mddle value of F when the value was 0.4; also check the dfferent two values for mddle bounded of C (0.2 and 0.15). Therefore, the best results are derved by 0.15 for mddle bound of C and 0.6 as mddle bound of F n term of balanced groups. At ths tme, groupng s algorthm s fnalzed n order to organze volunteers n sutable group. 6. Concluson and Future Works Ths paper shows the scored-based-groupng algorthm whch uses the mathematcal avalablty scorng model based on certan factors of volunteers n order to classfy them nto four dfferent groups. These groups have dfferent relablty level. Usng smulatons, t can be seen that a Vol 8 (30) November Indan Journal of Scence and Technology 5

6 Avalablty based Scorng Model for Resource Groupng n Desktop Grd Computng Table 2. Groupng results by dfferent values of lmtaton for F and C crtera Mddle bound of F Mddle bound of C AFC1 AFC2 AFC3 AFC4 Case1 Test Test Test Test Test Test Test Test Test Case2 Test Test Test Test Test Test Test Test Case3 Test Test Test Test Test Test Test Test balanced dstrbuton of volunteers can be acheved usng the scorng model and the groupng approach proposed. As future work, the relablty level groupng wll be used n order to propose a relable-job-schedulng polcy n DG envronment to mprove performance of the DG systems. 7. References 1. Balaton Z, Farkas Z, Gombas G, Kacsuk P, Lovas R, Maros AC, et al. EDGeS: the common boundary between servce and desktop grds. Parallel Processng Letters. 2008; 18(03): Fedak G, German C, Ner V, Cappello F. Xtremweb: A generc global computng system Proceedngs Frst IEEE/ACM Internatonal Symposum on Cluster Computng and the Grd; IEEE; Sarmenta LF. Volunteer computng: Cteseer; Kondo D, Taufer M, Brooks CL, Casanova H, Chen AA. Characterzng and evaluatng desktop grds: An emprcal study. 18th Internatonal IEEE Proceedngs of Parallel and Dstrbuted Processng Symposum; Desell T, Magdon-Ismal M, Szymansk B, Varela CA, Newberg H, Anderson DP. Valdatng evolutonary algorthms on volunteer computng grds. Dstrbuted applcatons and nteroperable systems, Sprnger Kan W, Masaru F, Mchtaka K. Adaptve Group-Based Job Schedulng for Hgh Performance and Relable Volunteer Computng. 2011; 52(2): Cho S, Km H, Byun E, Hwang C. A taxonomy of desktop grd systems focusng on schedulng. Department of Computer Scence and Engeerng, Korea Unversty, Tech Rep KU-CSE Foster I, Kesselman C. The Grd 2: Blueprnt for a new computng nfrastructure. Elsever; Vol 8 (30) November Indan Journal of Scence and Technology

7 Soodeh Peyvand, Rohza Ahmad and M. Nordn Zakara 9. Kondo D, Fedak G, Cappello F, Chen AA, Casanova H. Characterzng resource avalablty n enterprse desktop grds. Future Generaton Computer Systems. 2007; 23(7): Heen EM, Anderson DP, Haghara K. Computng low latency batches wth unrelable workers n volunteer computng envronments. J Grd Computng. 2009; 7(4): Lerda JL, Solsona F, Hernandez P, Gne F, Hanzch M, Conde J. State-based predctons wth self-correcton on Enterprse Desktop Grd envronments. Journal of Parallel and Dstrbuted Computng. 2013; 73(6): Pham H. Recent studes n software relablty engneerng. Sprnger; Cho S, Bak M, Gl J, Park C, Jung S, Hwang C. Dynamc schedulng mechansm for result certfcaton n peer to peer grd computng. Grd and Cooperatve Computng- GCC 2005: Sprnger; p Cho S, Bak M, Gl J, Jung S, Hwang C. Adaptve group schedulng mechansm usng moble agents n peer-to-peer grd computng envronment. Appled Intellgence. 2006; 25(2): Cho S, Bak M, Gl J, Park C, Jung S, Hwang C. Groupbased dynamc computatonal replcaton mechansm n peer-to-peer grd computng CCGRID 06 Sxth IEEE Internatonal Symposum on Cluster Computng and the Grd; IEEE; Venugopal S, Buyya R, Ramamohanarao K. A taxonomy of data grds for dstrbuted data sharng, management, and processng. ACM Computng Surveys (CSUR). 2006; 38(1): Anderson DP. Bonc: A system for publc-resource computng and storage Proceedngs Ffth IEEE/ACM Internatonal Workshop on Grd Computng; IEEE; Sarmenta LF, Hrano S. Bayanhan: Buldng and studyng web-based volunteer computng systems usng Java. Future Generaton Computer Systems. 1999; 15(5): Nsan N, London S, Regev O, Camel N. Globally dstrbuted computaton over the nternet-the popcorn project IEEE Proceedngs 18th Internatonal Conference on Dstrbuted Computng Systems; Neary MO, Chrstansen BO, Cappello P, Schauser KE. Javeln: Parallel computng on the nternet. Future Generaton Computer Systems. 1999; 15(5): Chakravart AJ, Baumgartner G, Laura M. The organc grd: self-organzng computaton on a peer-to-peer network. IEEE Transactons on Systems, Man and Cybernetcs, Part A: Systems and Humans. 2005; 35(3): Zhou D, Lo V. Cluster computng on the fly: resource dscovery n a cycle sharng peer-to-peer system CCGrd 2004 IEEE Internatonal Symposum on Cluster Computng and the Grd; IEEE; Montresor A, Melng H, Babaoglu O. Messor: Loadbalancng through a swarm of autonomous agents. Agents and Peer-to-Peer Computng: Sprnger; p Zhong L, Wen D, Mng ZW, Peng Z. Paradropper: A general-purpose global computng envronment bult on peer-to-peer overlay network IEEE Proceedngs 23rd Internatonal Conference on Dstrbuted Computng Systems Workshops; Peyvand S, Ahmad R, Zakara MN. Assocaton among ndependent and dependent factors of host n volunteer computng IEEE Internatonal Conference on Computer and Informaton Scences (ICCOINS); Wllam H, Kruskal WAW. Use of ranks n one-crteron varance analyss. Journal of the Amercan Statstcal Assocaton. 1952; 47(260): Peyvand S, Ahmad R, Zakara Mn. Scorng model for avalablty of volatle hosts n volunteer computng envronment. Journal of Theoretcal and Appled Informaton Technology. 2014; 70(2). Vol 8 (30) November Indan Journal of Scence and Technology 7