Performance Evaluation of Selected Job Management Systems

Size: px
Start display at page:

Download "Performance Evaluation of Selected Job Management Systems"

Transcription

1 Kris Gaj 1, Tarek El-Ghazawi 2, Nikitas Alexandridis 2 Frederic Vroman, 2 Nguyen Nguyen, 1 Jacek R. Radzikowski 1, Preeyapong Samipagdi 2, andsuboha.suboh 2 1 George Mason University, 2 The George Washington University kgaj@gmu.edu, tarek@seas.gwu.edu, alexan@seas.gwu.edu Performance Evaluation of Selected Job Management Systems

2 Our Testbed science.gmu.edu gmu.edu alicja Solaris 8 UltraSparcIIi, 360 MHz, 512 MB RAM palpc2 Linux PII, 400 MHz, 128 MB RAM pallj / m0 Linux RH7.0 PIII 450 MHz, 512 MB RAM anna Solaris 8 UltraSparcIIi, 440 MHz, 128 MB RAM magdalena Solaris 8 UltraSparcIIi, 440 MHz, 512 MB RAM m1 m4 m5 m7 4 x Linux RH6.2 2xPIII 500 MHz, 128MB 3 x Linux RH6.2 2xPIII 450 MHz, 128MB redfox Solaris 8 UltraSparcIIi, 330 MHz, 128 MB RAM

3 NSA HPC Benchmarks Classes of benchmarks NAS Parallel Benchmarks UPC Benchmarks Cryptographic Benchmarks

4 Typical experiment Job submissions Pseudorandom delays between consecutive job submissions 1 N Poisson distribution of the job submission rate N= Jobs finishing execution 150 for medium and small jobs 75 for long jobs =0 i 1 i N Total of an experiment 2 hours

5 List of experiments Experiment Number Benchmark Set 1 Medium job list 2 Medium job list 3 Long job list 4 Short job list 5 I/O job list Average CPU / Job Average Time Intervals Between Job Submissions Total Number of Jobs Special Assumptions 7 min 22 s 30 s, 15 s, 5 s 150 one job / CPU 7 min 22 s 15 s 150 two jobs / CPU 16 min 51 s 2 min, 30 s 75 one job / CPU 22 s 15 s, 10 s, 5 s 150 one job / CPU 2s, 1s 1 min 36 s 15 s 150 one job / CPU

6 Definition of timing parameters t s submission t b begin of execution t e end of execution t d delivery T R response T EXE execution T D delivery T TA turn around

7 determined using the getofday() function Typical scenario t s submission t b begin of execution t e end of execution T R response T EXE execution T D =0 delivery =0 T TA turn around

8

9 Throughput Job submissions 1 N Jobs finishing execution Throughput (k) = k T k =0 i 1 i k i N T k necessary to execute k jobs

10 Total Throughput Job submissions 1 N Jobs finishing execution Total Throughput = N T N =0 i 1 i N T N necessary to execute N jobs

11

12

13

14 machine 1 CPU utilization Utilization U avr 1 average CPU utilization 100% job1 job2 job3 0% CPU utilization machine 2 job2 U avr 2 average CPU utilization 100% job CPU utilization machine M job1 U avr M job3 Overall utilization = average CPU utilization job2 U = 1 M M j= 1 avr U j 0% 100% 0%

15 Software used to collect performance measures Submission script Job submission program submission job Job Management System job beginning of execution top list of jobs average interval between jobs end of execution timing log files Timing postprocessing program top log files Utilization postprocessing program timing parameters utilization parameters

16 Medium jobs Average Throughput Throughput [jobs/hour] 140 LSF 120 PBS Codine Condor jobs/min 4 jobs/min 12 jobs/min Average job submission rate

17 Turn-around Time [s] 2500 LSF PBS 2000 Codine Condor Medium jobs Turn-around Time jobs/min 4 jobs/min 12 jobs/min Average job submission rate

18 Throughput [jobs/hour] 45 LSF 40 PBS 35 Codine Condor Long jobs Average Throughput job/min 2 jobs/min Average job submission rate 27 38

19 Turn-around Time [s] 4000 LSF 3500 PBS Codine 3000 Condor Long jobs Turn-around Time job/min 2 jobs/min Average job submission rate

20 Throughput [jobs/hour] Short jobs Average Throughput LSF PBS Codine Condor jobs/min 6 jobs/min 12 jobs/min 30 jobs/min 60 jobs/min Average job submission rate

21 Turn-around Time [s] Short jobs Turn-around Time LSF PBS Codine Condor jobs/min 6 jobs/min 12 jobs/min 30 jobs/min 60 jobs/min Average job submission rate

22 LSF Small jobs, 60 jobs/min

23 PBS Small jobs, 60 jobs/min

24 Codine Small jobs, 60 jobs/min

25 Condor Small jobs, 60 jobs/min

26 Medium jobs Total Throughput Throughput [jobs/hour] LSF PBS Codine Condor jobs/cpu, 4 jobs/min 1 job/cpu, 4 jobs/min Maximum number of jobs per CPU

27 Medium jobs Response Time Response Time [s] LSF PBS Codine Condor jobs/cpu, 4 jobs/min 1 job/cpu, 4 jobs/min Maximum number of jobs per CPU

28 Throughput - Summary Submission rate Size of jobs Large Medium Small B: W: B: Low LSF, PBS Codine (-14%) W: Condor (-21%) B: W: LSF, Codine Condor (-57%) Medium B: Condor, LSF, PBS W: Codine (-29%) W: Codine (-11%) B: LSF, Codine High LSF, PBS LSF, PBS B: PBS, LSF B: B: BEST W: Condor (-69%) W: WORST W: Codine (-17%) B: LSF, Codine W: Condor (-71%)

29 Turn-around Time - Summary Submission rate Low Medium High Size of jobs Large B: PBS, LSF W: Condor, Codine (+78%) B: PBS, LSF W: Codine (+57%) Medium Small B: W: B: W: PBS Codine (+31%) Codine Condor (+72%) B: PBS W: Codine (+37%) B: Codine W: PBS (+119%) B: W: Codine (+33%) B: PBS Codine W: LSF (+275%) B: BEST W: WORST

30 Summary The best performance in the experimental study In terms of the throughput LSF the best for all kind of jobs and submission rates In terms of the turn-around PBS the best for large and medium jobs Codine the best for small jobs The worst performance in the experimental study In terms of the throughput Codine the worst for medium and large jobs Condor the worst for small jobs In terms of the turn-around Codine the worst for medium and large jobs LSF the worst for small jobs and high submission rates