Micro-Virtualization. Maximize processing power use and improve system/energy efficiency

Size: px

Start display at page:

Download "Micro-Virtualization. Maximize processing power use and improve system/energy efficiency"

Clyde Gibson
5 years ago
Views:

1 Micro-Virtualization Maximize processing power use and improve system/energy efficiency

2 Disclaimers We don t know everything But we know there is a problem and we re solving (at least part of) it And we are willing to take input! 2

3 What we do Increase Throughput Ensure Service Level Agreements Enhance Reliability Improve Efficiency Allow you to steer system behaviors 3

4 What we don t Accelerate processor or memory speeds (or anything else) Make you re-write applications 4

5 Age of concurrency Tablets Mobile devices Servers Dilemma: How to safely run many apps? 5

6 More & more cores! Xeon MIC Xeon Phi ARM nvidia Calxeda 6

7 Parallelism & Resource Management Parallelism alone can t solve the problem Most parallelized apps will use only a fraction of system processing resources Better OS level resource management necessary to improve fitting many concurrent tasks 7

8 Macro Resource Management Macro-schedulers When jobs run Where jobs run (which node) Predicated by 20 years of trial & error No OS level control (timings, conflicts, priorities) 8

9 Micro Resource Management Micro-scheduler manages the cluster-in-a-box How jobs run within an OS instance Real-time need based scheduling to cores & memory Prevent MT apps from monopolizing resources Fine-grained control over timings, conflicts, priorities 9

10 Efficiency? User resource hints inject inefficiency Leave headroom to buffer problems User/app conflict avoidance Isolate, leave resources underused No effective kernel level prioritization Isolate, leave resources underused Large job/small job mix can create problems Don t mix jobs, leave resources underused OOM-Killer Lose work (server crash may be best case ) 10

11 Micro-Virtualization Maximize System utilization and throughput via new kernel resource management intelligence Lightweight real-time application/system monitoring & control Safely load system, more work in less time Prioritized by organization objectives Non-intrusive Efficient 11

12 Micro-Virtualization Product Suite Micro-Virtualization TM Intelligent Resource Management MCOPt Multicore Manager Dynamic Project Containers JobProfiler Optimize Throughput Analyze conditions Synchronize allocations Prevent core and memory oversubscription Prioritize work based on individual job value Reduce job runtimes Improve reliability Better energy efficiency Guarantee Service Levels Policy based core and memory project shares Dynamic resource allocations Maximize utilization Prevent ill-behaving tasks from degrading performance of other tasks Resource Reporting Fine-grained resource consumption data Plot actual memory profiles for each task Identify each file accessed by each task Improve Compliance reporting 12

13 Easy Implementation Workload Manager (LSF/GE/PBS etc) Cluster Management Environment Variable inserted No changes needed to any existing resource hints Applications Processing node MCOpt kernel module Operating System Simple, low-risk implementation Loadable module easily installed Transparent to applications/os/wlms; no changes needed Little administrative learning curve 13

14 Dynamic Project Containers Policy based system shares = Guaranteed SLAs Default 100% share Total Processing capacity Cores & memory become a central resource pool, dynamically allocated when a defined project enters the system. Default 65% share Project A 10% share Project B 25% share Important projects predictably get the resources needed. Default 15% share Project A 10% share Project B 25% share Project C 50% share No hypervisor overhead or complex administration. 14

Re-runs will exhibit different penalty behaviors Behaving jobs each penalized differently Only ill-behaved job negatively

15 User SLAs 4 jobs executing on a 16GB system, expected 25% memory usage per job 1 job exceeds expected memory usage Linux Dynamic Project Containers All jobs negatively impacted (30-50% degradation) Behaving jobs each penalized differently Re-runs will exhibit different penalty behaviors Behaving jobs each penalized differently Only ill-behaved job negatively impacted (but actually Does better than under Linux as less contention for swap) Re-runs will exhibit repeatable, predictable 15

16 Profiling Drill-down Visualization Tools 16

17 Optimized Resource Allocation MCOpt TM Multicore Manager Dynamic resource synchronization based on current conditions. Synchronizes resources Prevents congestion 17

Manufacturing EDA Scientific Oil & Gas Life Sciences Time B E T T

18 Faster Results System throughput gains from 20-50% based on workload requirements Before MCOpt TM After MCOpt TM Ex: Manufacturing EDA Scientific Oil & Gas Life Sciences Time B E T T E R Each color represents one of eleven jobs, white spaces represent unused resource.

19 Memory Problem Aggregate memory of all jobs Per job memory used Running 3 jobs concurrently results in thrashing, even though the steady state memory use would fit nicely. Static resource hints can t help as they can t map a changing memory usage profile. 19

20 Memory Optimization MCOPt recognizes when a job uses less than its allocated memory and puts that memory to effective use. Effectively, These 3 jobs end up being staggered so they fit within the available resource. 20

21 Memory Management Automate timing, fit and placement. Control Linux LRU prioritization. Steer oom-killer behaviours to protect high priority work. 21

22 Job Prioritization Preferential Resource Allocation (prioritize mission-critical processing) Maintain WLM Queue Priority (4) Jobs of varying priorities running in the system, (1) low priority job currently suspended. System Primary (running) jobs Slot1 Slot2 Slot3 Slot4 P15 P10 P5 P1 P1 MCOpt queued (suspended) jobs 22

23 Job Prioritization Preferential Resource Allocation (prioritize mission-critical processing) Maintain WLM Queue Priority Higher priority job (P10) enters System, forces suspension & takes resources of lower priority job (P1). P10 System Primary (running) jobs Slot1 Slot2 Slot3 Slot4 P15 P10 P5 P1 P1 MCOpt queued (suspended) jobs 23

Job Prioritization Preferential Resource Allocation (prioritize mission-critical processing) Maintain WLM Queue Priority System Primary (running) jobs Slot1 Slot2

24 Job Prioritization Preferential Resource Allocation (prioritize mission-critical processing) Maintain WLM Queue Priority System Primary (running) jobs Slot1 Slot2 Slot3 Slot4 P15 P10 P5 P10 P1 Lower priority job suspended and pushed into MCOpt hold queue until resources become re-available to it. P1 MCOpt queued (suspended) jobs 24

25 Job Exclusivity Run critical jobs NOW! (without killing other jobs) (3) Jobs running in the system, (1) processing slot becomes available System Primary (running) jobs Slot1 Slot2 Slot3 Slot4 P10 P10 P10 (Available) MCOpt queued (suspended) jobs 25

26 Job Exclusivity Run critical jobs NOW! (without killing other jobs) Exclusive Exclusive job enters System, forces suspension & takes resources of all currently executing jobs (strongest form, other options available when combined with Priority settings). System Primary (running) jobs Slot1 Slot2 Slot3 Slot4 P10 P10 P10 (Available) MCOpt queued (suspended) jobs 26

27 Job Exclusivity Run critical jobs NOW! (without killing other jobs) Exclusive job enters System, forces suspension & takes resources of all currently executing jobs (strongest form, other options available when combined with Priority settings). System Primary (running) jobs Slot1 Slot2 Slot3 Slot4 Exclusive Exclusive Exclusive Exclusive P10 P10 P10 MCOpt queued (suspended) jobs 27

28 EDA Mixed Workload Performance Results ~30% Throughput Gain w/o taking advantage of backfilling/core scavenging 28

29 Parallel Throughput MCOpt v. Linux Parallel jobs throughput significantly improved v. Linux MCOpt eliminates the risk of resource oversubscription problems which can manifest under Linux #Concurrent Jobs/Avg Time Model One Two Three Four car full (Linux) 9:45:28 14:15:13 car full (MCOpt) 9:24 11:49:25 Delta (MCOpt benefit) 4% 21% car reduced (Linux) 1:27:27 1:44:51 1:49:54 3:25:00 car reduced (MCOpt) 1:10:18 1:01:33 1:16:17 2:47:37 Delta (MCOptbenefit) 24% 70% 44% 22% Mid-sized model (Linux) 6:16:36 8:44:38 Mid-sized model (MCOpt) 6:21:33 6:16:36 Delta (MCOptbenefit) -1% 33% Notes: 1.Dark shaded boxes indicate the specific configuration was not tested. 2.Regarding the full car model, more than 2 concurrent instances would saturate the disk subsystem, which would then became the limiting factor in any test. 3.Regarding the reduced car model, with 4 concurrent instances running the disk subsystem became saturated, which limited the measured gains between the Linux and MCOpt runs. 29

30 What we don t Accelerate processor or memory speeds (or anything else) Make you re-write applications 30

31 What we do Increase Throughput Ensure Service Level Agreements Enhance Reliability Improve Efficiency Allow you to steer system behaviors 31

32 For more information exludus Technologies (514) x21 32