Resource management in IaaS cloud. Fabien Hermenier
|
|
- Job Washington
- 5 years ago
- Views:
Transcription
1 Resource management in IaaS cloud Fabien Hermenier
2 where to place the VMs? how much to allocate
3 1What can a resource manager do for you 3
4 2 Models of VM scheduler 4
5 3Implementing VM schedulers 5
6 1What can a resource manager do for you 6
7 central piece of code provide the awaited Quality of Service address the management objectives
8 the RM should solve that VM6? N1 VM2 VM1 N2 VM3 VM4 VM5 N3 N4
9 the RM should prevent that VM2 VM1 VM3 N1 N2 VM6 VM4 VM5 N3 N4
10 Service Level Agreement a contract where a service is formally defined performance indicator MTTR MTBF availability (MTBF/MTTR + MTBF) pricing penalties
11 (KPI)? uptime as the only Key Performance Indicator
12 Cloud Performance SLAs What customers want guaranteed capacities (CPU, ory, bw) guaranteed latencies/throughput (network, load balancer, disk) What providers give -
13
14 the objective provider side min(x) or max(x)
15 atomic objectives min(penalties) Ownership) min(total Cost min(unbalance)
16 composite objectives using weights min(αx + β y) How to estimate coefficients? useful to model sth. you don t understand? min(α TCO + β VIOLATIONS) as a common quantifier: max(revenues)
17 ? Should a client ask for an objective
18 ? Should a client ask for an objective no if the provider cannot prove it is achieved one can check the latency is below a threshold proving a latency is minimal
19 SLAs and objectives are evolving ynamic Power Management (DRS 3.1) 2009? VM-VM affinity (DRS) 2010? Dedicated instances (EC2) mar apr VM-host affinity (DRS 4.1) there is no holy grail MaxVMsPerServer (DRS 5.1) The constraint needed in 2012 sep. 2012?? 2013
20 2 Models of VM schedulers 20
21 VM queue VM scheduler decisions actuators cloud model monitoring data
22 scheduling models static dynamic static dynamic schedulers resource allocation
23 static schedulers VM1 VM2 VM3 N1 N2 VM6 VM4 VM5 N3 N4
24 static schedulers VM2 VM6 VM3 VM1 N1 N2 VM4 VM5 N3 N4
25 static schedulers VM2 VM6 VM3 VM1 N1 N2 VM7 VM4 VM5 N3 N4
26 Combinatorial problem! past decisions impact the future ones
27 static schedulers VM1 buffering might help N1 VM2 N2 VM3 VM6 VM5 VM4 N3 N4
28 static schedulers VM1 buffering might help N1 VM2 N2 VM3 VM6 VM5 VM4 N3 N4
29 static schedulers VM1 buffering might help N1 VM2 N2 VM3 VM7 VM6 VM4 VM5 N3 N4
30 static schedulers VM1 VM7 buffering might help N1 VM2 N2 VM3 VM6 VM4 VM5 N3 N4
31 static schedulers VM1 VM7 buffering might help N1 VM2 N2 VM3 VM6 VM5 VM4 /!\ incoming rate starvation N3 N4
32 static schedulers manipulate the VM queue only react on VM arrival
33 pros static schedulers easy to model simple standard actions manage a few elements
34 static schedulers cons hard to perform fine grain optimisation
35 dynamic schedulers VM1 consider every VMs VM2 VM3 N1 N2 VM6 VM4 VM5 N3 N4
36 dynamic schedulers consider every VMs VM2 VM6 VM3 VM1 N1 N2 VM4 VM5 N3 N4
37 dynamic schedulers consider every VMs VM2 VM6 VM3 VM1 N1 N2 VM7 VM4 VM5 N3 N4
38 dynamic schedulers VM1 consider every VMs VM2 VM3 N1 N2 VM7 VM6 VM4 VM5 N3 N4
39 dynamic schedulers VM1 VM7 consider every VMs VM2 VM3 N1 N2 VM6 VM4 VM5 N3 N4
40 dynamic schedulers manipulate the VM queue reconfigure the schedule with migrations
41 dynamic schedulers a migration takes time
42 dynamic schedulers a migration impacts performance
43 dynamic schedulers dependency management VM1 VM2 VM4? VM2 VM6 N1 N2 N1 N2 VM6 VM4 VM1 N3 VM3 N4 VM5 N3 VM3 N4 VM5
44 dynamic schedulers dependency management VM4 VM1 VM2 N1 N2 VM6 N3 VM3 N4 VM5
45 dynamic schedulers dependency management VM1 VM2 N1 N2 VM4 VM6 N3 VM3 N4 VM5
46 dynamic schedulers dependency management VM6 VM1 VM2 N1 N2 VM4 N3 VM3 N4 VM5
47 dynamic schedulers dependency management VM6 VM2 N1 N2 VM4 VM1 N3 VM3 N4 VM5
48 dynamic schedulers cyclic dependencies anti-affinity(vm3,vm4) min(#onlinenodes) anti-affinity(vm3,vm4) min(#onlinenodes)? N1 N1 VM4 VM1 VM5 VM1 VM3 VM5 VM3 VM4 N3 N4 N3 N4
49 dynamic schedulers cyclic dependencies a pivot to break the cycle N1 fix or prevent the situation? VM4 VM1 VM3 VM5 N3 N4
50 dynamic schedulers cyclic dependencies N1 VM4 a pivot to break the cycle fix or prevent the situation? VM1 VM3 VM5 N3 N4
51 dynamic schedulers cyclic dependencies N1 VM4 a pivot to break the cycle fix or prevent the situation? VM5 VM1 VM3 N3 N4
52 dynamic schedulers cyclic dependencies a pivot to break the cycle N1 fix or prevent the situation? VM5 VM1 VM3 VM4 N3 N4
53 dynamic schedulers quality at a price VM4 VM1 VM2 N1 N2 VM6 N3 VM3 N4 VM5 min(#onlinenodes) = 3
54 dynamic schedulers quality at a price VM1 VM2 N1 N2 VM4 VM6 N3 VM3 N4 VM5 min(#onlinenodes) = 3
55 dynamic schedulers quality at a price VM6 VM1 VM2 N1 N2 VM4 N3 VM3 N4 VM5 min(#onlinenodes) = 3
56 dynamic schedulers quality at a price VM6 VM2 N1 N2 VM4 VM1 N3 VM3 N4 VM5 min(#onlinenodes) = 3
57 dynamic schedulers quality at a price VM6 sol #1: 1m,1m,2m VM2 N1 N2 VM4 VM1 N3 VM3 N4 VM5 min(#onlinenodes) = 3
58 dynamic schedulers quality at a price VM4 sol #1: 1m,1m,2m VM1 VM2 N1 N2 VM6 N3 VM3 N4 VM5 min(#onlinenodes) = 3
59 dynamic schedulers quality at a price VM4 sol #1: 1m,1m,2m VM1 VM2 N1 N2 VM5 VM6 VM3 N3 N4 min(#onlinenodes) = 3
60 dynamic schedulers quality at a price VM4 VM6 sol #1: 1m,1m,2m VM1 VM2 N1 N2 VM5 VM3 N3 N4 min(#onlinenodes) = 3
61 dynamic schedulers quality at a price VM4 VM6 sol #1: 1m,1m,2m N1 VM1 N2 VM2 sol #2: 1m,1m 1m VM5 VM3 N3 N4 min(#onlinenodes) = 3
62 dynamic schedulers quality at a price VM4 VM6 sol #1: 1m,1m,2m N1 VM1 N2 VM2 sol #2: 1m,1m 1m VM5 lower MTTR (faster) VM3 N3 N4 min(#onlinenodes) = 3
63 dynamic schedulers quality at a price the objective should reflect reconfiguration costs min(mttr), min(#migrations),
64 dynamic schedulers pros continuous optimisation through reconfiguration
65 dynamic schedulers cons harder to model harder to scale technically expensive costly reconfiguration
66 ? dynamic scheduling for the win in theory yes but that s theory benefits depends on the workload the objective/ SLAs the infrastructure dynamic scheduling is rare in public or large clouds
67 scheduling models static dynamic schedulers static dynamic resource allocation
68 static resource allocation VM3, ram, i/o, bandwidth allocated once for all allocation!= utilisation VM2 N1 VM1 p no sharing conservative allocation
69 static resource allocation sharing (overbooking) VM3 performance loss with concurrent accesses VM2 VM1 acceptable if stated in the SLA N1 p
70 dynamic resource allocation to fix violations VM3 VM2, ram, i/o, bandwidth allocated can be revised VM1 N1 p
71 dynamic resource allocation to fix violations VM2, ram, i/o, bandwidth allocated can be revised VM1 N1 p
72 dynamic resource allocation to fix violations VM2, ram, i/o, bandwidth allocated can be revised VM1 N1 p
73 dynamic resource allocation to support vertical elasticity VM3 VM2 VM1 N1 p
74 dynamic resource allocation to support vertical elasticity VM3 VM2 VM1 N1 p
75 dynamic resource allocation to support vertical elasticity VM2 VM1 N1 p
76 dynamic resource allocation to support vertical elasticity VM2 VM2 VM1 N1 p
77 ? dynamic resource allocation for the win common on CPU + overbooking (inc. hosting capacity) exceptional for ory (huge performance loss) benefits depends on the workload the objective/ SLAs the infrastructure
78 RECAP 50
79 The VM scheduler makes cloud benefits real 51
80 no holy grail 52
81 think about what is costly 53
82 static scheduling for a peaceful life 54
83 dynamic scheduling to cease the day 55
84 with great power comes great responsibility 56
85 3 Coding a VM scheduler
86
87 VM scheduling is hard static or dynamic scheduler allocation? does the workload/sla/objective requires migration? maximum duration to schedule?
88 VM scheduling is NP-Hard issues with large infrastructures or hard problems fast adhoc heuristics despite corner cases like him some use biased complete approaches (linear programming, constraint programming)
89 vector packing problem N1 VM3 VM1 items with a finite volume to place inside finite bins a generalisation of the bin packing problem N2 VM4 VM2 the basic to model the infra. 1 dimension = 1 resource
90 ? Which resource can be modeled as a packing dimension
91 ? Which resource can be modeled as a packing dimension CPU, ory, disk IO, licences cardinality, network boundaries, end to end network
92 how to support migrations temporary, resources are used on the source and the destination nodes
93 how to support migrations a simple way VM4 n-phases vector packing N1 VM1 N2 VM2 VM duplication between 2 phases to simulate the migration VM6 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions
94 how to support migrations a simple way VM4 n-phases vector packing N1 VM1 N2 VM2 VM duplication between 2 phases to simulate the migration VM4 VM6 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions
95 how to support migrations a simple way n-phases vector packing N1 VM1 N2 VM2 VM duplication between 2 phases to simulate the migration VM4 VM6 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions
96 how to support migrations a simple way VM6 n-phases vector packing N1 VM1 N2 VM2 VM duplication between 2 phases to simulate the migration VM4 VM6 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions
97 how to support migrations a simple way VM6 n-phases vector packing N1 VM1 N2 VM2 VM duplication between 2 phases to simulate the migration VM4 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions
98 how to support migrations a simple way VM6 n-phases vector packing N1 VM1 N2 VM2 VM duplication between 2 phases to simulate the migration VM4 VM1 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions
99 how to support migrations a simple way VM6 n-phases vector packing N1 N2 VM2 VM duplication between 2 phases to simulate the migration VM4 VM1 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions
100 how to support migrations the alternative way N1 N3 VM1 VM7 VM3 N2 N4 VM2 VM5 VM6 VM4 N5 offline(n2) + no CPU sharing 1-phase + compute dependencies step by steps /!\ ok to implement cycles #migrations
101 how to support migrations the alternative way VM6 VM7 VM4 VM1 VM5 VM3 VM2 N1 N4 N3 N2 N5
102 how to support migrations the alternative way VM5 VM6 VM7 VM1 VM3 N1 N4 N3 N2 VM4 VM2 1) migrate VM2, migrate VM4, migrate VM5 N5
103 how to support migrations the alternative way VM5 VM6 VM1 VM7 VM3 N1 N4 N3 N2 VM4 N5 VM2 1) migrate VM2, migrate VM4, migrate VM5 2) shutdown(n2), migrate VM7
104 coarse grain staging delay actions mig(vm2) mig(vm4) mig(vm5) off(n2) mig(vm7) stage1 stage2 time
105 Resource-Constrained Project Scheduling Problem the clean way to model space and time N1 N2 N3 N4 N5 VM1 VM2 VM4 VM7 VM3 VM6 VM off VM5 VM1 VM3 VM6 VM7 VM2 VM4 time
106 Resource-Constrained Project Scheduling Problem the clean way to model space and time N1 N2 N3 N4 N5 VM1 VM2 VM4 VM7 VM3 VM6 VM off VM5 VM1 VM3 VM6 VM7 VM2 VM4 time
107 Resource-Constrained Project Scheduling Problem the clean way to model space and time N1 N2 N3 N4 N5 VM1 VM2 VM4 VM7 VM3 VM6 VM off VM5 VM1 VM3 VM6 VM7 VM2 VM4 time
108 Resource-Constrained Project Scheduling Problem the clean way to model space and time N1 N2 N3 N4 N5 VM1 VM2 VM4 VM7 VM3 VM6 VM off VM5 VM1 VM3 VM6 VM7 VM2 VM4 time
109 Resource-Constrained Project Scheduling Problem the clean way to model space and time N1 N2 N3 N4 N5 VM1 VM2 VM4 VM7 VM3 VM6 VM off VM5 VM1 VM3 VM6 VM7 VM2 VM4 time
110 Resource-Constrained Project Scheduling Problem the clean way to model space and time N1 N2 N3 N4 N5 VM1 VM2 VM4 VM7 VM3 VM6 VM off VM5 VM1 VM3 VM6 VM7 VM2 VM4 time
111 Resource-Constrained Project Scheduling Problem the pure way to support migrations 1 resource per (node x dimension), bounded capacity tasks to model the VM lifecycle. height to model a consumption width to model a duration at any moment, the cumulative task consumption on a resource cannot exceed its capacity comfortable to express continuous optimisation very hard to implement (properly)
112 From a theoretical schedule to a practical one duration may be longer convert to an event based schedule 0:3 - migrate VM4 0:3 - migrate VM5 0:4 - migrate VM2 3:8 - migrate VM7 4:8 - shutdown(n2) - : migrate VM4 - : migrate VM5 - : migrate VM2!migrate(VM2) &!migrate(vm4): shutdown(n2)!migrate(vm5): migrate VM7
113 Back to vector packing based approaches
114 Fit Fit Decrease (FFD) basic VM scheduling sort VMs in desc order for each VM pick the first suitable node
115 Fit Fit Decrease (FFD) basic VM scheduling sort VMs in desc order for each VM pick the first suitable node difficult VMs first!
116 Fit Fit Decrease (FFD) basic VM scheduling sort VMs in desc order for each VM pick the first suitable node difficult VMs first! enough free resources
117 Fit Fit Decrease (FFD) V4 V3 V2 V1 example N1 N2 N3 N4
118 Fit Fit Decrease (FFD) V4 V1 V3 V2 example N1 N2 N3 N4
119 Fit Fit Decrease (FFD) V4 V1 V2 V3 example N1 N2 N3 N4
120 Fit Fit Decrease (FFD) V1 V2 V3 V4 example N1 N2 N3 N4
121 Fit Fit Decrease (FFD) example V1 V2 V4 V3 N1 N2 N3 N4
122 multi-dimension sorting? V4 V3 V1 V2 easy, 1 dimension is varying V4 V1 V3 easy, uniform variation V4 V1 V3?
123 multi-dimension sorting? V4 V3 V1 V2 easy, 1 dimension is varying V4 V1 V3 easy, uniform variation V4 V1 V3 sort by dimension aggregate dimensions find the most critical one
124 Balancing VMs
125 Why?
126 to reduce loss in terms of failures to reduce hotspots to absorb load spikes Why?
127 balancing in theory min(stddev([load(n ),, load(n )])) 1 i
128 Worst Fit Decrease (WFD) balancing, a practice sort VMs in desc order for each VM pick the suitable node with the highest remaining space
129 balancing over? multiple dimensions
130 balancing over? multiple dimensions dimension normalisation worst dimension dimension aggregation
131 energy efficient VM scheduler
132 1.5% USA of the 2005 budget 3 % in 2010? (Environmental Protection Agency, 2007)
133
134 VM # Consumption (Watt) Node consumption statistics (Cluster edel, 128GB HDD) 110 W 175 W servers are not energy efficient
135 how to reduce consumption with identical nodes?
136 The principles place VMs on the minimum number of nodes turn off idle nodes rince/repeat for the dynamic version
137 Best Fit Decrease (BFD) a simple heuristic to pack VMs sort VMs in desc order for each VM pick the suitable node with the least remaining space efficiency decreases when nb. of dimensions increase
138 2-pass dynamic VM packing to manage the running VMs 1) address performance issues on saturated nodes, BFD to move away VMs on non-saturated nodes 2) address energy efficiency issues on low-loaded nodes, BFD to move away VMs on heavily loaded nodes many variations using threshold based systems, load predictions
139 100 Domain % CPU Node % CPU Node Domain % CPU Node % CPU Node Domain % CPU Node % CPU Node Domain % CPU Node % CPU Node Time (in sec.) Time (in sec.) boot time Domain-4 Domain-3 Domain-2 Domain-1 No packing 2-passes packing
140 100 Domain % CPU Node % CPU Node Domain % CPU Node % CPU Node Domain % CPU Node % CPU Node worthy? Domain % CPU Node % CPU Node Time (in sec.) Time (in sec.) boot time Domain-4 Domain-3 Domain-2 Domain-1 No packing 2-passes packing
141 baseline IVS 2-pass VM packing Energy (Watts) Time (min) Native execution IVS execution prototype interesting gains, the node boot time was the bottleneck simplistic model (identical nodes)
142 Consequences of having heterogeneous nodes?
143 Consequences of having heterogeneous nodes? different performance different Power Usage Effectiveness how to estimate the benefits of a migration?
144 vector packing problem +performance model +power model +cost model + a specialised model
145 power models estimate the energy consumption model the static and the dynamic energy profile of the components usually linear equations Wh(node) = α Hw +β
146 VM performance models a neutral performance unit mips, ECU, GCU, a model to map VM performance with their host perf(vm, N) = α % +β n
147 hw. particularities multi-core CPUs, DDR3 ory, spinning HD, PUE / CUE, boot/shutdown time workload particularities a fine-grain power model VM template migration duration migration payback time
148 - 27% -7% -16% -47% coarse to fine grain optimisation
149 RECAP 99
150 no holy grail 100
151 master the problem 101