Resource management in IaaS cloud. Fabien Hermenier

Size: px
Start display at page:

Download "Resource management in IaaS cloud. Fabien Hermenier"

Transcription

1 Resource management in IaaS cloud Fabien Hermenier

2 where to place the VMs? how much to allocate

3 1What can a resource manager do for you 3

4 2 Models of VM scheduler 4

5 3Implementing VM schedulers 5

6 1What can a resource manager do for you 6

7 central piece of code provide the awaited Quality of Service address the management objectives

8 the RM should solve that VM6? N1 VM2 VM1 N2 VM3 VM4 VM5 N3 N4

9 the RM should prevent that VM2 VM1 VM3 N1 N2 VM6 VM4 VM5 N3 N4

10 Service Level Agreement a contract where a service is formally defined performance indicator MTTR MTBF availability (MTBF/MTTR + MTBF) pricing penalties

11 (KPI)? uptime as the only Key Performance Indicator

12 Cloud Performance SLAs What customers want guaranteed capacities (CPU, ory, bw) guaranteed latencies/throughput (network, load balancer, disk) What providers give -

13

14 the objective provider side min(x) or max(x)

15 atomic objectives min(penalties) Ownership) min(total Cost min(unbalance)

16 composite objectives using weights min(αx + β y) How to estimate coefficients? useful to model sth. you don t understand? min(α TCO + β VIOLATIONS) as a common quantifier: max(revenues)

17 ? Should a client ask for an objective

18 ? Should a client ask for an objective no if the provider cannot prove it is achieved one can check the latency is below a threshold proving a latency is minimal

19 SLOs and objectives are evolving ynamic Power Management (DRS 3.1) 2009? VM-VM affinity (DRS) 2010? Dedicated instances (EC2) mar apr VM-host affinity (DRS 4.1) there is no holy grail MaxVMsPerServer (DRS 5.1) The constraint needed in 2012 sep. 2012?? 2013

20 2 Models of VM schedulers 20

21 VM queue VM scheduler decisions actuators cloud model monitoring data

22 scheduling models static dynamic static dynamic schedulers resource allocation

23 static schedulers VM1 VM2 VM3 N1 N2 VM6 VM4 VM5 N3 N4

24 static schedulers VM2 VM6 VM3 VM1 N1 N2 VM4 VM5 N3 N4

25 static schedulers VM2 VM6 VM3 VM1 N1 N2 VM7 VM4 VM5 N3 N4

26 Combinatorial problem! past decisions impact the future ones

27 static schedulers VM1 buffering might help N1 VM2 N2 VM3 VM6 VM5 VM4 N3 N4

28 static schedulers VM1 buffering might help N1 VM2 N2 VM3 VM6 VM5 VM4 N3 N4

29 static schedulers VM1 buffering might help N1 VM2 N2 VM3 VM7 VM6 VM4 VM5 N3 N4

30 static schedulers VM1 VM7 buffering might help N1 VM2 N2 VM3 VM6 VM4 VM5 N3 N4

31 static schedulers VM1 VM7 buffering might help N1 VM2 N2 VM3 VM6 VM5 VM4 /!\ incoming rate starvation N3 N4

32 static schedulers manipulate the VM queue only react on VM arrival

33 pros static schedulers easy to model simple standard actions manage a few elements

34 static schedulers cons hard to perform fine grain optimisation

35 dynamic schedulers VM1 consider every VMs VM2 VM3 N1 N2 VM6 VM4 VM5 N3 N4

36 dynamic schedulers consider every VMs VM2 VM6 VM3 VM1 N1 N2 VM4 VM5 N3 N4

37 dynamic schedulers consider every VMs VM2 VM6 VM3 VM1 N1 N2 VM7 VM4 VM5 N3 N4

38 dynamic schedulers VM1 consider every VMs VM2 VM3 N1 N2 VM7 VM6 VM4 VM5 N3 N4

39 dynamic schedulers VM1 VM7 consider every VMs VM2 VM3 N1 N2 VM6 VM4 VM5 N3 N4

40 dynamic schedulers manipulate the VM queue reconfigure the schedule with migrations

41 dynamic schedulers a migration takes time

42 dynamic schedulers a migration impacts performance

43 dynamic schedulers dependency management VM1 VM2 VM4? VM2 VM6 N1 N2 N1 N2 VM6 VM4 VM1 N3 VM3 N4 VM5 N3 VM3 N4 VM5

44 dynamic schedulers dependency management VM4 VM1 VM2 N1 N2 VM6 N3 VM3 N4 VM5

45 dynamic schedulers dependency management VM1 VM2 N1 N2 VM4 VM6 N3 VM3 N4 VM5

46 dynamic schedulers dependency management VM6 VM1 VM2 N1 N2 VM4 N3 VM3 N4 VM5

47 dynamic schedulers dependency management VM6 VM2 N1 N2 VM4 VM1 N3 VM3 N4 VM5

48 dynamic schedulers cyclic dependencies anti-affinity(vm3,vm4) min(#onlinenodes) anti-affinity(vm3,vm4) min(#onlinenodes)? N1 N1 VM4 VM1 VM5 VM1 VM3 VM5 VM3 VM4 N3 N4 N3 N4

49 dynamic schedulers cyclic dependencies a pivot to break the cycle N1 fix or prevent the situation? VM4 VM1 VM3 VM5 N3 N4

50 dynamic schedulers cyclic dependencies N1 VM4 a pivot to break the cycle fix or prevent the situation? VM1 VM3 VM5 N3 N4

51 dynamic schedulers cyclic dependencies N1 VM4 a pivot to break the cycle fix or prevent the situation? VM5 VM1 VM3 N3 N4

52 dynamic schedulers cyclic dependencies a pivot to break the cycle N1 fix or prevent the situation? VM5 VM1 VM3 VM4 N3 N4

53 dynamic schedulers quality at a price VM4 VM1 VM2 N1 N2 VM6 N3 VM3 N4 VM5 min(#onlinenodes) = 3

54 dynamic schedulers quality at a price VM1 VM2 N1 N2 VM4 VM6 N3 VM3 N4 VM5 min(#onlinenodes) = 3

55 dynamic schedulers quality at a price VM6 VM1 VM2 N1 N2 VM4 N3 VM3 N4 VM5 min(#onlinenodes) = 3

56 dynamic schedulers quality at a price VM6 VM2 N1 N2 VM4 VM1 N3 VM3 N4 VM5 min(#onlinenodes) = 3

57 dynamic schedulers quality at a price VM6 sol #1: 1m,1m,2m VM2 N1 N2 VM4 VM1 N3 VM3 N4 VM5 min(#onlinenodes) = 3

58 dynamic schedulers quality at a price VM4 sol #1: 1m,1m,2m VM1 VM2 N1 N2 VM6 N3 VM3 N4 VM5 min(#onlinenodes) = 3

59 dynamic schedulers quality at a price VM4 sol #1: 1m,1m,2m VM1 VM2 N1 N2 VM5 VM6 VM3 N3 N4 min(#onlinenodes) = 3

60 dynamic schedulers quality at a price VM4 VM6 sol #1: 1m,1m,2m VM1 VM2 N1 N2 VM5 VM3 N3 N4 min(#onlinenodes) = 3

61 dynamic schedulers quality at a price VM4 VM6 sol #1: 1m,1m,2m N1 VM1 N2 VM2 sol #2: 1m,1m 1m VM5 VM3 N3 N4 min(#onlinenodes) = 3

62 dynamic schedulers quality at a price VM4 VM6 sol #1: 1m,1m,2m N1 VM1 N2 VM2 sol #2: 1m,1m 1m VM5 lower MTTR (faster) VM3 N3 N4 min(#onlinenodes) = 3

63 dynamic schedulers quality at a price the objective should reflect reconfiguration costs min(mttr), min(#migrations),

64 dynamic schedulers pros continuous optimisation through reconfiguration

65 dynamic schedulers cons harder to model harder to scale technically expensive costly reconfiguration

66 ? dynamic scheduling for the win in theory yes but that s theory benefits depends on the workload the objective/ SLAs the infrastructure dynamic scheduling is rare in public or large clouds

67 scheduling models static dynamic schedulers static dynamic resource allocation

68 static resource allocation VM3, ram, i/o, bandwidth allocated once for all allocation!= utilisation VM2 N1 VM1 p no sharing conservative allocation

69 static resource allocation sharing (overbooking) VM3 performance loss with concurrent accesses VM2 VM1 acceptable if stated in the SLA N1 p

70 dynamic resource allocation to fix violations VM3 VM2, ram, i/o, bandwidth allocated can be revised VM1 N1 p

71 dynamic resource allocation to fix violations VM2, ram, i/o, bandwidth allocated can be revised VM1 N1 p

72 dynamic resource allocation to fix violations VM2, ram, i/o, bandwidth allocated can be revised VM1 N1 p

73 dynamic resource allocation to support vertical elasticity VM3 VM2 VM1 N1 p

74 dynamic resource allocation to support vertical elasticity VM3 VM2 VM1 N1 p

75 dynamic resource allocation to support vertical elasticity VM2 VM1 N1 p

76 dynamic resource allocation to support vertical elasticity VM2 VM2 VM1 N1 p

77 ? dynamic resource allocation for the win common on CPU + overbooking (inc. hosting capacity) exceptional for ory (huge performance loss) benefits depends on the workload the objective/ SLAs the infrastructure

78 RECAP 50

79 The VM scheduler makes cloud benefits real 51

80 no holy grail 52

81 think about what is costly 53

82 static scheduling for a peaceful life 54

83 dynamic scheduling to cease the day 55

84 with great power comes great responsibility 56

85 3 Coding a VM scheduler

86

87 VM scheduling is hard static or dynamic scheduler allocation? does the workload/sla/objective requires migration? maximum duration to schedule?

88 VM scheduling is NP-Hard issues with large infrastructures or hard problems fast adhoc heuristics despite corner cases some use biased complete approaches (linear programming, constraint programming) like him

89 vector packing problem N1 VM3 VM1 items with a finite volume to place inside finite bins a generalisation of the bin packing problem N2 VM4 VM2 the basic to model the infra. 1 dimension = 1 resource

90 ? Which resource can be modeled as a packing dimension

91 ? Which resource can be modeled as a packing dimension CPU, ory, disk IO, licences cardinality, network boundaries, end to end network

92 how to support migrations temporary, resources are used on the source and the destination nodes

93 how to support migrations a simple way VM4 n-phases vector packing N1 VM1 N2 VM2 VM duplication between 2 phases to simulate the migration VM6 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions

94 how to support migrations a simple way VM4 n-phases vector packing N1 VM1 N2 VM2 VM duplication between 2 phases to simulate the migration VM4 VM6 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions

95 how to support migrations a simple way n-phases vector packing N1 VM1 N2 VM2 VM duplication between 2 phases to simulate the migration VM4 VM6 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions

96 how to support migrations a simple way VM6 n-phases vector packing N1 VM1 N2 VM2 VM duplication between 2 phases to simulate the migration VM4 VM6 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions

97 how to support migrations a simple way VM6 n-phases vector packing N1 VM1 N2 VM2 VM duplication between 2 phases to simulate the migration VM4 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions

98 how to support migrations a simple way VM6 n-phases vector packing N1 VM1 N2 VM2 VM duplication between 2 phases to simulate the migration VM4 VM1 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions

99 how to support migrations a simple way VM6 n-phases vector packing N1 N2 VM2 VM duplication between 2 phases to simulate the migration VM4 VM1 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions

100 how to support migrations the alternative way N1 N3 VM1 VM7 VM3 N2 N4 VM2 VM5 VM6 VM4 N5 offline(n2) + no CPU sharing 1-phase + compute dependencies step by steps /!\ ok to implement cycles #migrations

101 how to support migrations the alternative way VM6 VM7 VM4 VM1 VM5 VM3 VM2 N1 N4 N3 N2 N5

102 how to support migrations the alternative way VM5 VM6 VM7 VM1 VM3 N1 N4 N3 N2 VM4 VM2 1) migrate VM2, migrate VM4, migrate VM5 N5

103 how to support migrations the alternative way VM5 VM6 VM1 VM7 VM3 N1 N4 N3 N2 VM4 N5 VM2 1) migrate VM2, migrate VM4, migrate VM5 2) shutdown(n2), migrate VM7

104 coarse grain staging delay actions mig(vm2) mig(vm4) mig(vm5) off(n2) mig(vm7) stage1 stage2 time

105 Resource-Constrained Project Scheduling Problem the clean way to model space and time N1 N2 N3 N4 N5 VM1 VM2 VM4 VM7 VM3 VM6 VM off VM5 VM1 VM3 VM6 VM7 VM2 VM4 time

106 Resource-Constrained Project Scheduling Problem the clean way to model space and time N1 N2 N3 N4 N5 VM1 VM2 VM4 VM7 VM3 VM6 VM off VM5 VM1 VM3 VM6 VM7 VM2 VM4 time

107 Resource-Constrained Project Scheduling Problem the clean way to model space and time N1 N2 N3 N4 N5 VM1 VM2 VM4 VM7 VM3 VM6 VM off VM5 VM1 VM3 VM6 VM7 VM2 VM4 time

108 Resource-Constrained Project Scheduling Problem the clean way to model space and time N1 N2 N3 N4 N5 VM1 VM2 VM4 VM7 VM3 VM6 VM off VM5 VM1 VM3 VM6 VM7 VM2 VM4 time

109 Resource-Constrained Project Scheduling Problem the clean way to model space and time N1 N2 N3 N4 N5 VM1 VM2 VM4 VM7 VM3 VM6 VM off VM5 VM1 VM3 VM6 VM7 VM2 VM4 time

110 Resource-Constrained Project Scheduling Problem the clean way to model space and time N1 N2 N3 N4 N5 VM1 VM2 VM4 VM7 VM3 VM6 VM off VM5 VM1 VM3 VM6 VM7 VM2 VM4 time

111 Resource-Constrained Project Scheduling Problem the pure way to support migrations 1 resource per (node x dimension), bounded capacity tasks to model the VM lifecycle. height to model a consumption width to model a duration at any moment, the cumulative task consumption on a resource cannot exceed its capacity comfortable to express continuous optimisation very hard to implement (properly)

112 From a theoretical schedule to a practical one duration may be longer convert to an event based schedule 0:3 - migrate VM4 0:3 - migrate VM5 0:4 - migrate VM2 3:8 - migrate VM7 4:8 - shutdown(n2) - : migrate VM4 - : migrate VM5 - : migrate VM2!migrate(VM2) &!migrate(vm4): shutdown(n2)!migrate(vm5): migrate VM7

113 Back to vector packing based approaches

114 Fit Fit Decrease (FFD) basic VM scheduling sort VMs in desc order for each VM pick the first suitable node

115 Fit Fit Decrease (FFD) basic VM scheduling sort VMs in desc order for each VM pick the first suitable node difficult VMs first!

116 Fit Fit Decrease (FFD) basic VM scheduling sort VMs in desc order for each VM pick the first suitable node difficult VMs first! enough free resources

117 Fit Fit Decrease (FFD) V4 V3 V2 V1 example N1 N2 N3 N4

118 Fit Fit Decrease (FFD) V4 V1 V3 V2 example N1 N2 N3 N4

119 Fit Fit Decrease (FFD) V4 V1 V2 V3 example N1 N2 N3 N4

120 Fit Fit Decrease (FFD) V1 V2 V3 V4 example N1 N2 N3 N4

121 Fit Fit Decrease (FFD) example V1 V2 V3 V4 N1 N2 N3 N4

122 multi-dimension sorting? V4 V3 V1 V2 easy, 1 dimension is varying V4 V1 V3 easy, uniform variation V4 V1 V3?

123 multi-dimension sorting? V4 V3 V1 V2 easy, 1 dimension is varying V4 V1 V3 easy, uniform variation V4 V1 V3 sort by dimension aggregate dimensions find the most critical one

124 Balancing VMs

125 Why?

126 to reduce loss in terms of failures to reduce hotspots to absorb load spikes Why?

127 balancing in theory min(stddev([load(n ),, load(n )])) min( 1 i max([load(n ),, load(n )]), min([load(n ),, load(n )]), ) OR 1 i 1 i

128 Worst Fit Decrease (WFD) balancing, a practice sort VMs in desc order for each VM pick the suitable node with the highest remaining space

129 balancing over? multiple dimensions

130 balancing over? multiple dimensions dimension normalisation worst dimension dimension aggregation current state vs. next state

131 What about the SLOs? ynamic Power Management (DRS 3.1) 2009? VM-VM affinity (DRS) 2010? Dedicated instances (EC2) mar apr VM-host affinity (DRS 4.1) MaxVMsPerServer (DRS 5.1) The constraint needed in 2012 sep. 2012?? 2013

132 SLOs inject additional rules placement constraint (anti)affinity rules fault tolerance, hw. compatibility, security VM-VM, VM-PM, relative, absolute temporal constraint precedences, parralelism, sequence control, performance control between actions counting constraint performance, licensing restriction per node, group of nodes, resource

133 discrete constraints >>spread(vm[1,2]) ban(vm1, N1) ban(vm2, N2) N1 N2 N3 VM1 VM2 simple spatial problem continuous constraints N1 N2 VM1 spread(vm[1,2]) ban(vm1, N1) ban(vm2, N2) N3 VM2 harder scheduling problem (think about actions interleaving)

134 hard constraints spread(vm[1..50]) must be satisfied all or nothing approach not always meaningful mostlyspread(vm[1..50], 4, 6) soft constraints satisfiable or not internal or external penalty model harder to implement/scale hard to standardise?

135 energy efficient VM scheduler

136 1.5% USA of the 2005 budget 3 % in 2010? (Environmental Protection Agency, 2007)

137

138 1Consume less 2003

139 VM # Consumption (Watt) Node consumption statistics (Cluster edel, 128GB HDD) 110 W 175 W servers are not energy efficient

140 how to reduce consumption with identical nodes?

141 The principles place VMs on the minimum number of nodes turn off idle nodes rince/repeat for the dynamic version

142 Best Fit Decrease (BFD) a simple heuristic to pack VMs sort VMs in desc order for each VM pick the suitable node with the least remaining space efficiency decreases when nb. of dimensions increase

143 2-pass dynamic VM packing to manage the running VMs 1) address performance issues on saturated nodes, BFD to move away VMs on non-saturated nodes 2) address energy efficiency issues on low-loaded nodes, BFD to move away VMs on heavily loaded nodes many variations using threshold based systems, load predictions

144 100 Domain % CPU Node % CPU Node Domain % CPU Node % CPU Node Domain % CPU Node % CPU Node Domain % CPU Node % CPU Node Time (in sec.) Time (in sec.) boot time Domain-4 Domain-3 Domain-2 Domain-1 No packing 2-passes packing

145 100 Domain % CPU Node % CPU Node Domain % CPU Node % CPU Node Domain % CPU Node % CPU Node worthy? Domain % CPU Node % CPU Node Time (in sec.) Time (in sec.) boot time Domain-4 Domain-3 Domain-2 Domain-1 No packing 2-passes packing

146 baseline IVS 2-pass VM packing Energy (Watts) Time (min) Native execution IVS execution prototype interesting gains, the node boot time was the bottleneck simplistic model (identical nodes)

147 Consequences of having heterogeneous nodes?

148 Consequences of having heterogeneous nodes? different performance different Power Usage Effectiveness how to estimate the benefits of a migration?

149 vector packing problem +performance model +power model +cost model + a specialised model

150 power models estimate the energy consumption model the static and the dynamic energy profile of the components usually linear equations Wh(node) = α Hw +β

151 VM performance models a neutral performance unit mips, ECU, GCU, a model to map VM performance with their host perf(vm, N) = α % +β n

152 hw. particularities multi-core CPUs, DDR3 ory, spinning HD, PUE / CUE, boot/shutdown time workload particularities a fine-grain power model VM template migration duration migration payback time

153 - 27% -7% -16% -47% coarse to fine grain optimisation

154 2 Consume better 2012 from a spatial to a temporal point of view

155 DC4Cities let existing and new data centres become energy adaptive

156 grid renewable part renewable power (%) /01 17/01 18/01 19/01 20/01 Date sunroof PV production Power (W) /01 17/01 18/01 19/01 20/01 Date renewable energies are intermittent

157 so are elastic and batch-oriented applications 6 to 20 moonshot cartridges Request/s :00 05:00 11:00 17:00 23:00 Time SLO Penalty (euros) Performance degradation (Req/s) penalty model

158 align workload to renewable energies availability DC4Cities

159 shape EASCs for sustainable profitability energy providers smart city authority forecasts sustainable objective Carver application descriptor elected working modes energy adaptive applications Easc Easc web service... video transcoder pick WMs such as min(penalty(slo) + penalty(sma) + price(e))

160 Energy Adaptive Software Components attached to an application exhibit - working modes - SLO (cumulative or instant) - actuators (see UCC 15 paper)

161 An automaton for each EASC model the behavior, + penalty functions for the SMA, the SLA

162 application baseline (satisfy perf ) Watts /01 17/01 18/01 19/01 20/01 Date E learning G indexing Website carver Watts application E learning G indexing Website 0 16/01 17/01 18/01 19/01 20/01 Date green (max renewable) Watts application E learning G indexing Website 0 16/01 17/01 18/01 19/01 20/01 Date

163 Resulting running costs 17/01/15 18/01/15 19/01/15 20/01/15 Running cost (euros) expense energy SLO SMA 0 perf green Carver perf green Carver perf green Carver perf green Carver

164 RECAP 114

165 no holy grail 115

166 master the problem 116