Resource management in IaaS cloud. Fabien Hermenier

Size: px
Start display at page:

Download "Resource management in IaaS cloud. Fabien Hermenier"

Transcription

1 Resource management in IaaS cloud Fabien Hermenier

2 where to place the VMs? how much to allocate

3 1What can a resource manager do for you 3

4 2 Models of VM scheduler 4

5 3Implementing VM schedulers 5

6 1What can a resource manager do for you 6

7 central piece of code provide the awaited Quality of Service address the management objectives

8 the RM should solve that VM6? N1 VM2 VM1 N2 VM3 VM4 VM5 N3 N4

9 the RM should prevent that VM2 VM1 VM3 N1 N2 VM6 VM4 VM5 N3 N4

10 Service Level Agreement a contract where a service is formally defined performance indicator MTTR MTBF availability (MTBF/MTTR + MTBF) pricing penalties

11 (KPI)? uptime as the only Key Performance Indicator

12 Cloud Performance SLAs What customers want guaranteed capacities (CPU, ory, bw) guaranteed latencies/throughput (network, load balancer, disk) What providers give -

13

14 the objective provider side min(x) or max(x)

15 atomic objectives min(penalties) Ownership) min(total Cost min(unbalance)

16 composite objectives using weights min(αx + β y) How to estimate coefficients? useful to model sth. you don t understand? min(α TCO + β VIOLATIONS) as a common quantifier: max(revenues)

17 ? Should a client ask for an objective

18 ? Should a client ask for an objective no if the provider cannot prove it is achieved one can check the latency is below a threshold proving a latency is minimal

19 SLAs and objectives are evolving ynamic Power Management (DRS 3.1) 2009? VM-VM affinity (DRS) 2010? Dedicated instances (EC2) mar apr VM-host affinity (DRS 4.1) there is no holy grail MaxVMsPerServer (DRS 5.1) The constraint needed in 2012 sep. 2012?? 2013

20 2 Models of VM schedulers 20

21 VM queue VM scheduler decisions actuators cloud model monitoring data

22 scheduling models static dynamic static dynamic schedulers resource allocation

23 static schedulers VM1 VM2 VM3 N1 N2 VM6 VM4 VM5 N3 N4

24 static schedulers VM2 VM6 VM3 VM1 N1 N2 VM4 VM5 N3 N4

25 static schedulers VM2 VM6 VM3 VM1 N1 N2 VM7 VM4 VM5 N3 N4

26 Combinatorial problem! past decisions impact the future ones

27 static schedulers VM1 buffering might help N1 VM2 N2 VM3 VM6 VM5 VM4 N3 N4

28 static schedulers VM1 buffering might help N1 VM2 N2 VM3 VM6 VM5 VM4 N3 N4

29 static schedulers VM1 buffering might help N1 VM2 N2 VM3 VM7 VM6 VM4 VM5 N3 N4

30 static schedulers VM1 VM7 buffering might help N1 VM2 N2 VM3 VM6 VM4 VM5 N3 N4

31 static schedulers VM1 VM7 buffering might help N1 VM2 N2 VM3 VM6 VM5 VM4 /!\ incoming rate starvation N3 N4

32 static schedulers manipulate the VM queue only react on VM arrival

33 pros static schedulers easy to model simple standard actions manage a few elements

34 static schedulers cons hard to perform fine grain optimisation

35 dynamic schedulers VM1 consider every VMs VM2 VM3 N1 N2 VM6 VM4 VM5 N3 N4

36 dynamic schedulers consider every VMs VM2 VM6 VM3 VM1 N1 N2 VM4 VM5 N3 N4

37 dynamic schedulers consider every VMs VM2 VM6 VM3 VM1 N1 N2 VM7 VM4 VM5 N3 N4

38 dynamic schedulers VM1 consider every VMs VM2 VM3 N1 N2 VM7 VM6 VM4 VM5 N3 N4

39 dynamic schedulers VM1 VM7 consider every VMs VM2 VM3 N1 N2 VM6 VM4 VM5 N3 N4

40 dynamic schedulers manipulate the VM queue reconfigure the schedule with migrations

41 dynamic schedulers a migration takes time

42 dynamic schedulers a migration impacts performance

43 dynamic schedulers dependency management VM1 VM2 VM4? VM2 VM6 N1 N2 N1 N2 VM6 VM4 VM1 N3 VM3 N4 VM5 N3 VM3 N4 VM5

44 dynamic schedulers dependency management VM4 VM1 VM2 N1 N2 VM6 N3 VM3 N4 VM5

45 dynamic schedulers dependency management VM1 VM2 N1 N2 VM4 VM6 N3 VM3 N4 VM5

46 dynamic schedulers dependency management VM6 VM1 VM2 N1 N2 VM4 N3 VM3 N4 VM5

47 dynamic schedulers dependency management VM6 VM2 N1 N2 VM4 VM1 N3 VM3 N4 VM5

48 dynamic schedulers cyclic dependencies anti-affinity(vm3,vm4) min(#onlinenodes) anti-affinity(vm3,vm4) min(#onlinenodes)? N1 N1 VM4 VM1 VM5 VM1 VM3 VM5 VM3 VM4 N3 N4 N3 N4

49 dynamic schedulers cyclic dependencies a pivot to break the cycle N1 fix or prevent the situation? VM4 VM1 VM3 VM5 N3 N4

50 dynamic schedulers cyclic dependencies N1 VM4 a pivot to break the cycle fix or prevent the situation? VM1 VM3 VM5 N3 N4

51 dynamic schedulers cyclic dependencies N1 VM4 a pivot to break the cycle fix or prevent the situation? VM5 VM1 VM3 N3 N4

52 dynamic schedulers cyclic dependencies a pivot to break the cycle N1 fix or prevent the situation? VM5 VM1 VM3 VM4 N3 N4

53 dynamic schedulers quality at a price VM4 VM1 VM2 N1 N2 VM6 N3 VM3 N4 VM5 min(#onlinenodes) = 3

54 dynamic schedulers quality at a price VM1 VM2 N1 N2 VM4 VM6 N3 VM3 N4 VM5 min(#onlinenodes) = 3

55 dynamic schedulers quality at a price VM6 VM1 VM2 N1 N2 VM4 N3 VM3 N4 VM5 min(#onlinenodes) = 3

56 dynamic schedulers quality at a price VM6 VM2 N1 N2 VM4 VM1 N3 VM3 N4 VM5 min(#onlinenodes) = 3

57 dynamic schedulers quality at a price VM6 sol #1: 1m,1m,2m VM2 N1 N2 VM4 VM1 N3 VM3 N4 VM5 min(#onlinenodes) = 3

58 dynamic schedulers quality at a price VM4 sol #1: 1m,1m,2m VM1 VM2 N1 N2 VM6 N3 VM3 N4 VM5 min(#onlinenodes) = 3

59 dynamic schedulers quality at a price VM4 sol #1: 1m,1m,2m VM1 VM2 N1 N2 VM5 VM6 VM3 N3 N4 min(#onlinenodes) = 3

60 dynamic schedulers quality at a price VM4 VM6 sol #1: 1m,1m,2m VM1 VM2 N1 N2 VM5 VM3 N3 N4 min(#onlinenodes) = 3

61 dynamic schedulers quality at a price VM4 VM6 sol #1: 1m,1m,2m N1 VM1 N2 VM2 sol #2: 1m,1m 1m VM5 VM3 N3 N4 min(#onlinenodes) = 3

62 dynamic schedulers quality at a price VM4 VM6 sol #1: 1m,1m,2m N1 VM1 N2 VM2 sol #2: 1m,1m 1m VM5 lower MTTR (faster) VM3 N3 N4 min(#onlinenodes) = 3

63 dynamic schedulers quality at a price the objective should reflect reconfiguration costs min(mttr), min(#migrations),

64 dynamic schedulers pros continuous optimisation through reconfiguration

65 dynamic schedulers cons harder to model harder to scale technically expensive costly reconfiguration

66 ? dynamic scheduling for the win in theory yes but that s theory benefits depends on the workload the objective/ SLAs the infrastructure dynamic scheduling is rare in public or large clouds

67 scheduling models static dynamic schedulers static dynamic resource allocation

68 static resource allocation VM3, ram, i/o, bandwidth allocated once for all allocation!= utilisation VM2 N1 VM1 p no sharing conservative allocation

69 static resource allocation sharing (overbooking) VM3 performance loss with concurrent accesses VM2 VM1 acceptable if stated in the SLA N1 p

70 dynamic resource allocation to fix violations VM3 VM2, ram, i/o, bandwidth allocated can be revised VM1 N1 p

71 dynamic resource allocation to fix violations VM2, ram, i/o, bandwidth allocated can be revised VM1 N1 p

72 dynamic resource allocation to fix violations VM2, ram, i/o, bandwidth allocated can be revised VM1 N1 p

73 dynamic resource allocation to support vertical elasticity VM3 VM2 VM1 N1 p

74 dynamic resource allocation to support vertical elasticity VM3 VM2 VM1 N1 p

75 dynamic resource allocation to support vertical elasticity VM2 VM1 N1 p

76 dynamic resource allocation to support vertical elasticity VM2 VM2 VM1 N1 p

77 ? dynamic resource allocation for the win common on CPU + overbooking (inc. hosting capacity) exceptional for ory (huge performance loss) benefits depends on the workload the objective/ SLAs the infrastructure

78 RECAP 50

79 The VM scheduler makes cloud benefits real 51

80 no holy grail 52

81 think about what is costly 53

82 static scheduling for a peaceful life 54

83 dynamic scheduling to cease the day 55

84 with great power comes great responsibility 56

85 3 Coding a VM scheduler

86

87 VM scheduling is hard static or dynamic scheduler allocation? does the workload/sla/objective requires migration? maximum duration to schedule?

88 VM scheduling is NP-Hard issues with large infrastructures or hard problems fast adhoc heuristics despite corner cases like him some use biased complete approaches (linear programming, constraint programming)

89 vector packing problem N1 VM3 VM1 items with a finite volume to place inside finite bins a generalisation of the bin packing problem N2 VM4 VM2 the basic to model the infra. 1 dimension = 1 resource

90 ? Which resource can be modeled as a packing dimension

91 ? Which resource can be modeled as a packing dimension CPU, ory, disk IO, licences cardinality, network boundaries, end to end network

92 how to support migrations temporary, resources are used on the source and the destination nodes

93 how to support migrations a simple way VM4 n-phases vector packing N1 VM1 N2 VM2 VM duplication between 2 phases to simulate the migration VM6 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions

94 how to support migrations a simple way VM4 n-phases vector packing N1 VM1 N2 VM2 VM duplication between 2 phases to simulate the migration VM4 VM6 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions

95 how to support migrations a simple way n-phases vector packing N1 VM1 N2 VM2 VM duplication between 2 phases to simulate the migration VM4 VM6 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions

96 how to support migrations a simple way VM6 n-phases vector packing N1 VM1 N2 VM2 VM duplication between 2 phases to simulate the migration VM4 VM6 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions

97 how to support migrations a simple way VM6 n-phases vector packing N1 VM1 N2 VM2 VM duplication between 2 phases to simulate the migration VM4 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions

98 how to support migrations a simple way VM6 n-phases vector packing N1 VM1 N2 VM2 VM duplication between 2 phases to simulate the migration VM4 VM1 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions

99 how to support migrations a simple way VM6 n-phases vector packing N1 N2 VM2 VM duplication between 2 phases to simulate the migration VM4 VM1 easy to implement N3 VM3 N4 VM5 hard to manipulate to compute long term previsions

100 how to support migrations the alternative way N1 N3 VM1 VM7 VM3 N2 N4 VM2 VM5 VM6 VM4 N5 offline(n2) + no CPU sharing 1-phase + compute dependencies step by steps /!\ ok to implement cycles #migrations

101 how to support migrations the alternative way VM6 VM7 VM4 VM1 VM5 VM3 VM2 N1 N4 N3 N2 N5

102 how to support migrations the alternative way VM5 VM6 VM7 VM1 VM3 N1 N4 N3 N2 VM4 VM2 1) migrate VM2, migrate VM4, migrate VM5 N5

103 how to support migrations the alternative way VM5 VM6 VM1 VM7 VM3 N1 N4 N3 N2 VM4 N5 VM2 1) migrate VM2, migrate VM4, migrate VM5 2) shutdown(n2), migrate VM7

104 coarse grain staging delay actions mig(vm2) mig(vm4) mig(vm5) off(n2) mig(vm7) stage1 stage2 time

105 Resource-Constrained Project Scheduling Problem the clean way to model space and time N1 N2 N3 N4 N5 VM1 VM2 VM4 VM7 VM3 VM6 VM off VM5 VM1 VM3 VM6 VM7 VM2 VM4 time

106 Resource-Constrained Project Scheduling Problem the clean way to model space and time N1 N2 N3 N4 N5 VM1 VM2 VM4 VM7 VM3 VM6 VM off VM5 VM1 VM3 VM6 VM7 VM2 VM4 time

107 Resource-Constrained Project Scheduling Problem the clean way to model space and time N1 N2 N3 N4 N5 VM1 VM2 VM4 VM7 VM3 VM6 VM off VM5 VM1 VM3 VM6 VM7 VM2 VM4 time

108 Resource-Constrained Project Scheduling Problem the clean way to model space and time N1 N2 N3 N4 N5 VM1 VM2 VM4 VM7 VM3 VM6 VM off VM5 VM1 VM3 VM6 VM7 VM2 VM4 time

109 Resource-Constrained Project Scheduling Problem the clean way to model space and time N1 N2 N3 N4 N5 VM1 VM2 VM4 VM7 VM3 VM6 VM off VM5 VM1 VM3 VM6 VM7 VM2 VM4 time

110 Resource-Constrained Project Scheduling Problem the clean way to model space and time N1 N2 N3 N4 N5 VM1 VM2 VM4 VM7 VM3 VM6 VM off VM5 VM1 VM3 VM6 VM7 VM2 VM4 time

111 Resource-Constrained Project Scheduling Problem the pure way to support migrations 1 resource per (node x dimension), bounded capacity tasks to model the VM lifecycle. height to model a consumption width to model a duration at any moment, the cumulative task consumption on a resource cannot exceed its capacity comfortable to express continuous optimisation very hard to implement (properly)

112 From a theoretical schedule to a practical one duration may be longer convert to an event based schedule 0:3 - migrate VM4 0:3 - migrate VM5 0:4 - migrate VM2 3:8 - migrate VM7 4:8 - shutdown(n2) - : migrate VM4 - : migrate VM5 - : migrate VM2!migrate(VM2) &!migrate(vm4): shutdown(n2)!migrate(vm5): migrate VM7

113 Back to vector packing based approaches

114 Fit Fit Decrease (FFD) basic VM scheduling sort VMs in desc order for each VM pick the first suitable node

115 Fit Fit Decrease (FFD) basic VM scheduling sort VMs in desc order for each VM pick the first suitable node difficult VMs first!

116 Fit Fit Decrease (FFD) basic VM scheduling sort VMs in desc order for each VM pick the first suitable node difficult VMs first! enough free resources

117 Fit Fit Decrease (FFD) V4 V3 V2 V1 example N1 N2 N3 N4

118 Fit Fit Decrease (FFD) V4 V1 V3 V2 example N1 N2 N3 N4

119 Fit Fit Decrease (FFD) V4 V1 V2 V3 example N1 N2 N3 N4

120 Fit Fit Decrease (FFD) V1 V2 V3 V4 example N1 N2 N3 N4

121 Fit Fit Decrease (FFD) example V1 V2 V4 V3 N1 N2 N3 N4

122 multi-dimension sorting? V4 V3 V1 V2 easy, 1 dimension is varying V4 V1 V3 easy, uniform variation V4 V1 V3?

123 multi-dimension sorting? V4 V3 V1 V2 easy, 1 dimension is varying V4 V1 V3 easy, uniform variation V4 V1 V3 sort by dimension aggregate dimensions find the most critical one

124 Balancing VMs

125 Why?

126 to reduce loss in terms of failures to reduce hotspots to absorb load spikes Why?

127 balancing in theory min(stddev([load(n ),, load(n )])) 1 i

128 Worst Fit Decrease (WFD) balancing, a practice sort VMs in desc order for each VM pick the suitable node with the highest remaining space

129 balancing over? multiple dimensions

130 balancing over? multiple dimensions dimension normalisation worst dimension dimension aggregation

131 energy efficient VM scheduler

132 1.5% USA of the 2005 budget 3 % in 2010? (Environmental Protection Agency, 2007)

133

134 VM # Consumption (Watt) Node consumption statistics (Cluster edel, 128GB HDD) 110 W 175 W servers are not energy efficient

135 how to reduce consumption with identical nodes?

136 The principles place VMs on the minimum number of nodes turn off idle nodes rince/repeat for the dynamic version

137 Best Fit Decrease (BFD) a simple heuristic to pack VMs sort VMs in desc order for each VM pick the suitable node with the least remaining space efficiency decreases when nb. of dimensions increase

138 2-pass dynamic VM packing to manage the running VMs 1) address performance issues on saturated nodes, BFD to move away VMs on non-saturated nodes 2) address energy efficiency issues on low-loaded nodes, BFD to move away VMs on heavily loaded nodes many variations using threshold based systems, load predictions

139 100 Domain % CPU Node % CPU Node Domain % CPU Node % CPU Node Domain % CPU Node % CPU Node Domain % CPU Node % CPU Node Time (in sec.) Time (in sec.) boot time Domain-4 Domain-3 Domain-2 Domain-1 No packing 2-passes packing

140 100 Domain % CPU Node % CPU Node Domain % CPU Node % CPU Node Domain % CPU Node % CPU Node worthy? Domain % CPU Node % CPU Node Time (in sec.) Time (in sec.) boot time Domain-4 Domain-3 Domain-2 Domain-1 No packing 2-passes packing

141 baseline IVS 2-pass VM packing Energy (Watts) Time (min) Native execution IVS execution prototype interesting gains, the node boot time was the bottleneck simplistic model (identical nodes)

142 Consequences of having heterogeneous nodes?

143 Consequences of having heterogeneous nodes? different performance different Power Usage Effectiveness how to estimate the benefits of a migration?

144 vector packing problem +performance model +power model +cost model + a specialised model

145 power models estimate the energy consumption model the static and the dynamic energy profile of the components usually linear equations Wh(node) = α Hw +β

146 VM performance models a neutral performance unit mips, ECU, GCU, a model to map VM performance with their host perf(vm, N) = α % +β n

147 hw. particularities multi-core CPUs, DDR3 ory, spinning HD, PUE / CUE, boot/shutdown time workload particularities a fine-grain power model VM template migration duration migration payback time

148 - 27% -7% -16% -47% coarse to fine grain optimisation

149 RECAP 99

150 no holy grail 100

151 master the problem 101