Self-Aware Multidimensional Auto-Scaling

Size: px
Start display at page:

Download "Self-Aware Multidimensional Auto-Scaling"

Transcription

1 Master s Thesis Presentation

2 Motivation Public Cloud [1] MA André Bauer 2

3 In a Nutshell PROBLEM: - Limited multi-tier support of existing auto-scalers - Hidden bottlenecks, oscillations - Decision based on CPU utilization not reliable in public clouds - Costs in public cloud increase dramatically IDEA: - Extent Chameleon to multi-tier support - Change decision basis - Cost-efficiency component BENEFIT: - One Chameleon instance required - Stable behavior - Cost reduction ACTION: - Upgrade decision logic - Develop cost component 3

4 Related Work Multi-Tier Auto-Scaler Method Proactive / Reactive Scaling Direction Cost-Efficiency AutoMAP `15 [3] Thresholds Reactive Horizontal + Vertical Deployment AGILE `13 [4] Wavelets Proactive Horizontal No CloudScale `11 [5] Fast Fourier Transformation Proactive Vertical No W. Iqbal `11 [6] Regression Proactive Horizontal No Q. Zhu `10 [7] Control Theory Reactive Vertical No B. Urgaonkar `08 [8] Queueing Theory Proactive Horizontal No U. Sharma `12 [9] Queueing Theory Reactive Horizontal + Vertical Deployment J. Bi `10 [10] Queueing Theory Proactive Horizontal No P. Lama `09 [11] Queueing Theory Proactive Horizontal No 4

5 Multi-Tier Approach 5

6 Decision Logic Change To date: CPU utilization based Change to queueing theory M/M/n queue per tier Utilization ρρ = λλ μμ nn Required number of virtual machines n = λλ μμ ρρ Proactive cycle Forecast of arrival rate at first tier Decision for first tier Decisions for later tiers based on the predecessor decisions [2] MA Marwin Züfle 6

7 Cost-Efficiency Component Two cost models Idea Amazon EC2: hourly charging Google Cloud Platform: First ten minutes fix Then minute-by-minute charging Do not stop virtual machines if required in near future Near future: inside the current charging interval Stop VMs nearest to next charging interval ICPE 2018 als FOX eingereicht 7

8 Testbed Private cloud environment: cloudstack Applications Single-tier: reimplementation of LU worklet of SPEC Sert Multi-tier: own implementation of a typical 3-tier architecture Workloads (48 hour realtime) German Wikipedia, December 2013 (regular, seasonal) BibSonomy, April 2017 (noisy, seasonal) Load driver multi-tier Bungee Competing auto-scalers React [13], Adapt [14], Reg [6], Hist [8], ConPaaS [15] 8

9 Metrics User metrics Average amount VMs Amount of adaptations Mean and median response time SLO violation rate System-oriented metrics Provisioning accuracy θθ Wrong provisioning timeshare ττ Instability νν Aggregation metrics Auto-scaler deviation δδ (Minkowski distance to theoretically optimal auto-scaler) 9

10 Metrics Instability 10

11 Experiments Single-Tier Wikipedia Single-Tier BibSonomy Multi-Tier Wikipedia Standard Setup Large Setup Reproducibility Multi-Tier BibSonomy Side-Evaluation Forecasting Multi-Tier BibSonomy Cost-Efficiency 11

12 Single-Tier Wikipedia Adapt Upscaling too late Anticipates stable phases quite well Late downscaling at end SLO violation in underprovisioned phases Chameleon Upscaling in time Late downscaling at start Nearly no SLO violations 12

13 Single-Tier Wikipedia Metric Chameleon Adapt React Reg Hist ConPaaS Prov Accuracy 6% 6% 10% 10% 18% 18% Prov Timeshare 39% 33% 42% 39% 42% 43% Instability 16% 32% 27% 44% 61% 68% AS Deviation 86% 90% 92% 96% 102% 104% Avg VM Amount 14,75 13,76 14,87 11,9 15,73 12,16 Mean Resp Time [ms] Median Resp Time [ms] SLO Violations 0% 12% 5% 33% 9% 44% 13

14 Multi-Tier Wikipedia Large Setup Reg Tier configuration Reg Presentation tier: 15 Business tier: 25 Database tier: 10 Bottleneck shifting effect appears Drops at random times Late downscaling High SLO violation in underprovisioned phases 14

15 Multi-Tier Wikipedia Large Setup Chameleon Tier configuration Presentation tier: 15 Business tier: 25 Database tier: 10 Chameleon No Bottleneck shifting effect In time upscaling Good anticipation in stable phases Few SLO violations 15

16 Tier Metric Chameleon Reg Multi-Tier Wikipedia - Metrics 1 Prov Accuracy 8% 11% 1 Prov Timeshare 43% 36% 1 Instability 12% 21% 1 AS Deviation 86% 87% 2 Prov Accuracy 6% 16% 2 Prov Timeshare 44% 45% 2 Instability 44% 45% 2 AS Deviation 92% 95% 3 Prov Accuracy 8% 22% Chameleon has lower values for most of the metrics Reg performs better in provisioning time share at first tier, instability and auto-scaler deviation at third tier Significant lower SLO violation for Chameleon 3 Prov Timeshare 36% 45% 3 Instability 39% 19% 3 AS Deviation 93% 91% Overall SLO Violation 6% 29% Overall Deviation 271% 273% 16

17 Multi-Tier BibSonomy Reg Tier configuration Reg Presentation tier: 10 Business tier: 10 Database tier: 10 No anticipations of fluctuations Smoothing of demand curve Sudden peaks and drops occur High SLO violation in underprovisioned phases 17

18 Multi-Tier BibSonomy Chameleon Tier configuration Presentation tier: 10 Business tier: 10 Database tier: 10 Chameleon Fluctuations are anticipated Tries to match demand as close as possible Sudden peaks from wrong forecasts Few SLO violations 18

19 Tier Metric Chameleon Reg Multi-Tier BibSonomy 1 Prov Accuracy 12% 13% 1 Prov Timeshare 29% 29% 1 Instability 28% 20% 1 AS Deviation 87% 84% 2 Prov Accuracy 11% 11% 2 Prov Timeshare 32% 33% 2 Instability 46% 32% 2 AS Deviation 94% 90% 3 Prov Accuracy 9% 9% Metrics show close results for both auto-scalers Reg performs slightly better in metrics Plots of behavior show peaks and drops for Reg Lower SLO violation rate for Chameleon 3 Prov Timeshare 19% 18% 3 Instability 20% 11% 3 AS Deviation 79% 74% Overall SLO Violation 7% 11% Overall Deviation 261% 249% 19

20 Multi-Tier BibSonomy Cost Efficiency Model No Cost EC2 No Cost GCP Charged h Accounted h SLO violation 7% 2.6% 7% 1.8% Amazon EC2 cost model Charged hours reduced Used hours doubled Google GCP cost model Charged hours increased Used hours doubled 20

21 Key Take-Aways Problems when using single-tier auto-scaler Overheads, bottleneck shifting, oscillations Upgrade existing Chameleon to multi-tier support Problem when running in public cloud Decision basis CPU utilization is not reliable, costs increase Change decision basis to queueing theory Add cost efficiency component Results Chameleon works best for single-tier and multi-tier Wikipedia For BibSonomy stable performance of Chameleon with lower SLO violation rate Cost-efficiency component saves money and decreases SLO violation rate 21

22 THANK YOU! 22

23 Bibliography [1] A. Bauer, Design and Evaluation of a Proactive, Application-Aware Elasticity Mechanism, Master Thesis, University of Würzburg, Am Hubland, Informatikgebäude, Würzburg, Germany, September [2] M. Züfle, Dynamic Hybrid Forecasting for Self-Aware Systems, Master Thesis, University of Würzburg, Am Hubland, Informatikgebäude, Würzburg, Germany, October [3] M. Beltran, Automatic provisioning of multi-tier applications in cloud computing environments, The Journal of Supercomputing, vol. 71, no. 6, pp , [4] H. Nguyen, Z. Shen, X. Gu, S. Subbiah, and J. Wilkes, Agile: Elastic distributed resource scaling for Infrastructure-as-a-service., [5] Z. Shen, S. Subbiah, X. Gu, and J. Wilkes, CloudScale: elastic resource scaling for multi-tenant cloud systems, in Proceedings of the 2nd ACM Symposium on Cloud Computing. ACM, 2011, p. 5. [6] W. Iqbal, M. N. Dailey, D. Carrera, and P. Janecek, Adaptive resource provisioning for read intensive multi-tier applications in the cloud, Future Generation Computer Systems, vol. 27, no. 6, pp , [7] Q. Zhu and G. Agrawal, Resource provisioning with budget constraints for adaptive applications in cloud environments," in Proceedings of the 19 th ACM International Symposium on High Performance Distributed Computing. ACM, 2010, pp

24 Bibliography [8] B. Urgaonkar, P. Shenoy, A. Chandra, and P. Goyal, Dynamic provisioning of multi-tier internet applications," in Autonomic Computing, ICAC Proceedings. Second International Conference on. IEEE, 2005, pp [9] U. Sharma, P. Shenoy, and D. F. Towsley, Provisioning multi-tier cloud applications using statistical bounds on sojourn time," in Proceedings of the 9th international conference on Autonomic computing. ACM, 2012, pp [10] J. Bi, Z. Zhu, R. Tian, and Q. Wang, Dynamic provisioning modeling for virtualized multi-tier applications in cloud data center, in Cloud Computing(CLOUD), 2010 IEEE 3rd international conference on. IEEE, 2010, pp [11] P. Lama and X. Zhou, Efficient server provisioning with end-to-end delay guarantee on multi-tier clusters," in Quality of Service, IWQoS. 17 th International Workshop on. IEEE, 2009, pp [12] Q. Zhang, L. Cherkasova, and E. Smirni, A regression-based analytic model for dynamic resource provisioning of multi-tier applications," in Autonomic Computing, ICAC'07. Fourth International Conference on. IEEE, 2007, pp

25 Bibliography [13] T. C. Chieu, A. Mohindra, A. A. Karve, and A. Segal, Dynamic scaling of web applications in a virtualized cloud computing environment," in IEEE International Conference on E-Business Engineering, ICEBE'09. IEEE, 2009, pp [14] A. Ali-Eldin, J. Tordsson, and E. Elmroth, An adaptive hybrid elasticity controller for cloud infrastructures," in Network Operations and Management Symposium (NOMS), 2012 IEEE. IEEE, 2012, pp [15] H. Fernandez, G. Pierre, and T. Kielmann, Autoscaling web applications in heterogeneous cloud infrastructures," in Cloud Engineering (IC2E), 2014 IEEE International Conference on. IEEE, 2014, pp

26 3-Tier Application 26

27 System-Oriented Metrics Provisioning accuracy θθ Wrong provisioning timeshare ττ Instability νν Auto-scaler deviation δδ θθ UU = 1 TT max( TT ddtt ss tt, 0) tt dd tt tt=1 θθ OO = 1 TT max( TT ddtt ss tt, 0) tt dd tt tt=1 TT ττ UU = 1 TT max(sgn dd tt ss tt, 0) tt tt=1 TT ττ OO = 1 TT tt=1 max(sgn ss tt dd tt, 0) tt TT υυ = 1 TT tt 1 tt=2 min( sgn ss tt sgn( dd tt ), 1) tt δδ = (θθ 2 + ττ + υυ) 1/4 27

28 Auto-Scaler Deviation Idea: compare auto-scaler performance to performance of theoretical optimal auto-scaler Use Minkowski-Distance: dd pp xx, yy xx yy pp = nn pp 1/4 ii=1 xx ii yy ii Take into account mean provisioning accuracy θθ mean wrong provisioning time share ττ instability υυ Double weight mean provisioning accuracy to equally weight time Auto-scaler deviation δδ = (θθ mmmmmmmm 2 + ττ mmmmmmmm + υυ) 1/4 Overall auto-scaler deviation δδ oooooooooooooo = ttiiiiiiii δδ tttttttt 28

29 Instability 29

30 Single-Tier Bibsonomy Metric Chameleon Adapt React Reg Hist ConPaaS Prov Accuracy 13% 14% 17% 15% 21% 23% Prov Timeshare 41% 45% 44% 43% 44% 45% Instability 60% 57% 61% 58% 61% 67% AS Deviation 101% 101% 102% 101% 102% 104% Mean Resp Time [ms] Median Resp Time [ms] SLO Violations 57% 58% 11% 61% 45% 33% 30

31 Multi-Tier Reproducibility Baseline Chameleon Tier Metrik Day 1 Day 2 Day 1 Day 2 1 Prov Accuracy 17% 11% 9% 3% 1 Prov Timeshare 46% 25% 36% 13% 1 Instability 3% 3% 6% 6% 1 AS Deviation 85% 73% 81% 66% 2 Prov Accuracy 20% 11% 5% 3% 2 Prov Timeshare 48% 26% 33% 16% 2 Instability 5% 5% 9% 9% 2 AS Deviation 87% 75% 81% 71% 3 Prov Accuracy 20% 12% 8% 3% 3 Prov Timeshare 41% 22% 25% 9% 3 Instability 2% 2% 5% 5% 3 AS Deviation 83% 71% 74% 61% overall AS Deviation 254% 219% 236% 197% 31

32 Side-Evaluation Forecasting Tier Metric Telescope TBATS 1 Prov Accuracy 12% 23% Method MASE MAPE Duration (s) 1 Prov Timeshare 29% 39% Telescope Instability 28% 22% TBATS AS Deviation 87% 90% ARIMA Prov Accuracy 11% 20% 2 Prov Timeshare 32% 42% 2 Instability 46% 35% 2 AS Deviation 94% 95% 3 Prov Accuracy 9% 18% 3 Prov Timeshare 19% 34% 3 Instability 20% 17% 3 AS Deviation 79% 86% Overall SLO Violation 7% 7% Overall Deviation 261% 271% 32