Talk Outline. Cloud Datacentre Reliability Cloud Resource Provisioning 8/2/2018. Smart Living

Size: px
Start display at page:

Download "Talk Outline. Cloud Datacentre Reliability Cloud Resource Provisioning 8/2/2018. Smart Living"

Transcription

1 Cloud Datacentre Reliability Cloud Resource Provisioning Talk Outline Cloud computing applications Hybrid Cloud Resource Provisioning Problem Overview Informal and informal definition of the problem Security and Reliability Challenges Addressing Reliability Problem Service Selection Algorithms Cloud Selection Algorithms Scheduling (Mapping VMs to PM) Algorithms Addressing Security Problem Trust Management Performance Evaluation Result and Discussion Conclusion Smart Living 1

2 Social Network Data Social network data is rich in content and relationships that are quite valuable to many third party consumers. sociologists, (e.g., for studying social structure), epidemiologists (e.g., to understand infectious disease dynamics), businesses (e.g., to drive marketing campaigns and to enable better social targeting of advertisements) and criminologists (e.g., identifying insurgent networks and determining leaders and active cells, fraud detection). SN operators routinely publish sanitized versions of the social network data collected. Smart Agricultural Source: M. Imran, R. Zurita-Milla, R. de By, ITC Univ. Twente, AGILE Conference 2011 Networked physical objects (devices, vehicles, buildings, etc.) capable of collecting and exchanging data. Cloud serves as a backend infrastructure Imagine your network with 1,000,000 more devices Any compromised device is a foothold on the network Biplob R. Ray, Jemal H. Abawajy, Morshed U. Chowdhury, Abdulhameed Alelaiwi: Universal and secure object ownership transfer protocol for the Internet of Things. Future Generation Comp. Syst. 78: (2018) 2

3 Application of Cloud in Agriculture Exploiting cloud computing with technologies such as wireless sensor networking and mobile computing Cloud Computing based Livestock Monitoring and Disease Forecasting System Cloud Based Autonomic Information System for delivering agriculture related information as a service to improve sustainability, efficiency and quality. Taking market to smallholder farmers to change formal markets faces challenges with quality, quantity, and high transaction cost. Application of Cloud in Health Cloud computing is used for a wide variety of health applications Tahsien Al-Quraishi, Jemal H. Abawajy, Morshed U. Chowdhury, Sutharshan Rajasegarar, Ahmad Shaker Abdalrada: Breast Cancer Recurrence Prediction Using Random Forest Model. SCDM 2018: Application of Cloud in Health Taking hospitals and clinks to the people not the people to the hospitals Federated Internet of Things and Cloud Computing Pervasive Patient Health Monitoring System. IEEE Communications Magazine 55(1): (2017) Sara Ghanavati, Jemal H. Abawajy, Davood Izadi, Abdulhameed Alelaiwi: Cloud-assisted IoTbased health status monitoring framework. Cluster Computing 20(2): (2017) 3

4 Cloud Computing Businesses - Discovering valuable new insights (e.g., consumer purchasing trends to better target marketing). Security - Support decision making (e.g., detect fraud, disaster management, etc.). Medical - Reveal new trends and patterns that were previously hidden (e.g., likelihood of being predisposed to an incurable disease). Agriculture (e.g., spot unusual changes in land use pattern) E-governance - Deliver personalised and streamlined services, that accurately and specifically meet individual s needs, in a timely manner. Future applications - where users and machines will need to collaborate in intelligent ways together (e.g., smart city). Intelligence and scientific discovery.. Introduction Data centres consume high energy costs and huge carbon footprints. Financial Issues: In excess of $11 billion in 2010 and cost doubles every five years. Reliability issues: For every 10 increase in temperature, the failure rate of a system doubles. Environmental issues: Closer to 2% emission of CO2 when data center systems are factored into the equation. Problem Statement How can we improve energy efficiency without QoS reduction? 4

5 Problem Overview A shared pool of personal (private) clouds,, A set of,, cloud computing users A shared pool of rentable (public) clouds,, A hybrid cloud is a partially sharable pool of that systematically integrates and,, Hybrid Cloud Computing. The most widely used Cloud computing models. Broker is core component for selecting suitable resource providers We developed broker Communication Module Message-Passing InterGrid Gateway Management & Monitoring JMX Scheduler (Provisioning Policies & Peering) Persistence DB Java Derby Virtual Machine Manager Local Grid IaaS Emulator Resources Middleware Provider Workload A set of tightly coupled jobs submitted by private cloud computing users,, Each job is described by a tuple:,,,, Gene Sequencing :Type of required virtual machines = estimated service demand = Deadline of the request Matrix multiplication = arrival time = number of VMs needed (All VMs must be available for the whole required duration) 5

6 Problem Overview Informally, the problem is how to execute user application on the hybrid cloud computing such that both user and cloud owners are happy. Where to execute the application (either public or private cloud)? How can we select the best public cloud to execute the application? How should jobs scheduled locally (mapping VMs to Physical Machines)? NP-Hard Problem Parties objectives and requirements differ: (a) maximizing their returnon-investment (Cloud owners); (b) minimizing their cost (Cloud customers). For example, cloud providers are interested in the following: Cumbersome task for user to select the best services from many functionally similar services. Might differ by the quality of service offered, by the semantics for data access - both for reading and writing - and by interfaces or security mechanisms implemented. Hybrid Cloud Reliability Challenges Resource failure is inevitable Redundant components in public Clouds (much more reliable service than private cloud) Leads to service failure in private Clouds Temporal correlation: the failure rate is time-dependent and some periodic failure patterns can be observed in different timescales Spatial correlation : multiple failures occur on different nodes within a short time interval 6

7 Hybrid Cloud Security Challenges Problem 1: How can we improve usability and security of end entity credential management on the Cloud? Security of a system often depends on how securely user credentials are managed. Problem 2: Hybrid Cloud opens up the possibility of misusing information to a degree never seen before. Insider threats are very serious problem Problem 3: Detecting and Mitigating HX-DoS attacks against Cloud Web Services Reliability-aware Hybrid Cloud Resource Provisioning Cloud Resource A shared pool of,, rentable clouds Each cloud consists of a pool of shared resources,, Each resource is described by a tuple:,,, where = available service capacity, 0 100% = service unit price = service type (CPU, Storage, Network} = service status (failed, working} 7

8 Resource allocation problem formulation Given A shared pool of,, rentable cloud computing resources A set of,, jobs Objective min.. (User-centric) min,.., (User centric) m.. (System centric) Cloud Service Cost The cost for provisioning every public cloud resource type for is:, Where : the amount of resource type (i.e., CPU, Storage, Network) needed by VM. : service type unit price by cloud service provider Job Deadline Each job has a deadline. stresses that the results of job must be ready before no later than the deadline as expressed in the following equation: Where = actual time taken to complete execution of job = arrival time of job 8

9 Where to execute the application (either public or private cloud)? Size-based Brokering Strategies Based on fact that the number of VMs requested by a job follows two-stage uniform distribution with (l,m,h,q) Schedule wider requests to public & narrow requests to private Uses mean number of VMs per request to distinguish between wide and narrow requests Ideal for spatial correlation, where multiple failures occur on different nodes within a short time interval Size-based Brokering Strategies 1. Algorithm: Size-based 2. BEGIN 3. Compute mean value of the two-stage uniform distribution 4. (l,m,h,q are two-stage uniform distribution) 5. FOR each DO 6. Compute mean number of VMs required IF THEN 9. Send it to public cloud 10. ELSE 11. Send it to private cloud 12. ENDFOR 13. END Algorithm 9

10 Time-based Brokering Strategies Based on the observation that the requests duration (job runtime) in real distributed systems are long-tailed This means that a very small fraction of all requests are responsible for the main part of the load. Ideal for temporal correlation, where the failure rate is time-dependent and some periodic failure patterns can be observed in different time-scales Shortest 80% requests contribute only the 20% of the total load Longest 20% requests contribute only the 80% of the total load Time-based Brokering Strategies 1. Algorithm: Time-based 2. BEGIN 3. Request duration follows lognormal distribution with μ and σ parameters 4. FOR each DO 5. //Compute mean durtion of a job 6. IF THEN 7. Send it to public cloud 8. ELSE 9. Send it to private cloud 10. ENDFOR 11. END Algorithm Area-based Brokering Strategies Utilize the area of a request which is the area of the rectangle with length and width as the decision point. Making a compromise between the size-based and time-based strategy This strategy sends long and wide requests to the public Cloud, It would be more conservative than a size-based strategy and less conservative than a time-based strategy. 10

11 Area-based Brokering Strategies Making a compromise between the size-based and time-based strategy The mean area of the requests This strategy sends long and wide requests to the public Cloud, It would be more conservative than a size-based strategy and less conservative than a time-based strategy. Area-based Brokering Strategies 1. Algorithm: Area-based 2. BEGIN 3. Calculate the mean request area of a job FOR each DO IF THEN 8. Send it to public cloud 9. ELSE 10. Send it to private cloud 11. ENDFOR 12. END Algorithm How should jobs scheduled locally (mapping VMs to PM)? 11

12 Server Level Scheduling Algorithms 1. Algorithm: Slowdown-based 2. Begin 3. 0 // threshold 4. 0 // waiting time of job 5. 0 // run time of job 6. //Slowdown of job 7. FOR each DO IF 10. grants a reservation to. 11. ENDIF 12. ENDFOR 13. END Algorithm Server Level Scheduling Algorithms 1. Algorithm: Advance-based (AB) Cautiously 2. Begin 3. //job at the head of the queue 4. //Slowdown of job k 5. FOR each DO 6. IF 7. Move ahead of 8. ENDIF 9. ENDFOR 10. END Algorithm Performance Evaluation 12

13 Experimental setup Used real failure traces and a workload model. Performance Metrics Deadline violation rate Slowdown Cloud Cost on EC2 Failures from Failure Trace Archive (Grid 5000 traces) 18-month, 800 events/node Average availability: hours Average unavailability: hours Slowdown Metrics Bounded slowdown is response time normalized by running time and can be defined as follows where Wi is the waiting time Ti is the run time of request i, Usage Cost Metrics The cost of using EC2 for policy pl can be calculated as follows: Where Hpl: the public Cloud usage per hour Mpl: the fraction of requests redirected to the public Cloud Hu: startup time for initialization of OS on a virtual machine (80s) Cn: The cost of one specific instance on EC2 (0.085 USD per virtual machine per hour for a small instance) Bn: amount of data which transfers to Amazon s EC2 for each request (0.1 USD per GB) 13

14 Deadline Metrics Deadline for application is set as follows: : job submission time : job completion time : job turn around time : stringency factor ( 1is normal deadline (e.g., f=1.3)) Result and Discussion Deadline Violation Rate Analysis Violation rate as a function of the job arrival rate 14

15 Deadline Violation Rate Analysis Violation rate as a function of the request size Deadline Violation Rate Analysis Violation rate as a function of the job duration Slowdown Analysis Slowdown for all provisioning policies as a function of job arrival rate. 15

16 Slowdown Analysis Slowdown for all provisioning policies as a function of job size. Slowdown Analysis Slowdown for all provisioning policies as a function of job service demand. Public Cloud Usage Cost Analysis Cloud Cost on EC2 for all provisioning policies as a function of job arrival time. 16

17 Public Cloud Usage Cost Analysis Cloud Cost on EC2 for all provisioning policies as a function of job size. Public Cloud Usage Cost Analysis Cloud Cost on EC2 for all provisioning policies as a function of job service demand. Cloud Resource Management Scheduling is a problem that has many variants. Optimizing one objective has been widely studied for many combinatorial problems including scheduling. The most popular objective is the makespan which is informally defined as the time of the last finishing task (completion time) of an application represented by a precedence task graph. 17

18 A utility-oriented distributed computing system consisting of a collection of inter-connected and virtualised computers that are dynamically Understanding provisioned presented Cloud Computing as one or more unified computing resources based on service-level agreements (SLA) Users Cloud established Services through negotiation between the service provider and consumers. Salesforce.com Software as a Web 2 application, , etc. Service Microsoft Azure Platform as Develop. Cloud & test, Develop. Computing & Integration, etc. a Service Amazon S3, EC2 Infrastructure Storage??? as a Service Advantages Infinite compute resource on demand (virtualization) Accessibility anytime and anywhere Elimination of the upfront commitment of users Reduced costs due to dynamic hardware provisioning Pay per use basis (and also other models) No need to plan for peak load in advance Easy management: Software versioning and upgrading Rent on demand Risks Performance How to garauantee perfromance? Security How much you trust your provider? What about recovery, tracing, and data integrity? Who access your data? How to make cloud computing energy efficiency? Source: Raj Resource manager for the private Cloud Able to start, pause, resume, and stop VMs on the physical resources. Hybrid Cloud Infrastructure Able to migrate VMs for consolidation purpose Greedy Particle Swarm Algorithm Optimization (GPSO) Area based Time based Size based Brokering Brokering Brokering Strategies Strategies Strategies 18

19 Think Yourself? Thank You... Collaborative Work Jemal Abawajy 19