Dependency-aware and Resourceefficient Scheduling for Heterogeneous Jobs in Clouds

Similar documents
CORP: Cooperative Opportunistic Resource Provisioning for Short-Lived Jobs in Cloud Systems

Hawk: Hybrid Datacenter Scheduling

Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization

Motivation. Types of Scheduling

CPU scheduling. CPU Scheduling

Goodbye to Fixed Bandwidth Reservation: Job Scheduling with Elastic Bandwidth Reservation in Clouds

Comparing Application Performance on HPC-based Hadoop Platforms with Local Storage and Dedicated Storage

CPU Scheduling. Chapter 9

Chapter 6: CPU Scheduling. Basic Concepts. Histogram of CPU-burst Times. CPU Scheduler. Dispatcher. Alternating Sequence of CPU And I/O Bursts

CSE 451: Operating Systems Spring Module 8 Scheduling

CS 143A - Principles of Operating Systems

Graph Optimization Algorithms for Sun Grid Engine. Lev Markov

Adaptive Scheduling of Parallel Jobs in Spark Streaming

א א א א א א א א

Job Scheduling based on Size to Hadoop

A Preemptive Fair Scheduler Policy for Disco MapReduce Framework

Scheduling I. Today. Next Time. ! Introduction to scheduling! Classical algorithms. ! Advanced topics on scheduling

CSE 5343/7343 Fall 2006 PROCESS SCHEDULING

Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis

Decentralized Preemptive Scheduling across Heterogeneous Multi-core Grid Resources

CPU SCHEDULING. Scheduling Objectives. Outline. Basic Concepts. Enforcement of fairness in allocating resources to processes

Principles of Operating Systems

CPU Scheduling (Chapters 7-11)

CLOUD computing and its pay-as-you-go cost structure

Advanced Types Of Scheduling

CSC 553 Operating Systems

Reducing Execution Waste in Priority Scheduling: a Hybrid Approach

Lecture 11: CPU Scheduling

Intro to O/S Scheduling. Intro to O/S Scheduling (continued)

CPU Scheduling CPU. Basic Concepts. Basic Concepts. CPU Scheduler. Histogram of CPU-burst Times. Alternating Sequence of CPU and I/O Bursts

Roadmap. Tevfik Koşar. CSE 421/521 - Operating Systems Fall Lecture - V CPU Scheduling - I. University at Buffalo.

Preemptive ReduceTask Scheduling for Fair and Fast Job Completion

Minimising the Execution of Unknown Bag-of-Task Jobs with Deadlines on the Cloud

Introduction to Operating Systems. Process Scheduling. John Franco. Dept. of Electrical Engineering and Computing Systems University of Cincinnati

Autonomic Provisioning and Application Mapping on Spot Cloud Resources

CPU Scheduling. Disclaimer: some slides are adopted from book authors and Dr. Kulkarni s slides with permission

Exploring Non-Homogeneity and Dynamicity of High Scale Cloud through Hive and Pig

CSC 1600: Chapter 5. CPU Scheduling. Review of Process States

SE350: Operating Systems. Lecture 6: Scheduling

Tributary: spot-dancing for elastic services with latency SLOs

Energy Management of End Users Modeling their Reaction from a GENCO s Point of View

CPU Scheduling. Basic Concepts Scheduling Criteria Scheduling Algorithms. Unix Scheduler

A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems

CPU Scheduling: Part I. Operating Systems. Spring CS5212

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

CCRP: Customized Cooperative Resource Provisioning for High Resource Utilization in Clouds

Performance Comparison of RTS Scheduling Algorithms

On Cloud Computational Models and the Heterogeneity Challenge

On the Usability of Shortest Remaining Time First Policy in Shared Hadoop Clusters

RESOURCE MANAGEMENT IN CLUSTER COMPUTING PLATFORMS FOR LARGE SCALE DATA PROCESSING

CPU Scheduling Minsoo Ryu Real-Time Computing and Communications Lab. Hanyang University

Kairos: Preemptive Data Center Scheduling Without Runtime Estimates.

Roadmap. Tevfik Ko!ar. CSC Operating Systems Spring Lecture - V CPU Scheduling - I. Louisiana State University.

CS 425 / ECE 428 Distributed Systems Fall 2018

Node Allocation In Grid Computing Using Optimal Resouce Constraint (ORC) Scheduling

CS 425 / ECE 428 Distributed Systems Fall 2017

LASER: A Deep Learning Approach for Speculative Execution and Replication of Deadline-Critical Jobs in Cloud

Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds

Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale

Chapter 9 Uniprocessor Scheduling

Effective Straggler Mitigation

Decentralized Preemptive Scheduling Across Heterogeneous Multi-core Grid Resources

Multi-Stage Resource-Aware Scheduling for Data Centers with Heterogenous Servers

Cluster management at Google

Analysis of Round-Robin Variants: Favoring Newly Arrived Jobs

An Optimized Task Scheduling Algorithm in Cloud Computing Environment

Scheduling. CSE Computer Systems November 19, 2001

Wide-Area Analytics with Multiple Resources

A Novel Multilevel Queue based Performance Analysis of Hadoop Job Schedulers

Queue based Job Scheduling algorithm for Cloud computing

Scheduling Processes 11/6/16. Processes (refresher) Scheduling Processes The OS has to decide: Scheduler. Scheduling Policies

EPOCH TASK SCHEDULING IN DISTRIBUTED SERVER SYSTEMS

PRIORITY BASED SCHEDULING IN CLOUD COMPUTING BASED ON TASK AWARE TECHNIQUE

Resource Utilization & Execution Time Enhancement by Priority Based Preemptable Shortest Job Next Scheduling In Private Cloud Computing

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

Operating Systems. Scheduling

Chameleon: Design and Evaluation of a Proactive, Application-Aware Elasticity Mechanism

Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters

CS510 Operating System Foundations. Jonathan Walpole

Multi-Stage Resource-Aware Scheduling for Data Centers with Heterogeneous Servers

FlowTime: Dynamic Scheduling of Deadline-Aware Workflows and Ad-hoc Jobs

LOAD SHARING IN HETEROGENEOUS DISTRIBUTED SYSTEMS

TBP:A Threshold Based Priority Scheduling in Cloud Computing Environment

TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF SYMBOLS AND ABBREVIATIONS

Size-aware MDP approach to dispatching problems

Performance-Aware Fair Scheduling: Exploiting Demand Elasticity of Data Analytics Jobs

Resource Scheduling Architectural Evolution at Scale and Distributed Scheduler Load Simulator

Trace Driven Analytic Modeling for Evaluating Schedulers for Clusters of Computers

Multi-Agent Model for Job Scheduling in Cloud Computing

Lecture 6: CPU Scheduling. CMPUT 379, Section A1, Winter 2014 February 5

A PROFIT MAXIMIZATION SCHEME WITH GUARANTEED QUALITY OF SERVICE IN CLOUD COMPUTING

Optimizing MapReduce Framework through Joint Scheduling of Overlapping Phases

A SURVEY ON TRADITIONAL AND EARLIER JOB SCHEDULING IN CLOUD ENVIRONMENT

Tushar Champaneria Assistant Professor of Computer Engg. Department, L.D.College of Engineering, India

Towards Resource-Efficient Cloud Systems: Avoiding Over-Provisioning in Demand-Prediction Based Resource Provisioning

International Journal of Advanced Research in Computer Science and Software Engineering

On the Feasibility of Dynamic Rescheduling on the Intel Distributed Computing Platform

High-priority and high-response job scheduling algorithm

Energy Efficient Scheduling. Power saving scheduling for heterogeneous architectures like ARM big.little

Lab: Response Time Analysis using FpsCalc Course: Real-Time Systems Period: Autumn 2015

Transcription:

Dependency-aware and Resourceefficient Scheduling for Heterogeneous Jobs in Clouds Jinwei Liu* and Haiying Shen *Dept. of Electrical and Computer Engineering, Clemson University, SC, USA Dept. of Computer Science, University of Virginia, Charlottesville, VA, USA

Outline Introduction Problem addressed by Dependencyaware and Resource-efficient Scheduling (DRS) Design of DRS Performance Evaluation Conclusions 2

Scheduling Introduction Job T T T T Scheduler Job Scheduler T T T T 3

Motivation Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism Diverse task dependency A task cannot start running until its precedent tasks complete T1 T5 T9 T2 T6 T10 T3 T7 T11 T4 T8 T12 4

Motivation (cont.) High requirements on response time 10 min 1 min 10 sec 2 sec 2004 MapReduce batch job 2009 Hive query 2010 Dremel query 2012 In-memory Spark 5

Motivation (cont.) Queue length: poor predictor of waiting time 100 ms 100 ms Worker 1 200 ms 400 ms Worker 2 400 ms 6

Resource inefficiency Motivation (cont.) Scheduler Random reservation (power of two) [1] K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica. Sparrow: Distributed, low latency scheduling. In Proc. of SOSP, 2013 [2] P. Delgado, F. Dinu, A.-M. Kermarrec, and W. Zwaenepoel. Hawk: Hybrid datacenter scheduling. In Proc. of ATC, 2015. 7

Proposed Solution DRS: Dependency-aware and Resource-efficient Scheduling Features of DRS Dependency awareness Resource efficiency High throughput Dependency-aware and resourceefficient scheduling (DRS) High throughput Dependencyaware scheduling Resourceefficient scheduling Framework of DRS 8

Outline Introduction Problem addressed Dependency-aware and Resource-efficient Scheduling (DRS) Design of DRS Performance Evaluation Conclusions 9

System model DRS n: Total # of workers J: A set of jobs a: # of jobs in J J i : The ith job in J t ij : The jth task of J i S ij : Task t ij s size m: # of tasks per job C cos : Resource cost Waiting time: The time from when a task is submitted to the worker machine until when the task starts executing Service time: The time from when a task starts executing until when the task finishes executing on a worker machine Job response time: The time from when a job is submitted to the scheduler until the tail task of the job finishes executing Throughput: The total number of jobs completing their executions within the job deadlines per unit of time 10

Problem statement DRS Given a set of jobs consisting of tasks, constraints of tasks (i.e., dependency constraints, resource constraints, response time constraints of jobs) and a set of heterogeneous worker machines, how to schedule these jobs so that the resource cost and the response time can be reduced as much as possible? Goal Design an efficient scheduling method for heterogeneous jobs with task dependency constraints for reducing resource cost and response time. 11

DRS Linear programming (ILP) model (4) (5) (6) (7) (8) (9) (10) (11) 12

ILP model Constraint (4): ensures that the cumulative resource usage of Rr on a worker machine does not exceed the capacity of this resource type during the period of task execution Constraint (5): ensures the execution order of tasks on a worker Constraint (6): ensures jobs can complete within their specified deadlines Constraint (7): ensures the dependency relation between tasks Constraint (11): ensures that a task can be assigned to only one worker machine CPLEX linear program solver

Challenges of DRS design Challenges How to schedule tasks with constraints (e.g., dependency) to achieve low response time and high resource utilization How to accurately estimate tasks waiting time in the queues of workers How to reduce the communication overhead in scheduling 14

Outline Introduction Problem addressed by Dependencyaware and Resource-efficient Scheduling (DRS) Design of DRS Performance Evaluation Conclusions 15

Reduce cost Design of DRS Utilize the ILP model to reduce resource cost Reduce response time Leverage task dependency information and schedule tasks that are independent of each other to different workers or different processors of a worker so that independent tasks can run in parallel Use the reinforcement learning-based approach to estimate tasks waiting time for scheduling Utilize scheduler domains and Gossip protocol to reduce the communication overhead 16

Reduce Communication Overhead Schedulers are scattered in a distributed system When users submit their jobs, the jobs are delivered to the schedulers near them If the scheduler is heavily loaded, the job will be delivered to the lightly loaded neighbor of the heavily loaded scheduler to achieve load balance Scheduler Manager Communication Split the schedulers into different sets, each set is a scheduler domain A scheduler manager exists in each scheduler domain Scheduler manager communicate with each other instead of schedulers, reducing communication overhead 17

Reduce Response Time Running independent tasks in parallel Running tasks in parallel on multiple single processor workers Running tasks in parallel on multiprocessor workers Job classification: CPU intensive, memory intensive, GPU intensive, etc. 18

Reduce tasks waiting time Predict tasks waiting time Worker 1 Worker 2 19

Predicting Tasks duration Reinforcement learning-based approach DRS uses the mutual reinforcement learning-based approach to accurately estimate tasks duration in worker s queues, and utilizes it for task scheduling 20

Outline Introduction Dependency-aware and Resourceefficient Scheduling (DRS) Design of DRS Performance Evaluation Conclusions 21

Performance Evaluation Methods for comparison CLR [1]: implements opportunistic allocation of spare resources to jobs, and it chooses the tasks consuming the most resources to be preempted. [1] G. Ananthanarayanan, C. Douglas, R. Ramakrishnan, S. Rao, and I. Stoica. True elasticity in multi-tenant data-intensive compute clusters. In Proc. SoCC, 2012. Natjam [2]: assigns higher priority to production jobs and lower priority to research jobs in scheduling. It uses production jobs to preempt research jobs. Natjam first preempts the tasks of the job with the maximum deadline (which have the lowest priority). For tasks with the same job priority, Natjam uses two task preemption policies: Shortest Remaining Time (SRT) and Longest Remaining Time (LRT). [2] B. Cho, M. Rahman, T. Chajed, I. Gupta, C. Abad, N. Roberts, and P. Lin. Natjam: Design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters. In Proc. SoCC, 2013. SNB [3]: gives preference to requests for small size files or requests with short remaining file size. Specifically, it uses the linear combination of waiting time and the remaining time for a job (i.e., estimated time for completing the remaining part of the job) to determine the priority of a job. [3] A. Balasubramanian, A. Sussman, and N. Sadeh. Decentralized preemptive scheduling across heterogeneous multi-core grid resources. In Proc. of JSSPP, 2015. SRPT [4]: gives preference to those requests which are short, or have small remaining processing requirements, in accordance with the SRPT (Shortest Remaining Processing Time) scheduling policy. [4] M. Harchol-Balter, B. Schroeder, N. Bansal, and M. Agrawal. Size-based scheduling to improve web performance. ACM Trans. on Computer Systems, 21(2):207--233, 2003. 22

Experiment Setup Parameter settings Parameter Meaning Setting N # of servers 50/30 a # of jobs 50-1000 m # of tasks / job 10-20 l # of resource types 2 α Weight for CPU size 0.5 β Weight for memory size 0.5 γ Weight for waiting time 0.5 θ Weight for queue length 0.5 Palmetto cluster in Clemson and Amazon EC2 23

Evaluation (cont.) Resource cost on the cluster (a) 15 tasks / job (b) 20 tasks / job Result: Resource cost increases as the number of jobs increases; resource cost follows DRS < Natjam < CLR 24

Throughput on the cluster Evaluation (cont.) (a) 5 jobs / second (b) 50 jobs / second Result: Throughput follows SNB < SRPT < CLR Natjam < DRS 25

Evaluation (cont.) Resource cost on Amazon EC2 (a) 15 tasks / job (b) 20 tasks / job Result: Resource cost increases as the number of jobs increases; resource cost follows DRS < Natjam < CLR 26

Evaluation (cont.) Throughput on Amazon EC2 (a) 5 jobs / second (b) 50 jobs / second Result: Throughput follows SNB < SRPT < Natjam < CLR < DRS 27

Outline Introduction Problem addressed Dependency-aware and Resource-efficient Scheduling (DRS) Design of DRS Performance Evaluation Conclusions 28

Our contributions Conclusions Build a linear programming model to minimize the resource cost and increase the resource utilization Consider task dependency for task assignment, utilize scheduler domains and Gossip protocol to reduce the communication overhead Present a reinforcement learning-based approach to estimate tasks' waiting time in the queue of workers, and then assign tasks to workers to reduce the response time Conduct extensive trace-driven experiments and show the performance of DRS Future work Consider preemption Consider the tolerance of failures 29

Thank you! Questions & Comments? Dr. Haiying Shen hs6ms@virginia.edu Associate Professor Pervasive Communication Laboratory University of Virginia 30