Hawk: Hybrid Datacenter Scheduling
|
|
- Ross Carpenter
- 6 years ago
- Views:
Transcription
1 Hawk: Hybrid Datacenter Scheduling Pamela Delgado, Florin Dinu, Anne-Marie Kermarrec, Willy Zwaenepoel July 10th, 2015 USENIX ATC
2 Introduction: datacenter scheduling Job 1 task task scheduler cluster Job N task task 2
3 Introduction: centralized scheduling Job 1 task task centralized scheduler cluster Job N task task 3
4 Introduction: centralized scheduling cluster Job N Job 2 Job 1 centralized scheduler 4
5 Introduction: centralized scheduling cluster Job N Job 2 Job 1 Good: placement centralized scheduler Not so good: scheduling latency 5
6 Introduction: distributed scheduling Job 1 Job 2 distributed scheduler 1 distributed scheduler 2 cluster Job N distributed scheduler N 6
7 Introduction: distributed scheduling Job 1 distributed scheduler 1 cluster Job 2 distributed scheduler 2 Good: scheduling latency Not so good: placement Job N distributed scheduler N 7
8 Outline 1) Introduction 2) HAWK hybrid scheduling Rationale Design 3) Evaluation Simulation Real cluster implementation 4) Conclusion 8
9 Hybrid scheduling cluster Job 1 Job M Job 2 centralized scheduler distributed scheduler 1 Job N distributed scheduler N 9
10 Hawk: hybrid scheduling Long jobs centralized Short jobs distributed 10
11 Hawk: hybrid scheduling Long job 1 Long job M centralized scheduler Long/short: estimated execution time vs cut-off Short job 1 Short job 2 Short job N distributed scheduler 1 distributed scheduler 2 distributed scheduler N 11
12 Rationale for Hawk Typical production workloads few Long job 1 Long job M most resources many Short job 1 Short job 2 little resources Short job N 12
13 Rationale for Hawk (continued) Percentage of long jobs Percentage of task-seconds for long jobs Source: Design Insights for MapReduce from Diverse Production Workloads, Chen et al 2012
14 Rationale for Hawk (continued) Percentage of long jobs Percentage of task-seconds for long jobs Long jobs: minority 60 but take up most of 40the resources Source: Design Insights for MapReduce from Diverse Production Workloads, Chen et al 2012
15 Hawk: hybrid scheduling Bulk of resources good placement Long job 1 centralized Few jobs reasonable scheduling latency Short job 1 distributed 1 Latency sensitive Fast scheduling Short job N distributed N Few resources can trade not-so-good placement 15
16 Hawk: hybrid scheduling Bulk of resources good placement Latency sensitive Fast scheduling Long job 1 centralized BEST OF BOTH WORLDS Short job 1 Short job N distributed 1 distributed N Few jobs reasonable scheduling latency Good: scheduling latency for short jobs Good: placement for long jobs Few resources can trade not-so-good placement 16
17 Hawk: distributed scheduling Sparrow Work-stealing 17
18 Hawk: distributed scheduling Sparrow Work-stealing 18
19 Sparrow distributed scheduler task random reservation (power of two) 19
20 Hawk: distributed scheduling Sparrow Work-stealing 20
21 Sparrow and high load distributed scheduler task Random placement: Low likelihood on finding a free node 21
22 Sparrow and high load distributed scheduler task High load + job heterogeneity head-of-line blocking Random placement: Low likelihood on finding a free node 22
23 Hawk work-stealing Free node!! 23
24 Hawk work-stealing 2. Random node: send short tasks reservation in queue 1. Free node: contact random node for probes! 24
25 Hawk work-stealing 2. Random node: send short tasks reservation in queue High load high probability of contacting node with backlog 1. Free node: contact random node for probes! 25
26 Hawk cluster partitioning centralized scheduler No coordination, challenge: no free nodes for mice! Reserved nodes: small cluster partition distributed scheduler 26
27 Hawk cluster partitioning centralized scheduler No coordination, challenge: no free nodes for mice! Short jobs schedule anywhere. Long jobs only in non-reserved nodes. Reserved nodes: small cluster partition distributed scheduler 27
28 Hawk design summary Hybrid scheduler: long centralized, short distributed Work-stealing Cluster partitioning 28
29 Evaluation: 1. Simulation Sparrow simulator Google trace Vary number of nodes to vary cluster utilization Measure: Job running time Report 50 th and 90 th percentiles for short and long jobs Normalized to Sparrow 29
30 Hawk/Sparrow Simulated results: short jobs lower better th 90th Better across the board Number of nodes in the cluster (thousands) 30
31 Hawk/Sparrow Simulated results: long jobs lower better th 90th Better except under high load Number of nodes in the cluster (thousands) 31
32 Hawk/Sparrow Simulated results: long jobs lower better th 90th Very high utilization: partitioning Number of nodes in the cluster (thousands) 32
33 Decomposing Hawk 1. Hawk minus centralized 2. Hawk minus stealing 3. Hawk minus partitioning (normalized to Hawk) 33
34 Decomposing Hawk: no centralized 1. Hawk minus centralized 2 2. Hawk minus stealing Hawk minus partitioning (normalized to Hawk) Hawk no centralized 50th short jobs 90th short jobs 50th long jobs 90th long jobs 34
35 Decomposing Hawk: no stealing 1. Hawk minus centralized Hawk minus stealing Hawk minus partitioning (normalized to Hawk) Hawk no stealing 50th short jobs 90th short jobs 50th long jobs 90th long jobs 35
36 Decomposing Hawk: no partitioning 1. Hawk minus centralized Hawk minus stealing Hawk minus partitioning (normalized to Hawk) Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs 36
37 Decomposing Hawk summary Absence of any component reduces Hawk s performance! Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs 37
38 Sensitivity analysis 1. Incorrect estimates of runtime 2. Cut off long/short 3. Details of stealing 38
39 Sensitivity analysis 1. Incorrect estimates of runtime 2. Cut off long/short 3. Details of stealing Bottom line: benefits of Hawk remain despite variation See paper for details 39
40 Evaluation: 2. Implementation Hawk scheduler Hawk daemon Hawk daemon 40
41 Experiment 100-node cluster Subset of Google trace Vary inter-arrival time to vary cluster utilization Measure: Job running time Report 50 th and 90 th percentile for short and long jobs Normalized to Sparrow 41
42 Hawk/Sparrow Hawk/Sparrow Short jobs lower better Inter-arrival time Inter-arrival time real 50th simulated 50th real 90th simulated 90th 42
43 Hawk/Sparrow Hawk/Sparrow Long jobs lower better Inter-arrival time Inter-arrival time real 50th simulated 50th real 90th simulated 90th 43
44 Implementation 1. Hawk works well in real cluster 2. Good correspondence implementation/simulation 44
45 Related work Centralized: Hadoop Fair Scheduler, Quincy Eurosys 10, SOSP 09 Two level: Yarn, Mesos SoCC 12, NSDI 11 Distributed schedulers: Omega, Sparrow Eurosys 12,SOSP 13 Hybrid schedulers: Mercury 45
46 Conclusion Hawk: hybrid scheduler long: centralized, short: distributed work-stealing cluster partitioning Hawk provides good results for short and long jobs Even under high cluster utilization 46
Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization
Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization Wei Chen, Jia Rao*, and Xiaobo Zhou University of Colorado, Colorado Springs * University of Texas at Arlington Data Center
More informationEagle job-aware scheduling: divide and... reorder
Eagle job-aware scheduling: divide and... reorder Pamela Delgado, Diego Didona, Florin Dinu, Willy Zwaenepoel EPFL Submission Type: Research Abstract We present Eagle, a new hybrid cluster scheduler for
More informationKairos: Preemptive Data Center Scheduling Without Runtime Estimates.
Kairos: Preemptive Data Center Scheduling Without Runtime Estimates Pamela Delgado, Diego Didona, Florin Dinu,2 and Willy Zwaenepoel,2 EPFL, Switzerland 2 University of Sydney, Australia ABSTRACT The vast
More informationDependency-aware and Resourceefficient Scheduling for Heterogeneous Jobs in Clouds
Dependency-aware and Resourceefficient Scheduling for Heterogeneous Jobs in Clouds Jinwei Liu* and Haiying Shen *Dept. of Electrical and Computer Engineering, Clemson University, SC, USA Dept. of Computer
More informationCluster management at Google
Cluster management at Google LISA 2013 john wilkes (johnwilkes@google.com) Software Engineer, Google, Inc. We own and operate data centers around the world http://www.google.com/about/datacenters/inside/locations/
More informationResource Scheduling Architectural Evolution at Scale and Distributed Scheduler Load Simulator
Resource Scheduling Architectural Evolution at Scale and Distributed Scheduler Load Simulator Renyu Yang Supported by Collaborated 863 and 973 Program Resource Scheduling Problems 2 Challenges at Scale
More informationOn Cloud Computational Models and the Heterogeneity Challenge
On Cloud Computational Models and the Heterogeneity Challenge Raouf Boutaba D. Cheriton School of Computer Science University of Waterloo WCU IT Convergence Engineering Division POSTECH FOME, December
More informationReducing Execution Waste in Priority Scheduling: a Hybrid Approach
Reducing Execution Waste in Priority Scheduling: a Hybrid Approach Derya Çavdar Bogazici University Lydia Y. Chen IBM Research Zurich Lab Fatih Alagöz Bogazici University Abstract Guaranteeing quality
More informationApache Mesos. Delivering mixed batch & real-time data infrastructure , Galway, Ireland
Apache Mesos Delivering mixed batch & real-time data infrastructure 2015-11-17, Galway, Ireland Naoise Dunne, Insight Centre for Data Analytics Michael Hausenblas, Mesosphere Inc. http://www.meetup.com/galway-data-meetup/events/226672887/
More informationAdaptive Scheduling of Parallel Jobs in Spark Streaming
Adaptive Scheduling of Parallel Jobs in Spark Streaming Dazhao Cheng *, Yuan Chen, Xiaobo Zhou, Daniel Gmach and Dejan Milojicic * Department of Computer Science, University of North Carolina at Charlotte,
More informationOmega: flexible, scalable schedulers for large compute clusters. Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, John Wilkes
Omega: flexible, scalable schedulers for large compute clusters Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, John Wilkes Cluster Scheduling Shared hardware resources in a cluster Run a mix
More informationMulti-Resource Fair Sharing for Datacenter Jobs with Placement Constraints
Multi-Resource Fair Sharing for Datacenter Jobs with Placement Constraints Wei Wang, Baochun Li, Ben Liang, Jun Li Hong Kong University of Science and Technology, University of Toronto weiwa@cse.ust.hk,
More informationMorpheus: Towards Automated SLOs for Enterprise Clusters
Morpheus: Towards Automated SLOs for Enterprise Clusters Sangeetha Abdu Jyothi* Carlo Curino Ishai Menache Shravan Matthur Narayanamurthy Alexey Tumanov^ Jonathan Yaniv** Ruslan Mavlyutov^^ Íñigo Goiri
More informationAn Adaptive Efficiency-Fairness Meta-scheduler for Data-Intensive Computing
1 An Adaptive Efficiency-Fairness Meta-scheduler for Data-Intensive Computing Zhaojie Niu, Shanjiang Tang, Bingsheng He Abstract In data-intensive cluster computing platforms such as Hadoop YARN, efficiency
More informationMulti-Agent Model for Job Scheduling in Cloud Computing
Multi-Agent Model for Job Scheduling in Cloud Computing Khaled M. Khalil, M. Abdel-Aziz, Taymour T. Nazmy, Abdel-Badeeh M. Salem Abstract Many applications are turning to Cloud Computing to meet their
More informationCluster Fair Queueing: Speeding up Data-Parallel Jobs with Delay Guarantees
Cluster Fair Queueing: Speeding up Data-Parallel Jobs with Delay Guarantees Chen Chen, Wei Wang, Shengkai Zhang, Bo Li Hong Kong University of Science and Technology {cchenam, weiwa}@cse.ust.hk, szhangaj@connect.ust.hk,
More informationData Center Operating System (DCOS) IBM Platform Solutions
April 2015 Data Center Operating System (DCOS) IBM Platform Solutions Agenda Market Context DCOS Definitions IBM Platform Overview DCOS Adoption in IBM Spark on EGO EGO-Mesos Integration 2 Market Context
More informationScheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds
Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds 1 B.Thirumala Rao, 2 L.S.S.Reddy Department of Computer Science and Engineering, Lakireddy Bali Reddy College of Engineering,
More informationOPERATING SYSTEMS. Systems and Models. CS 3502 Spring Chapter 03
OPERATING SYSTEMS CS 3502 Spring 2018 Systems and Models Chapter 03 Systems and Models A system is the part of the real world under study. It is composed of a set of entities interacting among themselves
More informationJob Scheduling without Prior Information in Big Data Processing Systems
Job Scheduling without Prior Information in Big Data Processing Systems Zhiming Hu, Baochun Li Department of Electrical and Computer Engineering University of Toronto Toronto, Canada Email: zhiming@ece.utoronto.ca,
More informationA Novel Multilevel Queue based Performance Analysis of Hadoop Job Schedulers
Indian Journal of Science and Technology, Vol 9(44), DOI: 10.17485/ijst/2016/v9i44/96414, November 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 A Novel Multilevel Queue based Analysis of Hadoop
More informationTimely Long Tail Identification Through Agent Based Monitoring and Analytics
Timely Long Tail Identification Through Agent Based Monitoring and Analytics Peter Garraghan, Xue Ouyang, Paul Townend, Jie Xu School of Computing University of Leeds Leeds, UK {p.m.garraghan, scxo, p.m.townend,
More informationThis document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.
This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title : An Adaptive -Fairness Scheduler for Data-Intensive Cluster Computing Author(s) Niu, Zhaojie; Tang,
More informationalsched: Algebraic Scheduling of Mixed Workloads in Heterogeneous Clouds
alsched: Algebraic Scheduling of Mixed Workloads in Heterogeneous Clouds Alexey Tumanov Carnegie Mellon University atumanov@cmu.edu James Cipar Carnegie Mellon University jcipar@cmu.edu Gregory R. Ganger
More informationA Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems
A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems Aysan Rasooli Department of Computing and Software McMaster University Hamilton, Canada Email: rasooa@mcmaster.ca Douglas G. Down
More informationThis is a repository copy of Timely Long Tail Identification through Agent Based Monitoring and Analytics.
This is a repository copy of Timely Long Tail Identification through Agent Based Monitoring and Analytics. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/88442/ Version:
More informationExploring Non-Homogeneity and Dynamicity of High Scale Cloud through Hive and Pig
Exploring Non-Homogeneity and Dynamicity of High Scale Cloud through Hive and Pig Kashish Ara Shakil, Mansaf Alam(Member, IAENG) and Shuchi Sethi Abstract Cloud computing deals with heterogeneity and dynamicity
More informationFlowTime: Dynamic Scheduling of Deadline-Aware Workflows and Ad-hoc Jobs
FlowTime: Dynamic Scheduling of Deadline-Aware Workflows and Ad-hoc Jobs Zhiming Hu, Baochun Li Department of Electrical and Computer Engineering University of Toronto Email: zhiming@ece.utoronto.ca, bli@ece.toronto.edu
More informationLooking for the perfect VM scheduler
Looking for the perfect VM scheduler Fabien Hermenier placing rectangles since 2006 @fhermeni fabien.hermenier@nutanix.com https://fhermeni.github.io 2006-2010 PhD - Postdoc Gestion dynamique des tâches
More informationStratus Cost-aware container scheduling in the public cloud Andrew Chung
Stratus Cost-aware container scheduling in the public cloud Andrew Chung Jun Woo Park, Greg Ganger PARALLEL DATA LABORATORY University Motivation IaaS CSPs provide per-time VM rental of diverse offerings
More informationLearning Based Admission Control. Jaideep Dhok MS by Research (CSE) Search and Information Extraction Lab IIIT Hyderabad
Learning Based Admission Control and Task Assignment for MapReduce Jaideep Dhok MS by Research (CSE) Search and Information Extraction Lab IIIT Hyderabad Outline Brief overview of MapReduce MapReduce as
More informationIBM Research Report. Megos: Enterprise Resource Management in Mesos Clusters
H-0324 (HAI1606-001) 1 June 2016 Computer Sciences IBM Research Report Megos: Enterprise Resource Management in Mesos Clusters Abed Abu-Dbai Khalid Ahmed David Breitgand IBM Platform Computing, Toronto,
More informationPROCESSOR LEVEL RESOURCE-AWARE SCHEDULING FOR MAPREDUCE IN HADOOP
PROCESSOR LEVEL RESOURCE-AWARE SCHEDULING FOR MAPREDUCE IN HADOOP G.Hemalatha #1 and S.Shibia Malar *2 # P.G Student, Department of Computer Science, Thirumalai College of Engineering, Kanchipuram, India
More informationOptimizing MapReduce Framework through Joint Scheduling of Overlapping Phases
Optimizing MapReduce Framework through Joint Scheduling of Overlapping Phases Huanyang Zheng, Ziqi Wan, and Jie Wu Department of Computer and Information Sciences, Temple University, USA Email: {huanyang.zheng,
More informationComparing Application Performance on HPC-based Hadoop Platforms with Local Storage and Dedicated Storage
Comparing Application Performance on HPC-based Hadoop Platforms with Local Storage and Dedicated Storage Zhuozhao Li *, Haiying Shen *, Jeffrey Denton and Walter Ligon * Department of Computer Science,
More informationA Preemptive Fair Scheduler Policy for Disco MapReduce Framework
A Preemptive Fair Scheduler Policy for Disco MapReduce Framework Augusto Souza 1, Islene Garcia 1 1 Instituto de Computação Universidade Estadual de Campinas Campinas, SP Brasil augusto.souza@students.ic.unicamp.br,
More informationMesos and Borg (Lecture 17, cs262a) Ion Stoica, UC Berkeley October 24, 2016
Mesos and Borg (Lecture 17, cs262a) Ion Stoica, UC Berkeley October 24, 2016 Today s Papers Mesos: A Pla+orm for Fine-Grained Resource Sharing in the Data Center, Benjamin Hindman, Andy Konwinski, Matei
More informationSr. Sergio Rodríguez de Guzmán CTO PUE
PRODUCT LATEST NEWS Sr. Sergio Rodríguez de Guzmán CTO PUE www.pue.es Hadoop & Why Cloudera Sergio Rodríguez Systems Engineer sergio@pue.es 3 Industry-Leading Consulting and Training PUE is the first Spanish
More informationBigger, Longer, Fewer: what do cluster jobs look like outside Google?
Bigger, Longer, Fewer: what do cluster jobs look like outside Google? George Amvrosiadis, Jun Woo Park, Gregory R. Ganger, Garth A. Gibson, Elisabeth Baseman, Nathan DeBardeleben Carnegie Mellon University
More information3Sigma: distribution-based cluster scheduling for runtime uncertainty
: distribution-based cluster scheduling for runtime uncertainty Jun Woo Park, Alexey Tumanov, Angela Jiang Michael A. Kozuch, Gregory R. Ganger Carnegie Mellon University, UC Berkeley, Intel Labs CMU-PDL-17-17
More informationGandiva: Introspective Cluster Scheduling for Deep Learning
Gandiva: Introspective Cluster Scheduling for Deep Learning Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu
More informationWorkload Decomposition for Power Efficient Storage Systems
Workload Decomposition for Power Efficient Storage Systems Lanyue Lu and Peter Varman Rice University, Houston, TX {ll2@rice.edu, pjv@rice.edu} Abstract Power consumption and cooling costs of hosted storage
More informationA Hybrid Scheduling Algorithm for Data Intensive Workloads in a MapReduce Environment
2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing A Hybrid Scheduling Algorithm for Data Intensive Workloads in a MapReduce Environment Phuong Nguyen 1 Tyler Simon 1 Milton Halem
More informationTyrex: Size-based Resource Allocation in MapReduce Frameworks
Tyrex: Size-based Resource Allocation in MapReduce Frameworks Bogdan Ghiț and Dick Epema Delft University of Technology, the Netherlands Eindhoven University of Technology, the Netherlands {b.i.ghit,d.h.j.epema}@tudelft.nl
More informationEfficient Queue Management for Cluster Scheduling
Efficient Queue Management for Cluster Scheduling Jeff Rasley Konstantinos Karanasos Srikanth Kandula Rodrigo Fonseca Milan Vojnovic Sriram Rao Microsoft Brown University Abstract Job scheduling in Big
More informationA Reference Architecture for Datacenter Scheduling: Design, Validation, and Experiments
A Reference Architecture for Datacenter Scheduling: Design, Validation, and Experiments [Technical Report on the SC18 homonym article] Georgios Andreadis, Laurens Versluis, Fabian Mastenbroek, and Alexandru
More informationOn the Usability of Shortest Remaining Time First Policy in Shared Hadoop Clusters
On the Usability of Shortest Remaining Time First Policy in Shared Hadoop Clusters Nathanaël Cheriere, Pierre Donat-Bouillud, Shadi Ibrahim, Matthieu Simonin To cite this version: Nathanaël Cheriere, Pierre
More informationHeterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis
Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis Charles Reiss *, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz *, Michael A. Kozuch * UC Berkeley CMU Intel Labs http://www.istc-cc.cmu.edu/
More informationA Contention-Aware Hybrid Evaluator for Schedulers of Big Data Applications in Computer Clusters
A Contention-Aware Hybrid Evaluator for Schedulers of Big Data Applications in Computer Clusters Shouvik Bardhan Department of Computer Science George Mason University Fairfax, VA 22030 Email: sbardhan@masonlive.gmu.edu
More informationCPU scheduling. CPU Scheduling
EECS 3221 Operating System Fundamentals No.4 CPU scheduling Prof. Hui Jiang Dept of Electrical Engineering and Computer Science, York University CPU Scheduling CPU scheduling is the basis of multiprogramming
More information5th Annual. Cloudera, Inc. All rights reserved.
5th Annual 1 The Essentials of Apache Hadoop The What, Why and How to Meet Agency Objectives Sarah Sproehnle, Vice President, Customer Success 2 Introduction 3 What is Apache Hadoop? Hadoop is a software
More informationDynamicCloudSim: Simulating Heterogeneity in Computational Clouds
DynamicCloudSim: Simulating Heterogeneity in Computational Clouds Marc Bux, Ulf Leser Department of Computer Science, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany {buxmarcn,leser}@informatik.hu-berlin.de
More informationPICS: A Public IaaS Cloud Simulator
Preliminary version. Final version to appear in 8th IEEE International Conference on Cloud Computing (IEEE Cloud 2015), June 27 - July 2, 2015, New York, USA PICS: A Public IaaS Cloud Simulator In Kee
More informationMulti-tenancy in Datacenters: to each according to his. Lecture 15, cs262a Ion Stoica & Ali Ghodsi UC Berkeley, March 10, 2018
Multi-tenancy in Datacenters: to each according to his Lecture 15, cs262a Ion Stoica & Ali Ghodsi UC Berkeley, March 10, 2018 1 Cloud Computing IT revolution happening in-front of our eyes 2 Basic tenet
More informationPerformance-Aware Fair Scheduling: Exploiting Demand Elasticity of Data Analytics Jobs
Performance-Aware Fair Scheduling: Exploiting Demand Elasticity of Data Analytics Jobs Chen Chen, Wei Wang, Bo Li Hong Kong University of Science and Technology {cchenam, weiwa, bli}@cse.ust.hk Abstract
More informationHow jobs are scheduled to run on Graham and Cedar
SHARCNET General Interest Webinar Series How jobs are scheduled to run on Graham and Cedar James Desjardins High Performance Computing Consultant SHARCNET, Brock University July 19th, 2017 Overview Documentation
More information10/1/2013 BOINC. Volunteer Computing - Scheduling in BOINC 5 BOINC. Challenges of Volunteer Computing. BOINC Challenge: Resource availability
Volunteer Computing - Scheduling in BOINC BOINC The Berkley Open Infrastructure for Network Computing Ryan Stern stern@cs.colostate.edu Department of Computer Science Colorado State University A middleware
More informationEnabling Resource Sharing between Transactional and Batch Workloads Using Dynamic Application Placement
Enabling Resource Sharing between Transactional and Batch Workloads Using Dynamic Application Placement David Carrera 1, Malgorzata Steinder 2, Ian Whalley 2, Jordi Torres 1, and Eduard Ayguadé 1 1 Technical
More informationHadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management
Hadoop on HPC: Integrating Hadoop and Pilot-based Andre Luckow, Ioannis Paraskevakos, George Chantzialexiou and Shantenu Jha Overview Introduction and Motivation Background Integrating Hadoop/Spark with
More informationTrace Driven Analytic Modeling for Evaluating Schedulers for Clusters of Computers
Department of Computer Science George Mason University Technical Reports 44 University Drive MS#4A5 Fairfax, VA 223-4444 USA http://cs.gmu.edu/ 73-993-53 Trace Driven Analytic Modeling for Evaluating Schedulers
More informationATLAS: An Adaptive Failure-aware Scheduler for Hadoop
ATLAS: An Adaptive Failure-aware Scheduler for Hadoop Mbarka Soualhia 1, Foutse Khomh 2 and Sofiène Tahar 1 arxiv:1511.01446v2 [cs.dc] 5 Nov 2015 1 Department of Electrical and Computer Engineering, Concordia
More informationJustice: A Deadline-aware, Fair-share Resource Allocator for Implementing Multi-analytics
Justice: A Deadline-aware, Fair-share Resource Allocator for Implementing Multi-analytics Stratos Dimopoulos, Chandra Krintz, Rich Wolski Department of Computer Science University of California, Santa
More informationJob Scheduling based on Size to Hadoop
I J C T A, 9(25), 2016, pp. 845-852 International Science Press Job Scheduling based on Size to Hadoop Shahbaz Hussain and R. Vidhya ABSTRACT Job scheduling based on size with aging has been recognized
More informationRE-EVALUATING RESERVATION POLICIES FOR BACKFILL SCHEDULING ON PARALLEL SYSTEMS
The th IASTED Int l Conf. on Parallel and Distributed Computing and Systems (PDCS), Cambridge, MA, Nov. RE-EVALUATING RESERVATION POLICIES FOR BACKFILL SCHEDULING ON PARALLEL SYSTEMS Su-Hui Chiang Computer
More informationIn Cloud, Can Scientific Communities Benefit from the Economies of Scale?
PRELIMINARY VERSION IS PUBLISHED ON SC-MTAGS 09 WITH THE TITLE OF IN CLOUD, DO MTC OR HTC SERVICE PROVIDERS BENEFIT FROM THE ECONOMIES OF SCALE? 1 In Cloud, Can Scientific Communities Benefit from the
More informationData Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB
Data Analytics and CERN IT Hadoop Service CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB 1 Data Analytics at Scale The Challenge When you cannot fit your workload in a desktop Data
More informationAdaptive Scheduling for Systems with Asymmetric Memory Hierarchies. Po-An Tsai, Changping Chen, and Daniel Sanchez
Adaptive Scheduling for Systems with Asymmetric Memory Hierarchies Po-An Tsai, Changping Chen, and Daniel Sanchez Die-stacking has enabled near-data processing Die-stacking has enabled near-data processing
More informationParallel Data Laboratory Carnegie Mellon University Pittsburgh, PA Abstract
JamaisVu: Robust Scheduling with Auto-Estimated Job Runtimes Alexey Tumanov, Angela Jiang, Jun Woo Park Michael A. Kozuch, Gregory R. Ganger Carnegie Mellon University, Intel Labs CMU-PDL-16-4 September
More informationCost-Efficient Orchestration of Containers in Clouds: A Vision, Architectural Elements, and Future Directions
Cost-Efficient Orchestration of Containers in Clouds: A Vision, Architectural Elements, and Future Directions Rajkumar Buyya 1, Maria A. Rodriguez 1, Adel Nadjaran Toosi 2, Jaeman Park 3 1 Cloud Computing
More informationPriority-enabled Scheduling for Resizable Parallel Applications
Priority-enabled Scheduling for Resizable Parallel Applications Rajesh Sudarsan, Student Member, IEEE, Calvin J. Ribbens,and Srinidhi Varadarajan, Member, IEEE Abstract In this paper, we illustrate the
More informationFailure Predic,on of Jobs in Compute Clouds: A Google Cluster Case Study
Failure Predic,on of Jobs in Compute Clouds: A Google Cluster Case Study Xin Chen, and Karthik Pa0abiraman University of Bri:sh Columbia (UBC) Charng- da Lu, Unaffiliated Compute Clouds } Infrastructure
More informationHopper: Decentralized Speculation-aware Cluster Scheduling at Scale
Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale Xiaoqi Ren 1, Ganesh Ananthanarayanan 2, Adam Wierman 1, Minlan Yu 3 1 California Institute of Technology, 2 Microsoft Research, 3 University
More informationRESOURCE MANAGEMENT IN CLUSTER COMPUTING PLATFORMS FOR LARGE SCALE DATA PROCESSING
RESOURCE MANAGEMENT IN CLUSTER COMPUTING PLATFORMS FOR LARGE SCALE DATA PROCESSING A Dissertation Presented By Yi Yao to The Department of Electrical and Computer Engineering in partial fulfillment of
More informationCASH: Context Aware Scheduler for Hadoop
CASH: Context Aware Scheduler for Hadoop Arun Kumar K, Vamshi Krishna Konishetty, Kaladhar Voruganti, G V Prabhakara Rao Sri Sathya Sai Institute of Higher Learning, Prasanthi Nilayam, India {arunk786,vamshi2105}@gmail.com,
More informationGuagua: An Iterative Computing Framework on Hadoop. Zhang Pengshan(David), PayPal
Guagua: An Iterative Computing Framework on Hadoop Zhang Pengshan(David), PayPal AGENDA Introduction Distributed Neural Network Algorithm What is Guagua? Guagua Advanced Features Shifu on Guagua Future
More informationClairvoyant Site Allocation of Jobs with Highly Variable Service Demands in a Computational Grid
Clairvoyant Site Allocation of Jobs with Highly Variable Service Demands in a Computational Grid Stylianos Zikos and Helen Karatza Department of Informatics Aristotle University of Thessaloniki 54124 Thessaloniki,
More informationTributary: spot-dancing for elastic services with latency SLOs
Tributary: spot-dancing for elastic services with latency SLOs Aaron Harlap and Andrew Chung, Carnegie Mellon University; Alexey Tumanov, UC Berkeley; Gregory R. Ganger and Phillip B. Gibbons, Carnegie
More informationA2L2: an Application Aware Flexible HPC Scheduling Model for Low-Latency Allocation
A2L2: an Application Aware Flexible HPC Scheduling Model for Low-Latency Allocation Gonzalo P. Rodrigo - gonzalo@cs.umu.se P-O Östberg p-o@cs.umu.se Lavanya Ramakrishnan lramakrishnan@lbl.gov Erik Elmroth
More informationA survey on Static and Dynamic Hadoop Schedulers
Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 8 (2017) pp. 2317-2325 Research India Publications http://www.ripublication.com A survey on Static and Dynamic Hadoop
More informationImproving Throughput and Utilization in Parallel Machines Through Concurrent Gang
Improving Throughput and Utilization in Parallel Machines Through Concurrent Fabricio Alves Barbosa da Silva Laboratoire ASIM, LIP6 Universite Pierre et Marie Curie Paris, France fabricio.silva@lip6.fr
More informationJob Scheduling with Lookahead Group Matchmaking for Time/Space Sharing on Multi-core Parallel Machines
Job Scheduling with Lookahead Group Matchmaking for Time/Space Sharing on Multi-core Parallel Machines Xijie Zeng and Angela Sodan University of Windsor, Windsor ON N9B 3P4, Canada zengx@uwindsor.ca,acsodan@uwindsor.ca
More informationCS 425 / ECE 428 Distributed Systems Fall 2018
CS 425 / ECE 428 Distributed Systems Fall 2018 Indranil Gupta (Indy) Lecture 24: Scheduling All slides IG Why Scheduling? Multiple tasks to schedule The processes on a single-core OS The tasks of a Hadoop
More informationAn Analysis of Traces from a Production MapReduce Cluster
2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing An Analysis of Traces from a Production MapReduce Cluster Soila Kavulya, Jiaqi Tan, Rajeev Gandhi and Priya Narasimhan Carnegie
More informationCS 425 / ECE 428 Distributed Systems Fall 2017
CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy) Nov 16, 2017 Lecture 24: Scheduling All slides IG Why Scheduling? Multiple tasks to schedule The processes on a single-core OS The tasks
More informationSynthetic Data Generation for Realistic Analytics Examples and Testing
Synthetic Data Generation for Realistic Analytics Examples and Testing Ronald J. Nowling Red Hat, Inc. rnowling@redhat.com http://rnowling.github.io/ Who Am I? Software Engineer at Red Hat Data Science
More informationDOWNTIME IS NOT AN OPTION
DOWNTIME IS NOT AN OPTION HOW APACHE MESOS AND DC/OS KEEPS APPS RUNNING DESPITE FAILURES AND UPDATES 2017 Mesosphere, Inc. All Rights Reserved. 1 WAIT, WHO ARE YOU? Engineer at Mesosphere DC/OS Contributor
More informationLong-Term Multi-Resource Fairness for Pay-as-you Use Computing Systems
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 1 Long-Term Multi-Resource Fairness for Pay-as-you Use Computing Systems Shanjiang Tang, Zhaojie Niu, Bingsheng He, Bu-Sung Lee, Ce Yu Abstract Many
More informationTop 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11
Top 5 Challenges for Hadoop MapReduce in the Enterprise Whitepaper - May 2011 http://platform.com/mapreduce 2 5/9/11 Table of Contents Introduction... 2 Current Market Conditions and Drivers. Customer
More informationStateful Services on DC/OS. Santa Clara, California April 23th 25th, 2018
Stateful Services on DC/OS Santa Clara, California April 23th 25th, 2018 Who Am I? Shafique Hassan Solutions Architect @ Mesosphere Operator 2 Agenda DC/OS Introduction and Recap Why Stateful Services
More informationEnabling a Large Analytic Userbase on Hadoop & Co
Enabling a Large Analytic Userbase on Hadoop & Co Justin Coffey, j.coffey@criteo.com, @jqcoffey GOTO Amsterdam 2015 What we ll be taking (lots) about I ll be presenting an overview of our analytics stack
More informationFaria Kalim, Le Xu, Sharanya Bathey, Richa Meherwal, Indranil Gupta
Henge: Intent-Driven Multi-Tenant Stream Processing Faria Kalim, Le Xu, Sharanya Bathey, Richa Meherwal, Indranil Gupta Distributed Protocols Research Group Department of Computer Science University of
More informationLecture 11: CPU Scheduling
CS 422/522 Design & Implementation of Operating Systems Lecture 11: CPU Scheduling Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous versions of
More informationUNFAIRNESS IN PARALLEL JOB SCHEDULING
UNFAIRNESS IN PARALLEL JOB SCHEDULING DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Gerald
More informationDynamic Task Scheduling and Data Collocation in Distributed Computing Systems
Dynamic Task Scheduling and Data Collocation in Distributed Computing Systems Daria Glushkova Home University: Universitat Politècnica de Catalunya Host University: Technische Universität Dresden Supervisors:
More informationAn Analysis of Traces from a Production MapReduce Cluster (CMU-PDL )
Carnegie Mellon University Research Showcase @ CMU Parallel Data Laboratory Research Centers and Institutes 12-2009 An Analysis of Traces from a Production MapReduce Cluster (CMU-PDL-09-107) Soila Kavulya
More informationCloud object stores offer a costeffective. Provider versus Tenant Pricing Games for Hybrid Object Stores in the Cloud.
Cloud Storage Provider versus Tenant Pricing Games for Hybrid Object Stores in the Cloud The tiered object store detailed here is designed for the cloud, taking into account both fast and slow storage
More informationMinimising the Execution of Unknown Bag-of-Task Jobs with Deadlines on the Cloud
Minimising the Execution of Un Bag-of-Task Jobs with Deadlines on the Cloud Long Thai School of Computer Science University of St Andrews Fife, Scotland, UK ltt2@st-andrews.ac.uk Blesson Varghese School
More informationApache Kafka. A distributed publish-subscribe messaging system. Neha Narkhede, 11/11/11
Apache Kafka A distributed publish-subscribe messaging system Neha Narkhede, LinkedIn @nehanarkhede, 11/11/11 Outline Introduction to pub-sub Kafka at LinkedIn Hadoop and Kafka Design Performance What
More informationClass-Partitioning Job Scheduling for Large-Scale Parallel Systems
Class-Partitioning Job Scheduling for Large-Scale Parallel Systems Su-Hui Chiang Mary K. Vernon Computer Science Department Computer Science Department Portland State University University of Wisconsin-Madison
More informationPreemptive Resource Management for Dynamically Arriving Tasks in an Oversubscribed Heterogeneous Computing System
2017 IEEE International Parallel and Distributed Processing Symposium Workshops Preemptive Resource Management for Dynamically Arriving Tasks in an Oversubscribed Heterogeneous Computing System Dylan Machovec
More information