Aspects of Fair Share on a heterogeneous cluster with SLURM
|
|
- Bridget Ford
- 6 years ago
- Views:
Transcription
1 Aspects of Fair Share on a heterogeneous cluster with SLURM Karsten Balzer Computing Center, CAU Kiel balzer@rz.uni-kiel.de ZKI Tagung Heidelberg 216 (13th Oct. 216, 16:3-17:)
2 Background I CAU Kiel - We operate a) NEC HPC-System consisting of NEC SX-ACE vector system 256 nodes: 4 vector cores each; 64 GB memory bandwidth: 256 GB/s theo. peak performance: 65.6 TFlops NEC HPC-Linux-Cluster 22 cores (SandyBridge/Haswell); GB theo. peak performance: 47.2 TFlops sharing: global ScaTeFS file system: 1.5 PB batch system: NQSII b) separate Linux Cluster: rzcluster 15 nodes (AMD, Intel), 32 GB... 1 TB for rather general purpose but with exclusive islands batch system: PBSPro cluster is currently up for renewal
3 Background II Nucleus for the cluster renewal c) caucluster: an additional small Linux Cluster 7 nodes (Haswell): 4 cores each; 256 GB separate login and service nodes (8 cores, 64 GB) separate software installation and modules home and work directories shared batch system: SLURM basic fair share scheduling currently 5 user groups (i.e., 5 SLURM accounts) fair share ratios
4 Agenda A. Introduction SLURM and fair share with SLURM Our intentions to deploy fair share B. Towards a heterogeneous system Challenges? From some test cases to useful accounting metrics C. Conclusions A brief summary Outlook
5 SLURM... A brief overview SLURM: acronym for Simple Linux Utility for Resource Management Originally developed at Lawrence Livermore National Lab as a simple resource manager (started in 22) Now maintained and supported by SchedMD ( Has evolved into a capable job scheduler ( 5, lines of C code) Portable, scalable and fault-tolerant Increasingly being used at academic research computing centers + Supports fair share, with a rather easy-to-use plugin + Fair share with generic groups (not based on Linux groups) + Very good documentation (for installation, administration and usage) + Nice tools and thorough MySQL database for monitoring/accounting + Open-source
6 ... and fair share with SLURM Priority plugins Default is FIFO: - scheduling jobs on a first in, first out basis Fair share: - job priorities adjusted according to short term historical usage - helps to steer a system toward defined usage targets Basic fair share with SLURM priority/multifactor: - fair share plugin, but used without other priority factors (such as age, job size, partition, qos,...) sched/backfill: - backfill plugin enabled shares: - allocated resources (core-hours) decaying with time - share decay half life: 12 h fair share tree: - consider a simple tree with 3 accounts - fair share targets S i : 7%, 2% and 1% of resources root accounts users S 1 =.7 S 2 =.2 S 3 =.1
7 Examples I A 1 ) Test cluster: - just 2 compute nodes with 8 cores each and 64 GB Job mix: - submit 5 jobs per account - random job properties: walltime: 1-1 min number of nodes: 1-2 cores per node: memory/core < 8 GB Time evolution of shares: s i (t n) = γ s i (t n 1 ) + ω i (t n) ω i (t n): consumption of core-hours in time interval t = t n t n 1 γ: decay factor for interval t Cluster allocation: - here ρ=.872 shares si [core-h] Theoretical peak share: s peak = N cluster-cores t 1 γ Effective peak share: s i (t) s eff = s peak ρ i s peak s eff = s peak ρ sum
8 Examples II 16 i A 2 ) Job allocation map: - is available from SLURM s accounting database Normalized shares: si norm. (t) = s i (t) s (t), s (t) = i s i (t) allocated cores shares s norm a i = T Accounting: 1 ρn cluster-cores T N jobs ( T ) i j=1 ω j s norm. i S i (S i : target share)
9 Examples III A 3 ) Job allocation map: - identify jobs of different kind serial single-core jobs parallel single-node jobs parallel multi-node jobs fine pattern coarse pattern full solid allocated cores Backfill strategy: - numbering of jobs according to submission order reveals backfill assistance - e.g.: 137, 172, 72, 14, 33 Reservations: - resources are being reserved essentially for multi-node jobs - reservations mainly determine the cluster allocation ρ
10 Examples III A 3 ) Job allocation map: - identify jobs of different kind serial single-core jobs parallel single-node jobs parallel multi-node jobs fine pattern coarse pattern full solid allocated cores Backfill strategy: - numbering of jobs according to submission order reveals backfill assistance - e.g.: 137, 172, 72, 14, 33 Reservations: - resources are being reserved essentially for multi-node jobs - reservations mainly determine the cluster allocation ρ
11 Examples IV B) Test cluster: - as in A) above Job mix: - submit 5 jobs/account - account 1: walltime: 3 min full node: 8 cores - accounts 2, 3: walltime: 1-1 min number of nodes: 1-2 cores per node: 1-8 Job allocation map: - excellent time-local performance of the fair share algorithm Cluster allocation: ρ = allocated cores
12 Our intentions to deploy fair share Make all compute resources available to all users Ensure fair wait times when cluster is flooded by jobs Reach pre-defined usage targets, ultimately on a monthly basis
13 Fair share challenges? It is not just about to monitor the overall cluster utilization ρ Instead: additionally (many) usage targets distributed over a well-branched tree! Cluster heterogeneity: - May it impede the smooth operation of the fair share algorithm? - Can we guarantee an adaquate cluster allocation? - Can we guarantee usage targets?
14 Test scenarios I A) Test cluster: - 2 nodes (64GB) with 8 and 4 cores, resp. Job mix: - memory/core < 8 GB - accounts 1, 3: walltime: 1-1 min 1 node: 1-4 cores - account 2: walltime: 1 min full node: 8 cores Job allocation map: - targets are reached (S 1 =.7, S 2 =.2, S 3 =.1) 16 allocated cores
15 Test scenarios II B 1 ) Test cluster: - 3 nodes with the following resources: Job mix: - random single-node jobs (number of cores 8) nodes 1, 2 node 3 8 cores (64 GB) 4 cores (256 GB) allocated cores Job allocation map: - for single-node jobs: ρ =
16 Test scenarios II allocated cores B 2 ) Test cluster: - 3 nodes with the following resources: nodes 1, 2 node 3 8 cores (64 GB) 4 cores (256 GB) Job mix: - account 1: 1 min, 1 cores, > 128GB - accounts 2, 3: walltime: 1-1 min 1 node, 1-8 cores per node memory/core < 1GB Job allocation map: Cluster allocation: ρ =.84 - for single-node jobs: ρ =
17 Test scenarios II allocated cores B 2 ) Test cluster: - 3 nodes with the following resources: nodes 1, 2 node 3 8 cores (64 GB) 4 cores (256 GB) Job allocation map: - for single-node jobs: ρ =.927 memory is secondary in scheduling the red jobs Job mix: - account 1: 1 min, 1 cores, > 128GB - accounts 2, 3: walltime: 1-1 min 1 node, 1-8 cores per node memory/core < 1GB Cluster allocation: ρ =
18 Test scenarios II allocated cores B 2 ) Test cluster: - 3 nodes with the following resources: nodes 1, 2 node 3 8 cores (64 GB) 4 cores (256 GB) Job allocation map: - for single-node jobs: ρ =.927 shares target shares (s i S i ) Job mix: - account 1: s 1 = 1/ s 2 = (46/3 2)/56.55 >.2 s 3 = (46/3 1)/ >.1 s 3/s 2 =.5 1 min, 1 cores, > 128GB - accounts 2, 3: walltime: 1-1 min 1 node, 1-8 cores per node memory/core < 1GB Cluster allocation: ρ =
19 Test scenarios II allocated cores B 2 ) Test cluster: - 3 nodes with the following resources: nodes 1, 2 node 3 8 cores (64 GB) 4 cores (256 GB) Job allocation map: - for single-node jobs: ρ =.927 shares target shares (s i S i ) Job mix: - account 1: s 1 = 1/ s 2 = (46/3 2)/56.55 >.2 s 3 = (46/3 1)/ >.1 s 3/s 2 =.5 1 min, 1 cores, > 128GB - accounts 2, 3: walltime: 1-1 min 1 node, 1-8 cores per node memory/core < 1GB Cluster allocation: ρ =.84 - same would occur on a homogeneous cluster: (3 1)/(3 4) = but throughput of account 1 would be larger: 3 (smaller problematic time window)
20 Test scenarios III 16 C 1 ) Test cluster: - 2 nodes with the following resources: shares s norm. i node 1 node Job allocation map: 8 cores (32 GB) 8 cores (64 GB) Job mix: - account 1: 5/1 min, 1/2 cores 3/4GB - accounts 2, 3: single-node jobs not fully at random memory/core: irrelevant Results: - cl. allocation: ρ =.31 - target shares are potentially reached by decrease of backfill - this behavior is reproducible allocated cores
21 Test scenarios III C 2 ) Test cluster: - 2 nodes with the following resources: node 1 node 2 8 cores (64 GB) 8 cores (64 GB) Job mix: - same as in C 1 ) Results: - targets are reached - alternating scheduling for account 1 - cl. allocation slightly better (though still low) Job allocation map: 16 allocated cores
22 Intermediate summary On a heterogeneous cluster (even at large ρ) it can be (much) more difficult to reach target shares If targets are achieved strongly depends on the job mix and on job properties Obtained targets may reflect the heterogeneity of the cluster Delay of backfill can assist in reaching targets, but leads to small ρ
23 Towards suitable metrics What about user satisfaction? 1..5 a i S i =
24 Towards suitable metrics What about user satisfaction? 1..5 a i S i = Can we provide a simple metric which helps to explain those situations?
25 Suitable metrics I Consider a situation where ρ 1 - On which resources a job j can run? - How many nodes, cores per node and how much memory per node are requested? N nodes,j, N cores-per-node,j and N gigabyte-per-node,j - How many cluster cores are allocatable to process the job? N allocatable-cores,j = f (N nodes,j,n cores-per-node,j,n gigabyte-per-node,j ) - How much cores are available in total on the cluster? N cluster-cores Job-based cluster allocation - Of interest is the ratio for job j : γ j = N allocatable-cores,j N cluster-cores γ j 1
26 Suitable metrics II Definition A - Job-based average: M A = 1 N jobs γ j N jobs j=1 Definition B - Runtime-weighted average: M B = 1 N jobs N jobs γ j T j, T = T j=1 j=1 T j
27 i Suitable metrics III Example: Test cluster: - 2 nodes with the following resources: node 1 node 2 8 cores (64 GB) 8 cores (32 GB) Job mix: - account 1: 1/3 min, 1/4 cores 48/48GB - accounts 2, 3: single-node jobs number of cores at random memory/core: irrelevant shares s norm S 1 =.7 M A M B shares s norm M A M B S 2 = shares s norm MA M B S 3 =
28 Conclusions Fair share is (even in its simple form) much more complex than other scheduling mechanisms In particular on a heterogeneous system, where different resources are not equally available (intrinsic unfairness) One needs to keep an eye on many different key quantities ρ S i One may quickly disappoint users Therefore, it is indispensable to be able to explain how the applied fair share algorithm works and why in specific cases lower targets are reached M?
29 Outlook I Realistic tests caucluster
30 Outlook II 2nd-level backfill Basic idea: use situations where ρ < 1 to process low-priority jobs - Server+client architecture; with the server responsible for maintaining the pool and handing out tasks to the client(s) - Automatic scheduling of clients by a preemption rule Slurm configuration via standard plugin: - 2 partitions: fairq (Priority=1) + lowq (Priority=1) - PreemptType=preempt/partition prio - PreemptMode=CANCEL Software Kiel University: - Optimization algorithms for chemical and materials science (RG Prof. Hartke) 1 - Quantum Monte-Carlo (RG Prof. Bonitz) 2 Fair share aspect: - A grace time may influence the fair share in the standard partition 1 J.M. Dieterich and B. Hartke, An Error-safe, Portable, and Efficient Evolutionary Algorithms Implementation with High Scalability, submitted to J. Chem. Theory Comput. (216) 2 T. Dornheim, S. Groth, T. Schoof et al., Ab initio quantum Monte Carlo simulations of the uniform electron gas without fixed nodes, Phys. Rev. B 93, (216)
31 Outlook III 2nd-level backfill example - 3 accounts: Blue and green: Single-node jobs with max. 8 cores (fairq, ratio 2:1) Red: Serial backfill jobs in the lowq (fixed walltime: 5 min) allocated cores a b c
Cluster Workload Management
Cluster Workload Management Goal: maximising the delivery of resources to jobs, given job requirements and local policy restrictions Three parties Users: supplying the job requirements Administrators:
More informationMoreno Baricevic Stefano Cozzini. CNR-IOM DEMOCRITOS Trieste, ITALY. Resource Management
Moreno Baricevic Stefano Cozzini CNR-IOM DEMOCRITOS Trieste, ITALY Resource Management RESOURCE MANAGEMENT We have a pool of users and a pool of resources, then what? some software that controls available
More informationMoab and TORQUE Achieve High Utilization of Flagship NERSC XT4 System
Moab and TORQUE Achieve High Utilization of Flagship NERSC XT4 System Michael Jackson, President Cluster Resources michael@clusterresources.com +1 (801) 717-3722 Cluster Resources, Inc. Contents 1. Introduction
More informationSt Louis CMG Boris Zibitsker, PhD
ENTERPRISE PERFORMANCE ASSURANCE BASED ON BIG DATA ANALYTICS St Louis CMG Boris Zibitsker, PhD www.beznext.com bzibitsker@beznext.com Abstract Today s fast-paced businesses have to make business decisions
More informationEvaluating scalability and efficiency of the Resource and Job Management System on large HPC Clusters
Evaluating scalability and efficiency of the Resource and Job Management System on large HPC Clusters Yiannis Georgiou 1, Matthieu Hautreux 2 1 BULL S.A.S Yiannis.Georgiou@bull.fr 2 CEA-DAM Matthieu.Hautreux@cea.fr
More informationDynamic Fractional Resource Scheduling for HPC Workloads
Dynamic Fractional Resource Scheduling for HPC Workloads Mark Stillwell 1 Frédéric Vivien 2 Henri Casanova 1 1 Department of Information and Computer Sciences University of Hawai i at Mānoa 2 INRIA, France
More informationAll about job wait times in the Graham queue
SHARCNET General Interest Webinar Series All about job wait times in the Graham queue James Desjardins High Performance Computing Consultant SHARCNET, Brock University April 25th, 2018 Common questions
More informationA2L2: an Application Aware Flexible HPC Scheduling Model for Low-Latency Allocation
A2L2: an Application Aware Flexible HPC Scheduling Model for Low-Latency Allocation Gonzalo P. Rodrigo - gonzalo@cs.umu.se P-O Östberg p-o@cs.umu.se Lavanya Ramakrishnan lramakrishnan@lbl.gov Erik Elmroth
More informationLearning Based Admission Control. Jaideep Dhok MS by Research (CSE) Search and Information Extraction Lab IIIT Hyderabad
Learning Based Admission Control and Task Assignment for MapReduce Jaideep Dhok MS by Research (CSE) Search and Information Extraction Lab IIIT Hyderabad Outline Brief overview of MapReduce MapReduce as
More informationDelivering High Performance for Financial Models and Risk Analytics
QuantCatalyst Delivering High Performance for Financial Models and Risk Analytics September 2008 Risk Breakfast London Dr D. Egloff daniel.egloff@quantcatalyst.com QuantCatalyst Inc. Technology and software
More informationDIET: New Developments and Recent Results
A. Amar 1, R. Bolze 1, A. Bouteiller 1, A. Chis 1, Y. Caniou 1, E. Caron 1, P.K. Chouhan 1, G.L. Mahec 2, H. Dail 1, B. Depardon 1, F. Desprez 1, J. S. Gay 1, A. Su 1 LIP Laboratory (UMR CNRS, ENS Lyon,
More informationHPC USAGE ANALYTICS. Supercomputer Education & Research Centre Akhila Prabhakaran
HPC USAGE ANALYTICS Supercomputer Education & Research Centre Akhila Prabhakaran OVERVIEW: BATCH COMPUTE SERVERS Dell Cluster : Batch cluster consists of 3 Nodes of Two Intel Quad Core X5570 Xeon CPUs.
More informationCompute Canada Resource Allocation Competition Bryan Caron
Compute Canada Resource Allocation Competition 2015 October 1-2, 2014 Bryan Caron bryan.caron@mcgill.ca bryan.caron@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada
More informationCPU scheduling. CPU Scheduling
EECS 3221 Operating System Fundamentals No.4 CPU scheduling Prof. Hui Jiang Dept of Electrical Engineering and Computer Science, York University CPU Scheduling CPU scheduling is the basis of multiprogramming
More informationIntegration of Titan supercomputer at OLCF with ATLAS Production System
Integration of Titan supercomputer at OLCF with ATLAS Production System F Barreiro Megino 1, K De 1, S Jha 2, A Klimentov 3, P Nilsson 3, D Oleynik 1, S Padolski 3, S Panitkin 3, J Wells 4 and T Wenaus
More informationPreemptive ReduceTask Scheduling for Fair and Fast Job Completion
ICAC 2013-1 Preemptive ReduceTask Scheduling for Fair and Fast Job Completion Yandong Wang, Jian Tan, Weikuan Yu, Li Zhang, Xiaoqiao Meng ICAC 2013-2 Outline Background & Motivation Issues in Hadoop Scheduler
More informationUAB Condor Pilot UAB IT Research Comptuing June 2012
UAB Condor Pilot UAB IT Research Comptuing June 2012 The UAB Condor Pilot explored the utility of the cloud computing paradigm to research applications using aggregated, unused compute cycles harvested
More informationCMS readiness for multi-core workload scheduling
CMS readiness for multi-core workload scheduling Antonio Pérez-Calero Yzquierdo, on behalf of the CMS Collaboration, Computing and Offline, Submission Infrastructure Group CHEP 2016 San Francisco, USA
More informationHTCaaS: Leveraging Distributed Supercomputing Infrastructures for Large- Scale Scientific Computing
HTCaaS: Leveraging Distributed Supercomputing Infrastructures for Large- Scale Scientific Computing Jik-Soo Kim, Ph.D National Institute of Supercomputing and Networking(NISN) at KISTI Table of Contents
More informationA Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems
A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems Aysan Rasooli Department of Computing and Software McMaster University Hamilton, Canada Email: rasooa@mcmaster.ca Douglas G. Down
More informationTrackable RESources (TRES)
Trackable RESources (TRES) Brian Christiansen and Danny Auble SchedMD LLC Slurm User Group Meeting 2015 Overview Need Setup Transition Reporting Fairshare Priority Need Limits on more resources other than
More informationHow jobs are scheduled to run on Graham and Cedar
SHARCNET General Interest Webinar Series How jobs are scheduled to run on Graham and Cedar James Desjardins High Performance Computing Consultant SHARCNET, Brock University July 19th, 2017 Overview Documentation
More informationYale Center for Research Computing. High Performance Computing Cluster Usage Guidelines
Yale Center for Research Computing High Performance Computing Cluster Usage Guidelines The Yale Center for Research Computing (YCRC) provides shared access to a number of Linux-based High Performance Computing
More informationPriority-enabled Scheduling for Resizable Parallel Applications
Priority-enabled Scheduling for Resizable Parallel Applications Rajesh Sudarsan, Student Member, IEEE, Calvin J. Ribbens,and Srinidhi Varadarajan, Member, IEEE Abstract In this paper, we illustrate the
More informationFULTECH CONSULTING RISK TECHNOLOGIES
FULTECH CONSULTING RISK TECHNOLOGIES ENTERPRISE-WIDE RISK MANAGEMENT Many global financial services firms rely on their legacy technology infrastructure for critical calculations dedicated to support enterprise-wide
More informationBuilding a Multi-Tenant Infrastructure for Diverse Application Workloads
Building a Multi-Tenant Infrastructure for Diverse Application Workloads Rick Janowski Marketing Manager IBM Platform Computing 1 The Why and What of Multi-Tenancy 2 Parallelizable problems demand fresh
More informationUnderstand (and potentially reduce) job wait times by examining scheduler configuration, load in the queue and account usage
SHARCNET General Interest Webinar Series Understand (and potentially reduce) job wait times by examining scheduler configuration, load in the queue and account usage James Desjardins High Performance Computing
More informationNVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION APPLICATION SIZING GUIDE FOR SIEMENS NX APPLICATION GUIDE. Ver 1.0
NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION APPLICATION SIZING GUIDE FOR SIEMENS NX APPLICATION GUIDE Ver 1.0 EXECUTIVE SUMMARY This document provides insights into how to deploy NVIDIA Quadro Virtual
More informationOracle Financial Services Revenue Management and Billing V2.3 Performance Stress Test on Exalogic X3-2 & Exadata X3-2
Oracle Financial Services Revenue Management and Billing V2.3 Performance Stress Test on Exalogic X3-2 & Exadata X3-2 O R A C L E W H I T E P A P E R J A N U A R Y 2 0 1 5 Table of Contents Disclaimer
More informationJob Scheduling with Lookahead Group Matchmaking for Time/Space Sharing on Multi-core Parallel Machines
Job Scheduling with Lookahead Group Matchmaking for Time/Space Sharing on Multi-core Parallel Machines Xijie Zeng and Angela Sodan University of Windsor, Windsor ON N9B 3P4, Canada zengx@uwindsor.ca,acsodan@uwindsor.ca
More informationAsk the right question, regardless of scale
Ask the right question, regardless of scale Customers use 100s to 1,000s Of cores to answer business-critical Questions they couldn t have done before. Trivial to support different use cases Different
More informationJuly, 10 th From exotics to vanillas with GPU Murex 2014
July, 10 th 2014 From exotics to vanillas with GPU Murex 2014 COMPANY Selected Industry Recognition and Rankings 2013-2014 OVERALL #1 TOP TECHNOLOGY VENDOR #1 Trading Systems #1 Pricing & Risk Analytics
More informationA Systematic Approach to Performance Evaluation
A Systematic Approach to Performance evaluation is the process of determining how well an existing or future computer system meets a set of alternative performance objectives. Arbitrarily selecting performance
More informationANSYS, Inc. March 12, ANSYS HPC Licensing Options - Release
1 2016 ANSYS, Inc. March 12, 2017 ANSYS HPC Licensing Options - Release 18.0 - 4 Main Products HPC (per-process) 10 instead of 8 in 1 st Pack at Release 18.0 and higher HPC Pack HPC product rewarding volume
More informationJob Scheduling Challenges of Different Size Organizations
Job Scheduling Challenges of Different Size Organizations NetworkComputer White Paper 2560 Mission College Blvd., Suite 130 Santa Clara, CA 95054 (408) 492-0940 Introduction Every semiconductor design
More informationWhite paper A Reference Model for High Performance Data Analytics(HPDA) using an HPC infrastructure
White paper A Reference Model for High Performance Data Analytics(HPDA) using an HPC infrastructure Discover how to reshape an existing HPC infrastructure to run High Performance Data Analytics (HPDA)
More informationIncreasing Enterprise Support Demand & Complexity
PTC System Monitor Increasing Enterprise Support Demand & Complexity Diagnostics & Troubleshooting Tools based on Customer & TS Requirements Customer Challenges Visibility into System Health Time To Resolution
More informationComputer Servers Draft 2 Comment Response Summary
Ref. # Organization Topic Stakeholder Comment Summary EPA Response 1 Comment Summary Resilient server definition One stakeholder commented on the resilient server definition. The stakeholder commented
More informationBluemix Overview. Last Updated: October 10th, 2017
Bluemix Overview Last Updated: October 10th, 2017 Agenda Overview Architecture Apps & Services Cloud Computing An estimated 85% of new software is being built for cloud deployment Cloud Computing is a
More informationIBM xseries 430. Versatile, scalable workload management. Provides unmatched flexibility with an Intel architecture and open systems foundation
Versatile, scalable workload management IBM xseries 430 With Intel technology at its core and support for multiple applications across multiple operating systems, the xseries 430 enables customers to run
More informationGraph Optimization Algorithms for Sun Grid Engine. Lev Markov
Graph Optimization Algorithms for Sun Grid Engine Lev Markov Sun Grid Engine SGE management software that optimizes utilization of software and hardware resources in heterogeneous networked environment.
More informationThe Portable Batch Scheduler and the Maui Scheduler on Linux Clusters*
The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters* Brett Bode, David M. Halstead, Ricky Kendall, and Zhou Lei Scalable Computing Laboratory, Ames Laboratory, DOE Wilhelm Hall, Ames,
More informationParallels Remote Application Server and Microsoft Azure. Scalability and Cost of Using RAS with Azure
Parallels Remote Application Server and Microsoft Azure and Cost of Using RAS with Azure Contents Introduction to Parallels RAS and Microsoft Azure... 3... 4 Costs... 18 Conclusion... 21 2 C HAPTER 1 Introduction
More informationBackfilling Scheduling Jobs Based on Priority Using in Grid Computing
Backfilling Scheduling Jobs Based on Priority Using in Grid Computing Sandip Fakira Lokhande 1, Prof. S. R. Jadhao 2 1 ME Scholar, Babasaheb Naik College of Engineering, Pusad 2 Associate Professor, Department
More informationOn Cloud Computational Models and the Heterogeneity Challenge
On Cloud Computational Models and the Heterogeneity Challenge Raouf Boutaba D. Cheriton School of Computer Science University of Waterloo WCU IT Convergence Engineering Division POSTECH FOME, December
More informationIntegration of Titan supercomputer at OLCF with ATLAS Production System
Integration of Titan supercomputer at OLCF with ATLAS Production System F. Barreiro Megino 1, K. De 1, S. Jha 2, A. Klimentov 3, P. Nilsson 3, D. Oleynik 1, S. Padolski 3, S. Panitkin 3, J. Wells 4 and
More informationAnalytical Capability Security Compute Ease Data Scale Price Users Traditional Statistics vs. Machine Learning In-Memory vs. Shared Infrastructure CRAN vs. Parallelization Desktop vs. Remote Explicit vs.
More informationCORE APPLICATIONS ANALYSIS OF BUSINESS-CRITICAL ADABAS & NATURAL
ADABAS & NATURAL ANALYSIS OF BUSINESS-CRITICAL CORE APPLICATIONS CONTENTS 2 Core applications in a changing IT landscape 3 The need for comprehensive analysis 4 The complexity of core applications 5 An
More informationCS 143A - Principles of Operating Systems
CS 143A - Principles of Operating Systems Lecture 4 - CPU Scheduling Prof. Nalini Venkatasubramanian nalini@ics.uci.edu CPU Scheduling 1 Outline Basic Concepts Scheduling Objectives Levels of Scheduling
More informationThe Evolution of Planning Software
The Evolution of Planning Software Get it done Faster. July 29, 2015 Imagination at work Software Products Concorda Software Suite MAPS PSLF MARS Evaluates power system economics and impact of congestion
More informationHigh Performance Computing(HPC) & Software Stack
IBM HPC Developer Education @ TIFR, Mumbai High Performance Computing(HPC) & Software Stack January 30-31, 2012 Pidad D'Souza (pidsouza@in.ibm.com) IBM, System & Technology Group 2002 IBM Corporation Agenda
More informationA Slurm Simulator: Implementation and Parametric Analysis
A Slurm Simulator: Implementation and Parametric Analysis Nikolay A. Simakov, Martins D. Innus, Matthew D. Jones,Robert L. DeLeon, Joseph P. White, Steven M. Gallo, Abani K. Patra and Thomas R. Furlani
More informationAn Oracle White Paper September, Oracle Exalogic Elastic Cloud: A Brief Introduction
An Oracle White Paper September, 2010 Oracle Exalogic Elastic Cloud: A Brief Introduction Introduction For most enterprise IT organizations, years of innovation, expansion, and acquisition have resulted
More informationHTCondor at the RACF. William Strecker-Kellogg RHIC/ATLAS Computing Facility Brookhaven National Laboratory April 2014
HTCondor at the RACF SEAMLESSLY INTEGRATING MULTICORE JOBS AND OTHER DEVELOPMENTS William Strecker-Kellogg RHIC/ATLAS Computing Facility Brookhaven National Laboratory April 2014 RACF Overview Main HTCondor
More informationDigital Transformation & olvency II Simulations for L&G: Optimizing, Accelerating and Migrating to the Cloud
Digital Transformation & olvency II Simulations for L&G: Optimizing, Accelerating and Migrating to the Cloud ActiveEon Introduction Products: Locations Workflows & Parallelization Some Customers IT Engineering
More informationOn the Impact of Reservations from the Grid on Planning-Based Resource Management
On the Impact of Reservations from the Grid on Planning-Based Resource Management Felix Heine 1, Matthias Hovestadt 1, Odej Kao 1, and Achim Streit 2 1 Paderborn Center for Parallel Computing (PC 2 ),
More informationOracle In-Memory Cost Management Cloud. Release What s New
Oracle In-Memory Cost Management Cloud Release 17.3 What s New TABLE OF CONTENTS REVISION HISTORY... 3 OVERVIEW... 4 UPDATE TASKS... 5 RELEASE FEATURE SUMMARY... 6 ORACLE IN-MEMORY COST MANAGEMENT CLOUD
More informationTABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF SYMBOLS AND ABBREVIATIONS
viii TABLE OF CONTENTS ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF SYMBOLS AND ABBREVIATIONS v xviii xix xxii 1. INTRODUCTION 1 1.1 MOTIVATION OF THE RESEARCH 1 1.2 OVERVIEW OF PROPOSED WORK 3 1.3
More informationGADD platform Overview
GADD platform Overview A GADD White Paper Published January, 2012 gaddsoftware.com Table of content 1. Introduction... 4 2. What makes the GADD platform different?... 4 2.1. How it has evolved... 4 2.2.
More informationMulti-core job submission and grid resource scheduling for ATLAS AthenaMP
Journal of Physics: Conference Series Multi-core job submission and grid resource scheduling for ATLAS AthenaMP To cite this article: D Crooks et al 2012 J. Phys.: Conf. Ser. 396 032115 View the article
More informationarxiv: v1 [cs.dc] 18 Jun 2018
arxiv:1806.06728v1 [cs.dc] 18 Jun 2018 AccaSim: a Customizable Workload Management Simulator for Job Dispatching Research in HPC Systems Cristian Galleguillos, Zeynep Kiziltan Alessio Netti Ricardo Soto
More informationAddressing the I/O bottleneck of HPC workloads. Professor Mark Parsons NEXTGenIO Project Chairman Director, EPCC
Addressing the I/O bottleneck of HPC workloads Professor Mark Parsons NEXTGenIO Project Chairman Director, EPCC I/O is key Exascale challenge Parallelism beyond 100 million threads demands a new approach
More informationDEPEI QIAN. HPC Development in China: A Brief Review and Prospect
DEPEI QIAN Qian Depei, Professor at Sun Yat-sen university and Beihang University, Dean of the School of Data and Computer Science of Sun Yat-sen University. Since 1996 he has been the member of the expert
More informationJob Scheduling in Cluster Computing: A Student Project
Session 3620 Job Scheduling in Cluster Computing: A Student Project Hassan Rajaei, Mohammad B. Dadfar Department of Computer Science Bowling Green State University Bowling Green, Ohio 43403 Phone: (419)372-2337
More informationHarvester. Tadashi Maeno (BNL)
Harvester Tadashi Maeno (BNL) Outline Motivation Design Workflows Plans 2 Motivation 1/2 PanDA currently relies on server-pilot paradigm PanDA server maintains state and manages workflows with various
More informationLatest Computational and Mathematical Tools for Transmission Expansion
Latest Computational and Mathematical Tools for Transmission Expansion IEEE PES T&D Meeting, Chicago IL Clayton Barrows, PhD April 15, 2014 NREL is a national laboratory of the U.S. Department of Energy,
More informationHow to create an Azure subscription
How to create an Azure subscription Azure is a cloud hosting service offered by Microsoft, and offers services like file storage, backups, database and Windows and Linux virtual machines. Anyone can harness
More informationA comprehensive mobile solution for Staff Management. CrewBuddy
A comprehensive mobile solution for Staff Management CrewBuddy A Comprehensive Staff Management Solution CrewBuddy is a leading-edge mobile application that brings the convenience of managing staff information
More informationUSING HPC CLASS INFRASTRUCTURE FOR HIGH THROUGHPUT COMPUTING IN GENOMICS
USING HPC CLASS INFRASTRUCTURE FOR HIGH THROUGHPUT COMPUTING IN GENOMICS Claude SCARPELLI Claude.Scarpelli@cea.fr FUNDAMENTAL RESEARCH DIVISION GENOMIC INSTITUTE Intel DDN Life Science Field Day Heidelberg,
More informationProRenaTa: Proactive and Reactive Tuning to Scale a Distributed Storage System
ProRenaTa: Proactive and Reactive Tuning to Scale a Distributed Storage System Ying Liu, Navaneeth Rameshan, Enric Monte, Vladimir Vlassov and Leandro Navarro 2015 15th IEEE/ACM International Symposium
More informationAgile Computing on Business Grids
C&C Research Laboratories NEC Europe Ltd Rathausallee 10 D-53757 St Augustin Germany Junwei Cao Agile Computing on Business Grids An Introduction to AgileGrid June 2003 Agile Computing on Business Grids
More informationREQUEST FOR PROPOSAL FOR
REQUEST FOR PROPOSAL FOR HIGH PERFORMANCE COMPUTING (HPC) SOLUTION Ref. No. PHY/ALK/43 (27/11/2012) by DEPARTMENT OF PHYSICS UNIVERSITY OF PUNE PUNE - 411 007 INDIA NOVEMBER 27, 2012 1 Purpose of this
More informationOptimize your FLUENT environment with Platform LSF CAE Edition
Optimize your FLUENT environment with Platform LSF CAE Edition Accelerating FLUENT CFD Simulations ANSYS, Inc. is a global leader in the field of computer-aided engineering (CAE). The FLUENT software from
More informationThe Gang Scheduler - Timesharing on a Cray T3D
The Gang Scheduler - sharing on a Cray T3D Morris Jette, David Storch, and Emily Yim, Lawrence Livermore National Laboratory, Livermore, California, USA ABSTRACT: The Gang Scheduler, under development
More informationMicro-Virtualization. Maximize processing power use and improve system/energy efficiency
Micro-Virtualization Maximize processing power use and improve system/energy efficiency Disclaimers We don t know everything But we know there is a problem and we re solving (at least part of) it And we
More informationValuePack:Value-Based Scheduling Framework for CPU-GPU Clusters
ValuePack:Value-Based Scheduling Framework for CPU-GPU Clusters Vignesh T. Ravi 1, Michela Becchi 2, Gagan Agrawal 1, and Srimat Chakradhar 3 1 Dept. of Computer Science and Engg., 2 Dept. of Electrical
More informationCPU Scheduling. Disclaimer: some slides are adopted from book authors and Dr. Kulkarni s slides with permission
CPU Scheduling Disclaimer: some slides are adopted from book authors and Dr. Kulkarni s slides with permission 1 Recap Deadlock prevention Break any of four deadlock conditions Mutual exclusion, no preemption,
More informationEnabling Resource Sharing between Transactional and Batch Workloads Using Dynamic Application Placement
Enabling Resource Sharing between Transactional and Batch Workloads Using Dynamic Application Placement David Carrera 1, Malgorzata Steinder 2, Ian Whalley 2, Jordi Torres 1, and Eduard Ayguadé 1 1 Technical
More informationMulti-Stage Resource-Aware Scheduling for Data Centers with Heterogenous Servers
MISTA 2015 Multi-Stage Resource-Aware Scheduling for Data Centers with Heterogenous Servers Tony T. Tran + Meghana Padmanabhan + Peter Yun Zhang Heyse Li + Douglas G. Down J. Christopher Beck + Abstract
More informationMaximizing your HPC cluster investment. Cray User Group. John Corne, Pre-Sales Engineer 23-May-2018
Maimizing your HPC cluster investment Cray User Group John Corne, Pre-Sales Engineer 23-May-2018 Agenda Bright Computing Bright Cluster Manager What s new in 8.1 Workload Accounting and Reporting Bright
More informationIBM General Parallel File System (GPFS TM )
November 2013 IBM General Parallel File System (GPFS TM ) Status, what s new and what s coming Agenda GPFS Updates Status of new features Roadmap discussion Research Activities 2 High Performance Common
More informationIBM WebSphere Extended Deployment, Version 5.1
Offering a dynamic, goals-directed, high-performance environment to support WebSphere applications IBM, Version 5.1 Highlights Dynamically accommodates variable and unpredictable business demands Helps
More informationPreston Smith Director of Research Services. September 12, 2015 RESEARCH COMPUTING GIS DAY 2015 FOR THE GEOSCIENCES
Preston Smith Director of Research Services RESEARCH COMPUTING September 12, 2015 GIS DAY 2015 FOR THE GEOSCIENCES OVERVIEW WHO ARE WE? IT Research Computing (RCAC) A unit of ITaP (Information Technology
More informationHPC Workload Management Tools: Tech Brief Update
89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com @EdisonGroupInc 212.367.7400 HPC Workload Management Tools: Tech Brief Update IBM Platform LSF Meets Evolving High Performance Computing
More informationDesign, Install and Manage System Center 2012 Operations Manager
Design, Install and Manage System Center 2012 Operations Manager Course 55004-55006 5 Days Instructor-led, Hands-on Introduction This five-day instructor-led course combines the content found in two System
More informationBringing AI Into Your Existing HPC Environment,
Bringing AI Into Your Existing HPC Environment, and Scaling It Up Introduction 2 Today s advancements in high performance computing (HPC) present new opportunities to tackle exceedingly complex workflows.
More informationEnhanced Algorithms for Multi-Site Scheduling
Enhanced Algorithms for Multi-Site Scheduling Carsten Ernemann 1, Volker Hamscher 1, Ramin Yahyapour 1, and Achim Streit 2 1 Computer Engineering Institute, University of Dortmund, 44221 Dortmund, Germany,
More informationCONTINUOUS INTEGRATION IN A CRAY MULTIUSER ENVIRONMENT
erhtjhtyhy CONTINUOUS INTEGRATION IN A CRAY MULTIUSER ENVIRONMENT BEN LENARD HPC Systems & Database Administrator May 22 nd, 2018 Stockholm, Sweden ARGONNE Director: Paul Kerns Managed by: UChicago Argonne,
More informationGrid 2.0 : Entering the new age of Grid in Financial Services
Grid 2.0 : Entering the new age of Grid in Financial Services Charles Jarvis, VP EMEA Financial Services June 5, 2008 Time is Money! The Computation Homegrown Applications ISV Applications Portfolio valuation
More informationService Solution. Brochure. Detailed status monitoring and intelligent alerting improves uptime and device availability.
Brochure Service Solution Detailed status monitoring and intelligent alerting improves uptime and device availability. Custom reports per device or function are generated automatically to keep an eye on
More informationRODOD Performance Test on Exalogic and Exadata Engineered Systems
An Oracle White Paper March 2014 RODOD Performance Test on Exalogic and Exadata Engineered Systems Introduction Oracle Communications Rapid Offer Design and Order Delivery (RODOD) is an innovative, fully
More informationGrid Computing Scheduling Jobs Based on Priority Using Backfilling
Grid Computing Scheduling Jobs Based on Priority Using Backfilling Sandip Fakira Lokhande 1, Sachin D. Chavhan 2, Prof. S. R. Jadhao 3 1-2 ME Scholar, Department of CSE, Babasaheb Naik College of Engineering,
More informationAn Oracle White Paper June Leveraging the Power of Oracle Engineered Systems for Enterprise Policy Automation
An Oracle White Paper June 2012 Leveraging the Power of Oracle Engineered Systems for Enterprise Policy Automation Executive Overview Oracle Engineered Systems deliver compelling return on investment,
More informationAdvantage Risk Management. Evolution to a Global Grid
Advantage Risk Management Evolution to a Global Grid Michael Oltman Risk Management Technology Global Corporate Investment Banking Agenda Warm Up Project Overview Motivation & Strategy Success Criteria
More informationExercise: Fractals, Task Farms and Load Imbalance
Exercise: Fractals, Task Farms and Load Imbalance May 24, 2015 1 Introduction and Aims This exercise looks at the use of task farms and how they can be applied to parallelise a problem. We use the calculation
More informationEfficient Access to a Cloud-based HPC Visualization Cluster
Efficient Access to a Cloud-based HPC Visualization Cluster Brian Fromme @ZapYourBrain 2017 Penguin Computing. All rights reserved. Agenda 1. Data Science and HPC challenges 2. Technologies that improve
More informationThe Dynamic PBS Scheduler
The Dynamic PBS Scheduler Jim Glidewell High Performance Computing Enterprise Storage and Servers May 8, 2008 BOEING is a trademark of Boeing Management Company. Computing Environment Cray X1 2 Chassis,
More informationTask Resource Allocation in Grid using Swift Scheduler
Int. J. of Computers, Communications & Control, ISSN 1841-9836, E-ISSN 1841-9844 Vol. IV (2009), No. 2, pp. 158-166 Task Resource Allocation in Grid using Swift Scheduler K. Somasundaram, S. Radhakrishnan
More informationGemaLogic Energy Management Platform
GemaLogic Energy Management Platform 1 GEMALOGIC ENERGY MANAGEMENT (EM) SOFTWARE PLATFORM GENERAL DESCRIPTION OF GEMALOGIC GemaLogic enables a systematic approach to continuous improvements of energy efficiency
More informationPreemptive Resource Management for Dynamically Arriving Tasks in an Oversubscribed Heterogeneous Computing System
2017 IEEE International Parallel and Distributed Processing Symposium Workshops Preemptive Resource Management for Dynamically Arriving Tasks in an Oversubscribed Heterogeneous Computing System Dylan Machovec
More information