Aspects of Fair Share on a heterogeneous cluster with SLURM

Size: px
Start display at page:

Download "Aspects of Fair Share on a heterogeneous cluster with SLURM"

Transcription

1 Aspects of Fair Share on a heterogeneous cluster with SLURM Karsten Balzer Computing Center, CAU Kiel balzer@rz.uni-kiel.de ZKI Tagung Heidelberg 216 (13th Oct. 216, 16:3-17:)

2 Background I CAU Kiel - We operate a) NEC HPC-System consisting of NEC SX-ACE vector system 256 nodes: 4 vector cores each; 64 GB memory bandwidth: 256 GB/s theo. peak performance: 65.6 TFlops NEC HPC-Linux-Cluster 22 cores (SandyBridge/Haswell); GB theo. peak performance: 47.2 TFlops sharing: global ScaTeFS file system: 1.5 PB batch system: NQSII b) separate Linux Cluster: rzcluster 15 nodes (AMD, Intel), 32 GB... 1 TB for rather general purpose but with exclusive islands batch system: PBSPro cluster is currently up for renewal

3 Background II Nucleus for the cluster renewal c) caucluster: an additional small Linux Cluster 7 nodes (Haswell): 4 cores each; 256 GB separate login and service nodes (8 cores, 64 GB) separate software installation and modules home and work directories shared batch system: SLURM basic fair share scheduling currently 5 user groups (i.e., 5 SLURM accounts) fair share ratios

4 Agenda A. Introduction SLURM and fair share with SLURM Our intentions to deploy fair share B. Towards a heterogeneous system Challenges? From some test cases to useful accounting metrics C. Conclusions A brief summary Outlook

5 SLURM... A brief overview SLURM: acronym for Simple Linux Utility for Resource Management Originally developed at Lawrence Livermore National Lab as a simple resource manager (started in 22) Now maintained and supported by SchedMD ( Has evolved into a capable job scheduler ( 5, lines of C code) Portable, scalable and fault-tolerant Increasingly being used at academic research computing centers + Supports fair share, with a rather easy-to-use plugin + Fair share with generic groups (not based on Linux groups) + Very good documentation (for installation, administration and usage) + Nice tools and thorough MySQL database for monitoring/accounting + Open-source

6 ... and fair share with SLURM Priority plugins Default is FIFO: - scheduling jobs on a first in, first out basis Fair share: - job priorities adjusted according to short term historical usage - helps to steer a system toward defined usage targets Basic fair share with SLURM priority/multifactor: - fair share plugin, but used without other priority factors (such as age, job size, partition, qos,...) sched/backfill: - backfill plugin enabled shares: - allocated resources (core-hours) decaying with time - share decay half life: 12 h fair share tree: - consider a simple tree with 3 accounts - fair share targets S i : 7%, 2% and 1% of resources root accounts users S 1 =.7 S 2 =.2 S 3 =.1

7 Examples I A 1 ) Test cluster: - just 2 compute nodes with 8 cores each and 64 GB Job mix: - submit 5 jobs per account - random job properties: walltime: 1-1 min number of nodes: 1-2 cores per node: memory/core < 8 GB Time evolution of shares: s i (t n) = γ s i (t n 1 ) + ω i (t n) ω i (t n): consumption of core-hours in time interval t = t n t n 1 γ: decay factor for interval t Cluster allocation: - here ρ=.872 shares si [core-h] Theoretical peak share: s peak = N cluster-cores t 1 γ Effective peak share: s i (t) s eff = s peak ρ i s peak s eff = s peak ρ sum

8 Examples II 16 i A 2 ) Job allocation map: - is available from SLURM s accounting database Normalized shares: si norm. (t) = s i (t) s (t), s (t) = i s i (t) allocated cores shares s norm a i = T Accounting: 1 ρn cluster-cores T N jobs ( T ) i j=1 ω j s norm. i S i (S i : target share)

9 Examples III A 3 ) Job allocation map: - identify jobs of different kind serial single-core jobs parallel single-node jobs parallel multi-node jobs fine pattern coarse pattern full solid allocated cores Backfill strategy: - numbering of jobs according to submission order reveals backfill assistance - e.g.: 137, 172, 72, 14, 33 Reservations: - resources are being reserved essentially for multi-node jobs - reservations mainly determine the cluster allocation ρ

10 Examples III A 3 ) Job allocation map: - identify jobs of different kind serial single-core jobs parallel single-node jobs parallel multi-node jobs fine pattern coarse pattern full solid allocated cores Backfill strategy: - numbering of jobs according to submission order reveals backfill assistance - e.g.: 137, 172, 72, 14, 33 Reservations: - resources are being reserved essentially for multi-node jobs - reservations mainly determine the cluster allocation ρ

11 Examples IV B) Test cluster: - as in A) above Job mix: - submit 5 jobs/account - account 1: walltime: 3 min full node: 8 cores - accounts 2, 3: walltime: 1-1 min number of nodes: 1-2 cores per node: 1-8 Job allocation map: - excellent time-local performance of the fair share algorithm Cluster allocation: ρ = allocated cores

12 Our intentions to deploy fair share Make all compute resources available to all users Ensure fair wait times when cluster is flooded by jobs Reach pre-defined usage targets, ultimately on a monthly basis

13 Fair share challenges? It is not just about to monitor the overall cluster utilization ρ Instead: additionally (many) usage targets distributed over a well-branched tree! Cluster heterogeneity: - May it impede the smooth operation of the fair share algorithm? - Can we guarantee an adaquate cluster allocation? - Can we guarantee usage targets?

14 Test scenarios I A) Test cluster: - 2 nodes (64GB) with 8 and 4 cores, resp. Job mix: - memory/core < 8 GB - accounts 1, 3: walltime: 1-1 min 1 node: 1-4 cores - account 2: walltime: 1 min full node: 8 cores Job allocation map: - targets are reached (S 1 =.7, S 2 =.2, S 3 =.1) 16 allocated cores

15 Test scenarios II B 1 ) Test cluster: - 3 nodes with the following resources: Job mix: - random single-node jobs (number of cores 8) nodes 1, 2 node 3 8 cores (64 GB) 4 cores (256 GB) allocated cores Job allocation map: - for single-node jobs: ρ =

16 Test scenarios II allocated cores B 2 ) Test cluster: - 3 nodes with the following resources: nodes 1, 2 node 3 8 cores (64 GB) 4 cores (256 GB) Job mix: - account 1: 1 min, 1 cores, > 128GB - accounts 2, 3: walltime: 1-1 min 1 node, 1-8 cores per node memory/core < 1GB Job allocation map: Cluster allocation: ρ =.84 - for single-node jobs: ρ =

17 Test scenarios II allocated cores B 2 ) Test cluster: - 3 nodes with the following resources: nodes 1, 2 node 3 8 cores (64 GB) 4 cores (256 GB) Job allocation map: - for single-node jobs: ρ =.927 memory is secondary in scheduling the red jobs Job mix: - account 1: 1 min, 1 cores, > 128GB - accounts 2, 3: walltime: 1-1 min 1 node, 1-8 cores per node memory/core < 1GB Cluster allocation: ρ =

18 Test scenarios II allocated cores B 2 ) Test cluster: - 3 nodes with the following resources: nodes 1, 2 node 3 8 cores (64 GB) 4 cores (256 GB) Job allocation map: - for single-node jobs: ρ =.927 shares target shares (s i S i ) Job mix: - account 1: s 1 = 1/ s 2 = (46/3 2)/56.55 >.2 s 3 = (46/3 1)/ >.1 s 3/s 2 =.5 1 min, 1 cores, > 128GB - accounts 2, 3: walltime: 1-1 min 1 node, 1-8 cores per node memory/core < 1GB Cluster allocation: ρ =

19 Test scenarios II allocated cores B 2 ) Test cluster: - 3 nodes with the following resources: nodes 1, 2 node 3 8 cores (64 GB) 4 cores (256 GB) Job allocation map: - for single-node jobs: ρ =.927 shares target shares (s i S i ) Job mix: - account 1: s 1 = 1/ s 2 = (46/3 2)/56.55 >.2 s 3 = (46/3 1)/ >.1 s 3/s 2 =.5 1 min, 1 cores, > 128GB - accounts 2, 3: walltime: 1-1 min 1 node, 1-8 cores per node memory/core < 1GB Cluster allocation: ρ =.84 - same would occur on a homogeneous cluster: (3 1)/(3 4) = but throughput of account 1 would be larger: 3 (smaller problematic time window)

20 Test scenarios III 16 C 1 ) Test cluster: - 2 nodes with the following resources: shares s norm. i node 1 node Job allocation map: 8 cores (32 GB) 8 cores (64 GB) Job mix: - account 1: 5/1 min, 1/2 cores 3/4GB - accounts 2, 3: single-node jobs not fully at random memory/core: irrelevant Results: - cl. allocation: ρ =.31 - target shares are potentially reached by decrease of backfill - this behavior is reproducible allocated cores

21 Test scenarios III C 2 ) Test cluster: - 2 nodes with the following resources: node 1 node 2 8 cores (64 GB) 8 cores (64 GB) Job mix: - same as in C 1 ) Results: - targets are reached - alternating scheduling for account 1 - cl. allocation slightly better (though still low) Job allocation map: 16 allocated cores

22 Intermediate summary On a heterogeneous cluster (even at large ρ) it can be (much) more difficult to reach target shares If targets are achieved strongly depends on the job mix and on job properties Obtained targets may reflect the heterogeneity of the cluster Delay of backfill can assist in reaching targets, but leads to small ρ

23 Towards suitable metrics What about user satisfaction? 1..5 a i S i =

24 Towards suitable metrics What about user satisfaction? 1..5 a i S i = Can we provide a simple metric which helps to explain those situations?

25 Suitable metrics I Consider a situation where ρ 1 - On which resources a job j can run? - How many nodes, cores per node and how much memory per node are requested? N nodes,j, N cores-per-node,j and N gigabyte-per-node,j - How many cluster cores are allocatable to process the job? N allocatable-cores,j = f (N nodes,j,n cores-per-node,j,n gigabyte-per-node,j ) - How much cores are available in total on the cluster? N cluster-cores Job-based cluster allocation - Of interest is the ratio for job j : γ j = N allocatable-cores,j N cluster-cores γ j 1

26 Suitable metrics II Definition A - Job-based average: M A = 1 N jobs γ j N jobs j=1 Definition B - Runtime-weighted average: M B = 1 N jobs N jobs γ j T j, T = T j=1 j=1 T j

27 i Suitable metrics III Example: Test cluster: - 2 nodes with the following resources: node 1 node 2 8 cores (64 GB) 8 cores (32 GB) Job mix: - account 1: 1/3 min, 1/4 cores 48/48GB - accounts 2, 3: single-node jobs number of cores at random memory/core: irrelevant shares s norm S 1 =.7 M A M B shares s norm M A M B S 2 = shares s norm MA M B S 3 =

28 Conclusions Fair share is (even in its simple form) much more complex than other scheduling mechanisms In particular on a heterogeneous system, where different resources are not equally available (intrinsic unfairness) One needs to keep an eye on many different key quantities ρ S i One may quickly disappoint users Therefore, it is indispensable to be able to explain how the applied fair share algorithm works and why in specific cases lower targets are reached M?

29 Outlook I Realistic tests caucluster

30 Outlook II 2nd-level backfill Basic idea: use situations where ρ < 1 to process low-priority jobs - Server+client architecture; with the server responsible for maintaining the pool and handing out tasks to the client(s) - Automatic scheduling of clients by a preemption rule Slurm configuration via standard plugin: - 2 partitions: fairq (Priority=1) + lowq (Priority=1) - PreemptType=preempt/partition prio - PreemptMode=CANCEL Software Kiel University: - Optimization algorithms for chemical and materials science (RG Prof. Hartke) 1 - Quantum Monte-Carlo (RG Prof. Bonitz) 2 Fair share aspect: - A grace time may influence the fair share in the standard partition 1 J.M. Dieterich and B. Hartke, An Error-safe, Portable, and Efficient Evolutionary Algorithms Implementation with High Scalability, submitted to J. Chem. Theory Comput. (216) 2 T. Dornheim, S. Groth, T. Schoof et al., Ab initio quantum Monte Carlo simulations of the uniform electron gas without fixed nodes, Phys. Rev. B 93, (216)

31 Outlook III 2nd-level backfill example - 3 accounts: Blue and green: Single-node jobs with max. 8 cores (fairq, ratio 2:1) Red: Serial backfill jobs in the lowq (fixed walltime: 5 min) allocated cores a b c

Cluster Workload Management

Cluster Workload Management Cluster Workload Management Goal: maximising the delivery of resources to jobs, given job requirements and local policy restrictions Three parties Users: supplying the job requirements Administrators:

More information

Moreno Baricevic Stefano Cozzini. CNR-IOM DEMOCRITOS Trieste, ITALY. Resource Management

Moreno Baricevic Stefano Cozzini. CNR-IOM DEMOCRITOS Trieste, ITALY. Resource Management Moreno Baricevic Stefano Cozzini CNR-IOM DEMOCRITOS Trieste, ITALY Resource Management RESOURCE MANAGEMENT We have a pool of users and a pool of resources, then what? some software that controls available

More information

Moab and TORQUE Achieve High Utilization of Flagship NERSC XT4 System

Moab and TORQUE Achieve High Utilization of Flagship NERSC XT4 System Moab and TORQUE Achieve High Utilization of Flagship NERSC XT4 System Michael Jackson, President Cluster Resources michael@clusterresources.com +1 (801) 717-3722 Cluster Resources, Inc. Contents 1. Introduction

More information

St Louis CMG Boris Zibitsker, PhD

St Louis CMG Boris Zibitsker, PhD ENTERPRISE PERFORMANCE ASSURANCE BASED ON BIG DATA ANALYTICS St Louis CMG Boris Zibitsker, PhD www.beznext.com bzibitsker@beznext.com Abstract Today s fast-paced businesses have to make business decisions

More information

Evaluating scalability and efficiency of the Resource and Job Management System on large HPC Clusters

Evaluating scalability and efficiency of the Resource and Job Management System on large HPC Clusters Evaluating scalability and efficiency of the Resource and Job Management System on large HPC Clusters Yiannis Georgiou 1, Matthieu Hautreux 2 1 BULL S.A.S Yiannis.Georgiou@bull.fr 2 CEA-DAM Matthieu.Hautreux@cea.fr

More information

Dynamic Fractional Resource Scheduling for HPC Workloads

Dynamic Fractional Resource Scheduling for HPC Workloads Dynamic Fractional Resource Scheduling for HPC Workloads Mark Stillwell 1 Frédéric Vivien 2 Henri Casanova 1 1 Department of Information and Computer Sciences University of Hawai i at Mānoa 2 INRIA, France

More information

All about job wait times in the Graham queue

All about job wait times in the Graham queue SHARCNET General Interest Webinar Series All about job wait times in the Graham queue James Desjardins High Performance Computing Consultant SHARCNET, Brock University April 25th, 2018 Common questions

More information

A2L2: an Application Aware Flexible HPC Scheduling Model for Low-Latency Allocation

A2L2: an Application Aware Flexible HPC Scheduling Model for Low-Latency Allocation A2L2: an Application Aware Flexible HPC Scheduling Model for Low-Latency Allocation Gonzalo P. Rodrigo - gonzalo@cs.umu.se P-O Östberg p-o@cs.umu.se Lavanya Ramakrishnan lramakrishnan@lbl.gov Erik Elmroth

More information

Learning Based Admission Control. Jaideep Dhok MS by Research (CSE) Search and Information Extraction Lab IIIT Hyderabad

Learning Based Admission Control. Jaideep Dhok MS by Research (CSE) Search and Information Extraction Lab IIIT Hyderabad Learning Based Admission Control and Task Assignment for MapReduce Jaideep Dhok MS by Research (CSE) Search and Information Extraction Lab IIIT Hyderabad Outline Brief overview of MapReduce MapReduce as

More information

Delivering High Performance for Financial Models and Risk Analytics

Delivering High Performance for Financial Models and Risk Analytics QuantCatalyst Delivering High Performance for Financial Models and Risk Analytics September 2008 Risk Breakfast London Dr D. Egloff daniel.egloff@quantcatalyst.com QuantCatalyst Inc. Technology and software

More information

DIET: New Developments and Recent Results

DIET: New Developments and Recent Results A. Amar 1, R. Bolze 1, A. Bouteiller 1, A. Chis 1, Y. Caniou 1, E. Caron 1, P.K. Chouhan 1, G.L. Mahec 2, H. Dail 1, B. Depardon 1, F. Desprez 1, J. S. Gay 1, A. Su 1 LIP Laboratory (UMR CNRS, ENS Lyon,

More information

HPC USAGE ANALYTICS. Supercomputer Education & Research Centre Akhila Prabhakaran

HPC USAGE ANALYTICS. Supercomputer Education & Research Centre Akhila Prabhakaran HPC USAGE ANALYTICS Supercomputer Education & Research Centre Akhila Prabhakaran OVERVIEW: BATCH COMPUTE SERVERS Dell Cluster : Batch cluster consists of 3 Nodes of Two Intel Quad Core X5570 Xeon CPUs.

More information

Compute Canada Resource Allocation Competition Bryan Caron

Compute Canada Resource Allocation Competition Bryan Caron Compute Canada Resource Allocation Competition 2015 October 1-2, 2014 Bryan Caron bryan.caron@mcgill.ca bryan.caron@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada

More information

CPU scheduling. CPU Scheduling

CPU scheduling. CPU Scheduling EECS 3221 Operating System Fundamentals No.4 CPU scheduling Prof. Hui Jiang Dept of Electrical Engineering and Computer Science, York University CPU Scheduling CPU scheduling is the basis of multiprogramming

More information

Integration of Titan supercomputer at OLCF with ATLAS Production System

Integration of Titan supercomputer at OLCF with ATLAS Production System Integration of Titan supercomputer at OLCF with ATLAS Production System F Barreiro Megino 1, K De 1, S Jha 2, A Klimentov 3, P Nilsson 3, D Oleynik 1, S Padolski 3, S Panitkin 3, J Wells 4 and T Wenaus

More information

Preemptive ReduceTask Scheduling for Fair and Fast Job Completion

Preemptive ReduceTask Scheduling for Fair and Fast Job Completion ICAC 2013-1 Preemptive ReduceTask Scheduling for Fair and Fast Job Completion Yandong Wang, Jian Tan, Weikuan Yu, Li Zhang, Xiaoqiao Meng ICAC 2013-2 Outline Background & Motivation Issues in Hadoop Scheduler

More information

UAB Condor Pilot UAB IT Research Comptuing June 2012

UAB Condor Pilot UAB IT Research Comptuing June 2012 UAB Condor Pilot UAB IT Research Comptuing June 2012 The UAB Condor Pilot explored the utility of the cloud computing paradigm to research applications using aggregated, unused compute cycles harvested

More information

CMS readiness for multi-core workload scheduling

CMS readiness for multi-core workload scheduling CMS readiness for multi-core workload scheduling Antonio Pérez-Calero Yzquierdo, on behalf of the CMS Collaboration, Computing and Offline, Submission Infrastructure Group CHEP 2016 San Francisco, USA

More information

HTCaaS: Leveraging Distributed Supercomputing Infrastructures for Large- Scale Scientific Computing

HTCaaS: Leveraging Distributed Supercomputing Infrastructures for Large- Scale Scientific Computing HTCaaS: Leveraging Distributed Supercomputing Infrastructures for Large- Scale Scientific Computing Jik-Soo Kim, Ph.D National Institute of Supercomputing and Networking(NISN) at KISTI Table of Contents

More information

A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems

A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems Aysan Rasooli Department of Computing and Software McMaster University Hamilton, Canada Email: rasooa@mcmaster.ca Douglas G. Down

More information

Trackable RESources (TRES)

Trackable RESources (TRES) Trackable RESources (TRES) Brian Christiansen and Danny Auble SchedMD LLC Slurm User Group Meeting 2015 Overview Need Setup Transition Reporting Fairshare Priority Need Limits on more resources other than

More information

How jobs are scheduled to run on Graham and Cedar

How jobs are scheduled to run on Graham and Cedar SHARCNET General Interest Webinar Series How jobs are scheduled to run on Graham and Cedar James Desjardins High Performance Computing Consultant SHARCNET, Brock University July 19th, 2017 Overview Documentation

More information

Yale Center for Research Computing. High Performance Computing Cluster Usage Guidelines

Yale Center for Research Computing. High Performance Computing Cluster Usage Guidelines Yale Center for Research Computing High Performance Computing Cluster Usage Guidelines The Yale Center for Research Computing (YCRC) provides shared access to a number of Linux-based High Performance Computing

More information

Priority-enabled Scheduling for Resizable Parallel Applications

Priority-enabled Scheduling for Resizable Parallel Applications Priority-enabled Scheduling for Resizable Parallel Applications Rajesh Sudarsan, Student Member, IEEE, Calvin J. Ribbens,and Srinidhi Varadarajan, Member, IEEE Abstract In this paper, we illustrate the

More information

FULTECH CONSULTING RISK TECHNOLOGIES

FULTECH CONSULTING RISK TECHNOLOGIES FULTECH CONSULTING RISK TECHNOLOGIES ENTERPRISE-WIDE RISK MANAGEMENT Many global financial services firms rely on their legacy technology infrastructure for critical calculations dedicated to support enterprise-wide

More information

Building a Multi-Tenant Infrastructure for Diverse Application Workloads

Building a Multi-Tenant Infrastructure for Diverse Application Workloads Building a Multi-Tenant Infrastructure for Diverse Application Workloads Rick Janowski Marketing Manager IBM Platform Computing 1 The Why and What of Multi-Tenancy 2 Parallelizable problems demand fresh

More information

Understand (and potentially reduce) job wait times by examining scheduler configuration, load in the queue and account usage

Understand (and potentially reduce) job wait times by examining scheduler configuration, load in the queue and account usage SHARCNET General Interest Webinar Series Understand (and potentially reduce) job wait times by examining scheduler configuration, load in the queue and account usage James Desjardins High Performance Computing

More information

NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION APPLICATION SIZING GUIDE FOR SIEMENS NX APPLICATION GUIDE. Ver 1.0

NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION APPLICATION SIZING GUIDE FOR SIEMENS NX APPLICATION GUIDE. Ver 1.0 NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION APPLICATION SIZING GUIDE FOR SIEMENS NX APPLICATION GUIDE Ver 1.0 EXECUTIVE SUMMARY This document provides insights into how to deploy NVIDIA Quadro Virtual

More information

Oracle Financial Services Revenue Management and Billing V2.3 Performance Stress Test on Exalogic X3-2 & Exadata X3-2

Oracle Financial Services Revenue Management and Billing V2.3 Performance Stress Test on Exalogic X3-2 & Exadata X3-2 Oracle Financial Services Revenue Management and Billing V2.3 Performance Stress Test on Exalogic X3-2 & Exadata X3-2 O R A C L E W H I T E P A P E R J A N U A R Y 2 0 1 5 Table of Contents Disclaimer

More information

Job Scheduling with Lookahead Group Matchmaking for Time/Space Sharing on Multi-core Parallel Machines

Job Scheduling with Lookahead Group Matchmaking for Time/Space Sharing on Multi-core Parallel Machines Job Scheduling with Lookahead Group Matchmaking for Time/Space Sharing on Multi-core Parallel Machines Xijie Zeng and Angela Sodan University of Windsor, Windsor ON N9B 3P4, Canada zengx@uwindsor.ca,acsodan@uwindsor.ca

More information

Ask the right question, regardless of scale

Ask the right question, regardless of scale Ask the right question, regardless of scale Customers use 100s to 1,000s Of cores to answer business-critical Questions they couldn t have done before. Trivial to support different use cases Different

More information

July, 10 th From exotics to vanillas with GPU Murex 2014

July, 10 th From exotics to vanillas with GPU Murex 2014 July, 10 th 2014 From exotics to vanillas with GPU Murex 2014 COMPANY Selected Industry Recognition and Rankings 2013-2014 OVERALL #1 TOP TECHNOLOGY VENDOR #1 Trading Systems #1 Pricing & Risk Analytics

More information

A Systematic Approach to Performance Evaluation

A Systematic Approach to Performance Evaluation A Systematic Approach to Performance evaluation is the process of determining how well an existing or future computer system meets a set of alternative performance objectives. Arbitrarily selecting performance

More information

ANSYS, Inc. March 12, ANSYS HPC Licensing Options - Release

ANSYS, Inc. March 12, ANSYS HPC Licensing Options - Release 1 2016 ANSYS, Inc. March 12, 2017 ANSYS HPC Licensing Options - Release 18.0 - 4 Main Products HPC (per-process) 10 instead of 8 in 1 st Pack at Release 18.0 and higher HPC Pack HPC product rewarding volume

More information

Job Scheduling Challenges of Different Size Organizations

Job Scheduling Challenges of Different Size Organizations Job Scheduling Challenges of Different Size Organizations NetworkComputer White Paper 2560 Mission College Blvd., Suite 130 Santa Clara, CA 95054 (408) 492-0940 Introduction Every semiconductor design

More information

White paper A Reference Model for High Performance Data Analytics(HPDA) using an HPC infrastructure

White paper A Reference Model for High Performance Data Analytics(HPDA) using an HPC infrastructure White paper A Reference Model for High Performance Data Analytics(HPDA) using an HPC infrastructure Discover how to reshape an existing HPC infrastructure to run High Performance Data Analytics (HPDA)

More information

Increasing Enterprise Support Demand & Complexity

Increasing Enterprise Support Demand & Complexity PTC System Monitor Increasing Enterprise Support Demand & Complexity Diagnostics & Troubleshooting Tools based on Customer & TS Requirements Customer Challenges Visibility into System Health Time To Resolution

More information

Computer Servers Draft 2 Comment Response Summary

Computer Servers Draft 2 Comment Response Summary Ref. # Organization Topic Stakeholder Comment Summary EPA Response 1 Comment Summary Resilient server definition One stakeholder commented on the resilient server definition. The stakeholder commented

More information

Bluemix Overview. Last Updated: October 10th, 2017

Bluemix Overview. Last Updated: October 10th, 2017 Bluemix Overview Last Updated: October 10th, 2017 Agenda Overview Architecture Apps & Services Cloud Computing An estimated 85% of new software is being built for cloud deployment Cloud Computing is a

More information

IBM xseries 430. Versatile, scalable workload management. Provides unmatched flexibility with an Intel architecture and open systems foundation

IBM xseries 430. Versatile, scalable workload management. Provides unmatched flexibility with an Intel architecture and open systems foundation Versatile, scalable workload management IBM xseries 430 With Intel technology at its core and support for multiple applications across multiple operating systems, the xseries 430 enables customers to run

More information

Graph Optimization Algorithms for Sun Grid Engine. Lev Markov

Graph Optimization Algorithms for Sun Grid Engine. Lev Markov Graph Optimization Algorithms for Sun Grid Engine Lev Markov Sun Grid Engine SGE management software that optimizes utilization of software and hardware resources in heterogeneous networked environment.

More information

The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters*

The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters* The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters* Brett Bode, David M. Halstead, Ricky Kendall, and Zhou Lei Scalable Computing Laboratory, Ames Laboratory, DOE Wilhelm Hall, Ames,

More information

Parallels Remote Application Server and Microsoft Azure. Scalability and Cost of Using RAS with Azure

Parallels Remote Application Server and Microsoft Azure. Scalability and Cost of Using RAS with Azure Parallels Remote Application Server and Microsoft Azure and Cost of Using RAS with Azure Contents Introduction to Parallels RAS and Microsoft Azure... 3... 4 Costs... 18 Conclusion... 21 2 C HAPTER 1 Introduction

More information

Backfilling Scheduling Jobs Based on Priority Using in Grid Computing

Backfilling Scheduling Jobs Based on Priority Using in Grid Computing Backfilling Scheduling Jobs Based on Priority Using in Grid Computing Sandip Fakira Lokhande 1, Prof. S. R. Jadhao 2 1 ME Scholar, Babasaheb Naik College of Engineering, Pusad 2 Associate Professor, Department

More information

On Cloud Computational Models and the Heterogeneity Challenge

On Cloud Computational Models and the Heterogeneity Challenge On Cloud Computational Models and the Heterogeneity Challenge Raouf Boutaba D. Cheriton School of Computer Science University of Waterloo WCU IT Convergence Engineering Division POSTECH FOME, December

More information

Integration of Titan supercomputer at OLCF with ATLAS Production System

Integration of Titan supercomputer at OLCF with ATLAS Production System Integration of Titan supercomputer at OLCF with ATLAS Production System F. Barreiro Megino 1, K. De 1, S. Jha 2, A. Klimentov 3, P. Nilsson 3, D. Oleynik 1, S. Padolski 3, S. Panitkin 3, J. Wells 4 and

More information

Analytical Capability Security Compute Ease Data Scale Price Users Traditional Statistics vs. Machine Learning In-Memory vs. Shared Infrastructure CRAN vs. Parallelization Desktop vs. Remote Explicit vs.

More information

CORE APPLICATIONS ANALYSIS OF BUSINESS-CRITICAL ADABAS & NATURAL

CORE APPLICATIONS ANALYSIS OF BUSINESS-CRITICAL ADABAS & NATURAL ADABAS & NATURAL ANALYSIS OF BUSINESS-CRITICAL CORE APPLICATIONS CONTENTS 2 Core applications in a changing IT landscape 3 The need for comprehensive analysis 4 The complexity of core applications 5 An

More information

CS 143A - Principles of Operating Systems

CS 143A - Principles of Operating Systems CS 143A - Principles of Operating Systems Lecture 4 - CPU Scheduling Prof. Nalini Venkatasubramanian nalini@ics.uci.edu CPU Scheduling 1 Outline Basic Concepts Scheduling Objectives Levels of Scheduling

More information

The Evolution of Planning Software

The Evolution of Planning Software The Evolution of Planning Software Get it done Faster. July 29, 2015 Imagination at work Software Products Concorda Software Suite MAPS PSLF MARS Evaluates power system economics and impact of congestion

More information

High Performance Computing(HPC) & Software Stack

High Performance Computing(HPC) & Software Stack IBM HPC Developer Education @ TIFR, Mumbai High Performance Computing(HPC) & Software Stack January 30-31, 2012 Pidad D'Souza (pidsouza@in.ibm.com) IBM, System & Technology Group 2002 IBM Corporation Agenda

More information

A Slurm Simulator: Implementation and Parametric Analysis

A Slurm Simulator: Implementation and Parametric Analysis A Slurm Simulator: Implementation and Parametric Analysis Nikolay A. Simakov, Martins D. Innus, Matthew D. Jones,Robert L. DeLeon, Joseph P. White, Steven M. Gallo, Abani K. Patra and Thomas R. Furlani

More information

An Oracle White Paper September, Oracle Exalogic Elastic Cloud: A Brief Introduction

An Oracle White Paper September, Oracle Exalogic Elastic Cloud: A Brief Introduction An Oracle White Paper September, 2010 Oracle Exalogic Elastic Cloud: A Brief Introduction Introduction For most enterprise IT organizations, years of innovation, expansion, and acquisition have resulted

More information

HTCondor at the RACF. William Strecker-Kellogg RHIC/ATLAS Computing Facility Brookhaven National Laboratory April 2014

HTCondor at the RACF. William Strecker-Kellogg RHIC/ATLAS Computing Facility Brookhaven National Laboratory April 2014 HTCondor at the RACF SEAMLESSLY INTEGRATING MULTICORE JOBS AND OTHER DEVELOPMENTS William Strecker-Kellogg RHIC/ATLAS Computing Facility Brookhaven National Laboratory April 2014 RACF Overview Main HTCondor

More information

Digital Transformation & olvency II Simulations for L&G: Optimizing, Accelerating and Migrating to the Cloud

Digital Transformation & olvency II Simulations for L&G: Optimizing, Accelerating and Migrating to the Cloud Digital Transformation & olvency II Simulations for L&G: Optimizing, Accelerating and Migrating to the Cloud ActiveEon Introduction Products: Locations Workflows & Parallelization Some Customers IT Engineering

More information

On the Impact of Reservations from the Grid on Planning-Based Resource Management

On the Impact of Reservations from the Grid on Planning-Based Resource Management On the Impact of Reservations from the Grid on Planning-Based Resource Management Felix Heine 1, Matthias Hovestadt 1, Odej Kao 1, and Achim Streit 2 1 Paderborn Center for Parallel Computing (PC 2 ),

More information

Oracle In-Memory Cost Management Cloud. Release What s New

Oracle In-Memory Cost Management Cloud. Release What s New Oracle In-Memory Cost Management Cloud Release 17.3 What s New TABLE OF CONTENTS REVISION HISTORY... 3 OVERVIEW... 4 UPDATE TASKS... 5 RELEASE FEATURE SUMMARY... 6 ORACLE IN-MEMORY COST MANAGEMENT CLOUD

More information

TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF SYMBOLS AND ABBREVIATIONS

TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF SYMBOLS AND ABBREVIATIONS viii TABLE OF CONTENTS ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF SYMBOLS AND ABBREVIATIONS v xviii xix xxii 1. INTRODUCTION 1 1.1 MOTIVATION OF THE RESEARCH 1 1.2 OVERVIEW OF PROPOSED WORK 3 1.3

More information

GADD platform Overview

GADD platform Overview GADD platform Overview A GADD White Paper Published January, 2012 gaddsoftware.com Table of content 1. Introduction... 4 2. What makes the GADD platform different?... 4 2.1. How it has evolved... 4 2.2.

More information

Multi-core job submission and grid resource scheduling for ATLAS AthenaMP

Multi-core job submission and grid resource scheduling for ATLAS AthenaMP Journal of Physics: Conference Series Multi-core job submission and grid resource scheduling for ATLAS AthenaMP To cite this article: D Crooks et al 2012 J. Phys.: Conf. Ser. 396 032115 View the article

More information

arxiv: v1 [cs.dc] 18 Jun 2018

arxiv: v1 [cs.dc] 18 Jun 2018 arxiv:1806.06728v1 [cs.dc] 18 Jun 2018 AccaSim: a Customizable Workload Management Simulator for Job Dispatching Research in HPC Systems Cristian Galleguillos, Zeynep Kiziltan Alessio Netti Ricardo Soto

More information

Addressing the I/O bottleneck of HPC workloads. Professor Mark Parsons NEXTGenIO Project Chairman Director, EPCC

Addressing the I/O bottleneck of HPC workloads. Professor Mark Parsons NEXTGenIO Project Chairman Director, EPCC Addressing the I/O bottleneck of HPC workloads Professor Mark Parsons NEXTGenIO Project Chairman Director, EPCC I/O is key Exascale challenge Parallelism beyond 100 million threads demands a new approach

More information

DEPEI QIAN. HPC Development in China: A Brief Review and Prospect

DEPEI QIAN. HPC Development in China: A Brief Review and Prospect DEPEI QIAN Qian Depei, Professor at Sun Yat-sen university and Beihang University, Dean of the School of Data and Computer Science of Sun Yat-sen University. Since 1996 he has been the member of the expert

More information

Job Scheduling in Cluster Computing: A Student Project

Job Scheduling in Cluster Computing: A Student Project Session 3620 Job Scheduling in Cluster Computing: A Student Project Hassan Rajaei, Mohammad B. Dadfar Department of Computer Science Bowling Green State University Bowling Green, Ohio 43403 Phone: (419)372-2337

More information

Harvester. Tadashi Maeno (BNL)

Harvester. Tadashi Maeno (BNL) Harvester Tadashi Maeno (BNL) Outline Motivation Design Workflows Plans 2 Motivation 1/2 PanDA currently relies on server-pilot paradigm PanDA server maintains state and manages workflows with various

More information

Latest Computational and Mathematical Tools for Transmission Expansion

Latest Computational and Mathematical Tools for Transmission Expansion Latest Computational and Mathematical Tools for Transmission Expansion IEEE PES T&D Meeting, Chicago IL Clayton Barrows, PhD April 15, 2014 NREL is a national laboratory of the U.S. Department of Energy,

More information

How to create an Azure subscription

How to create an Azure subscription How to create an Azure subscription Azure is a cloud hosting service offered by Microsoft, and offers services like file storage, backups, database and Windows and Linux virtual machines. Anyone can harness

More information

A comprehensive mobile solution for Staff Management. CrewBuddy

A comprehensive mobile solution for Staff Management. CrewBuddy A comprehensive mobile solution for Staff Management CrewBuddy A Comprehensive Staff Management Solution CrewBuddy is a leading-edge mobile application that brings the convenience of managing staff information

More information

USING HPC CLASS INFRASTRUCTURE FOR HIGH THROUGHPUT COMPUTING IN GENOMICS

USING HPC CLASS INFRASTRUCTURE FOR HIGH THROUGHPUT COMPUTING IN GENOMICS USING HPC CLASS INFRASTRUCTURE FOR HIGH THROUGHPUT COMPUTING IN GENOMICS Claude SCARPELLI Claude.Scarpelli@cea.fr FUNDAMENTAL RESEARCH DIVISION GENOMIC INSTITUTE Intel DDN Life Science Field Day Heidelberg,

More information

ProRenaTa: Proactive and Reactive Tuning to Scale a Distributed Storage System

ProRenaTa: Proactive and Reactive Tuning to Scale a Distributed Storage System ProRenaTa: Proactive and Reactive Tuning to Scale a Distributed Storage System Ying Liu, Navaneeth Rameshan, Enric Monte, Vladimir Vlassov and Leandro Navarro 2015 15th IEEE/ACM International Symposium

More information

Agile Computing on Business Grids

Agile Computing on Business Grids C&C Research Laboratories NEC Europe Ltd Rathausallee 10 D-53757 St Augustin Germany Junwei Cao Agile Computing on Business Grids An Introduction to AgileGrid June 2003 Agile Computing on Business Grids

More information

REQUEST FOR PROPOSAL FOR

REQUEST FOR PROPOSAL FOR REQUEST FOR PROPOSAL FOR HIGH PERFORMANCE COMPUTING (HPC) SOLUTION Ref. No. PHY/ALK/43 (27/11/2012) by DEPARTMENT OF PHYSICS UNIVERSITY OF PUNE PUNE - 411 007 INDIA NOVEMBER 27, 2012 1 Purpose of this

More information

Optimize your FLUENT environment with Platform LSF CAE Edition

Optimize your FLUENT environment with Platform LSF CAE Edition Optimize your FLUENT environment with Platform LSF CAE Edition Accelerating FLUENT CFD Simulations ANSYS, Inc. is a global leader in the field of computer-aided engineering (CAE). The FLUENT software from

More information

The Gang Scheduler - Timesharing on a Cray T3D

The Gang Scheduler - Timesharing on a Cray T3D The Gang Scheduler - sharing on a Cray T3D Morris Jette, David Storch, and Emily Yim, Lawrence Livermore National Laboratory, Livermore, California, USA ABSTRACT: The Gang Scheduler, under development

More information

Micro-Virtualization. Maximize processing power use and improve system/energy efficiency

Micro-Virtualization. Maximize processing power use and improve system/energy efficiency Micro-Virtualization Maximize processing power use and improve system/energy efficiency Disclaimers We don t know everything But we know there is a problem and we re solving (at least part of) it And we

More information

ValuePack:Value-Based Scheduling Framework for CPU-GPU Clusters

ValuePack:Value-Based Scheduling Framework for CPU-GPU Clusters ValuePack:Value-Based Scheduling Framework for CPU-GPU Clusters Vignesh T. Ravi 1, Michela Becchi 2, Gagan Agrawal 1, and Srimat Chakradhar 3 1 Dept. of Computer Science and Engg., 2 Dept. of Electrical

More information

CPU Scheduling. Disclaimer: some slides are adopted from book authors and Dr. Kulkarni s slides with permission

CPU Scheduling. Disclaimer: some slides are adopted from book authors and Dr. Kulkarni s slides with permission CPU Scheduling Disclaimer: some slides are adopted from book authors and Dr. Kulkarni s slides with permission 1 Recap Deadlock prevention Break any of four deadlock conditions Mutual exclusion, no preemption,

More information

Enabling Resource Sharing between Transactional and Batch Workloads Using Dynamic Application Placement

Enabling Resource Sharing between Transactional and Batch Workloads Using Dynamic Application Placement Enabling Resource Sharing between Transactional and Batch Workloads Using Dynamic Application Placement David Carrera 1, Malgorzata Steinder 2, Ian Whalley 2, Jordi Torres 1, and Eduard Ayguadé 1 1 Technical

More information

Multi-Stage Resource-Aware Scheduling for Data Centers with Heterogenous Servers

Multi-Stage Resource-Aware Scheduling for Data Centers with Heterogenous Servers MISTA 2015 Multi-Stage Resource-Aware Scheduling for Data Centers with Heterogenous Servers Tony T. Tran + Meghana Padmanabhan + Peter Yun Zhang Heyse Li + Douglas G. Down J. Christopher Beck + Abstract

More information

Maximizing your HPC cluster investment. Cray User Group. John Corne, Pre-Sales Engineer 23-May-2018

Maximizing your HPC cluster investment. Cray User Group. John Corne, Pre-Sales Engineer 23-May-2018 Maimizing your HPC cluster investment Cray User Group John Corne, Pre-Sales Engineer 23-May-2018 Agenda Bright Computing Bright Cluster Manager What s new in 8.1 Workload Accounting and Reporting Bright

More information

IBM General Parallel File System (GPFS TM )

IBM General Parallel File System (GPFS TM ) November 2013 IBM General Parallel File System (GPFS TM ) Status, what s new and what s coming Agenda GPFS Updates Status of new features Roadmap discussion Research Activities 2 High Performance Common

More information

IBM WebSphere Extended Deployment, Version 5.1

IBM WebSphere Extended Deployment, Version 5.1 Offering a dynamic, goals-directed, high-performance environment to support WebSphere applications IBM, Version 5.1 Highlights Dynamically accommodates variable and unpredictable business demands Helps

More information

Preston Smith Director of Research Services. September 12, 2015 RESEARCH COMPUTING GIS DAY 2015 FOR THE GEOSCIENCES

Preston Smith Director of Research Services. September 12, 2015 RESEARCH COMPUTING GIS DAY 2015 FOR THE GEOSCIENCES Preston Smith Director of Research Services RESEARCH COMPUTING September 12, 2015 GIS DAY 2015 FOR THE GEOSCIENCES OVERVIEW WHO ARE WE? IT Research Computing (RCAC) A unit of ITaP (Information Technology

More information

HPC Workload Management Tools: Tech Brief Update

HPC Workload Management Tools: Tech Brief Update 89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com @EdisonGroupInc 212.367.7400 HPC Workload Management Tools: Tech Brief Update IBM Platform LSF Meets Evolving High Performance Computing

More information

Design, Install and Manage System Center 2012 Operations Manager

Design, Install and Manage System Center 2012 Operations Manager Design, Install and Manage System Center 2012 Operations Manager Course 55004-55006 5 Days Instructor-led, Hands-on Introduction This five-day instructor-led course combines the content found in two System

More information

Bringing AI Into Your Existing HPC Environment,

Bringing AI Into Your Existing HPC Environment, Bringing AI Into Your Existing HPC Environment, and Scaling It Up Introduction 2 Today s advancements in high performance computing (HPC) present new opportunities to tackle exceedingly complex workflows.

More information

Enhanced Algorithms for Multi-Site Scheduling

Enhanced Algorithms for Multi-Site Scheduling Enhanced Algorithms for Multi-Site Scheduling Carsten Ernemann 1, Volker Hamscher 1, Ramin Yahyapour 1, and Achim Streit 2 1 Computer Engineering Institute, University of Dortmund, 44221 Dortmund, Germany,

More information

CONTINUOUS INTEGRATION IN A CRAY MULTIUSER ENVIRONMENT

CONTINUOUS INTEGRATION IN A CRAY MULTIUSER ENVIRONMENT erhtjhtyhy CONTINUOUS INTEGRATION IN A CRAY MULTIUSER ENVIRONMENT BEN LENARD HPC Systems & Database Administrator May 22 nd, 2018 Stockholm, Sweden ARGONNE Director: Paul Kerns Managed by: UChicago Argonne,

More information

Grid 2.0 : Entering the new age of Grid in Financial Services

Grid 2.0 : Entering the new age of Grid in Financial Services Grid 2.0 : Entering the new age of Grid in Financial Services Charles Jarvis, VP EMEA Financial Services June 5, 2008 Time is Money! The Computation Homegrown Applications ISV Applications Portfolio valuation

More information

Service Solution. Brochure. Detailed status monitoring and intelligent alerting improves uptime and device availability.

Service Solution. Brochure. Detailed status monitoring and intelligent alerting improves uptime and device availability. Brochure Service Solution Detailed status monitoring and intelligent alerting improves uptime and device availability. Custom reports per device or function are generated automatically to keep an eye on

More information

RODOD Performance Test on Exalogic and Exadata Engineered Systems

RODOD Performance Test on Exalogic and Exadata Engineered Systems An Oracle White Paper March 2014 RODOD Performance Test on Exalogic and Exadata Engineered Systems Introduction Oracle Communications Rapid Offer Design and Order Delivery (RODOD) is an innovative, fully

More information

Grid Computing Scheduling Jobs Based on Priority Using Backfilling

Grid Computing Scheduling Jobs Based on Priority Using Backfilling Grid Computing Scheduling Jobs Based on Priority Using Backfilling Sandip Fakira Lokhande 1, Sachin D. Chavhan 2, Prof. S. R. Jadhao 3 1-2 ME Scholar, Department of CSE, Babasaheb Naik College of Engineering,

More information

An Oracle White Paper June Leveraging the Power of Oracle Engineered Systems for Enterprise Policy Automation

An Oracle White Paper June Leveraging the Power of Oracle Engineered Systems for Enterprise Policy Automation An Oracle White Paper June 2012 Leveraging the Power of Oracle Engineered Systems for Enterprise Policy Automation Executive Overview Oracle Engineered Systems deliver compelling return on investment,

More information

Advantage Risk Management. Evolution to a Global Grid

Advantage Risk Management. Evolution to a Global Grid Advantage Risk Management Evolution to a Global Grid Michael Oltman Risk Management Technology Global Corporate Investment Banking Agenda Warm Up Project Overview Motivation & Strategy Success Criteria

More information

Exercise: Fractals, Task Farms and Load Imbalance

Exercise: Fractals, Task Farms and Load Imbalance Exercise: Fractals, Task Farms and Load Imbalance May 24, 2015 1 Introduction and Aims This exercise looks at the use of task farms and how they can be applied to parallelise a problem. We use the calculation

More information

Efficient Access to a Cloud-based HPC Visualization Cluster

Efficient Access to a Cloud-based HPC Visualization Cluster Efficient Access to a Cloud-based HPC Visualization Cluster Brian Fromme @ZapYourBrain 2017 Penguin Computing. All rights reserved. Agenda 1. Data Science and HPC challenges 2. Technologies that improve

More information

The Dynamic PBS Scheduler

The Dynamic PBS Scheduler The Dynamic PBS Scheduler Jim Glidewell High Performance Computing Enterprise Storage and Servers May 8, 2008 BOEING is a trademark of Boeing Management Company. Computing Environment Cray X1 2 Chassis,

More information

Task Resource Allocation in Grid using Swift Scheduler

Task Resource Allocation in Grid using Swift Scheduler Int. J. of Computers, Communications & Control, ISSN 1841-9836, E-ISSN 1841-9844 Vol. IV (2009), No. 2, pp. 158-166 Task Resource Allocation in Grid using Swift Scheduler K. Somasundaram, S. Radhakrishnan

More information

GemaLogic Energy Management Platform

GemaLogic Energy Management Platform GemaLogic Energy Management Platform 1 GEMALOGIC ENERGY MANAGEMENT (EM) SOFTWARE PLATFORM GENERAL DESCRIPTION OF GEMALOGIC GemaLogic enables a systematic approach to continuous improvements of energy efficiency

More information

Preemptive Resource Management for Dynamically Arriving Tasks in an Oversubscribed Heterogeneous Computing System

Preemptive Resource Management for Dynamically Arriving Tasks in an Oversubscribed Heterogeneous Computing System 2017 IEEE International Parallel and Distributed Processing Symposium Workshops Preemptive Resource Management for Dynamically Arriving Tasks in an Oversubscribed Heterogeneous Computing System Dylan Machovec

More information