HPC USAGE ANALYTICS. Supercomputer Education & Research Centre Akhila Prabhakaran

Size: px

Start display at page:

Download "HPC USAGE ANALYTICS. Supercomputer Education & Research Centre Akhila Prabhakaran"

Martha Webster
5 years ago
Views:

1 HPC USAGE ANALYTICS Supercomputer Education & Research Centre Akhila Prabhakaran

2 OVERVIEW: BATCH COMPUTE SERVERS Dell Cluster : Batch cluster consists of 3 Nodes of Two Intel Quad Core X5570 Xeon CPUs. Delta Cluster: 128 Core cluster with 8 Intel(R) Xeon(R) CPU E Nodes of 2 processors (16 cores per node) with Intel Omni-path based Infiniband interconnection for MPI communication.

3 OVERVIEW: BATCH COMPUTE SERVERS Fermi Cluster: 5 Node heterogeneous cluster dedicated to GPU jobs. Tyrone Cluster: 800 core cluster composed of 17 Nodes (1 head and 16 execution nodes). This cluster is a heterogenous cluster composed of two types of nodes, 9 nodes with 32- cores each and 8-nodes with 64-cores each.

4 OVERVIEW: SAHASRAT CPU only cluster: Intel Haswell 2.5 GHz based CPU cluster with 1376 nodes; Each node has 2 CPU sockets with 12 cores each, 128GB RAM and connected using Cray Aries interconnect. Total CPU compute capacity of cores and GB RAM. Accelerator based clusters: Two accelerator clusters Nvidia GPU cards (44 nodes) [Tesla K40 card with 2880 cores and 12GB device memory] Intel Xeon-Phi cards (24 nodes) Xeon-Phi Processor 7210 has 64 cores with 16GB accelerator memory and 96GB DDR4 memory per node.

5 OVERVIEW: SAHASRAT High Speed Storage: 2 PB usable spaces provided by high speed DDN storage unit supporting Cray s parallel Lustre filesystem. Software environment: Cray s customized Linux OS, called Cray Linux Environment, Supports Intel and open-source based Gnu compilers Parallel libraries like OpenMP, MPI, CUDA and Intel Cluster software Extensive range of parallel Scientific and Mathematical libraries like BLAS, LAPACK, Scalapack, fftw, hdf5 etc. DDT parallel debugger and profiler

6 OVERVIEW: WORKLOAD MANAGER Portable Batch System (PBS) is a computer software that performs job scheduling. Its primary task is to allocate computational tasks, i.e., batch jobs, among the available computing resources. OpenPBS original open source version released in 1998 (not actively developed) TORQUE a fork of OpenPBS. PBS Professional (PBS Pro) the version of PBS offered by Altair Engineering that is dual licensed under an open source and a commercial license.

7 NEED FOR ANALYTICS Monitoring the availability of resources. Monitoring real-time usage of resources. Understanding the nature of HPC workloads. How are the allocated resources being used? Research domains being serviced by in-house HPC resources. Insight into user domain, application domain and usage patterns. Understand Queue Waiting Times & Job Efficiency. It is necessary that the data be presented in intuitive and visually understandable way so that quick inferences can be made.

8 THIRD PARTY ANALYTICS TOOLS Ganglia: Real-time monitoring and execution environment and is used by hundreds of universities, private and government laboratories and commercial cluster implementers. Ganglia in SERC: Installed on the basic HPCs, i.e., on tesla, fermi, dell and tyrone clusters. Ganglia ties up these systems as a single pool. Data regarding availability of nodes and cores and their average use is shown. Who can access Ganglia?

9 SERC ANALYTICS PLATFORM FOR BATCH COMPUTE SERVERS URL : Access: Limited access to SERC administrators Behind the scenes: PBS logs, NIS user information, Faculty and Department Information. Automated monthly scripts. Scripts in R to consolidate & generate HTML. Static HTML pages. (Linux, Apache)

10 Basic HPC Clusters Get information from PBS Monthly logs SERC ANALYTICS PLATFORM FOR BATCH COMPUTE SERVERS Identify and associate jobs with research groups and departments. Monthly utilization charts for each cluster. Insights Utilization of these clusters. Department-wise & Research group wise utilization. Potential inputs for capacity planning and new procurements.

11 SAHASRAT - QUEUES Queue Min & Max Cores Wall Time Limit Priority idqueue Hours 100 small Hours 300 small Hours 300 medium Hours 350 large Hours 400 gpu queues Core counts ranging from 1-12 and one GPU on a node 24 Hours 100 #!/bin/sh #This job should be redirected to small queue #PBS -N jobname #PBS -l select=100:ncpus=24 #PBS -l walltime=24:00:00 #PBS -l place=scatter #PBS -l accelerator_type="none" #PBS-S /bin/sh@sdb -V. /opt/modules/default/init/sh cd <exepath> aprun -j 1 -n N 24./a.out

12 SAHASRAT ANALYTICS PLATFORM DATA SOURCES User I n f i n i b a n d Lustre File System PBS Logs ALPS event logs RUR (Resource Utilization Reporting) Logs I/O Access Patterns (jobstat)

13 SERC ANALYTICS PLATFORM: DATA COLLECTION PBSPro Accounting Logs Job specific allocations and Queue Wait Times ALPS event logs Resource allocation in terms on nodes list, apstart, append times, cores used by each aprun statement. RUR (Resource Utilization Reporting) Logs RUR is a tool for gathering statistics on how system resources are being used by applications. Resource utilization statistics are gathered from compute nodes. RUR runs primarily before the job has started and after it ends, ensuring minimal impact on performance. Details for each JOBID, APID (aprun) is specified in these logs Job Start Time, Job End Time, details for each aprun, apstart, append, utime, stime, Nodes allocated, gpu utilization, all aprun statements for the job etc.

14 SAMPLE LOG User Job ID aprun: aprun aprun aprun Job ID ~15MB to 55MB daily log size aprun: aprun aprun aprun

15 HPC USAGE ANALYTICS PLATFORM: BUILDING BLOCKS Users Faculty Department Shell Scripts PBS RUR JOBSTAT Data Files (reports, logs, json like,.gz) Data Processing & Consolidation to Supervisors

16 SERC ANALYTICS PLATFORM URL : Access : By User Role (Admins, Supervisors, Users) Authentication: Computational ID Realtime Insights: Realtime SahasraT Availability & Usage Realtime User Jobs on Queue with Estimated Queue WaitTime Realtime Running Jobs Status & Distribution of Utilization Realtime view of IOSTATS (aggregated across all OSTs)

17 ESTIMATED JOB START TIME Estimates the start time for all jobs on Queue using a PBS Server Attribute. PBS allows setting of the est_start_time_freq for retrieving estimated start times for all jobs. By setting the est_start_time_freq = 0, we have been able to retrieve the estimated start time for each job on Queue. These estimates are retrieved using the qstat -T command. Using a cron-job that runs every 10 minutes, we get this information for all Queued Jobs. The estimated start time for each job is computed by PBS on every scheduler cycle and is likely to change based on backfilling.

18 JOB CPU EFFICIENCY CPU Efficiency = Utime + Stime Cores X Wall Time Computed for each aprun statement Averaged for a Job Number of cores equivalent to hyperthreaded cores for jobs that have implicitly or explicitly used -j2 on the aprun command T11:09: :00 c0-0c1s1n2 RUR 2417 p t [RUR@34] uid: 12345, apid: 86989, jobid: 23460, cmdname: /lus/tmp/rur /./cpu plugin: taskstats ['utime', , 'stime', 0, 'max_rss', 940, 'rchar', , 'wchar', 90, 'exitcode:signal', ['0:0'], 'core', 0]

19 JOB I/O STATISTICS: LUSTRE FILE SYSTEM

JOB I/O STATISTICS A script runs every 8 minutes SSH to the 16 OSS nodes of Cray Gather the job_stats data for the 96 OSTs Each OSS will provide stats for the 6 OSTs its

20 JOB I/O STATISTICS A script runs every 8 minutes SSH to the 16 OSS nodes of Cray Gather the job_stats data for the 96 OSTs Each OSS will provide stats for the 6 OSTs its responsible for The script takes 5-6 seconds to complete. At the end of each job run, we get a single output file. Each log is KB, logs per day ~ few GB per day

21 SERC ANALYTICS PLATFORM URL : Access : Restricted Insights: Overall System Utilization Workloads by Queue and Core counts Timeline of availability and utilization of resources Queue Wait Times & Insight into jobs on special/reserved queues. Job Completion Statistics

22 SERC ANALYTICS PLATFORM Workload Insights: Overall Availability & Utilization of Resources. Monthly Utilization by Queue, Department, Research Group & User Job size characterization based on historical usage patterns. Job success & exit code patterns based on historical usage. Resource Allocated versus Resource Used Job wise: Efficiency, Queue Wait Time and IO Stats Monthly Summary of Usage ed to research groups. System Outage Calendar

SERC USER S FORUM URL : https://usersupport.serc.iisc.

23 SERC USER S FORUM URL : Accessible using computational ID & password Forum is broadly organized into 4 categories FAQ Systems Support Software Support HPC Discussion Forum

24 THANK YOU