Performance Modeling and Contracts

Similar documents
Sizing SAP Hybris Billing, pricing simulation Consultant Information for Release 1.1 (et seq.) Document Version

NSF {Program (NSF ) first announced on August 20, 2004} Program Officers: Frederica Darema Helen Gill Brett Fleisch

NEMO performance assessment report

Kaseya Traverse Unified Cloud, Network, Server & Application Monitoring

Towards Autonomic Virtual Applications in the In-VIGO System

This document describes the overall software development process of microcontroller software during all phases of the Company Name product life cycle.

INTEGRATED TEST PLATFORMS: TAKING ADVANTAGE OF ADVANCES IN COMPUTER HARDWARE AND SOFTWARE

Platform-Based Design of Heterogeneous Embedded Systems

Platform-Based Design of Heterogeneous Embedded Systems

Challenges in Application Scaling In an Exascale Environment

The Development of a Low Cost approach to the Prototyping of Construction Plant

Architecture-Aware Cost Modelling for Parallel Performance Portability

ITIL Glossary Page 1 of 1

Models in Engineering Glossary

Accelerate HPC Development with Allinea Performance Tools. Olly Perks & Florent Lebeau

Introduction. DIRAC Project

Delivering High Performance for Financial Models and Risk Analytics

Industrial IT System 800xA Engineering

MODPROD 2017, Linköping February 8, 2017

Center for Scalable Application Development Software: Center Overview. John Mellor-Crummey (Rice)

Live Video Analytics at Scale with Approximation and Delay-Tolerance

Addressing Predictive Maintenance with SAP Predictive Analytics

COMBINE: A decision support system for real time traffic control

Software Life Cycle. Main Topics. Introduction

CHAPTER 6 DYNAMIC SERVICE LEVEL AGREEMENT FOR GRID RESOURCE ALLOCATION

Design System for Machine Learning Accelerator

LexisNexis Active Insights: Evolving the Insurance Industry with the HPCC Systems Open Source Big Data Platform

Kaseya Traverse Predictive SLA Management and Monitoring

Improving System Dependability with Functional Alternatives

Stay Tuned Computational Science NeSI. Jordi Blasco

Embark on Your Data Management Journey with Confidence

ProRenaTa: Proactive and Reactive Tuning to Scale a Distributed Storage System

Starting Workflow Tasks Before They re Ready

DEPEI QIAN. HPC Development in China: A Brief Review and Prospect

VirtualWisdom Analytics Overview

AXON PREDICT ANALYTICS FOR VXWORKS

Project Management Process Groups. PMP Study Group Based on the PMBOK Guide 4 th Edition

A technical discussion of performance and availability December IBM Tivoli Monitoring solutions for performance and availability

Oil reservoir simulation in HPC

CP2K PARALLELISATION AND OPTIMISATION

Next Generation Performance Monitoring Machine Learning Algorithms for Anomaly Detection

Using Application Response to Monitor Microsoft Outlook

High Performance Computing(HPC) & Software Stack

Simplifying Hadoop. Sponsored by. July >> Computing View Point

Cray and Allinea. Maximizing Developer Productivity and HPC Resource Utilization. SHPCP HPC Theater SEG 2016 C O M P U T E S T O R E A N A L Y Z E 1

DESIGNING AND DEVELOPING MICROSOFT SHAREPOINT SERVER 2010 APPLICATIONS. Course: 10232A Duration: 5 Days; Instructor-led

The SWRLing Future of OWL. Mark Greaves DARPA / IXO

CHAPTER 6 RESULTS AND DISCUSSION

Sizing SAP Systems. Contents. 1 Introduction Sizing Methods Sizing Tools Quick Sizer Sizing Approaches...

AMR (Adaptive Mesh Refinement) Performance Benchmark and Profiling

Business behavioral monitoring Use of forecast weather prediction to monitor business aberrant behavior

Digital Wind Operations Optimization from GE Renewable Energy. Enhance the performance and efficiency of your people and machines to drive outcomes

Using AT&T IoT Services to Broker M2M Data for Your Customers and Their Devices

Engineering support in IC H&CD system on Development of IC-PSC prototype & Fast controller Phase 2

The Manycore Shift. Microsoft Parallel Computing Initiative Ushers Computing into the Next Era

CIM Forum Charter Dated

Test Workflow. Michael Fourman Cs2 Software Engineering

Jack Weast. Principal Engineer, Chief Systems Engineer. Automated Driving Group, Intel

: 55048B: No-Code SharePoint Workflows With SharePoint Designer 2013

Cactus-G: Experiments with a Grid-Enabled Computational Framework

Maru and Toru: Item-specific logistics solutions based on ROS. Moritz Tenorth, Ulrich Klank and Nikolas Engelhard

Synthesizing Representative I/O Workloads Using Iterative Distillation (Extended Version GIT-CERCS-03-29) Last revised: 18 December 2003

Profit Suite R300. Release Highlights

Chapter 2: Project Methodologies and Processes

Performance monitors for multiprocessing systems

Francine Berman University of California, San Diego

Automated Service Intelligence (ASI)

DOWNTIME IS NOT AN OPTION

Introduction to Simulink & Stateflow

Development of an artificial vision system to automatically inspect blood typing cards

Engineering support in IC H&CD system on developement of IC-PSC prototype & fast controller

RESOLVING APPLICATION DEVELOPMENT ISSUES USING SOA Y. KIRAN KUMAR 1, G.SUJATHA 2, G. JAGADEESH KUMAR 3

Superlink-online and BOINC

Using Machine Learning and Analytics to Understand How MQ Impacts Your Business

FULTECH CONSULTING RISK TECHNOLOGIES

AC235. SAP Convergent Charging 4.1 COURSE OUTLINE. Course Version: 15 Course Duration: 5 Day(s)

10/1/2013 BOINC. Volunteer Computing - Scheduling in BOINC 5 BOINC. Challenges of Volunteer Computing. BOINC Challenge: Resource availability

The WW Technology Group

IBM ICE (Innovation Centre for Education) Welcome to: Unit 1 Overview of delivery models in Cloud Computing. Copyright IBM Corporation

Combining OpenCV and High Level Synthesis to Accelerate your FPGA / SoC EV Application

Bed Management Solution (BMS)

Leistungsanalyse von Rechnersystemen

GEARING FACTORS. The A FLEXIBLE SIZING APPROACH

2016 Smart Grid R&D Program Peer Review Meeting

HPC Workload Management Tools: Tech Brief Update

Evaluating Enterprise Architectures through Executable Models

Obtaining Extremely Detailed Information at Scale. Jesus Labarta Barcelona Supercomputing Center

Real-Time and Embedded Systems

BluePlant SCADA/HMI Software

Agenda. z/vm and Linux Performance Management. Customer Presentation. New Product Overview. Opportunity. Current products. Future product.

A Profile Guided Approach to Scheduling in Cluster and Multi-cluster Systems

Modern Payment Fraud Prevention at Big Data Scale

How Jukin Media Leverages Metricly for Performance and Capacity Monitoring

Model-Driven Design-Space Exploration for Software-Intensive Embedded Systems

QUALITY MANAGEMENT FOR MOBILE COMMUNICATION SOFTWARE

The Importance of Complete Data Sets for Job Scheduling Simulations

Advanced Machine Monitoring. Whitepaper

Achieving Agility and Flexibility in Big Data Analytics with the Urika -GX Agile Analytics Platform

5 Things to Know About Network Monitoring in a Cloud-Centric World

Transcription:

Performance Modeling and Contracts Ruth Aydt Dan Reed Celso Mendes Fredrik Vraalsen Pablo Research Group Department of Computer Science University of Illinois {aydt,reed,cmendes,vraalsen}@cs.uiuc.edu http://www-pablo.cs.uiuc.edu

Making the Real-time Performance Monitor a Program Preparation System Reality Execution Environment Performance Feedback Software Components Performance Problem Real-time Performance Monitor Whole- Program Compiler Source Application Configurable Object Program Service Negotiator Scheduler Negotiation Grid Runtime System Libraries Dynamic Optimizer

Requirements! Performance models that predict application performance on a given set of Grid resources! Tools and interfaces that allow us to Monitor application execution in realtime Detect when observed performance does not match predicted performance Identify cause of problem Provide feedback to guide dynamic reconfiguration

Sources of Performance Models! Developer knowledge of application or library behavior ScaLAPACK Jack s team Considerable detail possible! Compile time analysis of code Rice team - Keith Use compiler understanding of code behavior to build performance predictions! Historical data from previous runs or observed behavior of current execution so far Pablo group Learn from past experience

Contract Specification! Boilerplate for specifying performance model inputs and outputs; enumerates what all parties are committing to provide! Given a set of resources (compute, network, I/O, ) Fine Grid with certain capabilities (flop rate, latency, ) NWS for particular problem parameters (matrix size, image resolution, ) part of job submission the application will achieve a specified, measurable

Real-time Performance Monitor! Decide if the contract has been violated! Strictly speaking, the contract is violated if any of the resource, capability, problem parameter or performance specifications are not met during the execution! In practice, tolerate a level of contract violation specifications will have inaccuracies! The contract violation policy should consider the severity persistence cumulative effect of the breach of contract in determining when

Approach: Tunable Tolerance! Use Autopilot decision procedures that are based on fuzzy logic to deal with uncertainty! These support intuitive reasoning about degrees of violation if FLOP rate has been low for a long time the contract is violated what constitutes low, long, and violated can be adjusted to express different levels of uncertainty and tolerance can also set threshold on how bad it must be violated before it is actually reported many knobs to turn!! Violation transitions are smooth rather than discrete as they are with decision tables

ScaLAPACK Developer Model! Model predicts duration for each iteration! An Autopilot Sensor inserted in application reports iteration number and actual duration! Contract Monitor computes ratio of actual to predicted time; ratio passed to decision procedure 1! Fuzzy rules specify contract output based on ratio var timeratio ( 0, 10 ) { set trapez LOW ( 0, 1, 0, 1 ); set trapez HIGH ( 2, 10, 1, 0 ); }; var contract ( -1, 2 ) { set triangle OK ( 1, 1, 1 ); set triangle VIOLATED ( 0, 1, 1 ); }; 0 LOW HIGH 0 2 4 6 8 10

Contract Monitoring application process 0 register sensor Architecture sensor data Resources, Capabilities, Program Parameters, Performance Model, Violation Policy PPS & Schedule r NWS Autopilot Manager MDS register sensor application process 1 locate sensors sensor data Contract Monitor Archive Feedback

Application Signature Model! Application Intrinsic Metrics description of application demands on resources sample metrics» FLOPS/statement, I/O bytes/statement, bytes/message values are independent of execution platform values may depend on problem parameters! Application Signature trajectory of values through N-dimensional metric space one trajectory per task» correlated for data parallel codes M 1 M 3 M 2

System Space Signature! System Space Metrics description of resource response to application demands sample metrics» FLOPS/second, I/O bytes/second, message bytes/second values are dependent on execution platform quantify actual performance! System Space Signature trajectory of values through N-dimensional metric space will vary across application executions, even on the same resources M 1 M 3 M 2

Performance Prediction! Given Strategy application intrinsic behavior resource capability information project application signature into system space signature, in effect predicting performance! Many possible projection strategies single figure of merit (scaling in each dimension)»peak MFLOPS, bandwidth, I/O rate»benchmark suite measurements»previous application executions (learning)

Single Figure of Merit A Instructions FLOPS Projection FLOPS X = Second Instructions Second 1 Application Intrinsic Projection Factor System Specific Messages Bytes Messages B X = 2 Byte Second Second Metric B Metric 2 Projection Metric A Metric 1

ScaLAPACK Application Signature! System independent modulo instructions versus statements! Trajectory reflects change in application demands over course

Projected Behavior (3 resource sets) opus major & torc major

Contract Implementation! Cluster the (projected) system space points centroids define nominal predicted behavior radii define tolerable range of values! Compare actual performance to cluster predictions! If far from cluster, report violation var distancefromcentroid ( 0, 100 ) { set trapez SHORT ( 0,.3, 0,.3 ); set trapez LONG (.6, 100,.3, 0 ); }; var contract ( -1, 2 ) { set triangle OK ( 1, 1, 1 ); set triangle VIOLATED ( 0, 1, 1 ); }; if ( distancefromcentroid == SHORT ) { contract = OK; } if ( distancefromcentroid == LONG) { contract = VIOLATED; }

Experimental Verification of Approach ScaLAPACK Periodic Application Signature Model Application-Intrinsic metrics captured every 60 seconds using PAPI, MPI wrappers, Autopilot Sensors Projections based on historical data Run on 3 clusters at UIUC; used 4 machines from each cluster; machine speeds vary across clusters Introduced CPU load on one machine Contract Monitor detects violation

Projected & Actual Comparison: Promising!! Radii based on standard deviation of each projection factor Green 2X std deviation; Red: 4X std deviation

Load on P3: Predicted & Measured! Shift down and to the left (metrics not independent for ScaLAPACK)

P3 Contract Monitor Output! Violations in System Space when load introduced (~3min)! Also looking at compute and communicate

Load on P3: Impact on All Processes! All shift to the left and down is detecting culprit hopeless?

Contract Results P0, P11! Perhaps there is hope!! Contract output for other processes not consistently violated! Side Note: individual components combine for overall violation

Master/Worker Experiments! Synthetic program Master (p0) hands out fixed-size jobs via message to workers Workers compute independently; report solution back to master Master may be bottleneck Two classes of behavior among processes! Results shown Application-Intrinsic metrics captured every 30 seconds Projections based on historical data from baseline execution Execute in WAN (UIUC, UCSD, UTK) Load introduced and then removed Contract monitor detects violation! MicroGrid will allow us to more GrADS easily Project conduct

Projected Performance maste r slow mediu m fast! Different Master and Worker behavior reflected! Different Worker processor speeds reflected

Measured FLOP Rate over Time! Contract detects severe dip in P3 FLOP Rate! Ignores small drop in P5 FLOP Rate and earlier drop in P3

Contract Output; load on P3 P0 Master P5 Worker P3 Worker

Where we are; Where we re going! Performance models application signature models look promising» Paper submitted to HPDC» Explore periodic reclustering to capture temporal behavior evolution» Determine if common set of metrics capture important characteristics of wide range of application types use compiler insights to develop better contracts! Contract monitoring infrastructure in place lessons learned reflected in work of Global Grid Forum Performance WG supports multi-level contract monitors as first step toward identifying per-process and overall violations experiment with different violation boundaries identify cause of violations; preliminary results reasonably good! Research Topics Development of methods to automatically select & tune fuzzy logic rulebases for different sets of resources; (Mario Medina) Automatic detection of phase changes to trigger reclustering

Year 1: Monitoring Progress: Are we meeting our Commitments? creation of initial performance models completed gain insight into performance of existing algorithms and libraries many insights from ScaLAPACK that are guiding what is possible to predict/detect overall specify interfaces for defining and validating performance contracts initial interfaces defined; continue to refine as we learn more about what is required to support wider range of contracts specify form and semantics of performance reporting mechanisms complete from application to contract monitor; feedback from monitor to PSE and Scheduler not done. Year 2: real-time, wide-area performance measurement tools completed sophisticated application performances GrADS models Project that relate