The Art of Building a Good Benchmark. Karl Huppler TPC Chair August 24, 2009

Similar documents
Price and the TPC. TPCTC 2010 Singapore. IBM Systems & Technology Group

Run SQL Scripts & Other DB2 Tools Re-invented

Challenges in capturing real World Workloads into Benchmarks

Fast Tracking Product Design can you afford the luxury of more time?

Hands-on SPU Timing Analysis

QuEST Forum Data Submission Receipt Advisories (Warnings)

IBM i CBU for PowerHA

Multicore: an IBM s Perspective

Innovative solutions to simplify your business. IBM System i5 Family

Overview and Frequently Asked Questions

A buyer s guide to data-driven HR. Which approach is best for you?

IBM Oracle Relationship

Subject : Computer Science. Paper : Software Quality Management. Module : Quality Management Activities Module No: CS/SQM/15

IBM Global Business Services Microsoft Dynamics AX solutions from IBM

Runbook Company: Building Powerful Finance Software Certified to Integrate with SAP HANA

Table of Contents. Introduction. Executive Summary. What Drives the Business? Importance and Performance. Tactics in Use. Analyst Bottom Line

Introduction to the Transaction Processing Performance Council (TPC)

(800) Leader s Guide

OPENEDGE BPM OVERVIEW

RELEASING LATENT VALUE DOCUMENT: CA ENDEVOR SOFTWARE CHANGE MANAGER R7. Releasing the Latent Value of CA Endevor Software Change Manager r7

IBM Break free from impractical and costly IT solutions

Independent Model Validation through DataRobot s AI Services

ISCC PLUS SAI Gold. SAI Gold ISCC PLUS V 1.0

Corporate Governance Statement

Server Efficiency Rating Tool TM Overview January Klaus-Dieter Lange Chair, SPECpower Committee, SPEC

Black Box Software Testing

Seven Ways to Create an Unbeatable Enterprise Mobility Strategy

IBM. Processor Migration Capacity Analysis in a Production Environment. Kathy Walsh. White Paper January 19, Version 1.2

Welcome to the Webinar. A Deep Dive into Advanced Analytics Model Review and Validation

pm4dev, 2016 management for development series Project Scope Management PROJECT MANAGEMENT FOR DEVELOPMENT ORGANIZATIONS

Advanced Analytics. The Power of Prediction O R A C L E W H I T E P A P E R J A N U A R Y

Corporate Governance Statement FY17

TCA / TCO Advantages of using POWER8

Appendix A Tools to Prioritize Requirements

An Oracle White Paper September, Oracle Exalogic Elastic Cloud: A Brief Introduction

Gartner RPE2 Methodology Overview

The Merit-based Incentive Payment System Quality Performance Category Eligible Measure Applicability (EMA) Fact Sheet

An Oracle White Paper March Access Certification: Addressing and Building On a Critical Security Control

UNDERSTANDING. CLOUD VALUE How to Take Your Cloud Program to the Next Level by Expanding Adoption

High-speed color on demand

How to Hire a Consultant

Center for Plain English Accounting: Risk Assessment FAQs

Routine railway maintenance: passenger perspectives and priorities. December 2017

Software Quality Factors

10-Step Facebook Ad Targeting Checklist

Continuous Compliance in SAP Environments

CS 147: Computer Systems Performance Analysis

The Merit-based Incentive Payment System Quality Performance Category Eligible Measure Applicability (EMA) Fact Sheet

success story Brightway Insurance

6/14/ :41 PM 1

Experience SAP HANA with Accenture and SAP

Who needs DevOps? Some of us actually

A FRAMEWORK FOR CAPACITY ANALYSIS D E B B I E S H E E T Z P R I N C I P A L C O N S U L T A N T M B I S O L U T I O N S

Is an RFP Processing Platform Right for Your Agency?

Five best practices for understanding customer journeys

Preparing your organization for a Human Resource Outsourcing implementation

Estimating the Cost of Enterprise Software System Implementations: It s Often Buyer Beware. White Paper. Ben Harrison, MAVERICK Technologies

Using the IBM Cloud App Management Database Load Projections Spreadsheet

Delivering GxP compliant mobile applications a practical case study. Presented by: Mark Stevens

For personal use only

5 reasons your boss should invest in your training

Playing to Win. Accenture and Salesforce Optimize Customer Experiences with Service Cloud Solutions

Secrets of Successful Modernization

An Oracle White Paper January Upgrade to Oracle Netra T4 Systems to Improve Service Delivery and Reduce Costs

IBM _` iseries systems Retail

Quick Guide: Independent Verification in TCO Certified

RPL process for returning students

LEGACY DECOMMISSIONING: GOOD FOR THE BUDGET, GOOD FOR COMPLIANCE

Fundamentals of Dynamic Testing. Inaccuracy in Dynamic Testing

Metrics For The Service Desk

Ethics and Financial Reporting: Delivering on the Commitment

The Dunning-Kruger Effect and Other Reasons People Screw Up. Written by Nick Sanders

EMPLOYEE ENGAGEMENT BENCHMARK STUDY, 2017

Minutes, ICoC Working Group #1 Meeting: 24 June 2011 via teleconference

Why Agile Business Suite Should Be Your Development Environment

Through these, as well as through our other products and services, EFQM aim to "Share What Works".

CUSTOM DESIGN/BUILD APPLIANCES

Accelerated Data Blending and Analytics for Microsoft Power BI with Alteryx

Developer Cloud Service. Transform Your Development Experience

Course "Empirical Evaluation in Informatics" Benchmarking

Get Partner Solutions for Oracle Cloud. Oracle Cloud Marketplace

Green Office: 35 Green Tips Guide

Oracle Open World 2015 Oracle Exadata Case Studies: Drive Application Performance, Lower Business Costs

Oracle Hyperion Capital Asset Planning

Finding success with your e-commerce store is all about looking at WooCommerce data points and making informed decisions.

Innovative Marketing Ideas That Work

ECM Migration Without Disrupting Your Business:

Leading provider of telecommunications equipment calls in IBM and IBM Advanced Business Partner Pathfinder Solutions to help improve code quality.

The Benefits of Using an Appliance to Simplify and Accelerate Desktop Virtualization Initiatives

WHITE PAPER THE 6 DIMENSIONS (& OBSTACLES) OF RISK MANAGEMENT

Introducing the Business Model Canvas

Building credibility: Establishing online identity without sacrificing convenience

Qlik Sense Enterprise

IBM TRIRIGA Application Platform Version Introduction to the Gantt Scheduler

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data

5 top questions for finding the best construction accounting software BY FOUNDATION SOFTWARE

Transcription:

The Art of Building a Good Benchmark Karl Huppler TPC Chair August 24, 2009

Seeming explosion of benchmarks over the last two decades TPC-D SYSmark2007 TPC-A SPECjbb2005 TPC-B SPECsfs2008 TPC-C TPC-H TPC-App SPECjvm2008 SPECint2000 (and these are just a few!!!!) TPC-R SPECint2006 SPECint_rate2000 SPECjms2007 TPC-E SPC-2C SPECfp_rate2006 SPECmail2008 SPECjms2007 SPC-1C

Just because something is new, doesn t mean it is good

Successful Benchmark Requirements Relevant Repeatable Fair Economical Verifiable

Compromise Required Economical Relevant Fair Verifiable Repeatable

Relevant Metric 1,223,457 SPECjbb bops Java business operations per second 4,801,497 tpmc transactions per minute in benchmark C 3,105 QphH@3000GB * [(S*22*3600)/Ts *SF] (but it does look like queries per hour and it actually relates to that) 21.3?

Use of Relevant of Software Use of appropriate software paths may be the most critical requirement of any benchmark Encourages optimization of real consumer paths Represents performance capabilities in a relevant software path typical SW path Bad benchmark path optimizes areas not important to consumer typical SW path Good benchmark path No benchmark will exercise all important paths Good benchmarks run paths that are used by many applications in the business model of the benchmark Bad benchmarks use fringe paths whose optimization does not help real applications TPC-C, TPC-H, SPECint, SPECfp, SPECjbb all are examples where benchmarks helped optimize consumer software paths.

Use of Relevant Hardware Similar to software Important that benchmark does not focus on hardware that is not important to typical applications in the business model of the benchmark Wouldn t want a benchmark that only focused on a single, potentially low-use component, like a floating point accelerator Good benchmarks can help drive hardware design to eliminate consumer problems before they happen At the macro level, TPC-C, after 17 years, continues to be used to provide system level engineering design guidance Within the processor and related firmware, SPECcpu provides a broad range of stress points that can be used in early design and final proof points

Does what it says no false claims Benchmarks must be designed to fit A particular business model A specific scope within the business model Benchmarks must clearly state that they cannot be generalized outside of the designed-for scope Sometimes, consumers of benchmark information make the wrong assumptions, anyway (Engineering) We ve measured the fuel use of the truck when it is standing idle So, that tells us how fast it will travel when fully loaded, right? (Sales)

Relevance long life, broadly applicable Long-lived Important functions Current functions Challenge: Not all product offerings will be at the same level leading edge Not bleeding edge Not too long-lived 2009 2010 2011 2012 2013 2014 Broad Applicability Business model may be tightly defined Must not restrict applicability TPC-C: General OLTP SPECcpu: Broad suite of compute-intensive functions

Strong Target Audience After Software Relevance, most critical to building strong, long-lived, relevant benchmark Not always known new functions may not have current audience Benchmarks perfect in every way may be retired because they cannot meet this Can still be a good benchmark, if strong in many other areas Not always consumers SPECcpu Audience: Engineers, academics, designers, programmers (and sometimes marketing) TPC-C Audience: Engineers, designers, end-customers, analysts, marketing, executives, etc.

Trading Relevance for Benchmarkability Repeatable Fair and Portable Verifiable Economical Economical Verifiable Relevant Fair Repeatable

Repeatability Confidence in getting same result on each measurement Challenge when information changes Queries run longer, different results Disks respond differently, depending on prior use Compromises may be required Pre-condition system by running application multiple times Not real, but perhaps consistent Ensure data changes do not affect future results sanitary database where updates are in columns that are not queried, or inserts are with key values that are not queried Refresh data on each run (or as appropriate) TPC-C lasts up to 12 hours

Fair and Portable Want benchmark to stress important, leading-edge features Don t want to penalize strong solutions that have not optimized all of the new features Do want to avoid reducing to functional lowest common denominator Focus on broad range of environments or declare that it is for a limited range Use of standard C, C++, Java, SQL makes portability easier Key requirement is testing on multiple platforms with multiple software environments Ensures portability Exposes inadvertent prejudice for the development environment Fair and Portable benchmarks trade custom and leading-edge features for broader applicability across environments

Verifiable Confidence in benchmark result required Can be self-verifying Automatic routines built into benchmark to test against verification criteria Many SPEC benchmarks do this, at least in part Can be reviewed and/or attested by a third party TPC uses certified auditors who have demonstrated expertise in the benchmark SPEC uses volunteer oversight of results from members of the development committee Each method has advantages The easier the verification, the greater the confidence May require trade-offs to simplify the benchmark

Economical An expensive benchmark requires great incentive to publish Can still be a strong benchmark, but with a limited result set An inexpensive benchmark may become popular by sheer number-of-publishes If coupled with the strength of other criteria in this discussion, can become very popular SPECint2006, SPECfp2006, SPECint_rate2006, SPECfp_rate2006 are clear examples

Can t do it all A benchmark can be too perfect How many TPC-C s does it take to run that geothermal analysis application? Satisfying almost every criterion leads to Too much popularity Too wide a target audience TPC-C Too much difficulty to make changes Too many general conclusions that are not based on the benchmark business model

Summary: Relating this to the TPC and future benchmark development The criteria in this discussion need to be kept in view throughout benchmark development Consumers need to know the strengths and limitations of benchmarks to properly use benchmark data New benchmarks will always be required It is not necessary to boil the ocean Benchmark development organizations should share and learn from each other

Trademarks and Disclaimers TPC and TPC Benchmark are copyrights of the Transaction Processing Performance Council. The SPEC logo, SPEC, SPECjbb, SPECsfs, SPECmail, SPECint, SPECfp, SPECweb, SPECjAppServer, SPECjms and SPECjvm are registered trademarks of the Standard Performance Evaluation Corporation. BAPco and SYSmark are registered trademarks of the Business Applications Performance Corporation. SPC Benchmark is a trademark of the Storage Performance Council. The opinions expressed in this paper are those of the author and do not represent official views of either the Transaction Processing Performance Council or the IBM Corporation