EV8: The Post-Ultimate Alpha. Dr. Joel Emer Intel Fellow Intel Architecture Group Intel Corporation

Size: px
Start display at page:

Download "EV8: The Post-Ultimate Alpha. Dr. Joel Emer Intel Fellow Intel Architecture Group Intel Corporation"

Transcription

1 EV8: The Post-Ultimate Alpha Dr. Joel Emer Intel Fellow Intel Architecture Group Intel Corporation

2 Alpha Microprocessor Overview Higher Performance Lower Cost 0.35µm EV EV µm 0.18µm EV µm EV µm EV µm EV First System Ship

3 Goals Leadership single stream performance Extra multistream performance with multithreading Without major architectural changes Without significant additional cost

4 EV8 Architecture Overview Aggressive instruction fetch unit 8-wide super-scalar scalar execution unit 4-way simultaneous multithreading (SMT) Large on-chip L2 cache Direct RAMBUS interface On-chip router for system interconnect for glueless,, directory-based, ccnuma with up to 512-way multiprocessing

5 System Block Diagram EV8 M IO EV8 M IO EV8 M IO EV8 M IO EV8 M IO EV8 M IO EV8 M IO EV8 M IO EV8 M IO

6 Instruction Issue Time Reduced function unit utilization due to dependencies

7 Superscalar Issue Time Superscalar leads to more performance, but lower utilization

8 Predicated Issue Time Adds to function unit utilization, but results are thrown away

9 Chip Multiprocessor Time Limited utilization when only running one thread

10 Fine Grained Multithreading Time Intra-thread dependencies still limit performance

11 Simultaneous Multithreading Time Maximum utilization of function units by independent operations

12 Basic Out-of of-order order Pipeline Fetch Decode/ Map Queue Reg Read Execute Dcache/ Store Buffer Reg Write Retire PC Register Map Regs Dcache Regs Icache Thread-blind

13 SMT Pipeline Fetch Decode/ Map Queue Reg Read Execute Dcache/ Store Buffer Reg Write Retire PC Register Map Icache Regs Dcache Regs

14 Architectural Abstraction 1 CPU with 4 Thread Processing Units (TPUs( TPUs) Shared hardware resources TPU 0 TPU1 TPU2 TPU3 Icache TLB Dcache Scache

15 Key Design Principles High throughput single stream design Enhancements for SMT

16 Little s Law Average Number of Tasks in Region (N) Throughput (T) = Average Latency in Region (L)

17 Little s Law for Instruction Fetch L = fixed pipe length + average memory latency N = number of instructions fetched N T = L

18 Instruction Fetch Unit Wider fetch Fetch more statically consecutive instructions Limited by trace length Trace Cache Build sequences of dynamically consecutive instructions Significantly greater complexity Double fetch Fetch two non-consecutive blocks of instructions

19 Instruction Fetch Unit Address Latches Icache Rate Matching Buffer Collapsing Logic Line Predictor Misprediction Calculation Branch Predictor Jump Address Predictor Return Predictor

20 Instruction Fetch Characteristics Two 8-instruction 8 fetches per cycle 16 branch predictions per cycle Jump target prediction Return address prediction Rate matching buffer of fetched instructions Collapse fetched instructions into groups of 8

21 Execution Unit Issue Queue Regs Dcache Regs

22 Execution Unit Characteristics Single issue queue 8-wide 112+ entries Register file 512 registers 16 read/8 write ports Function units 8 integer ALUs 4 floating ALUs 4 memory operations (2 read/2 write)

23 Little s Law for Execution Unit L (min) = Number of cycles in pipe T (desired) = Number of desired instructions per cycle (8) N 8 =

24 Little s Law for Execution Unit Little's Law for the IQ Max IPC Average Q chunk Lifetime

25 Key Design Principles High throughput single stream design Enhancements for SMT

26 Additions for SMT Replication required resources Program counters Register File (architectural space) Register maps Sharable resources Register file (rename space) Instruction queue Branch predictor First and second level caches Translation buffers.

27 Approaches Replicated resources used for all per TPU state (except register file) some sharable resources where design is easier (*) E.g,, return stack predictor Shared resources used for register file (*) all other sharable resources (*) * Policy may be needed to make priority decisions

28 Choosing Policy

29 Choosing policies FIFO trivial Round robin easy Proportional special case Icount-style fair

30 Icount Choosing Policy

31 Why Does Icount Make Sense? N T = L N/4 T/4 = L

32 Choosers Address Latches Line Predictor Retire

33 Choosers - Fetch Address Latches Line Predictor Retire

34 Choosers Fetch Address Latches Line Predictor Retire

35 Choosers - Map Address Latches Line Predictor Retire

36 Choosers - Retire Address Latches Line Predictor Retire

37 Choosers LD/ST numbers cache Address Latches Line Predictor Retire

38 Choosers LD/ST numbers cache Address Latches Line Predictor Retire

39 Choosers Miss/Store cache Address Latches Line Predictor Retire

40 Choosers Fetch Chooser - Icount Map Chooser - Icount LD/ST Number Chooser - Proportional Retire Chooser Round Robin Load miss Chooser Round Robin Store Buffer Chooser - FIFO

41 Area Cost of SMT Support Total overhead: 6%±

42 Multiprogrammed workload 250% 200% 150% 100% 1T 2T 3T 4T 50% 0% SpecInt SpecFP Mixed Int/FP

43 Decomposed SPEC95 Applications 250% 200% 150% 100% 1T 2T 3T 4T 50% 0% Turb3d Swm256 Tomcatv

44 Multithreaded Applications 300% 250% 200% 150% 100% 1T 2T 4T 50% 0% Barnes Chess Sort TP

45 Acknowledgements Tryggve Fossum Chuan-Hua Chang George Chrysos Steve Felix Chris Gianos Partha Kundu Jud Leonard Matt Mattina Matt Reilly

46

Physically Constrained Architecture for Chip Multiprocessors

Physically Constrained Architecture for Chip Multiprocessors Physically Constrained Architecture for Chip Multiprocessors A Dissertation Presented to the faculty of the School of Engineering and Applied Science University of Virginia In Partial Fulfillment of the

More information

Chapter 6: CPU Scheduling. Basic Concepts. Histogram of CPU-burst Times. CPU Scheduler. Dispatcher. Alternating Sequence of CPU And I/O Bursts

Chapter 6: CPU Scheduling. Basic Concepts. Histogram of CPU-burst Times. CPU Scheduler. Dispatcher. Alternating Sequence of CPU And I/O Bursts Chapter 6: CPU Scheduling Basic Concepts Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation Maximum CPU utilization obtained

More information

CPU scheduling. CPU Scheduling

CPU scheduling. CPU Scheduling EECS 3221 Operating System Fundamentals No.4 CPU scheduling Prof. Hui Jiang Dept of Electrical Engineering and Computer Science, York University CPU Scheduling CPU scheduling is the basis of multiprogramming

More information

Hot Chips-18. Design of a Reusable 1GHz, Superscalar ARM Processor

Hot Chips-18. Design of a Reusable 1GHz, Superscalar ARM Processor Hot Chips-18 Design of a Reusable 1GHz, Superscalar ARM Processor Stephen Hill Consulting Engineer ARM - Austin Design Centre 22 August 2006 1 Outline Overview of Cortex -A8 (Tiger) processor What is reusability

More information

Saving Energy with Just In Time Instruction Delivery

Saving Energy with Just In Time Instruction Delivery Saving Energy with Just In Time Instruction Delivery Tejas Karkhanis Univ. of Wisconsin - Madison 1415 Engineering Drive Madison, WI 53706 1+608-265-3826 karkhani@ece.wisc.edu James E Smith Univ. of Wisconsin

More information

Microprocessor Pipeline Energy Analysis: Speculation and Over-Provisioning

Microprocessor Pipeline Energy Analysis: Speculation and Over-Provisioning Microprocessor Pipeline Energy Analysis: Speculation and Over-Provisioning Karthik Natarajan Heather Hanson Stephen W. Keckler Charles R. Moore Doug Burger Department of Electrical and Computer Engineering

More information

Hill-Climbing SMT Processor Resource Distribution

Hill-Climbing SMT Processor Resource Distribution 1 Hill-Climbing SMT Processor Resource Distribution SEUNGRYUL CHOI Google and DONALD YEUNG University of Maryland The key to high performance in Simultaneous MultiThreaded (SMT) processors lies in optimizing

More information

Core Architecture Optimization for Heterogeneous Chip Multiprocessors

Core Architecture Optimization for Heterogeneous Chip Multiprocessors Core Architecture Optimization for Heterogeneous Chip Multiprocessors Rakesh Kumar, Dean M. Tullsen, Norman P. Jouppi Department of Computer Science and Engineering HP Labs University of California, San

More information

CPU SCHEDULING. Scheduling Objectives. Outline. Basic Concepts. Enforcement of fairness in allocating resources to processes

CPU SCHEDULING. Scheduling Objectives. Outline. Basic Concepts. Enforcement of fairness in allocating resources to processes Scheduling Objectives CPU SCHEDULING Enforcement of fairness in allocating resources to processes Enforcement of priorities Make best use of available system resources Give preference to processes holding

More information

Performance monitors for multiprocessing systems

Performance monitors for multiprocessing systems Performance monitors for multiprocessing systems School of Information Technology University of Ottawa ELG7187, Fall 2010 Outline Introduction 1 Introduction 2 3 4 Performance monitors Performance monitors

More information

BTB Access Filtering: A Low Energy and High Performance Design

BTB Access Filtering: A Low Energy and High Performance Design BTB Access Filtering: A Low Energy and High Performance Design Shuai Wang, Jie Hu, and Sotirios G. Ziavras Depment of Electrical and Computer Engineering New Jersey Institute of Technology Newark, NJ 72

More information

Microprocessor Pipeline Energy Analysis

Microprocessor Pipeline Energy Analysis Microprocessor Pipeline Energy Analysis Karthik Natarajan, Heather Hanson, Stephen W. Keckler, Charles R. Moore, Doug Burger Computer Architecture and Technology Laboratory Department of Computer Sciences

More information

Designing Energy-Efficient Fetch Engines

Designing Energy-Efficient Fetch Engines Designing Energy-Efficient Fetch Engines A Dissertation Presented to the faculty of the School of Engineering and Applied Science University of Virginia In Partial Fulfillment of the requirements for the

More information

TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF SYMBOLS AND ABBREVIATIONS

TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF SYMBOLS AND ABBREVIATIONS viii TABLE OF CONTENTS ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF SYMBOLS AND ABBREVIATIONS v xviii xix xxii 1. INTRODUCTION 1 1.1 MOTIVATION OF THE RESEARCH 1 1.2 OVERVIEW OF PROPOSED WORK 3 1.3

More information

Using Hardware Performance Counters on the Cray XT

Using Hardware Performance Counters on the Cray XT Using Hardware Performance Counters on the Cray XT Luiz DeRose Programming Environment Director Cray Inc. ldr@cray.com University of Bergen Bergen, Norway March 11-14, 2008 Luiz DeRose (ldr@cray.com) Cray

More information

Principles of Operating Systems

Principles of Operating Systems Principles of Operating Systems Lecture 9-10 - CPU Scheduling Ardalan Amiri Sani (ardalan@uci.edu) [lecture slides contains some content adapted from previous slides by Prof. Nalini Venkatasubramanian,

More information

PTSMT: A Tool for Cross-Level Power, Performance, and Thermal Exploration of SMT Processors

PTSMT: A Tool for Cross-Level Power, Performance, and Thermal Exploration of SMT Processors 21st International Conference on VLSI Design PTSMT: A Tool for Cross-Level Power, Performance, and Thermal Exploration of SMT Processors Deepa Kannan, Aseem Gupta, Aviral Shrivastava, Nikil D. Dutt, Fadi

More information

Dynamic Thermal Management in Modern Processors

Dynamic Thermal Management in Modern Processors Dynamic Thermal Management in Modern Processors Shervin Sharifi PhD Candidate CSE Department, UC San Diego Power Outlook Vdd (Volts) 1.2 0.8 0.4 0.0 Ideal Realistic 2001 2005 2009 2013 2017 V dd scaling

More information

CS 143A - Principles of Operating Systems

CS 143A - Principles of Operating Systems CS 143A - Principles of Operating Systems Lecture 4 - CPU Scheduling Prof. Nalini Venkatasubramanian nalini@ics.uci.edu CPU Scheduling 1 Outline Basic Concepts Scheduling Objectives Levels of Scheduling

More information

Making the Right Hand Turn to Power Efficient Computing

Making the Right Hand Turn to Power Efficient Computing Making the Right Hand Turn to Power Efficient Computing Justin Rattner Intel Fellow and Director Labs Intel Corporation 1 Outline Technology scaling Types of efficiency Making the right hand turn 2 Historic

More information

Performance, Energy, and Thermal Considerations for SMT and CMP Architectures

Performance, Energy, and Thermal Considerations for SMT and CMP Architectures Performance, Energy, and Thermal Considerations for SMT and CMP Architectures Yingmin Li, David Brooks, Zhigang Hu, Kevin Skadron Dept. of Computer Science, University of Virginia IBM T.J. Watson Research

More information

Performance, Energy, and Thermal Considerations for SMT and CMP Architectures

Performance, Energy, and Thermal Considerations for SMT and CMP Architectures Performance, Energy, and Thermal Considerations for SMT and CMP Architectures Yingmin Li, David Brooks, Zhigang Hu, Kevin Skadron Dept. of Computer Science, University of Virginia IBM T.J. Watson Research

More information

Emerging Workload Performance Evaluation on Future Generation OpenPOWER Processors

Emerging Workload Performance Evaluation on Future Generation OpenPOWER Processors Emerging Workload Performance Evaluation on Future Generation OpenPOWER Processors Saritha Vinod Power Systems Performance Analyst IBM Systems sarithavn@in.bm.com Agenda Emerging Workloads Characteristics

More information

Profile-Based Energy Reduction for High-Performance Processors

Profile-Based Energy Reduction for High-Performance Processors Profile-Based Energy Reduction for High-Performance Processors Michael Huang, Jose Renau, and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu

More information

Balancing Power Consumption in Multiprocessor Systems

Balancing Power Consumption in Multiprocessor Systems Universität Karlsruhe Fakultät für Informatik Institut für Betriebs und Dialogsysteme (IBDS) Lehrstuhl Systemarchitektur Balancing Power Consumption in Multiprocessor Systems DIPLOMA THESIS Andreas Merkel

More information

CAMP: A Technique to Estimate Per-Structure Power at Run-time using a Few Simple Parameters

CAMP: A Technique to Estimate Per-Structure Power at Run-time using a Few Simple Parameters CAMP: A Technique to Estimate Per-Structure Power at Run-time using a Few Simple Parameters Michael D. Powell, Arijit Biswas, Joel S. Emer, Shubhendu S. Mukherjee, Basit R. Sheikh 1, Shrirang Yardi 2 Intel

More information

A Conditional Probability Model for Vertical Multithreaded Architectures

A Conditional Probability Model for Vertical Multithreaded Architectures A Conditional Probability Model for Vertical Multithreaded Architectures Robert M. Lane rmlkcl@mindspring.com Abstract Vertical multithreaded architectures can be estimated using a model based on conditional

More information

STAND: New Tool for Performance Estimation of the Block Data Processing Algorithms in High-load Systems

STAND: New Tool for Performance Estimation of the Block Data Processing Algorithms in High-load Systems STAND: New Tool for Performance Estimation of the Block Data Processing Algorithms in High-load Systems Victor Minchenkov, Vladimir Bashun St-Petersburg State University of Aerospace Instrumentation {victor,

More information

Balancing Soft Error Coverage with. Lifetime Reliability in Redundantly. Multithreaded Processors

Balancing Soft Error Coverage with. Lifetime Reliability in Redundantly. Multithreaded Processors Balancing Soft Error Coverage with Lifetime Reliability in Redundantly Multithreaded Processors A Thesis Presented to the faculty of the School of Engineering and Applied Science University of Virginia

More information

Managing SMT Resource Usage through Speculative Instruction Window Weighting

Managing SMT Resource Usage through Speculative Instruction Window Weighting INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Managing SMT Resource Usage through Speculative Instruction Window Weighting Hans Vandierendonck André Seznec N 7103 Novembre 2009 Domaine

More information

CPU Scheduling CPU. Basic Concepts. Basic Concepts. CPU Scheduler. Histogram of CPU-burst Times. Alternating Sequence of CPU and I/O Bursts

CPU Scheduling CPU. Basic Concepts. Basic Concepts. CPU Scheduler. Histogram of CPU-burst Times. Alternating Sequence of CPU and I/O Bursts Basic Concepts CPU Scheduling CSCI 315 Operating Systems Design Department of Computer Science Notice: The slides for this lecture have been largely based on those from an earlier What does it mean to

More information

FAME: FAirly MEasuring Multithreaded Architectures

FAME: FAirly MEasuring Multithreaded Architectures FAME: FAirly MEasuring Multithreaded Architectures Javier Vera Francisco J. Cazorla Alex Pajuelo Oliverio J. Santana Enrique Fernández Mateo Valero, Barcelona Supercomputing Center, Spain. {javier.vera,

More information

Vampir MPI-Performance Analysis

Vampir MPI-Performance Analysis Vampir MPI-Performance Analysis 05.08.2013 FB. Computer Science Scientific Computing Christian Iwainsky 1 How does one gain insight about an HPC applciation? Data has to be presented to the developer in

More information

SE350: Operating Systems. Lecture 6: Scheduling

SE350: Operating Systems. Lecture 6: Scheduling SE350: Operating Systems Lecture 6: Scheduling Main Points Definitions Response time, throughput, scheduling policy, Uniprocessor policies FIFO, SJF, Round Robin, Multiprocessor policies Scheduling sequential

More information

Types of Workloads. Raj Jain. Washington University in St. Louis

Types of Workloads. Raj Jain. Washington University in St. Louis Types of Workloads Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-06/ 4-1 Overview!

More information

Vampir. (MPI-) Performance Analysis. Dipl. Inf. Christian Iwainsky Fachgebiet Scientific Computing

Vampir. (MPI-) Performance Analysis. Dipl. Inf. Christian Iwainsky Fachgebiet Scientific Computing Vampir (MPI-) Performance Analysis Dipl. Inf. Christian Iwainsky Fachgebiet Scientific Computing 11.11.2013 FB. Computer Science Scientific Computing Christian Iwainsky 1 How does one gain insight about

More information

How fast is your EC12 / z13?

How fast is your EC12 / z13? How fast is your EC12 / z13? Barton Robinson, barton@velocitysoftware.com Velocity Software Inc. 196-D Castro Street Mountain View CA 94041 650-964-8867 Velocity Software GmbH Max-Joseph-Str. 5 D-68167

More information

Roadmap. Tevfik Ko!ar. CSC Operating Systems Spring Lecture - V CPU Scheduling - I. Louisiana State University.

Roadmap. Tevfik Ko!ar. CSC Operating Systems Spring Lecture - V CPU Scheduling - I. Louisiana State University. CSC 4103 - Operating Systems Spring 2008 Lecture - V CPU Scheduling - I Tevfik Ko!ar Louisiana State University January 29 th, 2008 1 Roadmap CPU Scheduling Basic Concepts Scheduling Criteria Different

More information

Increasing the Performance of Superscalar Processors through Value Prediction. Arthur Perais

Increasing the Performance of Superscalar Processors through Value Prediction. Arthur Perais Increasing the Performance of Superscalar Processors through Value Prediction Arthur Perais Increasing Performance through Value Prediction 10/12/2015 Past and Recent Trends in General Purpose CPUs Power

More information

REDUCED OCCUPANCY FOR PROCESSOR EFFICIENCY THROUGH DOWNSIZED OUT-OF-ORDER ENGINE AND REDUCED TRAFFIC. Ziad Youssfi

REDUCED OCCUPANCY FOR PROCESSOR EFFICIENCY THROUGH DOWNSIZED OUT-OF-ORDER ENGINE AND REDUCED TRAFFIC. Ziad Youssfi REDUCED OCCUPANCY FOR PROCESSOR EFFICIENCY THROUGH DOWNSIZED OUT-OF-ORDER ENGINE AND REDUCED TRAFFIC By Ziad Youssfi A DISSERTATION Submitted to Michigan State University in partial fulfillment of the

More information

CPU Scheduling. Chapter 9

CPU Scheduling. Chapter 9 CPU Scheduling 1 Chapter 9 2 CPU Scheduling We concentrate on the problem of scheduling the usage of a single processor among all the existing processes in the system The goal is to achieve High processor

More information

CPU Scheduling (Chapters 7-11)

CPU Scheduling (Chapters 7-11) CPU Scheduling (Chapters 7-11) CS 4410 Operating Systems [R. Agarwal, L. Alvisi, A. Bracy, M. George, E. Sirer, R. Van Renesse] The Problem You re the cook at State Street Diner customers continuously

More information

Analysing value substitution and confidence estimation for value prediction

Analysing value substitution and confidence estimation for value prediction Analysing value substitution and confidence estimation for value prediction Luis Piñuel, Rafael A. Moreno and Francisco Tirado Departamento de Arquitectura de Computadores y Automática Universidad Complutense

More information

Hardware and Software Overview

Hardware and Software Overview Tutorial: Optimizing Applications for Performance on POWER Hardware and Software Overview SCICOMP13 July 2007 2007 IBM Agenda Hardware Software Documentation 2 2007 IBM Corporation Core CPU processor Hardware

More information

CPU Scheduling. Basic Concepts Scheduling Criteria Scheduling Algorithms. Unix Scheduler

CPU Scheduling. Basic Concepts Scheduling Criteria Scheduling Algorithms. Unix Scheduler CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms FCFS SJF RR Priority Multilevel Queue Multilevel Queue with Feedback Unix Scheduler 1 Scheduling Processes can be in one of several

More information

NBTI-Aware Dynamic Instruction Scheduling

NBTI-Aware Dynamic Instruction Scheduling NBTI-Aware Dynamic Instruction Scheduling Taniya Siddiqua and Sudhanva Gurumurthi Department of Computer Science University of Virginia Charlottesville, VA 2294 {taniya, gurumurthi}@cs.virginia.edu Abstract

More information

Accelerate HPC Development with Allinea Performance Tools. Olly Perks & Florent Lebeau

Accelerate HPC Development with Allinea Performance Tools. Olly Perks & Florent Lebeau Accelerate HPC Development with Allinea Performance Tools Olly Perks & Florent Lebeau Olly.Perks@arm.com Florent.Lebeau@arm.com Agenda 09:00 09:15 09:15 09:30 09:30 09:45 09:45 10:15 10:15 10:30 Introduction

More information

Evaluating the Impact of Job Scheduling and Power Management on Processor Lifetime for Chip Multiprocessors

Evaluating the Impact of Job Scheduling and Power Management on Processor Lifetime for Chip Multiprocessors Evaluating the Impact of Job Scheduling and Power Management on Processor Lifetime for Chip Multiprocessors Ayse K. Coskun, Richard Strong, Dean M. Tullsen, and Tajana Simunic Rosing Computer Science and

More information

Tasks or processes attempt to control or react to external events occuring in real time RT processes must be able to keep up events

Tasks or processes attempt to control or react to external events occuring in real time RT processes must be able to keep up events RTOS Characteristics Time constraints Shdli Scheduling C. Brandolese, W. Fornaciari Real-Time Systems Definition of real-time system A real-time system is one in which h the correctness of the computation

More information

z/tpf and z13 Mark Gambino, TPF Development Lab March 23, 2015 TPFUG Dallas, TX

z/tpf and z13 Mark Gambino, TPF Development Lab March 23, 2015 TPFUG Dallas, TX z/tpf and z13 Mark Gambino, TPF Development Lab March 23, 2015 TPFUG Dallas, TX Disclaimer Any reference to future plans are for planning purposes only. IBM reserves the right to change those plans at

More information

Roadmap. Tevfik Koşar. CSE 421/521 - Operating Systems Fall Lecture - V CPU Scheduling - I. University at Buffalo.

Roadmap. Tevfik Koşar. CSE 421/521 - Operating Systems Fall Lecture - V CPU Scheduling - I. University at Buffalo. CSE 421/521 - Operating Systems Fall 2011 Lecture - V CPU Scheduling - I Tevfik Koşar University at Buffalo September 13 th, 2011 1 Roadmap CPU Scheduling Basic Concepts Scheduling Criteria & Metrics Different

More information

Exploring Last n Value Prediction

Exploring Last n Value Prediction Exploring Last n Value Prediction Martin Burtscher Benjamin G. Zorn Department of Computer Science Microsoft Corporation University of Colorado 1 Microsoft Way Boulder, Colorado 39-3 Redmond, WA 95 burtsche@cs.colorado.edu

More information

Managing SMT Resource Usage through Speculative Instruction Window Weighting

Managing SMT Resource Usage through Speculative Instruction Window Weighting Managing SMT Resource Usage through Speculative Instruction Window Weighting Hans Vandierendonck, André Seznec To cite this version: Hans Vandierendonck, André Seznec. Managing SMT Resource Usage through

More information

DYNAMIC DETECTION OF WORKLOAD EXECUTION PHASES MATTHEW W. ALSLEBEN, B.S. A thesis submitted to the Graduate School

DYNAMIC DETECTION OF WORKLOAD EXECUTION PHASES MATTHEW W. ALSLEBEN, B.S. A thesis submitted to the Graduate School DYNAMIC DETECTION OF WORKLOAD EXECUTION PHASES BY MATTHEW W. ALSLEBEN, B.S. A thesis submitted to the Graduate School in partial fulfillment of the requirements for the degree Master of Science Major Subject:

More information

Context-Base Computational Value Prediction with Value Compression

Context-Base Computational Value Prediction with Value Compression Context-Base Computational Value Prediction with Value Compression Yasuo Ishii June 3 2018 2018 Arm Limited Submitted Track: 8KB 2018 Arm Limited Speedup over no VP Motivation and Achievement Known modern

More information

Composite Confidence Estimators for Enhanced Speculation Control

Composite Confidence Estimators for Enhanced Speculation Control Composite Estimators for Enhanced Speculation Control Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio San Antonio, Texas, U.S.A djimenez@acm.org Abstract This paper

More information

Bias Scheduling in Heterogeneous Multicore Architectures. David Koufaty Dheeraj Reddy Scott Hahn

Bias Scheduling in Heterogeneous Multicore Architectures. David Koufaty Dheeraj Reddy Scott Hahn Bias Scheduling in Heterogeneous Multicore Architectures David Koufaty Dheeraj Reddy Scott Hahn Motivation Mainstream multicore processors consist of identical cores Complexity dictated by product goals,

More information

Simulation-driven verification Design Automation Summer School, 2009

Simulation-driven verification Design Automation Summer School, 2009 Simulation-driven verification Design Automation Summer School, 2009 Bob Bentley Director, Pre-silicon Validation Enterprise Microprocessor Group Intel Corporation Hillsboro, Oregon, U.S.A. 1 Agenda Microprocessor

More information

CONTINUING advances in semiconductor technology lead

CONTINUING advances in semiconductor technology lead IEEE TRANSACTIONS ON COMPUTERS, VOL. 55, NO. 3, MARCH 2006 281 Control Speculation for Energy-Efficient Next-Generation Superscalar Processors Juan L. Aragón, Member, IEEE Computer Society, JoséGonzález,

More information

Reliability-Aware Scheduling on Heterogeneous Multicore Processors

Reliability-Aware Scheduling on Heterogeneous Multicore Processors Reliability-Aware Scheduling on Heterogeneous Multicore Processors Ajeya Naithani Ghent University, Belgium Stijn Eyerman Intel, Belgium Lieven Eeckhout Ghent University, Belgium ABSTRACT Reliability to

More information

Metrics for Lifetime Reliability

Metrics for Lifetime Reliability Metrics for Lifetime Reliability Pradeep Ramachandran, Sarita Adve, Pradip Bose, Jude Rivers and Jayanth Srinivasan 1 Department of Computer Science IBM T.J. Watson Research Center University of Illionis

More information

Homework 2: Comparing Scheduling Algorithms

Homework 2: Comparing Scheduling Algorithms Homework 2: Comparing Scheduling Algorithms The purpose of this homework is to build a scheduler to compare performance of a First In First Out (FIFO) versus Round Robin algorithms. The algorithms will

More information

High Performance Computing(HPC) & Software Stack

High Performance Computing(HPC) & Software Stack IBM HPC Developer Education @ TIFR, Mumbai High Performance Computing(HPC) & Software Stack January 30-31, 2012 Pidad D'Souza (pidsouza@in.ibm.com) IBM, System & Technology Group 2002 IBM Corporation Agenda

More information

Reducing Power Density through Activity Migration

Reducing Power Density through Activity Migration ISLPED 2003 8/26/2003 Reducing Power Density through Activity Migration Seongmoo Heo, Kenneth Barr, and Krste Asanovi Computer Architecture Group, MIT CSAIL Hot Spots Rapid rise of processor power density

More information

Lecture #10. ********************************* Outline - 1 min ********************************* Tomasulo Loop Example Exceptions In-order Commit

Lecture #10. ********************************* Outline - 1 min ********************************* Tomasulo Loop Example Exceptions In-order Commit Lecture #10 Review -- 1 min Registers not bottleneck Avoid WAR, WAW Not limited to basic blocks Lasting Contributions -- dynamic scheduling -- register renaming -- load/store disambiguation Outline - 1

More information

CPU Scheduling: Part I. Operating Systems. Spring CS5212

CPU Scheduling: Part I. Operating Systems. Spring CS5212 Operating Systems Spring 2009-2010 Outline CPU Scheduling: Part I 1 CPU Scheduling: Part I Outline CPU Scheduling: Part I 1 CPU Scheduling: Part I Basic Concepts CPU Scheduling: Part I Maximum CPU utilization

More information

א א א א א א א א

א א א א א א א א א א א W א א א א א א א א א 2008 2007 1 Chapter 6: CPU Scheduling Basic Concept CPU-I/O Burst Cycle CPU Scheduler Preemptive Scheduling Dispatcher Scheduling Criteria Scheduling Algorithms First-Come, First-Served

More information

Pipeline Gating: Speculation Control For Energy Reduction

Pipeline Gating: Speculation Control For Energy Reduction Pipeline Gating: Speculation Control For Energy Reduction Srilatha Manne University of Colorado Dept. of Electrical and Computer Engineering Boulder, CO 80309 srilatha.manne@colorado.edu Artur Klauser,

More information

Techniques for Improving Efficiency and Accuracy of Contemporary Dynamic Branch Predictors

Techniques for Improving Efficiency and Accuracy of Contemporary Dynamic Branch Predictors University of Nevada, Reno Techniques for Improving Efficiency and Accuracy of Contemporary Dynamic Branch Predictors A dissertation submitted in partial fulfillment of the requirements for the degree

More information

Designing Energy-Efficient Fetch Engines

Designing Energy-Efficient Fetch Engines Designing Energy-Efficient Fetch Engines Michele Co Department of Computer Science University of Virginia Advisor: Co-Advisor: Committee: Kevin Skadron Dee A.B. Weikle Jack Davidson (Chair) James Cohoon,

More information

Scheduling Processes 11/6/16. Processes (refresher) Scheduling Processes The OS has to decide: Scheduler. Scheduling Policies

Scheduling Processes 11/6/16. Processes (refresher) Scheduling Processes The OS has to decide: Scheduler. Scheduling Policies Scheduling Processes Don Porter Portions courtesy Emmett Witchel Processes (refresher) Each process has state, that includes its text and data, procedure call stack, etc. This state resides in memory.

More information

Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors

Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matthew DeVuyst, Rakesh Kumar, and Dean M. Tullsen University of California, San Diego Department of Computer

More information

SEPAS: A Highly Accurate Energy-Efficient Branch Predictor

SEPAS: A Highly Accurate Energy-Efficient Branch Predictor 2.3 SEPAS: A Highly Accurate Energy-Efficient Branch Predictor ABSTRACT Designers have invested much effort in developing accurate branch predictors with short learning periods. Such techniques rely on

More information

Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors

Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors In Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium, April 2006 Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matthew

More information

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017 CS 471 Operating Systems Yue Cheng George Mason University Fall 2017 Page Replacement Policies 2 Review: Page-Fault Handler (OS) (cheap) (cheap) (depends) (expensive) (cheap) (cheap) (cheap) PFN = FindFreePage()

More information

Dennis Bradford, Sundaram Chinthamani, Jesus Corbal, Adhiraj Hassan, Ken Janik, Nawab Ali

Dennis Bradford, Sundaram Chinthamani, Jesus Corbal, Adhiraj Hassan, Ken Janik, Nawab Ali Dennis Bradford, Sundaram Chinthamani, Jesus Corbal, Adhiraj Hassan, Ken Janik, Nawab Ali 1 Legal Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

More information

CPU Scheduling. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CPU Scheduling. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University CPU Scheduling Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu CPU Scheduling policy deciding which process to run next, given a set of runnable

More information

Understanding the Behavior of In-Memory Computing Workloads. Rui Hou Institute of Computing Technology,CAS July 10, 2014

Understanding the Behavior of In-Memory Computing Workloads. Rui Hou Institute of Computing Technology,CAS July 10, 2014 Understanding the Behavior of In-Memory Computing Workloads Rui Hou Institute of Computing Technology,CAS July 10, 2014 Outline Background Methodology Results and Analysis Summary 2 Background The era

More information

An Oracle White Paper January Upgrade to Oracle Netra T4 Systems to Improve Service Delivery and Reduce Costs

An Oracle White Paper January Upgrade to Oracle Netra T4 Systems to Improve Service Delivery and Reduce Costs An Oracle White Paper January 2013 Upgrade to Oracle Netra T4 Systems to Improve Service Delivery and Reduce Costs Executive Summary... 2 Deploy Services Faster and More Efficiently... 3 Greater Compute

More information

Controlling the Power and Area of Neural Branch Predictors for Practical Implementation in High-Performance Processors

Controlling the Power and Area of Neural Branch Predictors for Practical Implementation in High-Performance Processors Controlling the Power and Area of Neural Branch Predictors for Practical Implementation in High-Performance Processors Daniel A. Jiménez Rutgers University Department of Computer Science djimenez@cs.rutgers.edu

More information

CPU Scheduling. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CPU Scheduling. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University CPU Scheduling Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3044: Operating Systems, Fall 2017, Jinkyu Jeong (jinkyu@skku.edu) CPU Scheduling

More information

Prophet/Critic Hybrid Branch Prediction

Prophet/Critic Hybrid Branch Prediction Prophet/Critic Hybrid Branch Prediction Ayose Falcón Jared Stark Alex Ramirez Konrad Lai Mateo Valero Computer Architecture Department Universitat Politècnica de Catalunya {afalcon, aramirez, mateo}@ac.upc.es

More information

Project 2 solution code

Project 2 solution code Project 2 solution code Project 2 solution code in files for project 3: Mutex solution in Synch.c But this code has several flaws! If you copied this, we will know! Producer/Consumer and Dining Philosophers

More information

CPU Scheduling. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CPU Scheduling. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University CPU Scheduling Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Delivering High Performance for Financial Models and Risk Analytics

Delivering High Performance for Financial Models and Risk Analytics QuantCatalyst Delivering High Performance for Financial Models and Risk Analytics September 2008 Risk Breakfast London Dr D. Egloff daniel.egloff@quantcatalyst.com QuantCatalyst Inc. Technology and software

More information

BioBench: A Benchmark Suite of Bioinformatics Applications

BioBench: A Benchmark Suite of Bioinformatics Applications BioBench: A Benchmark Suite of Bioinformatics Applications Kursad Albayraktaroglu, Aamer Jaleel, Xue Wu, Manoj Franklin, Bruce Jacob, Chau-Wen Tseng, Donald Yeung Dept. of Electrical & Computer Engineering

More information

Lecture 11: CPU Scheduling

Lecture 11: CPU Scheduling CS 422/522 Design & Implementation of Operating Systems Lecture 11: CPU Scheduling Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous versions of

More information

Power, Energy and Performance. Tajana Simunic Rosing Department of Computer Science and Engineering UCSD

Power, Energy and Performance. Tajana Simunic Rosing Department of Computer Science and Engineering UCSD Power, Energy and Performance Tajana Simunic Rosing Department of Computer Science and Engineering UCSD 1 Motivation for Power Management Power consumption is a critical issue in system design today Mobile

More information

Optimize code for Intel Xeon Phi. Discovering bottlenecks without pain

Optimize code for Intel Xeon Phi. Discovering bottlenecks without pain Optimize code for Intel Xeon Phi Discovering bottlenecks without pain Agenda Introduction Allinea MAP and optimizing for Xeon Phi Conclusion Allinea Unified environment A modern integrated environment

More information

Course "Empirical Evaluation in Informatics" Benchmarking

Course Empirical Evaluation in Informatics Benchmarking Course "Empirical Evaluation in Informatics" Benchmarking Prof. Dr. Lutz Prechelt Freie Universität Berlin, Institut für Informatik http://www.inf.fu-berlin.de/inst/ag-se/ Example 1: SPEC CPU2000 Benchmark

More information

Dynamic Power Gating with Quality Guarantees

Dynamic Power Gating with Quality Guarantees Appears in the International Symposium on Low Power Electronics and Design 29 San Francisco, California, August, 29 Dynamic Power Gating with Quality Guarantees Anita Lungu 1, Pradip Bose 2, Alper Buyuktosunoglu

More information

STEM: Spatiotemporal Management of Capacity for Intra-Core Last Level Caches

STEM: Spatiotemporal Management of Capacity for Intra-Core Last Level Caches STEM: Spatiotemporal Management of Capacity for Intra-Core Last Level Caches Dongyuan Zhan, Hong Jiang, Sharad C. Seth Dept. of Computer Science & Engineering University of Nebraska - Lincoln Email: {dzhan,jiang,seth}@cse.unl.edu

More information

Resources and Special Effects

Resources and Special Effects Resources and Special Effects Charles Grassl IBM August, 2004 2004 IBM Part 2: Special Effects Large Pages Memory Affinity Process Binding Simultaneous Multi-Threading 17 2004 IBM Corporation 1 Large Page

More information

Thermoelectric-based Sustainable Self- Cooling for Fine-Grained Processor Hot Spots

Thermoelectric-based Sustainable Self- Cooling for Fine-Grained Processor Hot Spots IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems 16 Thermoelectric-based Sustainable Self- Cooling for Fine-Grained Processor Hot Spots Soochan Lee (LG Electronics),

More information

CSC 553 Operating Systems

CSC 553 Operating Systems CSC 553 Operating Systems Lecture 9 - Uniprocessor Scheduling Types of Scheduling Long-term scheduling The decision to add to the pool of processes to be executed Medium-term scheduling The decision to

More information

STEM: Spatiotemporal Management of Capacity for Intra-Core Last Level Caches

STEM: Spatiotemporal Management of Capacity for Intra-Core Last Level Caches 21 43rd Annual IEEE/ACM International Symposium on Microarchitecture STEM: Spatiotemporal Management of Capacity for Intra-Core Last Level Caches Dongyuan Zhan, Hong Jiang, Sharad C. Seth Dept. of Computer

More information

Introduction to Operating Systems. Process Scheduling. John Franco. Dept. of Electrical Engineering and Computing Systems University of Cincinnati

Introduction to Operating Systems. Process Scheduling. John Franco. Dept. of Electrical Engineering and Computing Systems University of Cincinnati Introduction to Operating Systems Process Scheduling John Franco Dept. of Electrical Engineering and Computing Systems University of Cincinnati Lifespan of a Process What does a CPU scheduler do? Determines

More information

Thermal Management with Asymmetric Dual Core Designs ; CU-CS

Thermal Management with Asymmetric Dual Core Designs ; CU-CS University of Colorado, Boulder CU Scholar Computer Science Technical Reports Computer Science Spring 5-1-2003 Thermal Management with Asymmetric Dual Core Designs ; CU-CS-965-03 Soraya Ghiasi University

More information