EV8: The Post-Ultimate Alpha. Dr. Joel Emer Intel Fellow Intel Architecture Group Intel Corporation
|
|
- Brittney Reed
- 6 years ago
- Views:
Transcription
1 EV8: The Post-Ultimate Alpha Dr. Joel Emer Intel Fellow Intel Architecture Group Intel Corporation
2 Alpha Microprocessor Overview Higher Performance Lower Cost 0.35µm EV EV µm 0.18µm EV µm EV µm EV µm EV First System Ship
3 Goals Leadership single stream performance Extra multistream performance with multithreading Without major architectural changes Without significant additional cost
4 EV8 Architecture Overview Aggressive instruction fetch unit 8-wide super-scalar scalar execution unit 4-way simultaneous multithreading (SMT) Large on-chip L2 cache Direct RAMBUS interface On-chip router for system interconnect for glueless,, directory-based, ccnuma with up to 512-way multiprocessing
5 System Block Diagram EV8 M IO EV8 M IO EV8 M IO EV8 M IO EV8 M IO EV8 M IO EV8 M IO EV8 M IO EV8 M IO
6 Instruction Issue Time Reduced function unit utilization due to dependencies
7 Superscalar Issue Time Superscalar leads to more performance, but lower utilization
8 Predicated Issue Time Adds to function unit utilization, but results are thrown away
9 Chip Multiprocessor Time Limited utilization when only running one thread
10 Fine Grained Multithreading Time Intra-thread dependencies still limit performance
11 Simultaneous Multithreading Time Maximum utilization of function units by independent operations
12 Basic Out-of of-order order Pipeline Fetch Decode/ Map Queue Reg Read Execute Dcache/ Store Buffer Reg Write Retire PC Register Map Regs Dcache Regs Icache Thread-blind
13 SMT Pipeline Fetch Decode/ Map Queue Reg Read Execute Dcache/ Store Buffer Reg Write Retire PC Register Map Icache Regs Dcache Regs
14 Architectural Abstraction 1 CPU with 4 Thread Processing Units (TPUs( TPUs) Shared hardware resources TPU 0 TPU1 TPU2 TPU3 Icache TLB Dcache Scache
15 Key Design Principles High throughput single stream design Enhancements for SMT
16 Little s Law Average Number of Tasks in Region (N) Throughput (T) = Average Latency in Region (L)
17 Little s Law for Instruction Fetch L = fixed pipe length + average memory latency N = number of instructions fetched N T = L
18 Instruction Fetch Unit Wider fetch Fetch more statically consecutive instructions Limited by trace length Trace Cache Build sequences of dynamically consecutive instructions Significantly greater complexity Double fetch Fetch two non-consecutive blocks of instructions
19 Instruction Fetch Unit Address Latches Icache Rate Matching Buffer Collapsing Logic Line Predictor Misprediction Calculation Branch Predictor Jump Address Predictor Return Predictor
20 Instruction Fetch Characteristics Two 8-instruction 8 fetches per cycle 16 branch predictions per cycle Jump target prediction Return address prediction Rate matching buffer of fetched instructions Collapse fetched instructions into groups of 8
21 Execution Unit Issue Queue Regs Dcache Regs
22 Execution Unit Characteristics Single issue queue 8-wide 112+ entries Register file 512 registers 16 read/8 write ports Function units 8 integer ALUs 4 floating ALUs 4 memory operations (2 read/2 write)
23 Little s Law for Execution Unit L (min) = Number of cycles in pipe T (desired) = Number of desired instructions per cycle (8) N 8 =
24 Little s Law for Execution Unit Little's Law for the IQ Max IPC Average Q chunk Lifetime
25 Key Design Principles High throughput single stream design Enhancements for SMT
26 Additions for SMT Replication required resources Program counters Register File (architectural space) Register maps Sharable resources Register file (rename space) Instruction queue Branch predictor First and second level caches Translation buffers.
27 Approaches Replicated resources used for all per TPU state (except register file) some sharable resources where design is easier (*) E.g,, return stack predictor Shared resources used for register file (*) all other sharable resources (*) * Policy may be needed to make priority decisions
28 Choosing Policy
29 Choosing policies FIFO trivial Round robin easy Proportional special case Icount-style fair
30 Icount Choosing Policy
31 Why Does Icount Make Sense? N T = L N/4 T/4 = L
32 Choosers Address Latches Line Predictor Retire
33 Choosers - Fetch Address Latches Line Predictor Retire
34 Choosers Fetch Address Latches Line Predictor Retire
35 Choosers - Map Address Latches Line Predictor Retire
36 Choosers - Retire Address Latches Line Predictor Retire
37 Choosers LD/ST numbers cache Address Latches Line Predictor Retire
38 Choosers LD/ST numbers cache Address Latches Line Predictor Retire
39 Choosers Miss/Store cache Address Latches Line Predictor Retire
40 Choosers Fetch Chooser - Icount Map Chooser - Icount LD/ST Number Chooser - Proportional Retire Chooser Round Robin Load miss Chooser Round Robin Store Buffer Chooser - FIFO
41 Area Cost of SMT Support Total overhead: 6%±
42 Multiprogrammed workload 250% 200% 150% 100% 1T 2T 3T 4T 50% 0% SpecInt SpecFP Mixed Int/FP
43 Decomposed SPEC95 Applications 250% 200% 150% 100% 1T 2T 3T 4T 50% 0% Turb3d Swm256 Tomcatv
44 Multithreaded Applications 300% 250% 200% 150% 100% 1T 2T 4T 50% 0% Barnes Chess Sort TP
45 Acknowledgements Tryggve Fossum Chuan-Hua Chang George Chrysos Steve Felix Chris Gianos Partha Kundu Jud Leonard Matt Mattina Matt Reilly
46
Physically Constrained Architecture for Chip Multiprocessors
Physically Constrained Architecture for Chip Multiprocessors A Dissertation Presented to the faculty of the School of Engineering and Applied Science University of Virginia In Partial Fulfillment of the
More informationChapter 6: CPU Scheduling. Basic Concepts. Histogram of CPU-burst Times. CPU Scheduler. Dispatcher. Alternating Sequence of CPU And I/O Bursts
Chapter 6: CPU Scheduling Basic Concepts Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation Maximum CPU utilization obtained
More informationCPU scheduling. CPU Scheduling
EECS 3221 Operating System Fundamentals No.4 CPU scheduling Prof. Hui Jiang Dept of Electrical Engineering and Computer Science, York University CPU Scheduling CPU scheduling is the basis of multiprogramming
More informationHot Chips-18. Design of a Reusable 1GHz, Superscalar ARM Processor
Hot Chips-18 Design of a Reusable 1GHz, Superscalar ARM Processor Stephen Hill Consulting Engineer ARM - Austin Design Centre 22 August 2006 1 Outline Overview of Cortex -A8 (Tiger) processor What is reusability
More informationSaving Energy with Just In Time Instruction Delivery
Saving Energy with Just In Time Instruction Delivery Tejas Karkhanis Univ. of Wisconsin - Madison 1415 Engineering Drive Madison, WI 53706 1+608-265-3826 karkhani@ece.wisc.edu James E Smith Univ. of Wisconsin
More informationMicroprocessor Pipeline Energy Analysis: Speculation and Over-Provisioning
Microprocessor Pipeline Energy Analysis: Speculation and Over-Provisioning Karthik Natarajan Heather Hanson Stephen W. Keckler Charles R. Moore Doug Burger Department of Electrical and Computer Engineering
More informationHill-Climbing SMT Processor Resource Distribution
1 Hill-Climbing SMT Processor Resource Distribution SEUNGRYUL CHOI Google and DONALD YEUNG University of Maryland The key to high performance in Simultaneous MultiThreaded (SMT) processors lies in optimizing
More informationCore Architecture Optimization for Heterogeneous Chip Multiprocessors
Core Architecture Optimization for Heterogeneous Chip Multiprocessors Rakesh Kumar, Dean M. Tullsen, Norman P. Jouppi Department of Computer Science and Engineering HP Labs University of California, San
More informationCPU SCHEDULING. Scheduling Objectives. Outline. Basic Concepts. Enforcement of fairness in allocating resources to processes
Scheduling Objectives CPU SCHEDULING Enforcement of fairness in allocating resources to processes Enforcement of priorities Make best use of available system resources Give preference to processes holding
More informationPerformance monitors for multiprocessing systems
Performance monitors for multiprocessing systems School of Information Technology University of Ottawa ELG7187, Fall 2010 Outline Introduction 1 Introduction 2 3 4 Performance monitors Performance monitors
More informationBTB Access Filtering: A Low Energy and High Performance Design
BTB Access Filtering: A Low Energy and High Performance Design Shuai Wang, Jie Hu, and Sotirios G. Ziavras Depment of Electrical and Computer Engineering New Jersey Institute of Technology Newark, NJ 72
More informationMicroprocessor Pipeline Energy Analysis
Microprocessor Pipeline Energy Analysis Karthik Natarajan, Heather Hanson, Stephen W. Keckler, Charles R. Moore, Doug Burger Computer Architecture and Technology Laboratory Department of Computer Sciences
More informationDesigning Energy-Efficient Fetch Engines
Designing Energy-Efficient Fetch Engines A Dissertation Presented to the faculty of the School of Engineering and Applied Science University of Virginia In Partial Fulfillment of the requirements for the
More informationTABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF SYMBOLS AND ABBREVIATIONS
viii TABLE OF CONTENTS ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF SYMBOLS AND ABBREVIATIONS v xviii xix xxii 1. INTRODUCTION 1 1.1 MOTIVATION OF THE RESEARCH 1 1.2 OVERVIEW OF PROPOSED WORK 3 1.3
More informationUsing Hardware Performance Counters on the Cray XT
Using Hardware Performance Counters on the Cray XT Luiz DeRose Programming Environment Director Cray Inc. ldr@cray.com University of Bergen Bergen, Norway March 11-14, 2008 Luiz DeRose (ldr@cray.com) Cray
More informationPrinciples of Operating Systems
Principles of Operating Systems Lecture 9-10 - CPU Scheduling Ardalan Amiri Sani (ardalan@uci.edu) [lecture slides contains some content adapted from previous slides by Prof. Nalini Venkatasubramanian,
More informationPTSMT: A Tool for Cross-Level Power, Performance, and Thermal Exploration of SMT Processors
21st International Conference on VLSI Design PTSMT: A Tool for Cross-Level Power, Performance, and Thermal Exploration of SMT Processors Deepa Kannan, Aseem Gupta, Aviral Shrivastava, Nikil D. Dutt, Fadi
More informationDynamic Thermal Management in Modern Processors
Dynamic Thermal Management in Modern Processors Shervin Sharifi PhD Candidate CSE Department, UC San Diego Power Outlook Vdd (Volts) 1.2 0.8 0.4 0.0 Ideal Realistic 2001 2005 2009 2013 2017 V dd scaling
More informationCS 143A - Principles of Operating Systems
CS 143A - Principles of Operating Systems Lecture 4 - CPU Scheduling Prof. Nalini Venkatasubramanian nalini@ics.uci.edu CPU Scheduling 1 Outline Basic Concepts Scheduling Objectives Levels of Scheduling
More informationMaking the Right Hand Turn to Power Efficient Computing
Making the Right Hand Turn to Power Efficient Computing Justin Rattner Intel Fellow and Director Labs Intel Corporation 1 Outline Technology scaling Types of efficiency Making the right hand turn 2 Historic
More informationPerformance, Energy, and Thermal Considerations for SMT and CMP Architectures
Performance, Energy, and Thermal Considerations for SMT and CMP Architectures Yingmin Li, David Brooks, Zhigang Hu, Kevin Skadron Dept. of Computer Science, University of Virginia IBM T.J. Watson Research
More informationPerformance, Energy, and Thermal Considerations for SMT and CMP Architectures
Performance, Energy, and Thermal Considerations for SMT and CMP Architectures Yingmin Li, David Brooks, Zhigang Hu, Kevin Skadron Dept. of Computer Science, University of Virginia IBM T.J. Watson Research
More informationEmerging Workload Performance Evaluation on Future Generation OpenPOWER Processors
Emerging Workload Performance Evaluation on Future Generation OpenPOWER Processors Saritha Vinod Power Systems Performance Analyst IBM Systems sarithavn@in.bm.com Agenda Emerging Workloads Characteristics
More informationProfile-Based Energy Reduction for High-Performance Processors
Profile-Based Energy Reduction for High-Performance Processors Michael Huang, Jose Renau, and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu
More informationBalancing Power Consumption in Multiprocessor Systems
Universität Karlsruhe Fakultät für Informatik Institut für Betriebs und Dialogsysteme (IBDS) Lehrstuhl Systemarchitektur Balancing Power Consumption in Multiprocessor Systems DIPLOMA THESIS Andreas Merkel
More informationCAMP: A Technique to Estimate Per-Structure Power at Run-time using a Few Simple Parameters
CAMP: A Technique to Estimate Per-Structure Power at Run-time using a Few Simple Parameters Michael D. Powell, Arijit Biswas, Joel S. Emer, Shubhendu S. Mukherjee, Basit R. Sheikh 1, Shrirang Yardi 2 Intel
More informationA Conditional Probability Model for Vertical Multithreaded Architectures
A Conditional Probability Model for Vertical Multithreaded Architectures Robert M. Lane rmlkcl@mindspring.com Abstract Vertical multithreaded architectures can be estimated using a model based on conditional
More informationSTAND: New Tool for Performance Estimation of the Block Data Processing Algorithms in High-load Systems
STAND: New Tool for Performance Estimation of the Block Data Processing Algorithms in High-load Systems Victor Minchenkov, Vladimir Bashun St-Petersburg State University of Aerospace Instrumentation {victor,
More informationBalancing Soft Error Coverage with. Lifetime Reliability in Redundantly. Multithreaded Processors
Balancing Soft Error Coverage with Lifetime Reliability in Redundantly Multithreaded Processors A Thesis Presented to the faculty of the School of Engineering and Applied Science University of Virginia
More informationManaging SMT Resource Usage through Speculative Instruction Window Weighting
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Managing SMT Resource Usage through Speculative Instruction Window Weighting Hans Vandierendonck André Seznec N 7103 Novembre 2009 Domaine
More informationCPU Scheduling CPU. Basic Concepts. Basic Concepts. CPU Scheduler. Histogram of CPU-burst Times. Alternating Sequence of CPU and I/O Bursts
Basic Concepts CPU Scheduling CSCI 315 Operating Systems Design Department of Computer Science Notice: The slides for this lecture have been largely based on those from an earlier What does it mean to
More informationFAME: FAirly MEasuring Multithreaded Architectures
FAME: FAirly MEasuring Multithreaded Architectures Javier Vera Francisco J. Cazorla Alex Pajuelo Oliverio J. Santana Enrique Fernández Mateo Valero, Barcelona Supercomputing Center, Spain. {javier.vera,
More informationVampir MPI-Performance Analysis
Vampir MPI-Performance Analysis 05.08.2013 FB. Computer Science Scientific Computing Christian Iwainsky 1 How does one gain insight about an HPC applciation? Data has to be presented to the developer in
More informationSE350: Operating Systems. Lecture 6: Scheduling
SE350: Operating Systems Lecture 6: Scheduling Main Points Definitions Response time, throughput, scheduling policy, Uniprocessor policies FIFO, SJF, Round Robin, Multiprocessor policies Scheduling sequential
More informationTypes of Workloads. Raj Jain. Washington University in St. Louis
Types of Workloads Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-06/ 4-1 Overview!
More informationVampir. (MPI-) Performance Analysis. Dipl. Inf. Christian Iwainsky Fachgebiet Scientific Computing
Vampir (MPI-) Performance Analysis Dipl. Inf. Christian Iwainsky Fachgebiet Scientific Computing 11.11.2013 FB. Computer Science Scientific Computing Christian Iwainsky 1 How does one gain insight about
More informationHow fast is your EC12 / z13?
How fast is your EC12 / z13? Barton Robinson, barton@velocitysoftware.com Velocity Software Inc. 196-D Castro Street Mountain View CA 94041 650-964-8867 Velocity Software GmbH Max-Joseph-Str. 5 D-68167
More informationRoadmap. Tevfik Ko!ar. CSC Operating Systems Spring Lecture - V CPU Scheduling - I. Louisiana State University.
CSC 4103 - Operating Systems Spring 2008 Lecture - V CPU Scheduling - I Tevfik Ko!ar Louisiana State University January 29 th, 2008 1 Roadmap CPU Scheduling Basic Concepts Scheduling Criteria Different
More informationIncreasing the Performance of Superscalar Processors through Value Prediction. Arthur Perais
Increasing the Performance of Superscalar Processors through Value Prediction Arthur Perais Increasing Performance through Value Prediction 10/12/2015 Past and Recent Trends in General Purpose CPUs Power
More informationREDUCED OCCUPANCY FOR PROCESSOR EFFICIENCY THROUGH DOWNSIZED OUT-OF-ORDER ENGINE AND REDUCED TRAFFIC. Ziad Youssfi
REDUCED OCCUPANCY FOR PROCESSOR EFFICIENCY THROUGH DOWNSIZED OUT-OF-ORDER ENGINE AND REDUCED TRAFFIC By Ziad Youssfi A DISSERTATION Submitted to Michigan State University in partial fulfillment of the
More informationCPU Scheduling. Chapter 9
CPU Scheduling 1 Chapter 9 2 CPU Scheduling We concentrate on the problem of scheduling the usage of a single processor among all the existing processes in the system The goal is to achieve High processor
More informationCPU Scheduling (Chapters 7-11)
CPU Scheduling (Chapters 7-11) CS 4410 Operating Systems [R. Agarwal, L. Alvisi, A. Bracy, M. George, E. Sirer, R. Van Renesse] The Problem You re the cook at State Street Diner customers continuously
More informationAnalysing value substitution and confidence estimation for value prediction
Analysing value substitution and confidence estimation for value prediction Luis Piñuel, Rafael A. Moreno and Francisco Tirado Departamento de Arquitectura de Computadores y Automática Universidad Complutense
More informationHardware and Software Overview
Tutorial: Optimizing Applications for Performance on POWER Hardware and Software Overview SCICOMP13 July 2007 2007 IBM Agenda Hardware Software Documentation 2 2007 IBM Corporation Core CPU processor Hardware
More informationCPU Scheduling. Basic Concepts Scheduling Criteria Scheduling Algorithms. Unix Scheduler
CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms FCFS SJF RR Priority Multilevel Queue Multilevel Queue with Feedback Unix Scheduler 1 Scheduling Processes can be in one of several
More informationNBTI-Aware Dynamic Instruction Scheduling
NBTI-Aware Dynamic Instruction Scheduling Taniya Siddiqua and Sudhanva Gurumurthi Department of Computer Science University of Virginia Charlottesville, VA 2294 {taniya, gurumurthi}@cs.virginia.edu Abstract
More informationAccelerate HPC Development with Allinea Performance Tools. Olly Perks & Florent Lebeau
Accelerate HPC Development with Allinea Performance Tools Olly Perks & Florent Lebeau Olly.Perks@arm.com Florent.Lebeau@arm.com Agenda 09:00 09:15 09:15 09:30 09:30 09:45 09:45 10:15 10:15 10:30 Introduction
More informationEvaluating the Impact of Job Scheduling and Power Management on Processor Lifetime for Chip Multiprocessors
Evaluating the Impact of Job Scheduling and Power Management on Processor Lifetime for Chip Multiprocessors Ayse K. Coskun, Richard Strong, Dean M. Tullsen, and Tajana Simunic Rosing Computer Science and
More informationTasks or processes attempt to control or react to external events occuring in real time RT processes must be able to keep up events
RTOS Characteristics Time constraints Shdli Scheduling C. Brandolese, W. Fornaciari Real-Time Systems Definition of real-time system A real-time system is one in which h the correctness of the computation
More informationz/tpf and z13 Mark Gambino, TPF Development Lab March 23, 2015 TPFUG Dallas, TX
z/tpf and z13 Mark Gambino, TPF Development Lab March 23, 2015 TPFUG Dallas, TX Disclaimer Any reference to future plans are for planning purposes only. IBM reserves the right to change those plans at
More informationRoadmap. Tevfik Koşar. CSE 421/521 - Operating Systems Fall Lecture - V CPU Scheduling - I. University at Buffalo.
CSE 421/521 - Operating Systems Fall 2011 Lecture - V CPU Scheduling - I Tevfik Koşar University at Buffalo September 13 th, 2011 1 Roadmap CPU Scheduling Basic Concepts Scheduling Criteria & Metrics Different
More informationExploring Last n Value Prediction
Exploring Last n Value Prediction Martin Burtscher Benjamin G. Zorn Department of Computer Science Microsoft Corporation University of Colorado 1 Microsoft Way Boulder, Colorado 39-3 Redmond, WA 95 burtsche@cs.colorado.edu
More informationManaging SMT Resource Usage through Speculative Instruction Window Weighting
Managing SMT Resource Usage through Speculative Instruction Window Weighting Hans Vandierendonck, André Seznec To cite this version: Hans Vandierendonck, André Seznec. Managing SMT Resource Usage through
More informationDYNAMIC DETECTION OF WORKLOAD EXECUTION PHASES MATTHEW W. ALSLEBEN, B.S. A thesis submitted to the Graduate School
DYNAMIC DETECTION OF WORKLOAD EXECUTION PHASES BY MATTHEW W. ALSLEBEN, B.S. A thesis submitted to the Graduate School in partial fulfillment of the requirements for the degree Master of Science Major Subject:
More informationContext-Base Computational Value Prediction with Value Compression
Context-Base Computational Value Prediction with Value Compression Yasuo Ishii June 3 2018 2018 Arm Limited Submitted Track: 8KB 2018 Arm Limited Speedup over no VP Motivation and Achievement Known modern
More informationComposite Confidence Estimators for Enhanced Speculation Control
Composite Estimators for Enhanced Speculation Control Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio San Antonio, Texas, U.S.A djimenez@acm.org Abstract This paper
More informationBias Scheduling in Heterogeneous Multicore Architectures. David Koufaty Dheeraj Reddy Scott Hahn
Bias Scheduling in Heterogeneous Multicore Architectures David Koufaty Dheeraj Reddy Scott Hahn Motivation Mainstream multicore processors consist of identical cores Complexity dictated by product goals,
More informationSimulation-driven verification Design Automation Summer School, 2009
Simulation-driven verification Design Automation Summer School, 2009 Bob Bentley Director, Pre-silicon Validation Enterprise Microprocessor Group Intel Corporation Hillsboro, Oregon, U.S.A. 1 Agenda Microprocessor
More informationCONTINUING advances in semiconductor technology lead
IEEE TRANSACTIONS ON COMPUTERS, VOL. 55, NO. 3, MARCH 2006 281 Control Speculation for Energy-Efficient Next-Generation Superscalar Processors Juan L. Aragón, Member, IEEE Computer Society, JoséGonzález,
More informationReliability-Aware Scheduling on Heterogeneous Multicore Processors
Reliability-Aware Scheduling on Heterogeneous Multicore Processors Ajeya Naithani Ghent University, Belgium Stijn Eyerman Intel, Belgium Lieven Eeckhout Ghent University, Belgium ABSTRACT Reliability to
More informationMetrics for Lifetime Reliability
Metrics for Lifetime Reliability Pradeep Ramachandran, Sarita Adve, Pradip Bose, Jude Rivers and Jayanth Srinivasan 1 Department of Computer Science IBM T.J. Watson Research Center University of Illionis
More informationHomework 2: Comparing Scheduling Algorithms
Homework 2: Comparing Scheduling Algorithms The purpose of this homework is to build a scheduler to compare performance of a First In First Out (FIFO) versus Round Robin algorithms. The algorithms will
More informationHigh Performance Computing(HPC) & Software Stack
IBM HPC Developer Education @ TIFR, Mumbai High Performance Computing(HPC) & Software Stack January 30-31, 2012 Pidad D'Souza (pidsouza@in.ibm.com) IBM, System & Technology Group 2002 IBM Corporation Agenda
More informationReducing Power Density through Activity Migration
ISLPED 2003 8/26/2003 Reducing Power Density through Activity Migration Seongmoo Heo, Kenneth Barr, and Krste Asanovi Computer Architecture Group, MIT CSAIL Hot Spots Rapid rise of processor power density
More informationLecture #10. ********************************* Outline - 1 min ********************************* Tomasulo Loop Example Exceptions In-order Commit
Lecture #10 Review -- 1 min Registers not bottleneck Avoid WAR, WAW Not limited to basic blocks Lasting Contributions -- dynamic scheduling -- register renaming -- load/store disambiguation Outline - 1
More informationCPU Scheduling: Part I. Operating Systems. Spring CS5212
Operating Systems Spring 2009-2010 Outline CPU Scheduling: Part I 1 CPU Scheduling: Part I Outline CPU Scheduling: Part I 1 CPU Scheduling: Part I Basic Concepts CPU Scheduling: Part I Maximum CPU utilization
More informationא א א א א א א א
א א א W א א א א א א א א א 2008 2007 1 Chapter 6: CPU Scheduling Basic Concept CPU-I/O Burst Cycle CPU Scheduler Preemptive Scheduling Dispatcher Scheduling Criteria Scheduling Algorithms First-Come, First-Served
More informationPipeline Gating: Speculation Control For Energy Reduction
Pipeline Gating: Speculation Control For Energy Reduction Srilatha Manne University of Colorado Dept. of Electrical and Computer Engineering Boulder, CO 80309 srilatha.manne@colorado.edu Artur Klauser,
More informationTechniques for Improving Efficiency and Accuracy of Contemporary Dynamic Branch Predictors
University of Nevada, Reno Techniques for Improving Efficiency and Accuracy of Contemporary Dynamic Branch Predictors A dissertation submitted in partial fulfillment of the requirements for the degree
More informationDesigning Energy-Efficient Fetch Engines
Designing Energy-Efficient Fetch Engines Michele Co Department of Computer Science University of Virginia Advisor: Co-Advisor: Committee: Kevin Skadron Dee A.B. Weikle Jack Davidson (Chair) James Cohoon,
More informationScheduling Processes 11/6/16. Processes (refresher) Scheduling Processes The OS has to decide: Scheduler. Scheduling Policies
Scheduling Processes Don Porter Portions courtesy Emmett Witchel Processes (refresher) Each process has state, that includes its text and data, procedure call stack, etc. This state resides in memory.
More informationExploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors
Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matthew DeVuyst, Rakesh Kumar, and Dean M. Tullsen University of California, San Diego Department of Computer
More informationSEPAS: A Highly Accurate Energy-Efficient Branch Predictor
2.3 SEPAS: A Highly Accurate Energy-Efficient Branch Predictor ABSTRACT Designers have invested much effort in developing accurate branch predictors with short learning periods. Such techniques rely on
More informationExploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors
In Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium, April 2006 Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matthew
More informationCS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017
CS 471 Operating Systems Yue Cheng George Mason University Fall 2017 Page Replacement Policies 2 Review: Page-Fault Handler (OS) (cheap) (cheap) (depends) (expensive) (cheap) (cheap) (cheap) PFN = FindFreePage()
More informationDennis Bradford, Sundaram Chinthamani, Jesus Corbal, Adhiraj Hassan, Ken Janik, Nawab Ali
Dennis Bradford, Sundaram Chinthamani, Jesus Corbal, Adhiraj Hassan, Ken Janik, Nawab Ali 1 Legal Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS
More informationCPU Scheduling. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
CPU Scheduling Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu CPU Scheduling policy deciding which process to run next, given a set of runnable
More informationUnderstanding the Behavior of In-Memory Computing Workloads. Rui Hou Institute of Computing Technology,CAS July 10, 2014
Understanding the Behavior of In-Memory Computing Workloads Rui Hou Institute of Computing Technology,CAS July 10, 2014 Outline Background Methodology Results and Analysis Summary 2 Background The era
More informationAn Oracle White Paper January Upgrade to Oracle Netra T4 Systems to Improve Service Delivery and Reduce Costs
An Oracle White Paper January 2013 Upgrade to Oracle Netra T4 Systems to Improve Service Delivery and Reduce Costs Executive Summary... 2 Deploy Services Faster and More Efficiently... 3 Greater Compute
More informationControlling the Power and Area of Neural Branch Predictors for Practical Implementation in High-Performance Processors
Controlling the Power and Area of Neural Branch Predictors for Practical Implementation in High-Performance Processors Daniel A. Jiménez Rutgers University Department of Computer Science djimenez@cs.rutgers.edu
More informationCPU Scheduling. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
CPU Scheduling Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3044: Operating Systems, Fall 2017, Jinkyu Jeong (jinkyu@skku.edu) CPU Scheduling
More informationProphet/Critic Hybrid Branch Prediction
Prophet/Critic Hybrid Branch Prediction Ayose Falcón Jared Stark Alex Ramirez Konrad Lai Mateo Valero Computer Architecture Department Universitat Politècnica de Catalunya {afalcon, aramirez, mateo}@ac.upc.es
More informationProject 2 solution code
Project 2 solution code Project 2 solution code in files for project 3: Mutex solution in Synch.c But this code has several flaws! If you copied this, we will know! Producer/Consumer and Dining Philosophers
More informationCPU Scheduling. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
CPU Scheduling Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationDelivering High Performance for Financial Models and Risk Analytics
QuantCatalyst Delivering High Performance for Financial Models and Risk Analytics September 2008 Risk Breakfast London Dr D. Egloff daniel.egloff@quantcatalyst.com QuantCatalyst Inc. Technology and software
More informationBioBench: A Benchmark Suite of Bioinformatics Applications
BioBench: A Benchmark Suite of Bioinformatics Applications Kursad Albayraktaroglu, Aamer Jaleel, Xue Wu, Manoj Franklin, Bruce Jacob, Chau-Wen Tseng, Donald Yeung Dept. of Electrical & Computer Engineering
More informationLecture 11: CPU Scheduling
CS 422/522 Design & Implementation of Operating Systems Lecture 11: CPU Scheduling Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous versions of
More informationPower, Energy and Performance. Tajana Simunic Rosing Department of Computer Science and Engineering UCSD
Power, Energy and Performance Tajana Simunic Rosing Department of Computer Science and Engineering UCSD 1 Motivation for Power Management Power consumption is a critical issue in system design today Mobile
More informationOptimize code for Intel Xeon Phi. Discovering bottlenecks without pain
Optimize code for Intel Xeon Phi Discovering bottlenecks without pain Agenda Introduction Allinea MAP and optimizing for Xeon Phi Conclusion Allinea Unified environment A modern integrated environment
More informationCourse "Empirical Evaluation in Informatics" Benchmarking
Course "Empirical Evaluation in Informatics" Benchmarking Prof. Dr. Lutz Prechelt Freie Universität Berlin, Institut für Informatik http://www.inf.fu-berlin.de/inst/ag-se/ Example 1: SPEC CPU2000 Benchmark
More informationDynamic Power Gating with Quality Guarantees
Appears in the International Symposium on Low Power Electronics and Design 29 San Francisco, California, August, 29 Dynamic Power Gating with Quality Guarantees Anita Lungu 1, Pradip Bose 2, Alper Buyuktosunoglu
More informationSTEM: Spatiotemporal Management of Capacity for Intra-Core Last Level Caches
STEM: Spatiotemporal Management of Capacity for Intra-Core Last Level Caches Dongyuan Zhan, Hong Jiang, Sharad C. Seth Dept. of Computer Science & Engineering University of Nebraska - Lincoln Email: {dzhan,jiang,seth}@cse.unl.edu
More informationResources and Special Effects
Resources and Special Effects Charles Grassl IBM August, 2004 2004 IBM Part 2: Special Effects Large Pages Memory Affinity Process Binding Simultaneous Multi-Threading 17 2004 IBM Corporation 1 Large Page
More informationThermoelectric-based Sustainable Self- Cooling for Fine-Grained Processor Hot Spots
IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems 16 Thermoelectric-based Sustainable Self- Cooling for Fine-Grained Processor Hot Spots Soochan Lee (LG Electronics),
More informationCSC 553 Operating Systems
CSC 553 Operating Systems Lecture 9 - Uniprocessor Scheduling Types of Scheduling Long-term scheduling The decision to add to the pool of processes to be executed Medium-term scheduling The decision to
More informationSTEM: Spatiotemporal Management of Capacity for Intra-Core Last Level Caches
21 43rd Annual IEEE/ACM International Symposium on Microarchitecture STEM: Spatiotemporal Management of Capacity for Intra-Core Last Level Caches Dongyuan Zhan, Hong Jiang, Sharad C. Seth Dept. of Computer
More informationIntroduction to Operating Systems. Process Scheduling. John Franco. Dept. of Electrical Engineering and Computing Systems University of Cincinnati
Introduction to Operating Systems Process Scheduling John Franco Dept. of Electrical Engineering and Computing Systems University of Cincinnati Lifespan of a Process What does a CPU scheduler do? Determines
More informationThermal Management with Asymmetric Dual Core Designs ; CU-CS
University of Colorado, Boulder CU Scholar Computer Science Technical Reports Computer Science Spring 5-1-2003 Thermal Management with Asymmetric Dual Core Designs ; CU-CS-965-03 Soraya Ghiasi University
More information