HETEROGENEOUS SYSTEM ARCHITECTURE: FROM THE HPC USAGE PERSPECTIVE

Similar documents
POWER. OpenPOWER Foundation Overview. February Steve Fields IBM Distinguished Engineer Director of Power Systems Design

Mont-Blanc work Past, present & future

AMD RYZEN TM CORPORATE BRAND GUIDELINES

Does ESL have a role in Verification? Nick Gatherer Engineering Manager Processor Division ARM

Advanced Support for Server Infrastructure Refresh

PSA Peugeot Citroën PME PORTO

RODOD Performance Test on Exalogic and Exadata Engineered Systems

Oracle Autonomous Data Warehouse Cloud

ORACLE S PEOPLESOFT HRMS 9.1 FP2 SELF-SERVICE

QuickSpecs. Why use Platform LSF?

DEPEI QIAN. HPC Development in China: A Brief Review and Prospect

``Overview. ``The Impact of Software. ``What are Virtual Prototypes? ``Competitive Electronic Products Faster

TLM-Driven Design and Verification Time For a Methodology Shift

Jack Weast. Principal Engineer, Chief Systems Engineer. Automated Driving Group, Intel

Oracle Systems Optimization Support

An Oracle White Paper July Enterprise Operations Monitor: Real-Time Voice over IP Monitoring and Troubleshooting

High Level Tools for Low-Power ASIC design

COMPUTE CLOUD SERVICE. Move to Your Private Data Center in the Cloud Zero CapEx. Predictable OpEx. Full Control.

ANSYS FLUENT Performance Benchmark and Profiling. October 2009

PORTFOLIO MANAGEMENT Thomas Zimmermann, Solutions Director, Software AG, May 03, 2017

EBOOK NetApp Cloud Sync Service

Why more and more SAP customers are migrating to Solaris

Application of Advanced Multi-Core Processor Technologies to Oceanographic Research

Oracle Autonomous Data Warehouse Cloud

Oracle Linux Management with Oracle Enterprise Manager 13c Cloud Control O R A C L E W H I T E P A P E R M A R C H

AN Contact reader ICs - TDA product support packages. Document information

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Oracle Utilities Mobile Workforce Management Benchmark

ORACLE INFRASTRUCTURE AS A SERVICE PRIVATE CLOUD WITH CAPACITY ON DEMAND

FPGA as a Service in the Cloud. Craig Davies Principal Hardware Architect

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

StorageTek Virtual Storage Manager System 7

The IBM and Oracle alliance. Power architecture

Oracle Big Data Cloud Service

Oracle Enterprise Manager 13c Cloud Control

Sizing SAP Central Process Scheduling 8.0 by Redwood

An Oracle White Paper June, Integrated Application-to-Disk Management with Oracle Enterprise Manager Cloud Control 12c

About Oracle Primavera P6 Enterprise Project Portfolio Management

Whitepaper Hardware Convergence & Functional Safety: Optimal Design Methods in Today s Automotive Digital Instrument Clusters

Addressing the I/O bottleneck of HPC workloads. Professor Mark Parsons NEXTGenIO Project Chairman Director, EPCC

SAVE MAINFRAME COSTS ZIIP YOUR NATURAL APPS

NSF {Program (NSF ) first announced on August 20, 2004} Program Officers: Frederica Darema Helen Gill Brett Fleisch

Totally Integrated Automation Portal

AMR (Adaptive Mesh Refinement) Performance Benchmark and Profiling

Introduction to the IBM MessageSight appliance for Mobile Messaging and M2M

About Configuring BI Publisher for Primavera Unifier. Getting Started with BI Publisher Reports

Oracle Production Scheduling

WebCenter Content. Complete and Versatile Content Management

Oracle Business Intelligence Suite Enterprise Edition 4,000 User Benchmark on an IBM System x3755 Server running Red Hat Enterprise Linux

Integrated Service Management

ORACLE CLOUD MANAGEMENT PACK FOR MIDDLEWARE

Job Scheduling Challenges of Different Size Organizations

Oracle CPQ Cloud and Salesforce.com Integration

Building a Multi-Tenant Infrastructure for Diverse Application Workloads

IoT: The 4th Industrial Revolution Yau Wai Yeong, Product Marketing Manager, Intel Internet-of-Things Group

Achieving Agility and Flexibility in Big Data Analytics with the Urika -GX Agile Analytics Platform

Oracle Big Data Discovery The Visual Face of Big Data

Building smart products: best practices for multicore software development

IBM System Storage. Kim Mortensen Nordic Storage Product Manager 2011 IBM Corporation

ORACLE UTILITIES CUSTOMER CARE AND BILLING 2.X FUNCTIONALITY FOR IMPLEMENTERS

Oracle In-Memory Cost Management Cloud. Release What s New

Cognitive Solutions in the Context of IBM Systems Cognitive Analytics / Integration Scenarios / Use Cases

SAP CENTRAL PROCESS SCHEDULING BY REDWOOD: FREQUENTLY ASKED QUESTIONS

Oracle Manufacturing Cloud

Oracle WebCenter Interaction Statement of Direction. February 2009

Deployment Recommendations for SAP Fiori Front-End Server & SAP Fiori Cloud

Oracle s Platform for SAP Solutions

THE NEW HYPER-CONNECTED ENTERPRISE. Improve collaboration. Enhance customer experiences. Streamline business processes.

Simio Supplemental Product Information

Benefits of Deploying Business Applications On Oracle Cloud At Customer O R A C L E W H I T E P A P E R D E C E M B E R

IBM xseries 430. Versatile, scalable workload management. Provides unmatched flexibility with an Intel architecture and open systems foundation

Advanced Types Of Scheduling

ORACLE BIG DATA APPLIANCE

Infor LN Minimum hardware requirements. Sizing Documentation

MOBILITY SOLUTIONS FOR AN ARRAY OF MOBILE APPLICATIONS AND BEYOND...

Using ClarityTM for Application Portfolio Management

CS 5220: VMs, containers, and clouds. David Bindel

Challenges in Application Scaling In an Exascale Environment

IBM Spectrum Scale. Transparent Cloud Tiering Deep Dive

Product Brief SysTrack VMP

RKT Live Expert Session

Oracle Manufacturing Cloud R13

Understanding the Business Value of Docker Enterprise Edition

Infor LN UI Sizing Guide

Agile Mainframe Software Development

CPU Scheduling. Disclaimer: some slides are adopted from book authors and Dr. Kulkarni s slides with permission

Innovative solutions to simplify your business. IBM System i5 Family

The Benefits of Consolidating Oracle s PeopleSoft Applications with the Oracle Optimized Solution for PeopleSoft

Infor LN UI 11.3 Sizing Guide

Oracle Profitability and Cost Management Cloud. What s New in the April Update (17.04)

Session 2.9: Tivoli Process Managers

Avaya Call Management System Capacities

Oracle Real-Time Scheduler Benchmark

Oracle Maintenance Cloud

Principles of Operating Systems

The Road to EMV on the Forecourt. May 7, 2015

Knauf builds high-speed business insight with SAP and IBM

Introduction to IBM Technical Computing Clouds IBM Redbooks Solution Guide

ADDITIONAL TERMS FOR INTEROUTE CLOUD HOSTED UNIFIED COMMUNICATIONS SCHEDULE 2U

IBM Sterling B2B Integrator

Transcription:

HETEROGENEOUS SYSTEM ARCHITECTURE: FROM THE HPC USAGE PERSPECTIVE Haibo Xie, Ph.D. Chief HSA Evangelist AMD China

AGENDA: GPGPU in HPC, what are the challenges Introducing Heterogeneous System Architecture (HSA) How HSA benefits GPGPU in HPC usage Taking HSA to the Industry

GPU IN HPC WHAT ARE THE CHALLENGES? Massively Parallel Processing? Finding Parallelism? SIMDs/Vector-Arrays? Bringing Data to Computation? Refine the algorithm? 3 HPC China 2012 HSA: from the HPC usage perspective Oct. 30, 2012

--

THE PROBLEM WHY IS IT DIFFICULT? Algo., programming Hardware, tool-chain Not every HPC domain-science programmer could use GPUs Efforts on tailoring algo. Even the size of the problem Code reuse remains an issues Data transfer cost Distributed memory space between CPU and GPU embarrassed (legacy) programming models High software runtime overhead Special purpose devices that lacks the necessary tools 5 HPC China 2012 HSA: from the HPC usage perspective Oct. 30, 2012

BUT Several efforts still targeted at utilizing GPUs in HPC Hybrid computing became a common term, heterogeneity is now becoming a norm Getting performance is still a problem in general purpose HPC US Department of Energy's 20 MW expectation ExaScale system is probably going to end up being a optimization problem to solve 6 HPC China 2012 HSA: from the HPC usage perspective Oct. 30, 2012

RE-THINKING CPU+dGPU 7 HPC China 2012 HSA: from the HPC usage perspective Oct. 30, 2012

CHANGING THE THINKING 8 HPC China 2012 HSA: from the HPC usage perspective Oct. 30, 2012

INTRODUCING HETEROGENEOUS SYSTEM ARCHITECTURE Brings All the Processors in a System into Unified Coherent Memory 9 HPC China 2012 HSA: from the HPC usage perspective Oct, 30, 2012

INTRODUCING HETEROGENEOUS SYSTEM ARCHITECTURE Brings All the Processors in a System into Unified Coherent Memory POWER EFFICIENT INDUSTRY SUPPORT EASY TO PROGRAM OPEN STANDARD FUTURE LOOKING ESTABLISHED TECHNOLOGY FOUNDATION 10 HPC China 2012 HSA: from the HPC usage perspective Oct. 30, 2012

HSA APU FEATURE ROADMAP Physical Integration Optimized Platforms Architectural Integration System Integration Integrate CPU & GPU in silicon GPU Compute C++ support Unified Address Space for CPU and GPU GPU compute context switch Unified Memory Controller User mode scheduling GPU uses pageable system memory via CPU pointers GPU graphics pre-emption Quality of Service Common Manufacturing Technology Bi-Directional Power Mgmt between CPU and GPU Fully coherent memory between CPU & GPU Extend to Discrete GPU 11 HPC China 2012 HSA: from the HPC usage perspective Oct. 30, 2012

HSA COMPLIANT FEATURES Optimized Platforms GPU Compute C++ support Support OpenCL C++ directions and Microsoft s upcoming C++ AMP language. This eases programming of both CPU and GPU working together to process parallel workloads. User mode scheduling Drastically reduces the time to dispatch work, requiring no OS kernel transitions or services, minimizing software driver overhead Bi-Directional Power Mgmt between CPU and GPU Enables power sloshing where CPU and GPU are able to dynamically lower or raise their power and performance, depending on the activity and which one is more suited to the task at hand. 12 HPC China 2012 HSA: from the HPC usage perspective Oct. 30, 2012

HSA COMPLIANT FEATURES Architectural Integration Unified Address Space for CPU and GPU GPU uses pageable system memory via CPU pointers Fully coherent memory between CPU & GPU The unified address space provides ease of programming for developers to create applications. For HSA platforms, a pointer is really a pointer and does not require separate memory pointers for CPU and GPU. The GPU can take advantage of the CPU virtual address space. With pageable system memory, the GPU can reference the data directly in the CPU domain. In prior architectures, data had to be copied between the two spaces or page-locked prior to use. And, NO GPU memory size limitation! Allows for data to be cached by both the CPU and the GPU, and referenced by either. In all previous generations, GPU caches had to be flushed at command buffer boundaries prior to CPU access. And unlike discrete GPUs, the CPU and GPU in an APU share a high speed coherent bus. 13 HPC China 2012 HSA: from the HPC usage perspective Oct. 30, 2012

FULL HSA FEATURES System Integration GPU compute context switch GPU tasks can be context switched, making the GPU a multi-tasker. Context switching means faster application, graphics and compute interoperation. Users get a snappier, more interactive experience. GPU graphics preemption As more applications enjoy the performance and features of the GPU, it is important that interactivity of the system is good. This means low latency access to the GPU from any process. Quality of service With context switching and pre-emption, time criticality is added to the tasks assigned to the processors. Direct access to the hardware for multi-users or multiple applications are either prioritized or equalized. 14 HPC China 2012 HSA: from the HPC usage perspective Oct. 30, 2012

HSA SOLUTION STACK System components: Compliant heterogeneous computing hardware A software compilation stack A user - space runtime system Kernel - space system components Application SW Application Domain Specific Libs (Bolt, OpenCV, many others) Overall Vision: Make GPU easily accessible Support mainstream languages, expandable to domain specific languages Complete GPU tool-chain, Programming & debugging & profiling like CPU does Make compute offload efficient Direct path to GPU (avoid Graphics overhead) Eliminate memory copy, Low-latency dispatch Make it ubiquitous Drive HSA as a standard through HSA Foundation Differentiated HW CPU(s) GPU(s) Open Source key components HSA Software Drivers HSA Runtime HSAIL HSA Finalizer GPU ISA OpenCL Runtime DirectX Runtime Legacy Drivers Other Runtime Other Accelerators 15 HPC China 2012 HSA: from the HPC usage perspective Oct. 30, 2012

Driver Stack HSA Software Stack Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Domain Libraries HSA Domain Libraries OpenCL 1.x, DX Runtimes, User Mode Drivers Graphics Kernel Mode Driver HSA JIT Task Queuing Libraries HSA Runtime HSA Kernel Mode Driver Hardware - APUs, CPUs, GPUs AMD user mode component AMD kernel mode component All others contributed by third parties or AMD 16 HPC China 2012 HSA: from the HPC usage perspective Oct. 30, 2012

HETEROGENEOUS COMPUTE DISPATCH How compute dispatch operates today in the driver model How compute dispatch improves tomorrow under HSA 17 HPC China 2012 HSA: from the HPC usage perspective Oct. 30, 2012

HSA COMMAND AND DISPATCH CPU <-> GPU Application / Runtime CPU1 CPU2 GPU 18 HPC China 2012 HSA: from the HPC usage perspective Oct. 30, 2012

HSA INTERMEDIATE LAYER - HSAIL HSAIL is a virtual ISA for parallel programs Finalized to ISA by a JIT compiler or Finalizer Low level for fast JIT compilation Explicitly parallel Designed for data parallel programming Support for exceptions, virtual functions, and other high level language features Syscall methods GPU code can call directly to system services, IO, printf, etc Debugging support 19 HPC Advisory Council HSA: platform for the future Oct, 28, 2012

HSA TAKING PLATFORM TO PROGRAMMERS Balance between CPU and GPU for performance and power efficiency Make GPUs accessible to wider audience of programmers Programming models close to today s CPU programming models Enabling more advanced language features on GPU Shared virtual memory enables complex pointer-containing data structures (lists, trees, etc) and hence more applications on GPU Kernel can enqueue work to any other device in the system (e.g. GPU->GPU, GPU->CPU) Enabling task-graph style algorithms, Ray-Tracing, etc Complete tool-chain for programming, debugging and profiling HSA provides a compatible architecture across a wide range of programming models and HW implementations. 20 HPC China 2012 HSA: from the HPC usage perspective Oct, 30, 2012

HSA VALUES GPGPU EASIER TO PROGRAM Pointer is a pointer! Expressive runtime for rich high level programming language, C/C++, Java, Python, C# More programming models support, OpenCL, C++ AMP, OpenMP Cacheable and coherent memory, more data structure allowed to be freely shared Single Source for all processors on the SOC 21 HPC China 2012 HSA: from the HPC usage perspective Oct. 30, 2012

HSA VALUES GPGPU PERFORMANCE AND POWER EFFICIENCY Pass pointer rather than moving data, support more problem with different dataset Reduced kernel launch time and efficient CPU/GPU communication Hardware managed queue and scheduling, allows for very low-latency comm. between devices. Good for performance and power effficiencypreemption and context switching, Support for Multiple Pre-emption Concurrent and context GPU switching, process, Preemptive Support for Multitasking Multiple Concurrent of CPU/GPU process, resources Preemptive Multitasking of CPU/GPU resources Bi-Directional Power Mgmt between CPU and GPU, Turbo Core technology for more power efficiency 22 HPC China 2012 HSA: from the HPC usage perspective Oct, 30, 2012

TAKING HSA TO THE INDUSTRY Copyright 2012 HSA Foundation. All Rights Reserved.

Copyright 2012 HSA Foundation. All Rights Reserved. 24 HSA FOUNDATION INITIAL FOUNDERS represented by, CVP, Heterogeneous Applications and Developer Solutions represented by, ARM Fellow and VP of Technology, Media Processing represented by Vice President, Marketing represented by, Senior Director, CTO Office represented by, Director, Linux Development Center

AMD S OPEN SOURCE COMMITMENT TO HSA We will open source our linux execution and compilation stack Jump start the ecosystem Allow a single shared implementation where appropriate Enable university research in all areas Component Name AMD Specific Rationale HSA Bolt Library No Enable understanding and debug OpenCL HSAIL Code Generator No Enable research LLVM Contributions No Industry and academic collaboration HSA Assembler No Enable understanding and debug HSA Runtime No Standardize on a single runtime HSA Finalizer Yes Enable research and debug HSA Kernel Driver Yes For inclusion in linux distros 25 HPC China 2012 HSA: from the HPC usage perspective Oct, 30, 2012

THE FUTURE OF HETEROGENEOUS COMPUTING The architectural path for the future is clear Programming patterns established on Symmetric Multi-Processor (SMP) systems migrate to the heterogeneous world An open architecture, with published specifications and an open source execution software stack Heterogeneous cores working together seamlessly in coherent memory Low latency dispatch No software fault lines APU server will unleash the GPGPU power in HPC domain 26 HPC China 2012 HSA: from the HPC usage perspective Oct, 30, 2012

WHERE ARE WE TAKING YOU? Switch the compute, don t move the data! Every processor now has serial and parallel cores All cores capable, with performance differences Simple and efficient program model Platform Design Goals Easy support of massive data sets Support for task based programming models Solutions for all platforms Open to all 27 HPC China 2012 HSA: from the HPC usage perspective Oct. 30, 2012

THANK YOU! Access HSA: http://developer.amd.com http://hc.csdn.net Haibo Xie: haibo.xie@amd.com

DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes. NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. 2012 Advanced Micro Devices, Inc. 29 HPC China 2012 HSA: from the HPC usage perspective Oct. 30, 2012