Peter Ungaro President and CEO

Size: px
Start display at page:

Download "Peter Ungaro President and CEO"

Transcription

1 Peter Ungaro President and CEO

2 We build the world s fastest supercomputers to help solve Grand Challenges in science and engineering Earth Sciences CLIMATE CHANGE & WEATHER PREDICTION Life Sciences PERSONALIZED MEDICINE & IMPROVED BIOFUELS Defense & National Security WARFIGHTER SUPPORT, THREAT PREDICTION & STOCKPILE STEWARDSHIP Computer-Aided Engineering AIRCRAFT DESIGN, CRASH SIMULATION & FLUID DYNAMICS Petroleum SEISMIC IMAGING & RESERVOIR SIMULATION Scientific Research NEW ENERGY SOURCES & EFFICIENT COMBUSTION

3 Supercomputing Leadership at the Petascale

4

5 Average Number of Processor Cores per Supercomputer (Top 20 of Top 500) Source: 79, ,306 47, ,245 1,073 1,644 1,847 2,230 2,827 3,093 3,518 10,073 16,316 20,

6 Supercomputing will become almost solely focused on scalability The flattening of per-core performance has renewed interest in novel processing architectures

7 Henri Calandra, John Etgen & Scott Morton, Oil & Gas Workshop 2011

8 3D Reverse Time Migration (RTM) Decomposable: Throughput with memory & time contraints Large Scale Computations: 1,000s Cores, Weeks of Processing 3D Full Waveform Inversion (FWI) Promises much greater image fidelity Less decomposable into smaller jobs (capability vs. capacity) Requires robust interprocessor communication at large scale Enormous Computation: 10K-100K Cores, Months of Processing Ref.: G.A. Newman, Large Scale Computing Requirements for Basic Energy Sciences

9 Compute & data requirements for seismic processing are huge Wide demands on processing from data acquisition to seismic to res sim Petaflop scale systems required for state-of the art processing Petabytes of capacity and terabytes of bandwidth from I/O An accurate seismic image has huge returns A single deep water well can cost >$100M and getting deeper Restating a reserve has serious business implications When requirements & return are huge the demand for getting it right goes up This is the class of simulation that drives real petascale capability computing You can do capacity on capability systems but not vice versa risk mitigation

10 Homogeneous Architectures Hybrid Architectures Cray XE6 node Cray XK6 node Different CPU:GPU ratios possible System balance at scale requires a much more tightly integrated architecture especially as node/socket performance grows

11 The algorithms can run on tens of thousands of cores and have been designed to scale well beyond these numbers. Runs on NERSC s Franklin system routinely use 4,000-8,000 cpu cores EMGeo (ElectroMagnetic Geological Mapper) image developed at NERSC Compliments seismic imaging 2009 award for innovation Investigators Gregory Newman and Michael Commer (LBNL)

12 Hybrid systems incorporating accelerator processors are here to stay Parallelism and power will drive this Productivity vs. Performance trade-off Question is how widely can accelerators be used effectively The line between accelerators and cpu s will blur over time Programming accelerators efficiently requires a large investment Need a single programming model that is portable across machine types, and also forward scalable in time Cray is investing heavily in accelerator software technology and building a tightly coupled, high-level programming environment with compilers, libraries, and tools that will help hide the complexity of the system

13 A common directives-based programming model for accelerators Announced in November at SC11 conference Offers portability between operating systems, host CPU s, accelerators and compliers Single code base! Works for Fortran, C, C++ OpenACC is a technically impressive initiative brought together by members of the OpenMP Working Group on Accelerators, as well as many others. We look forward to releasing a version of this proposal in the next release of OpenMP. Michael Wong CEO, OpenMP Directives Board February

14 An Example Kernel 5,869 lines CUDA TM 6,362 lines affected 1.3X Speedup OpenACC TM 20 lines affected 1.8X Speedup

15 We must deliver petascale, not just petaflops, solutions (P.S. Ditto for Exascale)

16

17 1 2 Huge Data Storage with High Performance I/O Sonexion: Integrated Lustre storage solution Data Analytics urika: Graph Appliance for Relationship Analytics

18 Interpretation 3D viewer The storage requirement (has) jumped one hundred fold and with that the stress on just about every component of the data chain most importantly feeding the computation Philip Neri, Paradigm, Jan 2011

19 Exclusively designed for Lustre Fully modular and scalable Performance scales with capacity GUI to manage environment Over 1PB and 20 GB/sec per rack!

20 The Challenge Feature Search to Aid Visualization of Seismic Simulations Hundreds of Terabytes of seismic data and simulation results Interpretation of results done by humans, using visualization techniques with success varying by analyst Enable interactive, real-time search for features of interest across multiple simulation models (seismic, EM, reservoir, well-logging, geological, and other interpretive results) urika Solution urika holds results of multiple models as spatio-temporal graphs in memory (up to 512 TB) Identify and highlight significant features for analysts, based on the search for geo-spatial patterns and discrepancies across time, geography and model Interactive, real-time guide to multiple analysts looking for optimal drilling opportunities Business Value Consistent evaluation of simulation results at the performance level of the best analysts. Cost of error can exceed $100M. How to leverage analytics across an IOC enterprise?

21 Cray System & Storage cabinets: Compute nodes: System Memory: Usable Storage; Bandwidth: Peak performance: Number of AMD x86 cores: Number of NVIDIA GPUs; cores: >300 >25,000 >1.5 Petabytes >25 Petabytes; >1TB/sec >11.5 Petaflops >380,000 >3,000; > 1.5M

22 Build a world-class supercomputer that enables transformational computing across a broad set of science, engineering and advanced analytics applications

23

24 ~ 250 cabinets ~12-14 TF processor ~5 PF per cabinet >1 EF peak >100,000 sockets (~ Billion threads) 6.5 PB on-socket memory (64GBs/socket) 50 PB off-socket memory (512GBs/socket) MW

25 Moving to the next level of seismic processing will require at least an order of magnitude increase in compute power, leveraging capability computing in addition to capacity No mater what the compute processors, it is the overall scalability, balance and programmability of the entire system that enables the solution A tightly-integrated, holistic approach to the HPC environment is important for petascale systems and an absolute requirement in the move to exascale computing

26 Seymour Cray June 4, 1995

27