DEPEI QIAN. HPC Development in China: A Brief Review and Prospect

DEPEI QIAN Qian Depei, Professor at Sun Yat-sen university and Beihang University, Dean of the School of Data and Computer Science of Sun Yat-sen University. Since 1996 he has been the member of the expert group and expert committee of the National High-tech Research & Development Program (the 863 program) in information technology. He was the chief scientist of three 863 key projects on high performance computing since 2002. Currently, he is the chief scientist of the 863 key project on high productivity computer and application service environment. His current research interests include high performance computer architecture and implementation technologies, distributed computing, network management and network performance measurement. He has published over 300 papers in journals and conferences. HPC Development in China: A Brief Review and Prospect

HPC development in China: a brief review and prospect Depei Qian Beihang University/Sun Yat-sen University ASC 2017, Wuxi, China April. 27, 2017

Outline A Brief review Issues in exa-scale system development Prospect of HPC in China s 13th 5-year Plan

A Brief review

Three 863 key projects on HPC 2002-2005:High Performance Computer and Core Software Research on resource sharing and collaborative work Grid-enabled applications in multiple areas TFlops computers and China National Grid (CNGrid) testbed 2006-2010:High Productivity Computer and Grid Service Environment High productivity Application performance Efficiency in program development Portability of programs Robust of the system Emphasizing service features of the HPC environment Developing peta-scale computers 2010-2016:High Productivity Computer and Application Service Environment Developing 100PF computers Developing large scale HPC applications Upgrading of CNGird

High performance Computers (1996-2016) 1996 China: Dawning 1000, 2.5GF US:1 TF 2016 China: Sunway Taihulight, 125PF, 50 million times increase in 20 years Milestones 2001: Dawning 3000, 432GF,cluster 2003: DeepComp 6800, 5.32TF, cluster 2004: Dawning 4000, 11.2TF, cluster 2008: DeepComp 7000, 150TF, Heterogeneous cluster Dawning 5000A, 230 TF, cluster 2010: TH-1A, 4.7PF, heterogeneous accelerated architecture Dawning 6000, 3PF, heterogeneous accelerated architecture 2011: Sunway Bluelight, multicore-based, 1PF, home-grown processors Dawning3000 DeepComp 6800 Dawning 4000 DeepComp 7000 Dawning 5000A Dawning 6000 TH-1A Sunway Bluelight

High performance Computers (1996-2016) 2013: Tianhe-2 CPU+MIC Heterogeneous accelerated architecture 54.9 PF peak, 33.9 PF Linpack, No. 1 in Top500 for 6 times from 2013 to 2015 Installed at the National Supercomputing Center in Guangzhou Will be upgraded to 100PF this year 2016: Sunway TaihuLight Implemented with home-grown Shenwei many-core processors, 10 million cores in total 125 PF peak, 93 PF Linpack, No. 1 in Top500 in June and Nov. of 2016 Installed at the National Supercomputing Center in Wuxi Tianhe-2 Sunway Bluelight

High performance computing environment (1996-2016) 1996 One national HPC center in Hefei, equipped with Dawning-I, 640 MIPS US computing infrastructure: PACI supported by NSF and DoE centers, emerging of Grid technology 2016 China National Grid, composed of 17 national supercomputing centers and HPC centers, world leading class computing resources

High performance computing applications (1996-2016) 1996 Limited HPC application in weather forecasting, oil exploration 16-32 parallelism Relying on import application software 2016 HPC applications in many domains 10-million core parallelism reached, Gordon Bell Prize in 2016 Developed a number application software, adopted by production systems aircraft design high speed train design oil & gas exploration new drug discovery ensemble weather forecasting bio-information car development design optimization of large fluid machinery electromagnetic computation

Important experiences Coordinated effort of the national research programs and the regional development plans Providing matching funding for development of leading class HPC systems Joint effort by the MOST and local government in establishing the national supercomputing centers Multi-lateral cooperation Supercomputing centers play an important role in determining the metrics of the HPC systems developed In selection of the team to develop the system enterprises participate in the national R&D program Inspur, Sougon, and Lenovo were involved in HPC system development promoting the development while improving the technical capability and competitiveness of the company application organizations lead the development of the application software Balanced and coordinated development of the HPC machines, HPC environment, and HPC application

Problems identified Lack of the long-term national program for high performance computing Weak in kernel HPC technologies processor/accelerator novel devices (new memory, storage, and network) large scale parallel algorithms and programs implementation Application software is the bottleneck applications rely on imported commercial software expensive small scale parallelism restricted by export regulation Shortage in cross-disciplinary talents No enough talents with both domain and IT knowledge Lack of multi-disciplinary collaboration

Issues in exa-scale system development

Major Challenges to exa-scale systems Power consumption Performance obtained by applications Programmability Resilience How to make tradeoffs between performance, power consumption, and programmability? How to achieve continuous no-stop operation? How to adapt to a wide range of applications with reasonable efficiency?

Architecture Novel architectures beyond the current heterogeneous accelerated/manycore-based expected Co-processor or partitioned heterogeneous architecture? Low utilization of the co-processor in some applications, using CPU only Bottleneck in moving data between CPU and co-processor Application-aware architecture on-chip integration of special purpose units (idea from Prof. Andrew Chien) using the right tool to do right things dynamic reconfigurable, how to program?

Memory system Pursuing large capacity, low latency, high bandwidth Increase capacity and lower power consumption by using DRAM/NVM together Data placement issue Handle the high write cost and limited lifetime of NVM due to write Improving bandwidth and latency by using the 3D stack technology Reduce the data move by placing the data closer to processing HBM/HMC near processor On-chip DRAM Simple functions in memory Reduce data copy cost by using unified memory space in heterogeneous architecture

Pursuing low latency, high bandwidth and low energy consumption Adopt new technologies Silicon photonics communication between components Optical interconnect / communication Miniature optical devices High scalability adapting to exascale system interconnect requirement Connecting 10,000+ nodes Low-hop, low-latency topology Reliable and intelligent routing Interconnect

Programming the heterogeneous systems Addressing the issues in programming the heterogeneous parallel systems efficient expression of the parallelism, dependence, data sharing, execution semantics problem decomposition appropriate for heterogeneous systems Improving programming by means of a holistic approach New programming models Programming languages extension and compiler Parallel debugging Runtime support and optimization Architectural support

Full-chain innovation mathematical methods computer algorithms algorithm implementation and optimization A good mathematical method is often more effective than hardware improvement and algorithm optimization to the performance Architecture-aware algorithm implementation and optimization is necessary for heterogeneous systems Domain-specific libraries for improving software productivity and performance Computational models and algorithms

Resilience Resilience is one of the key issues of the exa-scale system Large scale of the system 50K to 100K nodes Huge amount of components Very short MTBF Long time non-stop operation required for solving large scale problems Reliability measures at different levels, including device, node, and system levels Software / hardware coordination fast context saving and recovery for checkpointing in case of short MTBF fault-tolerance at the algorithm and application software level

Importance of the tools Development and optimization of large scale parallel software require scalable and efficient tools Particularly important for systems implemented with home-grown processors current commercial and research tools do not support Three kinds of default tools required Parallel debugger for correctness Performance tuner for performance Energy optimizer for energy efficiency

Urgent need for an eco-system The eco-system for exa-scale system based on home-grown processors is in a urgent need languages, compilers, OS, runtime tools application development support application software Need to attract the hardware manufacturers and the third party software developers product family instead of a single machine Collaboration between industry, academia and end-users required

Prospect of HPC in China s 13th 5- year Plan

Reform of research system in China The national research and development system is being reformed Merge 100+ different national R&D programs/initiatives into 5 tracks of national programs Basic research program (NSFC) Mega-science and technology programs Key R&D program (former 863, 973, enabling programs) Enterprise innovation program Facility/talent program

A New key project on HPC High performance computing has been identified as a priority subject under the key R&D program Strategic studies and planning have been conducted since 2013 A proposal on HPC in the 13 th five-year plan was submitted in early 2015 A key R&D project approved in Oct. 2015 by a multi-government agency committee led by the MOST

Motivations for the new key project The key value of exa-scale computers identified Addressing the grand challenge problems Energy shortage, pollution, climate change Enabling industry transformation supporting development of important products high speed train, commercial aircraft, automobile promoting economy transformation For social development and people s benefit new drug discovery, precision medicine, digital media Enabling scientific discovery high energy physics, computational chemistry, new material, astrophysics Promote computer industry by technology transfer Developing HPC systems by self-controllable technologies a lesson learnt from the recent embargo regulation

Goals Strengthening R&D on kernel technologies and pursuing the leading position in high performance computer development Promoting HPC applications and establishing the application eco-system Building up an HPC infrastructure with service features and exploring the path to the HPC service industry

Major tasks Exa-scale computer development R&D on novel architectures and key technologies of the exa-scale computer Developing the exa-scale computer based on home-grown processors Technology transfer to promote development of high-end servers HPC applications development Basic research on exa-scale modeling methods and parallel algorithms Developing high performance application software Establishing the HPC application eco-system HPC environment development Developing software and platform for national HPC environment Upgrading of the national HPC environment CNGrid Developing service systems on the national HPC environment Each task will cover basic research, key technology development, and application demonstration

Basic research Task 1: Exa-scale computer development Novel high performance interconnect Theoretical work on the novel interconnect based on the enabling technologies of 3D chips, silicon photonics and on-chip networks Programming & execution models for exa-scale systems new programming models for heterogeneous systems Improving programming efficiency

Task 1: Exa-scale computer development Key technology Prototype systems for verifying the exa-scale system technologies possible architectures for exa-scale computers implementation strategies technologies for energy efficiency prototype system 512 nodes 5-10TFlops/node 10-20Gflops/W point to point bandwidth>200gbps MPI latency<1.5us Emphasis on self-controllable technologies system software for prototypes 3 typical applications to verify the design

Task 1: Exa-scale computer development Key technology exa-scale system technologies architecture optimized for multi-objectives high efficient computing node high performance processor/accelerator design exa-scale system software scalable interconnect parallel I/O exa-scale infrastructure energy efficiency exa-scale system reliability

Task 1: Exa-scale computer development Exa-scale computer system development exaflops in peak Linpack efficiency >60% 10PB memory EB storage 30GF/w energy efficiency interconnect >500Gbps large scale system management and resource scheduling easy-to-use parallel programming environment system monitoring and fault tolerance support large scale applications

Task 2: HPC application development Basic research computable modeling and computational methods for exa-scale systems scalable highly efficient parallel algorithms and parallel libraries for exa-scale systems

Task 2: HPC application development Key technology programming framework for exa-scale software development, including framework for structured mesh unstructured mesh mesh-free combinatory geometry finite element graph computing supporting development of at least 40 software with million-core parallelism

Task 2: HPC application development Key technology and demo applications Numerical devices and their applications numerical nuclear reactor four components: Including reactor core particle transport, thermal hydraulics, structural mechanics and material optimization, non-linear coupling of multi-physics processes numerical aircraft multi-disciplinary optimization covering aerodynamics, structural strength and fluid solid interaction numerical earth system earth system modeling for studying climate change non-linear coupling of multi-physical and chemical processes covering atmosphere, ocean, land, and sea ice numerical engine high fidelity simulation system for numerical prototyping of commercial aircraft engine enabling fast and accurate virtual airworthiness experiments

Task 2: HPC application development Key technology and demo applications high performance application software for domain applications complex engineering project and critical equipment numerical simulation of ocean design of energy-efficient large fluid machineries drug discovery electromagnetic environment simulation ship design oil exploration digital media rendering high performance application software for scientific research material science high energy physics astrophysics life science

Task 2: HPC application development Eco-system for HPC application software development establishing a national-level R&D center for HPC application software build up of a platform for HPC software development and optimization tools for performance/energy efficiency and pre- /post-processing build up software resource repository developing typical domain application software a joint effort involving national supercomputing centers, universities, and institutes

Task 3: HPC environment development Basic research models and architecture of computational services service representation of distributed computing and storage resources architecture for computational services resource discovery and access unified management trading and accounting virtual data space architecture for cross-domain virtual data space integration of distributed storage unified access and management of virtual data space domain partition, separation and security

Task 3: HPC environment development Key technology mechanism and platform for the national HPC environment technical support for service mode operation upgrading the national HPC environment (CNGrid) >500PF computing resources >500PB storage >500 application software and tools >5000 users (team users)

Demo applications Task 3: HPC environment development service systems based on the national HPC environment integrated business platform, e.g. complex product design HPC-enabled EDA platform application villages innovation and optimization of industrial products drug discovery SME computing and simulation platform platform for HPC education provide computing resources and services to undergraduate and graduate students

Calls for proposals The first call for proposal was issued in Feb., 2016. 19 projects passed the evaluation and were launched in July, 2016 The second call (for 2017) was issued in Oct., 2016, the proposal evaluation has ended and the final results are expected to be announced soon These two rounds of call cover most of the subjects of the key project except the exa-scale system development. The exa-scale system development will be started after completion of the three prototypes

Thank you!