Cray and Allinea. Maximizing Developer Productivity and HPC Resource Utilization. SHPCP HPC Theater SEG 2016 C O M P U T E S T O R E A N A L Y Z E 1

Size: px
Start display at page:

Download "Cray and Allinea. Maximizing Developer Productivity and HPC Resource Utilization. SHPCP HPC Theater SEG 2016 C O M P U T E S T O R E A N A L Y Z E 1"

Transcription

1 Cray and Allinea Maximizing Developer Productivity and HPC Resource Utilization SHPCP HPC Theater SEG /27/2016 C O M P U T E S T O R E A N A L Y Z E 1

2 History Cluster Computing to Super Computing PGS has a rich history in both traditional supercomputer and cluster technologies. Intel MPP systems SGI Shared memory systems (Power Challenge, Origin ) IBM RS 6000 SP Clusters Ethernet x86 clusters InfiniBand x86 clusters Opportunities to expand workloads faster than traditional cluster replacement models allowed. Advanced acquisition technologies (GeoStreamer, multi-vessel) Higher frequency targets More complex algorithms (FWI) 10/27/2016 C O M P U T E S T O R E A N A L Y Z E 2

3 Challenges: Algorithm optimization required Started with a modest development system, but large enough to run real data cases Joint effort over nearly a year to prove commercial viability Cray Performance team assisted with ideas and libraries Modest initial targets of parity with existing clusters Intense focus on the most heavily used algorithms Complexity Increased levels of parallelism present challenges for developers. Traditional debugging tools become unwieldy at large scales. 10/27/2016 C O M P U T E S T O R E A N A L Y Z E 3

4 Conclusions: Effective development tools & processes are essential to enabling the potential of HPC systems. Current well engineered supercomputers, based on commodity processors, can be an effective alternative to pure commodity clusters. Cray and Allinea provide a number of development and operational tools providing customers with best of breed solutions. Workload managers Software management tools Debuggers and performance tools Libraries 10/27/2016 C O M P U T E S T O R E A N A L Y Z E 4

5 Cray and Allinea Many Cray customers have chosen the Allinea tools to complement their environment. Cray offers these components in an integrated system configuration, and has significant experience with these tools in doing optimization and benchmarking work. The Allinea approach to disciplined application analysis is in line with Cray s approach to system and software design and test. Cray and Allinea work together to ensure compatibility of the Allinea tools with the Cray Linux Environment and Cray Development Environment. 10/27/2016 C O M P U T E S T O R E A N A L Y Z E 5

6 Maximizing Developer Productivity and HPC Resource Utilization Beau Paisley

7 About Allinea Over 15 years of business focused on parallel programming development tools Strong R&D investment to drive innovation in changing landscape Committed to giving great support to the HPC community

8 Where to find Allinea s tools Over 85% of Top 100 HPC systems From small to very large tools provision 9 of the Top 10 HPC systems Up to 700,000 core tools usage Future leadership systems Millions of cores usage

9 Allinea: Industry Standard Tools for HPC (and hundreds more)

10 Reducing cost and time of seismic processing Allinea s tools enable developers to understand and improve software performance Our per-shot imaging was slashed from days to hours we enabled at least a 10x speed up for our RTM

11 Increasing reservoir modelling throughput Allinea s tools help teams develop faster, more capable and more scalable software faster! The model is now running 30% faster than it was before accelerating the throughput of reservoir engineering modelling in this industry is critical

12 Applications Drive HPC Analyze and tune with Allinea Performance Reports Develop, profile and debug applications with Allinea Forge With professional support when you need it most

13

14 The Uncomfortable Truth about Applications caption

15 Analyze and tune application performance A single-page report on application performance for users and administrators Identify configuration problems and resource bottlenecks immediately Track mission-critical performance over time and after system upgrades Ensure key applications run at full speed on a new cluster or architecture

16 Allinea MAP The Profiler Small data files <5% slowdown No instrumentation No recompilation

17 How Allinea MAP is different Adaptive sampling Sample frequency decreases over time Data never grows too much Run for as long as you want Scalable Same scalable infrastructure as Allinea DDT Merges sample data at end of job Handles very high core counts, fast Instruction analysis Categorizes instructions sampled Knows where processor spends time Shows vectorization and memory bandwidth Thread profiling Core-time not thread-time profiling Identifies lost compute time Detects OpenMP issues Integrated Part of Forge tool suite Zoom and drill into profile Profiling within your code

18 Great Things to Try with Allinea MAP Find the peak memory use Remove I/O bottleneck Make sure threads are well utilized Improve memory access Restructure for vectorization Add your own metrics to the MAP time based sampler

19 Print statement debugging The original debugger ALLOWS INSPECTION OF PROGRAM STATE DIAGNOSE THE PROBLEM FROM EVIDENCE AND INTUITION Can be a long slow process PARTICULARLY IF TRYING TO FIND CODE LOCATIONS EDIT/COMPILE/RUN CYCLE Fails at modest scale TOO MUCH OUTPUT MATCHING OUTPUT BETWEEN PROCESSES Compile Insert print statements Run Identify area of interest View output

20 Allinea DDT The Debugger Who had a rogue behavior? Merges stacks from processes and threads Where did it happen? leaps to source How did it happen? Diagnostic messages Some faults evident instantly from source Run with Allinea tools Identify a problem Gather info Who, Where, How, Why Fix Why did it happen? Unique Smart Highlighting Sparklines comparing data across processes

21 Great things to try with Allinea DDT The scalable print alternative Stop on variable change Static analysis warnings on code errors Detect read/write beyond array bounds Detect stale memory allocations

22 Top Tips for HPC Resource Maximization Proper tools amplify developer productivity Proper tools amplify cluster utilization Software needs performance attention Regular profiling pays rewards Test correctness and validate performance on real workloads Integrate your debugger with program correctness regression testing Constant diligence pays off

23 Beau Paisley