ANSYS FLUENT Performance Benchmark and Profiling. October 2009

Similar documents
NWChem Performance Benchmark and Profiling. October 2010

ECLIPSE 2012 Performance Benchmark and Profiling. August 2012

STAR-CCM+ Performance Benchmark. August 2010

OpenMX Performance Benchmark and Profiling. May 2011

AMR (Adaptive Mesh Refinement) Performance Benchmark and Profiling

LS-DYNA Performance With MPI Collectives Acceleration. April 2011

Performance and Scalability of FLOW-3D/MP 5.0 in HPC Cluster Environment. Pak Lui, Application Performance Manager HPC Advisory Council

Virtualizing Enterprise SAP Software Deployments

Dell EMC Ready Solutions for HPC Lustre Storage. Forrest Ling HPC Enterprise Technolgist at Dell EMC Greater China

July, 10 th From exotics to vanillas with GPU Murex 2014

Improve Your Productivity with Simplified Cluster Management. Copyright 2010 Platform Computing Corporation. All Rights Reserved.

IBM xseries 430. Versatile, scalable workload management. Provides unmatched flexibility with an Intel architecture and open systems foundation

Hierarchical Merge for Scalable MapReduce

An Oracle White Paper September, Oracle Exalogic Elastic Cloud: A Brief Introduction

Ed Turkel HPC Strategist

ANSYS HPC High Performance Computing Leadership. Barbara Hutchings

IBM Power Systems. Bringing Choice and Differentiation to Linux Infrastructure

Delivering High Performance for Financial Models and Risk Analytics

Exalogic Elastic Cloud

HPC in the Cloud Built for Atmospheric Modeling. Kevin Van Workum, PhD Sabalcore Computing Inc.

PSM Tag Matching API. Author: Todd Rimmer Date: April 2011

REDUCE RISK INCREASE SPEED. Is your technology up to the challenge? Optimize your IT infrastructure with HPC SOLUTIONS FROM CDW FINANCIAL SERVICES.

STAND: New Tool for Performance Estimation of the Block Data Processing Algorithms in High-load Systems

Addressing the I/O bottleneck of HPC workloads. Professor Mark Parsons NEXTGenIO Project Chairman Director, EPCC

Ask the right question, regardless of scale

Virtualized SAP Solution using Cisco Unified Computing System and EMC CLARiiON Storage

REQUEST FOR PROPOSAL FOR

ORACLE BIG DATA APPLIANCE

Technical Review Accelerating the Artificial Intelligence Journey with Dell EMC Ready Solutions for AI

Oracle Communications Billing and Revenue Management Elastic Charging Engine Performance. Oracle VM Server for SPARC

High-Performance Computing (HPC) Up-close

Accelerating Computing - Enhance Big Data & in Memory Analytics. Michael Viray Product Manager, Power Systems

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Blade Servers for Small Enterprises

DELL EMC POWEREDGE 14G SERVER PORTFOLIO

Increase the Processing Power Behind Your Mission-Critical Applications with Intel Xeon Processors. ServerWatchTM Executive Brief

Realize Your Product Promise. High-Performance Computing

IBM HPC DIRECTIONS. Dr Don Grice. ECMWF Workshop October 31, IBM Corporation

HPC Solutions. Marc Mendez-Bermond

Simplifying Hadoop. Sponsored by. July >> Computing View Point

QuickSpecs. What's New Altair Engineering PBS Professional now offered on a per socket basis

Headline in Arial Bold 30pt Taking the New Road

White paper A Reference Model for High Performance Data Analytics(HPDA) using an HPC infrastructure

Performance Study: STAR-CD v4 on PanFS

An Oracle White Paper November SPARC SuperCluster: The High Performance Engineered System for Data Center Transformation

<Insert Picture Here> Oracle Exalogic Elastic Cloud: Revolutionizing the Datacenter

locuz.com Professional Services Locuz HPC Services

DataStager: Scalable Data Staging Services for Petascale Applications

EVERY NANOSECOND COUNTS

FlashStack For Oracle RAC

Computer Servers Draft 2 Comment Response Summary

Bringing AI Into Your Existing HPC Environment,

MANAGEMENT DISCUSSION SECTION

RODOD Performance Test on Exalogic and Exadata Engineered Systems

Sizing SAP Central Process Scheduling 8.0 by Redwood

Inviting Quotations. Tender date: Quotation Due date:

Inviting Quotations. Tender date: Quotation Due date:

RAPIDS GPU POWERED MACHINE LEARNING

2009/2010 Challenges Application drives HPC Business Ready.., future for performance computing Business Ready - Radioss

Hadoop Solutions. Increase insights and agility with an Intel -based Dell big data Hadoop solution

Industrial Value of HPC for Engineering Simulation

NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION APPLICATION SIZING GUIDE FOR SIEMENS NX APPLICATION GUIDE. Ver 1.0

100 Million Subscriber Performance Test Whitepaper:

DELL EMC XTREMIO X2: NEXT-GENERATION ALL-FLASH ARRAY

High Performance Computing(HPC) & Software Stack

WHEN PERFORMANCE MATTERS: EXTENDING THE COMFORT ZONE: DAVIDE. Fabrizio Magugliani Siena, May 16 th, 2017

Cray and Allinea. Maximizing Developer Productivity and HPC Resource Utilization. SHPCP HPC Theater SEG 2016 C O M P U T E S T O R E A N A L Y Z E 1

HPC Market Update: Second Quarter. Copyright 2011 IDC. Reproduction is forbidden unless authorized. All rights reserved.

White Paper. IBM POWER8: Performance and Cost Advantages in Business Intelligence Systems. 89 Fifth Avenue, 7th Floor. New York, NY 10003

Microsoft Azure Açık Kaynak Teknoloji Çözümleri

Approaches to Reduce Your Application and OLTP Costs

Avoid Paying The Virtualization Tax: Deploying Virtualized BI 4.0 The Right Way

with Dell EMC s On-Premises Solutions

An Oracle White Paper January Upgrade to Oracle Netra T4 Systems to Improve Service Delivery and Reduce Costs

Increased Informix Awareness Discover Informix microsite launched

StackIQ Enterprise Data Reference Architecture

Advancing the adoption and implementation of HPC solutions. David Detweiler

IBM zenterprise System

Oracle In Memory Policy Analytics on Oracle Engineered Systems

ANSYS HPC ANSYS, Inc. November 25, 2014

Pre-announcement of Upcoming Procurement, AC2018, at National Supercomputing Centre at Linköping University

NetApp E-Series Performance Benchmarks Using Landmark SeisSpace/ProMAX

Porting and scaling engineering applications in the cloud. Wolfgang Gentzsch, UberCloud

Intel s Machine Learning Strategy. Gary Paek, HPC Marketing Manager, Intel Americas HPC User Forum, Tucson, AZ April 12, 2016

The IBM and Oracle alliance. Power architecture

HETEROGENEOUS SYSTEM ARCHITECTURE: FROM THE HPC USAGE PERSPECTIVE

White Paper. IBM System x. Superiority of the IBM ex5 Systems for Virtualized SAP Environments

ORACLE EXALYTICS IN-MEMORY MACHINE X3-4

No part of this document may be copied or reproduced in any form or by any means without the prior written consent of Curtiss-Wright Inc.

RE: Purchase of high performance computing cluster

The contribution of High Performance Computing and Modelling for Industrial Development. Dr Happy Sithole and Dr Onno Ubbink

Data Science is a Team Sport and an Iterative Process

HPC USAGE ANALYTICS. Supercomputer Education & Research Centre Akhila Prabhakaran

REQUEST FOR PROPOSALS RFP

Oracle Utilities Mobile Workforce Management Benchmark

Jack Weast. Principal Engineer, Chief Systems Engineer. Automated Driving Group, Intel

An Oracle White Paper June Running Oracle E-Business Suite on Oracle SuperCluster T5-8

Cori: A Cray XC Pre-Exascale System for NERSC

Introduction to IBM Technical Computing Clouds IBM Redbooks Solution Guide

Programming Models for Heterogeneous Computing Systems

Transcription:

ANSYS FLUENT Performance Benchmark and Profiling October 2009

Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, ANSYS, Dell, Mellanox Compute resource - HPC Advisory Council Cluster Center The participating members would like to thank ANSYS for their support and guidelines For more info please refer to www.mellanox.com, www.dell.com/hpc, www.intel.com, www.ansys.com 2

ANSYS FLUENT Computational Fluid Dynamics (CFD) is a computational technology Enables the study of the dynamics of things that flow By generating numerical solutions to a system of partial differential equations which describe fluid flow Enables better understanding of qualitative and quantitative physical flow phenomena which is used to improve engineering design CFD brings together a number of different disciplines Fluid dynamics, mathematical theory of partial differential systems, computational geometry, numerical analysis, computer science ANSYS FLUENT is a leading CFD application from ANSYS Widely used in almost every industry sector and manufactured product 3

Objectives The presented research was done to provide best practices ANSYS FLUENT performance benchmarking Interconnect performance comparisons Power-efficient simulations Understanding FLUENT communication patterns 4

Test Cluster Configuration Dell PowerEdge M610 24-node cluster Quad-Core Intel X5570 @ 2.93 GHz CPUs Intel Cluster Ready certified cluster Mellanox ConnectX MCQH29-XCC 4X QDR InfiniBand mezzanine card Mellanox M3601Q 32-Port Quad Data Rate (QDR-40Gb) InfiniBand Switch Memory: 24GB memory per node OS: RHEL5U3, OFED 1.4 InfiniBand SW stack MPI: HP-MPI 2.3 Application: FLUENT 12.0 Benchmark Workload New FLUENT Benchmark Suite 5

Mellanox InfiniBand Solutions Industry Standard Hardware, software, cabling, management Design for clustering and storage interconnect Performance 40Gb/s node-to-node 120Gb/s switch-to-switch 1us application latency Most aggressive roadmap in the industry Reliable with congestion management Efficient RDMA and Transport Offload Kernel bypass CPU focuses on application processing Scalable for Petascale computing & beyond End-to-end quality of service Virtualization acceleration I/O consolidation Including storage The InfiniBand Performance Gap is Increasing 60Gb/s 20Gb/s 120Gb/s 40Gb/s 240Gb/s (12X) 80Gb/s (4X) Ethernet Fibre Channel InfiniBand Delivers the Lowest Latency 6

Delivering Intelligent Performance Next Generation Intel Microarchitecture Bandwidth Intensive Intel QuickPath Technology Integrated Memory Controller Intel 5520 Chipset Intel Node Manager Technology PCI Express* 2.0 Intel Data Center Manager Threaded Applications 45nm quad-core Intel Xeon Processors Intel Hyper-threading Technology Intel X25-E SSDs ICH 9/10 Performance on Demand Intel Turbo Boost Technology Intel Intelligent Power Technology Performance That Adapts to The Software Environment 7

Dell PowerEdge Servers helping Simplify IT System Structure and Sizing Guidelines 24-node cluster build with Dell PowerEdge M610 blades server Servers optimized for High Performance Computing environments Building Block Foundations for best price/performance and performance/watt Dell HPC Solutions Scalable Architectures for High Performance and Productivity Dell's comprehensive HPC services help manage the lifecycle requirements. Integrated, Tested and Validated Architectures Workload Modeling Optimized System Size, Configuration and Workloads Test-bed Benchmarks ISV Applications Characterization Best Practices & Usage Analysis 8

FLUENT 12 Benchmark Results - Interconnect Input Dataset Truck_14M (14 millions elements) External Flow Over a Truck Body Performance testing is reported as normalized scalability The performance multiple increase versus 2 servers as the base line For example, rating scalability of 4 means 4x performance versus the 2 servers case performance InfiniBand QDR delivers above linear scalability At 24 nodes, GigE wastes 50% of the cluster resources Higher is better 9

FLUENT 12 Benchmark Results - Interconnect Input Dataset Truck_Poly_14M (14 millions elements) Performance testing is reported as normalized scalability The performance multiple increase versus 2 servers as the base line For example, rating scalability of 4 means 4x performance versus the 2 servers case performance InfiniBand QDR delivers above linear scalability At 24 nodes, GigE wastes more than 50% of the cluster resources Higher is better 10

FLUENT Performance Comparison Performance comparison between the tested results and best know results (Intel white box) Comparison demonstrated the capability of balanced system to provide better performance and higher scalability 11

FLUENT 12 Profiling MPI Functions Mostly used MPI functions MPI_Iprobe, MPI_Allreduce, MPI_Isend, and MPI_Irecv Number of Calls (Thousands) 20000 16000 12000 8000 4000 0 MPI_Waitall MPI_Send FLUENT 12 Benchmark Profiling Result (Truck_poly_14M) MPI_Reduce MPI_Recv MPI_Bcast MPI_Barrier MPI_Iprobe MPI Function MPI_Allreduce 4 Nodes 8 Nodes 16 Nodes MPI_Isend MPI_Irecv 12

FLUENT 12 Profiling Timing MPI_Recv and MPI_Allreduce show highest communication overhead Total Overhead (s) 20000 16000 12000 8000 4000 0 FLUENT 12 Benchmark Profiling Result (Truck_poly_14M) MPI_Waitall MPI_Send MPI_Reduce MPI_Recv MPI_Bcast MPI_Barrier MPI_Iprobe MPI Function MPI_Allreduce MPI_Isend MPI_Irecv 4 Nodes 8 Nodes 16 Nodes 13

FLUENT 12 Profiling Message Transferred Most data related MPI messages are within 256B-1KB in size Typical MPI synchronization messages are lower than 64B in size Number of messages increases with cluster size Number of Messages (Thousands) 28000 24000 20000 16000 12000 8000 4000 0 FLUENT 12 Benchmark Profiling Result (Truck_poly_14M) [0..64B] [64..256B] [256B..1KB] [1..4KB] [4..16KB] [16..64KB] [64..256KB] [256KB..1M] [1..4M] [4M..2GB] Message Size 4 Nodes 8 Nodes 16 Nodes 14

FLUENT Profiling Summary FLUENT 12 was profiled to identify its communication patterns Frequently used message sizes 256-1KB messages for data related communications <64B for synchronizations Number of messages increases with cluster size MPI Functions FLUENT 12 introduced MPI collective functions MPI_Allreduce help improves the communication efficiency Interconnects effect to FLUENT performance Both interconnect latency (MPI_Allreduce) and throughput (MPI_Recv) highly influence FLUENT performance Further optimization can be made to take bigger advantage of high-speed networks 15

Productive System = Balanced System Balanced system enables highest productivity Interconnect performance to match CPU capabilities CPU capabilities to drive the interconnect capability Memory bandwidth to match CPU performance Applications scalability relies on balanced configuration Bottleneck free Each system component can reach its highest capability Dell M610 system integrates balanced components Intel Nehalem CPUs and Mellanox InfiniBand QDR Latency to memory and interconnect latency at the same order of magnitude Provide the leading productivity and power/performance system for FLUENTsimulations 16

Thank You HPC Advisory Council All trademarks are property of their respective owners. All information is provided As-Is without any kind of warranty. The HPC Advisory Council makes no representation to the accuracy and completeness of the information contained herein. HPC Advisory Council Mellanox undertakes no duty and assumes no obligation to update or correct any information presented herein 17 17