IBM (Test Sponsor: Oak Ridge National Laboratory) Summit: IBM Power System AC922 (IBM Power9, Tesla V100-SXM2-16GB) |
SPEChpc 2021_lrg_base = 41.0 |
SPEChpc 2021_lrg_peak = Not Run |
hpc2021 License: | 056A | Test Date: | Sep-2021 |
---|---|---|---|
Test Sponsor: | Oak Ridge National Laboratory | Hardware Availability: | Nov-2018 |
Tested by: | Oak Ridge National Laboratory | Software Availability: | Jul-2021 |
Benchmark result graphs are available in the PDF report.
Benchmark | Base | Peak | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Model | Ranks | Thrds/Rnk | Seconds | Ratio | Seconds | Ratio | Seconds | Ratio | Model | Ranks | Thrds/Rnk | Seconds | Ratio | Seconds | Ratio | Seconds | Ratio | |
SPEChpc 2021_lrg_base | 41.0 | |||||||||||||||||
SPEChpc 2021_lrg_peak | Not Run | |||||||||||||||||
Results appear in the order in which they were run. Bold underlined text indicates a median measurement. | ||||||||||||||||||
805.lbm_l | ACC | 8400 | 1 | 38.6 | 70.5 | 27.0 | 101 | |||||||||||
818.tealeaf_l | ACC | 8400 | 1 | 68.3 | 21.2 | 68.3 | 21.2 | |||||||||||
819.clvleaf_l | ACC | 8400 | 1 | 37.4 | 56.2 | 35.3 | 59.5 | |||||||||||
828.pot3d_l | ACC | 8400 | 1 | 156 | 29.1 | 141 | 32.3 | |||||||||||
834.hpgmgfv_l | ACC | 8400 | 1 | 151 | 22.2 | 140 | 23.9 | |||||||||||
835.weather_l | ACC | 8400 | 1 | 39.3 | 87.2 | 37.6 | 91.0 |
Hardware Summary | |
---|---|
Type of System: | Homogenous Cluster |
Compute Node: | IBM Power System AC922 |
Interconnect: | Mellanox InfiniBand |
Compute Nodes Used: | 1400 |
Total Chips: | 2800 |
Total Cores: | 30800 |
Total Threads: | 123200 |
Total Memory: | 700 TB |
Software Summary | |
---|---|
Compiler: | C/C++/Fortran: Version 21.7 of NVHPC Toolkit |
MPI Library: | Spectrum MPI Version 10.4.0.3 |
Other MPI Info: | None |
Other Software: | None |
Base Parallel Model: | ACC |
Base Ranks Run: | 8400 |
Base Threads Run: | 1 |
Peak Parallel Models: | Not Run |
Hardware | |
---|---|
Number of nodes: | 1400 |
Uses of the node: | compute |
Vendor: | IBM |
Model: | IBM Power System AC922 |
CPU Name: | IBM POWER9 2.1 (pvr 004e 1201) |
CPU(s) orderable: | 2 chips |
Chips enabled: | 2 |
Cores enabled: | 22 |
Cores per chip: | 44 |
Threads per core: | 4 |
CPU Characteristics: | Up to 3.8 GHz |
CPU MHz: | 2300 |
Primary Cache: | 32 KB I + 32 KB D on chip per core |
Secondary Cache: | 512 KB I+D on chip per core |
L3 Cache: | 110 MB I+D on chip per chip |
Other Cache: | None |
Memory: | 512 GB (16 x 32 GB RDIMM-DDR4-2666) |
Disk Subsystem: | 2 x 800 GB (Samsung Electronics Co Ltd NVMe SSD Controller 172Xa/172Xb) |
Other Hardware: | None |
Accel Count: | 4 |
Accel Model: | Tesla V100-SXM2-16GB |
Accel Vendor: | NVIDIA Corporation |
Accel Type: | GPU |
Accel Connection: | NVLink 2.0 |
Accel ECC enabled: | Yes |
Accel Description: | See Notes |
Adapter: | Mellanox ConnectX-5 |
Number of Adapters: | 2 |
Slot Type: | None |
Data Rate: | 100 Gb/s (4X EDR) |
Ports Used: | 2 |
Interconnect Type: | EDR InfiniBand |
Software | |
---|---|
Accelerator Driver: | NVIDIA CUDA 450.80.02 |
Adapter: | Mellanox ConnectX-5 |
Adapter Driver: | 4.9-2.2.4.1 |
Adapter Firmware: | 16.29.1016 |
Operating System: | Red Hat Enterprise Linux 8.2 |
Local File System: | xfs |
Shared File System: | 250 PB IBM Spectrum Scale parallel filesystem over 4X EDR InfiniBand |
System State: | Multi-user, run level 3 |
Other Software: | None |
Hardware | |
---|---|
Vendor: | Mellanox |
Model: | Mellanox Switch IB-2 |
Switch Model: | Mellanox IB EDR Switch IB-2 |
Number of Switches: | 1 |
Number of Ports: | 36 |
Data Rate: | 100 Gb/s |
Topology: | Non-blocking Fat-tree |
Primary Use: | MPI Traffic and GPFS access |
Software |
---|
The config file option 'submit' was used.
MPI startup command: jsrun command was used to launch job using 1 GPU/rank. Detailed information from nvaccelinfo CUDA Driver Version: 11000 NVRM version: NVIDIA UNIX ppc64le Kernel Module 450.80.02 Wed Sep 23 00:55:04 UTC 2020 Device Number: 0 Device Name: Tesla V100-SXM2-16GB Device Revision Number: 7.0 Global Memory Size: 16911433728 Number of Multiprocessors: 80 Concurrent Copy and Execution: Yes Total Constant Memory: 65536 Total Shared Memory per Block: 49152 Registers per Block: 65536 Warp Size: 32 Maximum Threads per Block: 1024 Maximum Block Dimensions: 1024, 1024, 64 Maximum Grid Dimensions: 2147483647 x 65535 x 65535 Maximum Memory Pitch: 2147483647B Texture Alignment: 512B Clock Rate: 1530 MHz Execution Timeout: No Integrated Device: No Can Map Host Memory: Yes Compute Mode: exclusive-process Concurrent Kernels: Yes ECC Enabled: Yes Memory Clock Rate: 877 MHz Memory Bus Width: 4096 bits L2 Cache Size: 6291456 bytes Max Threads Per SMP: 2048 Async Engines: 4 Unified Addressing: Yes Managed Memory: Yes Concurrent Managed Memory: Yes Preemption Supported: Yes Cooperative Launch: Yes Multi-Device: Yes Default Target: cc70
============================================================================== CC 805.lbm_l(base) 818.tealeaf_l(base) 834.hpgmgfv_l(base) ------------------------------------------------------------------------------ /usr/lib64/crt1.o:(.rodata+0x8): undefined reference to `main' /usr/bin/ld: link errors found, deleting executable `a.out' pgacclnk: child process exit status 1: /sw/summit/xalt/1.2.1/bin/ld nvc 21.7-0 linuxpower target on Linuxpower NVIDIA Compilers and Tools Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. ------------------------------------------------------------------------------ ============================================================================== FC 819.clvleaf_l(base) 828.pot3d_l(base) 835.weather_l(base) ------------------------------------------------------------------------------ nvfortran 21.7-0 linuxpower target on Linuxpower NVIDIA Compilers and Tools Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. ------------------------------------------------------------------------------