Lenovo Global Technology ThinkSystem SD650-N V2 (Intel Xeon Platinum 8368Q, Tesla A100-SXM-40GB) |
SPEChpc 2021_tny_base = 48.5 |
SPEChpc 2021_tny_peak = Not Run |
hpc2021 License: | 28 | Test Date: | Aug-2021 |
---|---|---|---|
Test Sponsor: | Lenovo Global Technology | Hardware Availability: | Aug-2021 |
Tested by: | Lenovo Global Technology | Software Availability: | Aug-2021 |
Benchmark result graphs are available in the PDF report.
Benchmark | Base | Peak | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Model | Ranks | Thrds/Rnk | Seconds | Ratio | Seconds | Ratio | Seconds | Ratio | Model | Ranks | Thrds/Rnk | Seconds | Ratio | Seconds | Ratio | Seconds | Ratio | |
SPEChpc 2021_tny_base | 48.5 | |||||||||||||||||
SPEChpc 2021_tny_peak | Not Run | |||||||||||||||||
Results appear in the order in which they were run. Bold underlined text indicates a median measurement. | ||||||||||||||||||
505.lbm_t | ACC | 8 | 1 | 13.7 | 164 | 13.7 | 165 | 13.6 | 165 | |||||||||
513.soma_t | ACC | 8 | 1 | 36.1 | 103 | 36.1 | 102 | 36.2 | 102 | |||||||||
518.tealeaf_t | ACC | 8 | 1 | 99.2 | 16.6 | 99.1 | 16.6 | 99.2 | 16.6 | |||||||||
519.clvleaf_t | ACC | 8 | 1 | 20.2 | 81.9 | 20.0 | 82.5 | 20.2 | 81.9 | |||||||||
521.miniswp_t | ACC | 8 | 1 | 81.4 | 19.7 | 82.7 | 19.4 | 82.1 | 19.5 | |||||||||
528.pot3d_t | ACC | 8 | 1 | 44.2 | 48.1 | 44.5 | 47.7 | 44.7 | 47.6 | |||||||||
532.sph_exa_t | ACC | 8 | 1 | 73.1 | 26.7 | 73.0 | 26.7 | 73.3 | 26.6 | |||||||||
534.hpgmgfv_t | ACC | 8 | 1 | 69.7 | 16.9 | 69.4 | 16.9 | 68.3 | 17.2 | |||||||||
535.weather_t | ACC | 8 | 1 | 20.9 | 155 | 21.0 | 153 | 21.0 | 153 |
Hardware Summary | |
---|---|
Type of System: | Homogenous Cluster |
Compute Node: | ThinkSystem SD650-N V2 |
Interconnect: | Nvidia Mellanox ConnectX-6 HDR |
File Server Node: | ThinkSystem SD650-N V2 |
Compute Nodes Used: | 2 |
Total Chips: | 4 |
Total Cores: | 152 |
Total Threads: | 152 |
Total Memory: | 1 TB |
Software Summary | |
---|---|
Compiler: | Nvidia HPC SDK 21.5 |
MPI Library: | Open MPI 4.0.5 |
Base Parallel Model: | ACC |
Base Ranks Run: | 8 |
Base Threads Run: | 1 |
Peak Parallel Models: | Not Run |
Hardware | |
---|---|
Number of nodes: | 2 |
Uses of the node: | compute |
Vendor: | Lenovo Global Technology |
Model: | ThinkSystem SD650-N V2 |
CPU Name: | Intel Xeon Platinum 8368Q |
CPU(s) orderable: | 2 chips |
Chips enabled: | 2 |
Cores enabled: | 76 |
Cores per chip: | 38 |
Threads per core: | 1 |
CPU Characteristics: | Turbo up to 3.7 GHz |
CPU MHz: | 2600 |
Primary Cache: | 32 KB I + 48 KB D on chip per core |
Secondary Cache: | 1280 KB I+D on chip per core |
L3 Cache: | 57 MB I+D on chip per chip |
Other Cache: | None |
Memory: | 512 GB (16 x 32 GB 2Rx8 PC4-3200A-R) |
Disk Subsystem: | 1 x 480 GB 2.5" SSD |
Other Hardware: | None |
Accel Count: | 4 |
Accel Model: | Tesla A100 SXM4 40GB |
Accel Vendor: | Nvidia Corporation |
Accel Type: | GPU |
Accel Connection: | NVLink |
Accel ECC enabled: | Yes |
Accel Description: | Nvidia Tesla A100 SXM4 40GB |
Adapter: | Mellanox ConnectX-6 HDR |
Number of Adapters: | 1 |
Slot Type: | PCI-Express 4.0 x16 |
Data Rate: | 200 Gb/s |
Ports Used: | 1 |
Interconnect Type: | Nvidia Mellanox ConnectX-6 HDR |
Software | |
---|---|
Accelerator Driver: | 460.32.03 |
Adapter: | Mellanox ConnectX-6 HDR |
Adapter Driver: | 5.1-2.3.7 |
Adapter Firmware: | 20.28.1002 |
Operating System: | Red Hat Enterprise Linux Server release 8.3, Kernel 4.18.0-193.el8.x86_64 |
Local File System: | xfs |
Shared File System: | NFS |
System State: | Multi-user, run level 3 |
Other Software: | None |
Hardware | |
---|---|
Number of nodes: | 1 |
Uses of the node: | Fileserver |
Vendor: | Lenovo Global Technology |
Model: | ThinkSystem SD650-N V2 |
CPU Name: | Intel Xeon Platinum 8368Q |
CPU(s) orderable: | 2 chips |
Chips enabled: | 2 |
Cores enabled: | 76 |
Cores per chip: | 38 |
Threads per core: | 1 |
CPU Characteristics: | Turbo up to 3.7 GHz |
CPU MHz: | 2600 |
Primary Cache: | 32 KB I + 48 KB D on chip per core |
Secondary Cache: | 1280 KB I+D on chip per core |
L3 Cache: | 57 MB I+D on chip per chip |
Other Cache: | None |
Memory: | 512 GB (16 x 32 GB 2Rx8 PC4-3200A-R) |
Disk Subsystem: | 1 x 960 GB NVME 2.5" SSD |
Other Hardware: | None |
Accel Count: | 4 |
Accel Model: | Tesla A100 SXM4 40GB |
Accel Vendor: | Nvidia |
Accel Type: | GPU |
Accel Connection: | Nvidia Tesla A100 SXM4 40GB |
Accel ECC enabled: | Yes |
Accel Description: | Nvidia Tesla A100 SXM4 40GB |
Adapter: | Mellanox ConnectX-6 HDR |
Number of Adapters: | 1 |
Slot Type: | PCI-Express 4.0 x16 |
Data Rate: | 200 Gb/s |
Ports Used: | 1 |
Interconnect Type: | Nvidia Mellanox ConnectX-6 HDR |
Software | |
---|---|
Accelerator Driver: | N/A |
Adapter: | Mellanox ConnectX-6 HDR |
Adapter Driver: | 5.1-2.3.7 |
Adapter Firmware: | 20.28.1002 |
Operating System: | Red Hat Enterprise Linux Server release 8.3 |
Local File System: | xfs |
Shared File System: | N/A |
System State: | Multi-User, run level 3 |
Other Software: | None |
Hardware | |
---|---|
Vendor: | Nvidia |
Model: | Nvidia Mellanox ConnectX-6 HDR |
Switch Model: | N/A |
Number of Switches: | 0 |
Number of Ports: | 0 |
Data Rate: | N/A |
Firmware: | N/A |
Topology: | Direct Connect |
Primary Use: | MPI Traffic, NFS Access |
Software |
---|
Indiviual Ranks were bound to the CPU cores on the same NUMA node as the GPU using 'numactl' within the following "bind.pl" perl script: ---- Start bind.pl ------ my %bind; $bind{0} = "1-3"; $bind{1} = "4-7"; $bind{2} = "8-10"; $bind{3} = "11-14"; $bind{4} = "41-43"; $bind{5} = "44-47"; $bind{6} = "61-63"; $bind{7} = "64-67"; my $rank = $ENV{OMPI_COMM_WORLD_LOCAL_RANK}; my $cmd = "taskset -c $bind{$rank} "; while (my $arg = shift) { $cmd .= "$arg "; } my $rc = system($cmd); exit($rc); ---- End bind.pl ------ The config file option 'submit' was used. submit = mpirun ${MPIRUN_OPTS} --allow-run-as-root --oversubscribe -host 192.168.99.171:4,192.168.99.172:4 -x UCX_MEMTYPE_CACHE=n -mca coll_hcoll_enable 1 -x HCOLL_MAIN_IB=mlx5_0:1 -mca pml ucx -x UCX_TLS=sm,dc,rc,knem,cuda_copy,cuda_ipc -npernode 4 --map-by core -np $ranks
Environment variables set by runhpc before the start of the run: UCX_MEMTYPE_CACHE = "n" UCX_TLS = "self,shm,cuda_copy"
============================================================================== CC 505.lbm_t(base) 513.soma_t(base) 518.tealeaf_t(base) 521.miniswp_t(base) 534.hpgmgfv_t(base) ------------------------------------------------------------------------------ nvc 21.5-0 LLVM 64-bit target on x86-64 Linux -tp skylake NVIDIA Compilers and Tools Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved. ------------------------------------------------------------------------------ ============================================================================== CXXC 532.sph_exa_t(base) ------------------------------------------------------------------------------ nvc++ 21.5-0 LLVM 64-bit target on x86-64 Linux -tp skylake NVIDIA Compilers and Tools Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved. ------------------------------------------------------------------------------ ============================================================================== FC 519.clvleaf_t(base) 528.pot3d_t(base) 535.weather_t(base) ------------------------------------------------------------------------------ nvfortran 21.5-0 LLVM 64-bit target on x86-64 Linux -tp skylake NVIDIA Compilers and Tools Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved. ------------------------------------------------------------------------------
521.miniswp_t: | -DUSE_KBA -DUSE_ACCELDIR |
532.sph_exa_t: | -DSPEC_USE_LT_IN_KERNELS --c++17 |
-Mfprelaxed -Mnouniform -Mstack_arrays -fast -acc=gpu -Minfo=accel -DSPEC_ACCEL_AWARE_MPI |
-Mfprelaxed -Mnouniform -Mstack_arrays -fast -acc=gpu -Minfo=accel -DSPEC_ACCEL_AWARE_MPI |
-DSPEC_ACCEL_AWARE_MPI -Mfprelaxed -Mnouniform -Mstack_arrays -fast -acc=gpu -Minfo=accel |