SPEChpc™ 2021 Tiny Result

Copyright 2021-2023 Standard Performance Evaluation Corporation

Lenovo Global Technology

ThinkSystem SD650-N V2 (Intel Xeon Platinum 8368Q, Tesla A100-SXM-40GB)

SPEChpc 2021_tny_base = 48.50

SPEChpc 2021_tny_peak = Not Run

hpc2021 License: 28 Test Date: Aug-2021
Test Sponsor: Lenovo Global Technology Hardware Availability: Aug-2021
Tested by: Lenovo Global Technology Software Availability: Aug-2021

Benchmark result graphs are available in the PDF report.

Results Table

Benchmark Base Peak
Model Ranks Thrds/Rnk Seconds Ratio Seconds Ratio Seconds Ratio Model Ranks Thrds/Rnk Seconds Ratio Seconds Ratio Seconds Ratio
SPEChpc 2021_tny_base 48.50
SPEChpc 2021_tny_peak Not Run
Results appear in the order in which they were run. Bold underlined text indicates a median measurement.
505.lbm_t ACC 8 1 13.7 1640 13.7 1650 13.6 1650
513.soma_t ACC 8 1 36.1 1030 36.1 1020 36.2 1020
518.tealeaf_t ACC 8 1 99.2 16.6 99.1 16.6 99.2 16.6
519.clvleaf_t ACC 8 1 20.2 81.9 20.0 82.5 20.2 81.9
521.miniswp_t ACC 8 1 81.4 19.7 82.7 19.4 82.1 19.5
528.pot3d_t ACC 8 1 44.2 48.1 44.5 47.7 44.7 47.6
532.sph_exa_t ACC 8 1 73.1 26.7 73.0 26.7 73.3 26.6
534.hpgmgfv_t ACC 8 1 69.7 16.9 69.4 16.9 68.3 17.2
535.weather_t ACC 8 1 20.9 1550 21.0 1530 21.0 1530
Hardware Summary
Type of System: Homogenous Cluster
Compute Node: ThinkSystem SD650-N V2
Interconnect: Nvidia Mellanox ConnectX-6 HDR
File Server Node: ThinkSystem SD650-N V2
Compute Nodes Used: 2
Total Chips: 4
Total Cores: 152
Total Threads: 152
Total Memory: 1 TB
Software Summary
Compiler: Nvidia HPC SDK 21.5
MPI Library: Open MPI 4.0.5
Base Parallel Model: ACC
Base Ranks Run: 8
Base Threads Run: 1
Peak Parallel Models: Not Run

Node Description: ThinkSystem SD650-N V2

Hardware
Number of nodes: 2
Uses of the node: compute
Vendor: Lenovo Global Technology
Model: ThinkSystem SD650-N V2
CPU Name: Intel Xeon Platinum 8368Q
CPU(s) orderable: 2 chips
Chips enabled: 2
Cores enabled: 76
Cores per chip: 38
Threads per core: 1
CPU Characteristics: Turbo up to 3.7 GHz
CPU MHz: 2600
Primary Cache: 32 KB I + 48 KB D on chip per core
Secondary Cache: 1280 KB I+D on chip per core
L3 Cache: 57 MB I+D on chip per chip
Other Cache: None
Memory: 512 GB (16 x 32 GB 2Rx8 PC4-3200A-R)
Disk Subsystem: 1 x 480 GB 2.5" SSD
Other Hardware: None
Accel Count: 4
Accel Model: Tesla A100 SXM4 40GB
Accel Vendor: Nvidia Corporation
Accel Type: GPU
Accel Connection: NVLink
Accel ECC enabled: Yes
Accel Description: Nvidia Tesla A100 SXM4 40GB
Adapter: Mellanox ConnectX-6 HDR
Number of Adapters: 1
Slot Type: PCI-Express 4.0 x16
Data Rate: 200 Gb/s
Ports Used: 1
Interconnect Type: Nvidia Mellanox ConnectX-6 HDR
Software
Accelerator Driver: 460.32.03
Adapter: Mellanox ConnectX-6 HDR
Adapter Driver: 5.1-2.3.7
Adapter Firmware: 20.28.1002
Operating System: Red Hat Enterprise Linux Server release 8.3,
Kernel 4.18.0-193.el8.x86_64
Local File System: xfs
Shared File System: NFS
System State: Multi-user, run level 3
Other Software: None

Node Description: ThinkSystem SD650-N V2

Hardware
Number of nodes: 1
Uses of the node: Fileserver
Vendor: Lenovo Global Technology
Model: ThinkSystem SD650-N V2
CPU Name: Intel Xeon Platinum 8368Q
CPU(s) orderable: 2 chips
Chips enabled: 2
Cores enabled: 76
Cores per chip: 38
Threads per core: 1
CPU Characteristics: Turbo up to 3.7 GHz
CPU MHz: 2600
Primary Cache: 32 KB I + 48 KB D on chip per core
Secondary Cache: 1280 KB I+D on chip per core
L3 Cache: 57 MB I+D on chip per chip
Other Cache: None
Memory: 512 GB (16 x 32 GB 2Rx8 PC4-3200A-R)
Disk Subsystem: 1 x 960 GB NVME 2.5" SSD
Other Hardware: None
Accel Count: 4
Accel Model: Tesla A100 SXM4 40GB
Accel Vendor: Nvidia
Accel Type: GPU
Accel Connection: Nvidia Tesla A100 SXM4 40GB
Accel ECC enabled: Yes
Accel Description: Nvidia Tesla A100 SXM4 40GB
Adapter: Mellanox ConnectX-6 HDR
Number of Adapters: 1
Slot Type: PCI-Express 4.0 x16
Data Rate: 200 Gb/s
Ports Used: 1
Interconnect Type: Nvidia Mellanox ConnectX-6 HDR
Software
Accelerator Driver: N/A
Adapter: Mellanox ConnectX-6 HDR
Adapter Driver: 5.1-2.3.7
Adapter Firmware: 20.28.1002
Operating System: Red Hat Enterprise Linux Server release 8.3
Local File System: xfs
Shared File System: N/A
System State: Multi-User, run level 3
Other Software: None

Interconnect Description: Nvidia Mellanox ConnectX-6 HDR

Hardware
Vendor: Nvidia
Model: Nvidia Mellanox ConnectX-6 HDR
Switch Model: N/A
Number of Switches: 0
Number of Ports: 0
Data Rate: N/A
Firmware: N/A
Topology: Direct Connect
Primary Use: MPI Traffic, NFS Access
Software

Submit Notes

Indiviual Ranks were bound to the CPU cores on the same NUMA node as
the GPU using 'numactl' within the following "bind.pl" perl script:
---- Start bind.pl ------
my %bind;
$bind{0} = "1-3";
$bind{1} = "4-7";
$bind{2} = "8-10";
$bind{3} = "11-14";
$bind{4} = "41-43";
$bind{5} = "44-47";
$bind{6} = "61-63";
$bind{7} = "64-67";
my $rank = $ENV{OMPI_COMM_WORLD_LOCAL_RANK};
my $cmd = "taskset -c $bind{$rank} ";
while (my $arg = shift) {
	$cmd .= "$arg ";
}
my $rc = system($cmd);
exit($rc);
---- End bind.pl ------
 The config file option 'submit' was used.
 submit = mpirun ${MPIRUN_OPTS} --allow-run-as-root --oversubscribe
 -host 192.168.99.171:4,192.168.99.172:4 -x UCX_MEMTYPE_CACHE=n
 -mca coll_hcoll_enable 1 -x HCOLL_MAIN_IB=mlx5_0:1 -mca pml ucx
 -x UCX_TLS=sm,dc,rc,knem,cuda_copy,cuda_ipc -npernode 4 --map-by core -np $ranks

General Notes

Environment variables set by runhpc before the start of the run:
UCX_MEMTYPE_CACHE = "n"
UCX_TLS = "self,shm,cuda_copy"

Compiler Version Notes

==============================================================================
 CC  505.lbm_t(base) 513.soma_t(base) 518.tealeaf_t(base) 521.miniswp_t(base)
      534.hpgmgfv_t(base)
------------------------------------------------------------------------------
nvc 21.5-0 LLVM 64-bit target on x86-64 Linux -tp skylake 
NVIDIA Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
------------------------------------------------------------------------------

==============================================================================
 CXXC 532.sph_exa_t(base)
------------------------------------------------------------------------------
nvc++ 21.5-0 LLVM 64-bit target on x86-64 Linux -tp skylake 
NVIDIA Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
------------------------------------------------------------------------------

==============================================================================
 FC  519.clvleaf_t(base) 528.pot3d_t(base) 535.weather_t(base)
------------------------------------------------------------------------------
nvfortran 21.5-0 LLVM 64-bit target on x86-64 Linux -tp skylake 
NVIDIA Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.
------------------------------------------------------------------------------

Base Compiler Invocation

C benchmarks:

 mpicc 

C++ benchmarks:

 mpicxx 

Fortran benchmarks:

 mpif90 

Base Portability Flags

521.miniswp_t:  -DUSE_KBA   -DUSE_ACCELDIR 
532.sph_exa_t:  -DSPEC_USE_LT_IN_KERNELS   --c++17 

Base Optimization Flags

C benchmarks:

 -Mfprelaxed   -Mnouniform   -Mstack_arrays   -fast   -acc=gpu   -Minfo=accel   -DSPEC_ACCEL_AWARE_MPI 

C++ benchmarks:

 -Mfprelaxed   -Mnouniform   -Mstack_arrays   -fast   -acc=gpu   -Minfo=accel   -DSPEC_ACCEL_AWARE_MPI 

Fortran benchmarks:

 -DSPEC_ACCEL_AWARE_MPI   -Mfprelaxed   -Mnouniform   -Mstack_arrays   -fast   -acc=gpu   -Minfo=accel 

Base Other Flags

C benchmarks:

 -w 

C++ benchmarks:

 -w 

Fortran benchmarks:

 -w 

The flags file that was used to format this result can be browsed at
http://www.spec.org/hpc2021/flags/nv2021_flags.html.

You can also download the XML flags source by saving the following link:
http://www.spec.org/hpc2021/flags/nv2021_flags.xml.