SPEChpc™ 2021 Tiny Result

Lenovo Global Technology

ThinkSystem SR655 V3 (AMD EPYC 9654P, Nvidia H100-PCIE-80G)

SPEChpc 2021_tny_base = 17.70

SPEChpc 2021_tny_peak = 17.90

hpc2021 License:	28	Test Date:	Jan-2023
Test Sponsor:	Lenovo Global Technology	Hardware Availability:	Feb-2023
Tested by:	Lenovo Global Technology	Software Availability:	Feb-2023

Benchmark result graphs are available in the PDF report.

Results Table

Benchmark	Base									Peak
Benchmark	Model	Ranks	Thrds/Rnk	Seconds	Ratio	Seconds	Ratio	Seconds	Ratio	Model	Ranks	Thrds/Rnk	Seconds	Ratio	Seconds	Ratio	Seconds	Ratio
SPEChpc 2021_tny_base					17.70
SPEChpc 2021_tny_peak					17.90
Results appear in the order in which they were run. Bold underlined text indicates a median measurement.
505.lbm_t	ACC	1	1	66.2	34.00	67.3	33.40	67.3	33.40	ACC	1	1	60.6	37.10	61.0	36.90	60.5	37.20
513.soma_t	ACC	1	1	1020	36.20	1030	36.00	1030	36.10	ACC	1	1	1020	36.20	1030	36.00	1030	36.10
518.tealeaf_t	ACC	1	1	1150	14.30	1150	14.30	1150	14.30	ACC	1	1	1140	14.40	1140	14.40	1140	14.40
519.clvleaf_t	ACC	1	1	97.8	16.90	97.6	16.90	97.7	16.90	ACC	1	1	97.8	16.90	97.6	16.90	97.7	16.90
521.miniswp_t	ACC	1	1	1140	14.10	1140	14.00	1140	14.00	ACC	1	1	1140	14.10	1140	14.00	1140	14.00
528.pot3d_t	ACC	1	1	1370	15.50	1370	15.50	1370	15.50	ACC	1	1	1370	15.50	1370	15.50	1370	15.50
532.sph_exa_t	ACC	1	1	3340	5.84	3190	6.11	3250	6.00	ACC	1	1	3230	6.04	3180	6.13	3240	6.01
534.hpgmgfv_t	ACC	1	1	1080	10.90	1080	10.90	1080	10.90	ACC	1	1	1080	10.90	1080	10.90	1080	10.90
535.weather_t	ACC	1	1	79.8	40.40	79.8	40.40	79.7	40.50	ACC	1	1	79.2	40.70	79.0	40.80	79.2	40.70

Hardware Summary
Type of System:	Homogeneous Cluster
Compute Node:	ThinkSystem SR655 V3
Compute Nodes Used:	1
Total Chips:	1
Total Cores:	96
Total Threads:	96
Total Memory:	384 GB
Max. Peak Threads:	1

Software Summary
Compiler:	Nvidia HPC SDK 22.11
MPI Library:	Open MPI 4.0.5
Other MPI Info:	None
Base Parallel Model:	ACC
Base Ranks Run:	1
Base Threads Run:	1
Peak Parallel Models:	ACC
Minimum Peak Ranks:	1
Maximum Peak Ranks:	1
Max. Peak Threads:	1
Min. Peak Threads:	1

Node Description: ThinkSystem SR655 V3

Hardware
Number of nodes:	1
Uses of the node:	compute
Vendor:	Lenovo Global Technology
Model:	ThinkSystem SR655 V3
CPU Name:	AMD EPYC 9654P
CPU(s) orderable:	1 chips
Chips enabled:	1
Cores enabled:	96
Cores per chip:	96
Threads per core:	1
CPU Characteristics:	Intel Turbo Boost Technology up to 3.7 GHz
CPU MHz:	2400
Primary Cache:	32 KB I + 32 KB D on chip per core
Secondary Cache:	1 MB I+D on chip per core
L3 Cache:	384 MB I+D on chip per chip
Other Cache:	None
Memory:	384 GB (24 x 16 GB 2Rx4 PC5-4800B-R)
Disk Subsystem:	1x ThinkSystem 2.5" 5300 480GB SSD
Other Hardware:	None
Accel Count:	8
Accel Model:	Tesla H100 PCIe 80GB
Accel Vendor:	Nvidia Corporation
Accel Type:	GPU
Accel Connection:	PCIe Gen5 x16
Accel ECC enabled:	Yes
Accel Description:	Nvidia Tesla H100 PCIe 80GB
Adapter:	Mellanox ConnectX-6 HDR
Number of Adapters:	1
Slot Type:	PCI-Express 5.0 x16
Data Rate:	200 Gb/s
Ports Used:	1
Interconnect Type:	Nvidia Mellanox ConnectX-6 HDR

Software
Accelerator Driver:	525.60.13
Adapter:	Mellanox ConnectX-6 HDR
Adapter Driver:	5.2-1.0.4
Adapter Firmware:	20.28.1002
Operating System:	Red Hat Enterprise Linux Server release 9, Kernel 5.14.0-70.22.1.el9_0.x86_64
Local File System:	xfs
Shared File System:	XFS
System State:	Multi-user, run level 3
Other Software:	None

Submit Notes

Indiviual Ranks were bound to the CPU cores on the same NUMA node as
the GPU using 'numactl' within the following "bind2.pl" perl script:
---- Start bind2.pl ------
my %bind;
$bind{0} = "1-3";
$bind{1} = "144-146";
$bind{2} = "8-10";
$bind{3} = "11-14";
$bind{4} = "41-43";
$bind{5} = "44-47";
$bind{6} = "61-63";
$bind{7} = "64-67";
my $rank = $ENV{OMPI_COMM_WORLD_LOCAL_RANK};
my $cmd = "taskset -c $bind{$rank} ";
while (my $arg = shift) {
 $cmd .= "$arg ";
}
my $rc = system($cmd);
exit($rc);
---- End bind.pl ------
The config file option 'submit' was used.
submit = mpirun --allow-run-as-root -x UCX_MEMTYPE_CACHE=n
-host localhost:2 -np $ranks perl $[top]/bind2.pl $command

General Notes

Environment variables set by runhpc before the start of the run:
UCX_MEMTYPE_CACHE = "n"
UCX_TLS = "self,shm,cuda_copy"

Compiler Version Notes

==============================================================================
 CC  505.lbm_t(base, peak) 513.soma_t(base, peak) 518.tealeaf_t(base, peak)
      521.miniswp_t(base, peak) 534.hpgmgfv_t(base, peak)
------------------------------------------------------------------------------
nvc 22.11-0 64-bit target on x86-64 Linux -tp zen3 
NVIDIA Compilers and Tools
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
------------------------------------------------------------------------------

==============================================================================
 CXXC 532.sph_exa_t(base, peak)
------------------------------------------------------------------------------
nvc++ 22.11-0 64-bit target on x86-64 Linux -tp zen3 
NVIDIA Compilers and Tools
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
------------------------------------------------------------------------------

==============================================================================
 FC  519.clvleaf_t(base, peak) 528.pot3d_t(base, peak) 535.weather_t(base,
      peak)
------------------------------------------------------------------------------
nvfortran 22.11-0 64-bit target on x86-64 Linux -tp zen3 
NVIDIA Compilers and Tools
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
------------------------------------------------------------------------------

Base Compiler Invocation

C benchmarks:

mpicc

C++ benchmarks:

mpicxx

Fortran benchmarks:

mpif90

Base Portability Flags

505.lbm_t:	-DSPEC_OPENACC_NO_SELF
532.sph_exa_t:	--c++17

Base Other Flags

C benchmarks (except as noted below):

	-Ispecmpitime -w
521.miniswp_t:	-Ispecmpitime/ -w
534.hpgmgfv_t:	-Ispecmpitime -w

C++ benchmarks:

-Ispecmpitime -w

Fortran benchmarks (except as noted below):

	-w
519.clvleaf_t:	-Ispecmpitime -w

Peak Compiler Invocation

C benchmarks:

mpicc

C++ benchmarks:

mpicxx

Fortran benchmarks:

mpif90

Peak Portability Flags

505.lbm_t:

-DSPEC_OPENACC_NO_SELF

Peak Optimization Flags

C benchmarks:

505.lbm_t:	-fast -acc=gpu -O3 -Mfprelaxed -Mnouniform -DSPEC_ACCEL_AWARE_MPI
513.soma_t:	basepeak = yes
518.tealeaf_t:	-fast -acc=gpu -Msafeptr -DSPEC_ACCEL_AWARE_MPI
521.miniswp_t:	basepeak = yes
534.hpgmgfv_t:	-fast -acc=gpu -static-nvidia -DSPEC_ACCEL_AWARE_MPI

C++ benchmarks:

-fast -acc=gpu -O3 -Mfprelaxed -Mnouniform -Mstack_arrays -static-nvidia -DSPEC_ACCEL_AWARE_MPI

Fortran benchmarks:

519.clvleaf_t:	basepeak = yes
528.pot3d_t:	basepeak = yes
535.weather_t:	-DSPEC_ACCEL_AWARE_MPI -fast -acc=gpu -O3 -Mfprelaxed -Mnouniform -Mstack_arrays -static-nvidia

Peak Other Flags

C benchmarks (except as noted below):

	-Ispecmpitime -w
521.miniswp_t:	-Ispecmpitime/ -w
534.hpgmgfv_t:	-Ispecmpitime -w

C++ benchmarks:

-Ispecmpitime -w

Fortran benchmarks (except as noted below):

	-w
519.clvleaf_t:	-Ispecmpitime -w

The flags file that was used to format this result can be browsed at
http://www.spec.org/hpc2021/flags/nv2021_flags_v1.0.3.html.

You can also download the XML flags source by saving the following link:
http://www.spec.org/hpc2021/flags/nv2021_flags_v1.0.3.xml.