SPEChpc™ 2021 Tiny Result

Copyright 2021-2023 Standard Performance Evaluation Corporation

Lenovo Global Technology

ThinkSystem SR655 V3 (AMD EPYC 9654P, Nvidia H100-PCIE-80G)

SPEChpc 2021_tny_base = 17.70

SPEChpc 2021_tny_peak = 17.90

hpc2021 License: 28 Test Date: Jan-2023
Test Sponsor: Lenovo Global Technology Hardware Availability: Feb-2023
Tested by: Lenovo Global Technology Software Availability: Feb-2023

Benchmark result graphs are available in the PDF report.

Results Table

Benchmark Base Peak
Model Ranks Thrds/Rnk Seconds Ratio Seconds Ratio Seconds Ratio Model Ranks Thrds/Rnk Seconds Ratio Seconds Ratio Seconds Ratio
SPEChpc 2021_tny_base 17.70
SPEChpc 2021_tny_peak 17.90
Results appear in the order in which they were run. Bold underlined text indicates a median measurement.
505.lbm_t ACC 1 1 66.2 34.00 67.3 33.40 67.3 33.40 ACC 1 1 60.6 37.10 61.0 36.90 60.5 37.20
513.soma_t ACC 1 1 1020 36.20 1030 36.00 1030 36.10 ACC 1 1 1020 36.20 1030 36.00 1030 36.10
518.tealeaf_t ACC 1 1 1150 14.30 1150 14.30 1150 14.30 ACC 1 1 1140 14.40 1140 14.40 1140 14.40
519.clvleaf_t ACC 1 1 97.8 16.90 97.6 16.90 97.7 16.90 ACC 1 1 97.8 16.90 97.6 16.90 97.7 16.90
521.miniswp_t ACC 1 1 1140 14.10 1140 14.00 1140 14.00 ACC 1 1 1140 14.10 1140 14.00 1140 14.00
528.pot3d_t ACC 1 1 1370 15.50 1370 15.50 1370 15.50 ACC 1 1 1370 15.50 1370 15.50 1370 15.50
532.sph_exa_t ACC 1 1 3340 5.84 3190 6.11 3250 6.00 ACC 1 1 3230 6.04 3180 6.13 3240 6.01
534.hpgmgfv_t ACC 1 1 1080 10.90 1080 10.90 1080 10.90 ACC 1 1 1080 10.90 1080 10.90 1080 10.90
535.weather_t ACC 1 1 79.8 40.40 79.8 40.40 79.7 40.50 ACC 1 1 79.2 40.70 79.0 40.80 79.2 40.70
Hardware Summary
Type of System: Homogeneous Cluster
Compute Node: ThinkSystem SR655 V3
Compute Nodes Used: 1
Total Chips: 1
Total Cores: 96
Total Threads: 96
Total Memory: 384 GB
Max. Peak Threads: 1
Software Summary
Compiler: Nvidia HPC SDK 22.11
MPI Library: Open MPI 4.0.5
Other MPI Info: None
Base Parallel Model: ACC
Base Ranks Run: 1
Base Threads Run: 1
Peak Parallel Models: ACC
Minimum Peak Ranks: 1
Maximum Peak Ranks: 1
Max. Peak Threads: 1
Min. Peak Threads: 1

Node Description: ThinkSystem SR655 V3

Hardware
Number of nodes: 1
Uses of the node: compute
Vendor: Lenovo Global Technology
Model: ThinkSystem SR655 V3
CPU Name: AMD EPYC 9654P
CPU(s) orderable: 1 chips
Chips enabled: 1
Cores enabled: 96
Cores per chip: 96
Threads per core: 1
CPU Characteristics: Intel Turbo Boost Technology up to 3.7 GHz
CPU MHz: 2400
Primary Cache: 32 KB I + 32 KB D on chip per core
Secondary Cache: 1 MB I+D on chip per core
L3 Cache: 384 MB I+D on chip per chip
Other Cache: None
Memory: 384 GB (24 x 16 GB 2Rx4 PC5-4800B-R)
Disk Subsystem: 1x ThinkSystem 2.5" 5300 480GB SSD
Other Hardware: None
Accel Count: 8
Accel Model: Tesla H100 PCIe 80GB
Accel Vendor: Nvidia Corporation
Accel Type: GPU
Accel Connection: PCIe Gen5 x16
Accel ECC enabled: Yes
Accel Description: Nvidia Tesla H100 PCIe 80GB
Adapter: Mellanox ConnectX-6 HDR
Number of Adapters: 1
Slot Type: PCI-Express 5.0 x16
Data Rate: 200 Gb/s
Ports Used: 1
Interconnect Type: Nvidia Mellanox ConnectX-6 HDR
Software
Accelerator Driver: 525.60.13
Adapter: Mellanox ConnectX-6 HDR
Adapter Driver: 5.2-1.0.4
Adapter Firmware: 20.28.1002
Operating System: Red Hat Enterprise Linux Server release 9,
Kernel 5.14.0-70.22.1.el9_0.x86_64
Local File System: xfs
Shared File System: XFS
System State: Multi-user, run level 3
Other Software: None

Submit Notes

Indiviual Ranks were bound to the CPU cores on the same NUMA node as
the GPU using 'numactl' within the following "bind2.pl" perl script:
---- Start bind2.pl ------
my %bind;
$bind{0} = "1-3";
$bind{1} = "144-146";
$bind{2} = "8-10";
$bind{3} = "11-14";
$bind{4} = "41-43";
$bind{5} = "44-47";
$bind{6} = "61-63";
$bind{7} = "64-67";
my $rank = $ENV{OMPI_COMM_WORLD_LOCAL_RANK};
my $cmd = "taskset -c $bind{$rank} ";
while (my $arg = shift) {
 $cmd .= "$arg ";
}
my $rc = system($cmd);
exit($rc);
---- End bind.pl ------
The config file option 'submit' was used.
submit = mpirun --allow-run-as-root -x UCX_MEMTYPE_CACHE=n
-host localhost:2 -np $ranks perl $[top]/bind2.pl $command

General Notes

Environment variables set by runhpc before the start of the run:
UCX_MEMTYPE_CACHE = "n"
UCX_TLS = "self,shm,cuda_copy"

Compiler Version Notes

==============================================================================
 CC  505.lbm_t(base, peak) 513.soma_t(base, peak) 518.tealeaf_t(base, peak)
      521.miniswp_t(base, peak) 534.hpgmgfv_t(base, peak)
------------------------------------------------------------------------------
nvc 22.11-0 64-bit target on x86-64 Linux -tp zen3 
NVIDIA Compilers and Tools
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
------------------------------------------------------------------------------

==============================================================================
 CXXC 532.sph_exa_t(base, peak)
------------------------------------------------------------------------------
nvc++ 22.11-0 64-bit target on x86-64 Linux -tp zen3 
NVIDIA Compilers and Tools
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
------------------------------------------------------------------------------

==============================================================================
 FC  519.clvleaf_t(base, peak) 528.pot3d_t(base, peak) 535.weather_t(base,
      peak)
------------------------------------------------------------------------------
nvfortran 22.11-0 64-bit target on x86-64 Linux -tp zen3 
NVIDIA Compilers and Tools
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
------------------------------------------------------------------------------

Base Compiler Invocation

C benchmarks:

 mpicc 

C++ benchmarks:

 mpicxx 

Fortran benchmarks:

 mpif90 

Base Portability Flags

505.lbm_t:  -DSPEC_OPENACC_NO_SELF 
532.sph_exa_t:  --c++17 

Base Optimization Flags

C benchmarks:

 -fast   -acc=gpu   -Mfprelaxed   -Mnouniform   -Mstack_arrays   -DSPEC_ACCEL_AWARE_MPI 

C++ benchmarks:

 -fast   -acc=gpu   -Mfprelaxed   -Mnouniform   -Mstack_arrays   -DSPEC_ACCEL_AWARE_MPI 

Fortran benchmarks:

 -DSPEC_ACCEL_AWARE_MPI   -fast   -acc=gpu   -Mfprelaxed   -Mnouniform   -Mstack_arrays 

Base Other Flags

C benchmarks (except as noted below):

 -Ispecmpitime   -w 
521.miniswp_t:  -Ispecmpitime/   -w 
534.hpgmgfv_t:  -Ispecmpitime    -w 

C++ benchmarks:

 -Ispecmpitime    -w 

Fortran benchmarks (except as noted below):

 -w 
519.clvleaf_t:  -Ispecmpitime   -w 

Peak Compiler Invocation

C benchmarks:

 mpicc 

C++ benchmarks:

 mpicxx 

Fortran benchmarks:

 mpif90 

Peak Portability Flags

505.lbm_t:  -DSPEC_OPENACC_NO_SELF 

Peak Optimization Flags

C benchmarks:

505.lbm_t:  -fast   -acc=gpu   -O3   -Mfprelaxed   -Mnouniform   -DSPEC_ACCEL_AWARE_MPI 
513.soma_t:  basepeak = yes 
518.tealeaf_t:  -fast   -acc=gpu   -Msafeptr   -DSPEC_ACCEL_AWARE_MPI 
521.miniswp_t:  basepeak = yes 
534.hpgmgfv_t:  -fast   -acc=gpu   -static-nvidia   -DSPEC_ACCEL_AWARE_MPI 

C++ benchmarks:

 -fast   -acc=gpu   -O3   -Mfprelaxed   -Mnouniform   -Mstack_arrays   -static-nvidia   -DSPEC_ACCEL_AWARE_MPI 

Fortran benchmarks:

519.clvleaf_t:  basepeak = yes 
528.pot3d_t:  basepeak = yes 
535.weather_t:  -DSPEC_ACCEL_AWARE_MPI   -fast   -acc=gpu   -O3   -Mfprelaxed   -Mnouniform   -Mstack_arrays   -static-nvidia 

Peak Other Flags

C benchmarks (except as noted below):

 -Ispecmpitime   -w 
521.miniswp_t:  -Ispecmpitime/   -w 
534.hpgmgfv_t:  -Ispecmpitime    -w 

C++ benchmarks:

 -Ispecmpitime    -w 

Fortran benchmarks (except as noted below):

 -w 
519.clvleaf_t:  -Ispecmpitime   -w 

The flags file that was used to format this result can be browsed at
http://www.spec.org/hpc2021/flags/nv2021_flags_v1.0.3.html.

You can also download the XML flags source by saving the following link:
http://www.spec.org/hpc2021/flags/nv2021_flags_v1.0.3.xml.