SPEChpc™ 2021 Result File Fields

Last updated: 6 July 2022 mec, original document by cgp

ABSTRACT
This document describes the contents and arrangement of a SPEChpc 2021 result disclosure. In this document, we will refer to the arrangement of fields in the HTML format of a SPEChpc 2021 result disclosure, since there may be differences in arrangement of result fields between the text, HTML and other report formats (CSV, PDF, PS, etc.). While the reports are formatted in a way that is intended to be self-explanatory to the reader, he/she may desire a formal statement of the meaning of a field, or have technical questions about information provided in the fields. Further, the SPEC website contain links from the fields of the published reports to their descriptions in this document.

The contents of the result reports are either generated by the run of the benchmarks, extracted from the configuration file that controls the building and running of the benchmarks, or are provided by descriptive fields filled in by the tester. These follow conventions that are specified in the separate documents on the Run Rules, Config Files, and XML Flags Files. Reports published on the SPEC website have been peer-reviewed by the members of the SPEC/HPG committee and are expected to be correct in every detail.


(To check for possible updates to this document, please see http://www.spec.org/hpc2021/Docs/)

Abbreviated Contents

Selecting one of the following will take you to the detailed table of contents for that section or subsection:

1. SPEChpc 2021 Benchmarks

2. Result and Configuration Summary

2.1 Results Header

2.2 Performance Metrics

2.3 Results Table

2.4 System Hardware Summary

2.5 System Software Summary

2.6 Internal Timer Table

3. Node Description

3.1 Node Hardware Description

3.2 Node Software Description

4. Interconnect Description

5. Compilation Description

6. Tester-provide notes

7. Errors

1 SPEChpc 2021 Benchmarks

SPEChpc has 9 benchmarks (5 C, 1 C++, and 3 Fortran), organized into 4 suites by workload size: Tiny, Small, Medium, and Large.

Application Name Benchmark Language Approximate LOC Application Area
Tiny Small Medium Large
LBM D2Q37 505.lbm_t 605.lbm_s 705.lbm_m 805.lbm_l C 9000 Computational Fluid Dynamics
SOMA Offers Monte-Carlo Acceleration 513.soma_t 613.soma_s Not included. C 9500 Physics / Polymeric Systems
Tealeaf 518.tealeaf_t 618.tealeaf_s 718.tealeaf_m 818.tealeaf_l C 5400 Physics / High Energy Physics
Cloverleaf 519.clvleaf_t 619.clvleaf_s 719.clvleaf_m 819.clvleaf_l Fortran 12,500 Physics / High Energy Physics
Minisweep 521.miniswp_t 621.miniswp_s Not included. C 17,500 Nuclear Engineering - Radiation Transport
POT3D 528.pot3d_t 628.pot3d_s 728.pot3d_m 828.pot3d_l Fortran 495,000 (Includes HDF5 library) Solar Physics
SPH-EXA 532.sph_exa_t 632.sph_exa_s Not included. C++14 3400 Astrophysics and Cosmology
HPGMG-FV 534.hpgmgfv_t 634.hpgmgfv_s 734.hpgmgfv_m 834.hpgmgfv_l C 16,700 Cosmology, Astrophysics, Combustion
miniWeather 535.weather_t 635.weather_s 735.weather_m 835.weather_l Fortran 1100 Weather

2. Result and Configuration Summary

2.1 Result Header

System VendorThe vendor of the system under test.
System NameThe name of the system under test.
Hardware AvailabilityThe date when all the hardware necessary to run the result is generally available. For example, if the CPU is available in Aug-2021, but the memory is not available until Oct-2021, then the hardware availability date is Oct-2021 (unless some other component pushes it out farther).
Software AvailabilityThe date when all the software necessary to run the result is generally available. For example, if the operating system is available in Aug-2021, but the compiler or other libraries are not available until Oct-2021, then the software availability date is Oct-2021 (unless some other component pushes it out farther).
Test dateThe date when the test is run. This value is recorded by SPEC tools; the time reported by the system under test is recorded in the raw result file.
Test sponsorThe name of the organization or individual that sponsored the test. Generally, this is the name of the license holder.
Tested by

The name of the organization or individual that ran the test. If there are installations in multiple geographic locations, sometimes that will also be listed in this field.

2.2 Performance Metrics

SPEChpc 2021_tny_peakThe geometric mean of 9 normalized ratios (one for each benchmark) when compiled with aggressive optimization for each benchmark in the Tiny Suite.
SPEChpc 2021_tny_baseThe geometric mean of 9 normalized ratios (one for each benchmark) when compiled with conservative optimization for each benchmark in the Tiny Suite.

SPEChpc 2021_sml_peakThe geometric mean of 9 normalized ratios (one for each benchmark) when compiled with aggressive optimization for each benchmark in the Small Suite.
SPEChpc 2021_sml_baseThe geometric mean of 9 normalized ratios (one for each benchmark) when compiled with conservative optimization for each benchmark in the Small Suite.

SPEChpc 2021_med_peakThe geometric mean of 6 normalized ratios (one for each benchmark) when compiled with aggressive optimization for each benchmark in the Medium Suite.
SPEChpc 2021_med_baseThe geometric mean of 6 normalized ratios (one for each benchmark) when compiled with conservative optimization for each benchmark in the Medium Suite.

SPEChpc 2021_lrg_peakThe geometric mean of 6 normalized ratios (one for each benchmark) when compiled with aggressive optimization for each benchmark in the Large Suite.
SPEChpc 2021_lrg_baseThe geometric mean of 6 normalized ratios (one for each benchmark) when compiled with conservative optimization for each benchmark in the Large Suite.

More detailed information about the performance metrics may be found in section 4.3.1 of the SPEChpc 2021 Run and Reporting Rules.


2.3 Results table

BenchmarkThe name of the benchmarks making up this SPEChpc 2021 suites.
Parallel Model

This column indicates the parallel model being used by the benchmark. Parallel Models:

  • MPI: Use MPI only without a node level parallel model.
  • ACC: Use MPI+OpenACC.
  • OMP: Use MPI+OpenMP using task/thread based directives.
  • TGT: Use MPI+OpenMP using 'target' based directives.
RanksThis column indicates the number of MPI ranks (processes) that were used during the running of the benchmark.
Threads per rankThis column indicates the number of host threads (OpenMP or OpenACC) per rank that were used during the running of the benchmark.
SecondsThis is the amount of elapsed (wall) time in seconds that the benchmark took to run from job submit to job completion.
RatioThis is the ratio of benchmark run time (number of seconds) to the run time on the reference platform.

Identifying the Median results:

For a reportable SPEChpc 2021 run, at least two iterations of each benchmark are run, and the median of the runs (lower of middle two, if even) is selected to be part of the overall metric. In output formats that support it, the medians in the result table are underlined in bold. The ".txt" report will mark each median score with an asterisk "*".

Significance of the run order

Each iteration in the SPEChpc 2021 benchmark suites will run each benchmark once, in order. For example, given benchmarks "110.aaa", "120.bbb", and "130.ccc", here's what you might see as the benchmarks were run if they were part of each suite:

SPEChpc 2021

    Running (#1) 110.aaa ref base oct09a default
    Running (#1) 120.bbb ref base oct09a default
    Running (#1) 130.ccc ref base oct09a default
    Running (#2) 110.aaa ref base oct09a default
    Running (#2) 120.bbb ref base oct09a default
    Running (#2) 130.ccc ref base oct09a default
    Running (#3) 110.aaa ref base oct09a default
    Running (#3) 120.bbb ref base oct09a default
    Running (#3) 130.ccc ref base oct09a default

When you read the results table from a run, the results in the results table are listed in the order that they were run, in column-major order. In other words, if you're interested in the base scores as they were produced, start in the upper-lefthand column and read down the first column, then read the middle column, then the right column.

If the benchmarks were run with both base and peak tuning, all base runs were completed before starting peak.


2.4 System Hardware Summary

Collective hardware details across the whole system. Run rules relating to these items can be found in section 4.2 of the SPEChpc 2021 Run and Reporting Rules.

Type of SystemDescription of the system being benchmarked: SMP, Homogeneous Cluster or Heterogeneous Cluster.
Compute NodeThese systems are used as compute nodes during the run of the benchmark.
InterconnectsThese devices are used for the interconnects during the benchmark run.
Compute Nodes UsedNumber of compute nodes used to execute the benchmark.
Total ChipsThe total number of chips in the compute nodes available to execute the benchmark.
Total CoresThe total number of cores in the compute nodes available to execute the benchmark.
Total ThreadsThe total number of threads in the compute nodes available to execute the benchmark.
Total MemoryThe total amount of memory in all of the compute nodes available to execute the benchmark.

2.5 System Software Summary

Information on how the benchmark binaries are constructed. Run rules relating to these items can be found in section 4.2 of the SPEChpc 2021 Run and Reporting Rules.

CompilerThe names and versions of compilers used to generate the result.
MPI LibraryThe names and versions of MPI Libraries used to generate the result.
Other MPI InformationAny performance-relevant MPI information used to generate the result.
Other SoftwareAny performance-relevant non-compiler software used, including third-party libraries, accelerators, etc.
Base Parallel ModelThe parallel model used in base.
Base RanksThe number of MPI ranks used to execute the benchmark on the base optimization runs.
Base Threads per RankThe number of host threads run per rank used to execute the benchmark on the base optimization runs.
Peak Parallel ModelsThe list of parallel models used in peak.
Minimum Peak RanksThe smallest number of ranks used to execute the benchmark runs using peak optimizations.
Maximum Peak RanksThe largest number of ranks used to execute the peak version of the benchmark.
Minimum Peak Threads per RankThe smallest number of host threads per rank used to execute the benchmark runs using peak optimizations.
Maximum Peak Threads per RankThe largest number of host threads per rank used to execute the peak version of the benchmark.

2.6 Internal Timer Table

As part of SPEC/HPG's follow-on SPEChpc weak scaling suite (currently under development), internal timers were added to the codes to measure MPI initializations overhead, application initialization overhead, the core computation time, and residual time. For weak scaling, the core compute time will be used to determine a throughput "Figure of Merit" (FOM) measurement of units of work over time.

For the current strong scaled suites, SPEC/HPG decided to optionally include this measurement as it may better help understanding scaling.

The internal timing information may only be used for academic and research purposes. or as a derived value per SPEC's Fair Use Rules

Reporting of the internal timing is disabled by default. To enable, either add "showtimer=1" to your config file, use the runhpc --showtimer=1 option, or edit the resulting "raw" (.rsf) file changing the "showtimer" field to 1 and use rawformat utility to reformat the reports.

Reported TimeTime in seconds reported by the SPEC tools which is used in computing the benchmark ratio. Same as Seconds
Start-up overheadReported time less Application Time. Captures the overhead time for node allocation, scheduled overhead, MPI start-up time, etc.
Initialization TimeTime the application spends initializing data, reading input files, performing domain decomposition, etc.
Core Compute TimeTime the application spends performaning it's core computation. Time includes MPI communication
Residual TimeRemaining application time not captured under intialization or core compute. Includes items such as verification of results or saving output data files.
Application TimeTime measured between MPI_Init and MPI_Finalize. Note that this time is measured by the tools and shown in the log files but not included in the Internal Timer Table given it is the summation of the Intialization, Core Compute, and Residual Time.

3 Node Description

SPEChpc 2021 is capable of running on large heterogeneous clusters containing different kinds of nodes linked by different kinds of interconnects. The report format contains a separate section for each kind of node and each kind of interconnect. Section 4.2 of the Run Rules document describes what information is to be provided by the tester.

For example, an SMP will consist of one node and no interconnect. Homogeneous cluster systems will typically consist of one kind of compute node and one or two kinds of interconnects. There will also often be a file server node. It is possible that the node and interconnect components are available from their respective vendors but no vendor sells the configured system as a whole; in this case the report is intended to provide enough detail to reconstruct an equivalent system with equivalent performance.

3.1 Node Hardware Description(s)

Description of the hardware configuration of the node.

Number of nodesThe number of nodes of this type in the system.
Uses of the NodeThe purpose of this type of node: compute node, file server, head node, etc.
VendorThe manufacturer of this kind of node.
ModelThe model name of this kind of node.
CPU NameA manufacturer-determined formal name of the processor used in this node type.
CPU(s) orderableThe number of CPUs that can be ordered in this kind of node.
Chip(s)/CPU(s) enabledThe number of Chips (CPUs) that were enabled and active in the node during the benchmark run.
Core(s) enabledThe number of cores that were enabled and active in the node during the benchmark run.
Cores per ChipThe number of cores in each chip that were enabled and active in the node during the benchmark run.
Threads per CoreThe number of threads in each core that were enabled and active in the node during the benchmark run.
CPU CharacteristicsTechnical characteristics to help identify the processor type used in the node.
CPU MHzThe clock frequency of the CPU used in the node, expressed in megahertz.
Primary CacheDescription (size and organization) of the CPU's primary cache. This cache is also referred to as "L1 cache".
Secondary CacheDescription (size and organization) of the CPU's secondary cache. This cache is also referred to as "L2 cache".
L3 CacheDescription (size and organization) of the CPU's tertiary, or "Level 3" cache.
Other CacheDescription (size and organization) of any other levels of cache memory.
MemoryDescription of the system main memory configuration. End-user options that affect performance, such as arrangement of memory modules, interleaving, latency, etc, are documented here.
Disk SubsystemA description of the disk subsystem (size, type, and RAID level if any) of the storage used to hold the benchmark tree during the run.
Other HardwareAny additional equipment added to improve performance.
Accelerator ModelThe model name of the accelerator(s).
Accelerator CountThe number of accelerators of each model.
Accelerator VendorThe company/vendor of the accelerator.
Accelerator TypeThe Describes the type of accelerator. Possible values include, but not limited to: GPU, APU, CPU, FPGA, etc.
Accelerator ConnectionTells how the accelerator is connected to the system. Possible descriptions include, but not limited to: PCIe, integrated, etc.
Accelerator ECC EnabledShows if the Accelerator uses ECC for its memory.
Accelerator DescriptionFurther description of the accelertor.
Adapter Card(s)There will be one of these groups of entries for each network adapter -- aka Host Channel Adapter (HCA) or Network Interface Card (NIC) -- used to connect to an interconnect to carry MPI or file server traffic. This field contains this adapter's vendor and model name.
Number of AdaptersHow many of these adapters attach to the node.
Adapter Slot TypeThe type of slot used to attach the adapter card to the node.
Data RateThe per-port, nominal data transfer rate of the adapter.
Ports UsedThe number of ports used to run the benchmark on the adapter (especially for those which have multiple ports available).
Interconnect TypeIn general terms, the type of interconnect (Ethernet, InfiniBand, etc.) attached to this adapter.

3.2 Node Software Description

Software configuration of the node.

Adapter DriverThe driver type and level for this adapter.
Adaptor FirmwareThe adaptor firmware type and level for this device.
Operating SystemThe operating system name and version. If there are patches applied that affect performance, they must be disclosed in the Notes.
Local File SystemThe type of the file system local to each compute node.
Shared File SystemThe type of the file system used to contain the run directories.
System StateThe state (sometimes called "run level") of the system while the benchmarks were being run. Generally, this is "single user", "multi-user", "default", etc.
Other SoftwareAny performance-relevant non-compiler software used, including third-party libraries, accelerators, etc.
Accelerator DriverThe name and version of the software driver used to control the accelerator.

4 Interconnect Description

Description of the configuration of the interconnect.


VendorThe manufacturer(s) of this interconnect.
ModelThe model name(s) of the interconnect as a whole, or components of it -- not including the switch model, which is the next field.
Switch Model(s)The model and manufacturer of the switching element(s) of this interconnect. There may be more than one kind declared.
Number of switchesThe number of switches of this type in the interconnect.
Ports per switchThe number of ports per switch available for carrying the type of traffic noted in the "Primary Use" field.
Data RateThe per-port, nominal data transfer rate of the adapter.
FirmwareThe Firmware type and level for the switch(es).
TopologyDescription of the arrangement of switches and links in the interconnect.
Primary UseThe kind of data traffic carried by the interconnect: MPI, file server, etc.

5 Compilation Description

This section describes how the benchmarks are compiled. The HTML and PDF reports contain links from the settings that are listed, to the descriptions of those settings in the XML flags file report.

Much information is derived from compilation rules written into the config file and interpreted according to rules specified in the XML flags file. Free-form notes can be added to this. Sections only show up if the corresponding flags are used, such as peak optimization flags; otherwise the section is not printed. Section 2 of the MPI2007 Run and Reporting Rules document gives rules on how these items can be used in reportable runs.

Base & Peak Unknown Flags

This section lists flags, used in the base or peak compiles, that were not recognized by the report generation. Results with unknown flags are marked "invalid" and may not be published.

Likely the flagsurl parameter was not set correctly, or details need to be added to the XML flags file. The "invalid" marking may be removed by reformatting the result using a flags file that describes all of the unknown flags.

Base & Peak Forbidden Flags

This section lists flags, used in the base or peak compiles, that are designated as Forbidden in the XML flags file for the benchmark or the platform. Results with forbidden flags are marked "invalid" and may not be published.

Base & Peak Compiler Invocation This section describes how the compilers are invoked, whether any special paths had to be used or flags were passed, etc.
Base & Peak Portability Flags This section describes the portability settings that are used to build the benchmarks. Optimization settings are not listed here.
C BenchmarksPortability settings specific to benchmarks listed.
C++ BenchmarksPortability settings specific to the benchmarks listed.
Fortran benchmarksPortability settings specific to the benchmarks listed.
Benchmarks using both Fortran and C Portability settings specific to the benchmarks listed.
Base & Peak Optimization Flags This section describes the optimizations settings that are used to build the benchmark binaries for the base and peak runs.
C Benchmarks Optimization settings specific to the C benchmarks.
C++ Benchmarks Optimization settings specific to the C++ benchmarks.
Fortran benchmarks Optimization settings specific to the Fortran benchmarks.
Benchmarks using both Fortran and C Optimization settings specific to the mixed-language benchmarks.
Base & Peak Other Flags

This section describes the other settings that are used to build or run the benchmark binaries for the base and peak runs. These are classified as being neither portability nor optimization settings.

C Benchmarks Settings specific to the C benchmarks.
C++ Benchmarks Settings specific to the C++ benchmarks.
Fortran benchmarks Settings specific to the Fortran benchmarks.
Benchmarks using both Fortran and C Settings specific to the mixed-language benchmarks.

6 Tester-provided notes

Notes/Tuning Information Tester's free-form notes.
Compiler Notes Tester's notes about any compiler-specific information (example: special paths, setup scripts, and so forth.)
Submit Notes Tester's notes about how the config file submit option was used to assign processes to processors.
Portability Notes Tester's notes about portability options and flags used to build the benchmarks.
Base Tuning Notes Tester's notes about base optimization options and flags used to build the benchmarks.
Peak Tuning Notes Tester's notes about peak optimization options and flags used to build the benchmarks.
Operating System Notes Tester's notes about changes to the default operating system state and other OS tuning.
Platform Notes Tester's notes about changes to the default hardware state and other non-OS tuning.
Component Notes Tester's notes about components needed to build a particular system (for User-Built systems).
General Notes Tester's notes about anything not covered in the other notes sections.
Compiler Version Notes This section is automatically generated.
It contains output from CC_VERSION_OPTION (and FC_VERSION_OPTION and CXX_VERSION_OPTION).

7 Errors

This section is automatically inserted by the benchmark tools when there are errors present that prevent the result from being a valid reportable result.


Copyright © 2022 Standard Performance Evaluation Corporation
All Rights Reserved

W3C XHTML 1.0
W3C CSS