CPU2017 Flag Description
Lenovo Global Technology ThinkSystem SR635 2.90 GHz, AMD EPYC 7272

Compilers: AMD Optimizing C/C++ Compiler Suite


Base Compiler Invocation

C benchmarks

C++ benchmarks

Fortran benchmarks

Benchmarks using both Fortran and C

Benchmarks using both C and C++

Benchmarks using Fortran, C, and C++


Peak Compiler Invocation

C benchmarks

C++ benchmarks

Fortran benchmarks

Benchmarks using both Fortran and C

Benchmarks using both C and C++

Benchmarks using Fortran, C, and C++


Base Portability Flags

503.bwaves_r

507.cactuBSSN_r

508.namd_r

510.parest_r

511.povray_r

519.lbm_r

521.wrf_r

526.blender_r

527.cam4_r

538.imagick_r

544.nab_r

549.fotonik3d_r

554.roms_r


Peak Portability Flags

503.bwaves_r

507.cactuBSSN_r

508.namd_r

510.parest_r

511.povray_r

519.lbm_r

521.wrf_r

526.blender_r

527.cam4_r

538.imagick_r

544.nab_r

549.fotonik3d_r

554.roms_r


Base Optimization Flags

C benchmarks

C++ benchmarks

Fortran benchmarks

Benchmarks using both Fortran and C

Benchmarks using both C and C++

Benchmarks using Fortran, C, and C++


Peak Optimization Flags

C benchmarks

519.lbm_r

538.imagick_r

544.nab_r

C++ benchmarks

508.namd_r

510.parest_r

Fortran benchmarks

503.bwaves_r

549.fotonik3d_r

554.roms_r

Benchmarks using both Fortran and C

Benchmarks using both C and C++

511.povray_r

526.blender_r

Benchmarks using Fortran, C, and C++

507.cactuBSSN_r


Implicitly Included Flags

This section contains descriptions of flags that were included implicitly by other flags, but which do not have a permanent home at SPEC.


Commands and Options Used to Submit Benchmark Runs

Using numactl to bind processes and memory to cores

For multi-copy runs or single copy runs on systems with multiple sockets, it is advantageous to bind a process to a particular core. Otherwise, the OS may arbitrarily move your process from one core to another. This can affect performance. To help, SPEC allows the use of a "submit" command where users can specify a utility to use to bind processes. We have found the utility 'numactl' to be the best choice.

numactl runs processes with a specific NUMA scheduling or memory placement policy. The policy is set for a command and inherited by all of its children. The numactl flag "--physcpubind" specifies which core(s) to bind the process. "-l" instructs numactl to keep a process's memory on the local node while "-m" specifies which node(s) to place a process's memory. For full details on using numactl, please refer to your Linux documentation, 'man numactl'

Note that some older versions of numactl incorrectly interpret application arguments as its own. For example, with the command "numactl --physcpubind=0 -l a.out -m a", numactl will interpret a.out's "-m" option as its own "-m" option. To work around this problem, we put the command to be run in a shell script and then run the shell script using numactl. For example: "echo 'a.out -m a' > run.sh ; numactl --physcpubind=0 bash run.sh"


Shell, Environment, and Other Software Settings

Transparent Huge Pages (THP)

THP is an abstraction layer that automates most aspects of creating, managing, and using huge pages. THP is designed to hide much of the complexity in using huge pages from system administrators and developers, as normal huge pages must be assigned at boot time, can be difficult to manage manually, and often require significant changes to code in order to be used effectively. Most recent Linux OS releases have THP enabled by default.

Linux Huge Page settings

If you need finer control you can manually set huge pages using the following steps:

Note that further information about huge pages may be found in the Linux kernel documentation file hugetlbpage.txt.

ulimit -s <n>

Sets the stack size to n kbytes, or unlimited to allow the stack size to grow without limit.

ulimit -l <n>

Sets the maximum size of memory that may be locked into physical memory.

powersave -f (on SuSE)

Makes the powersave daemon set the CPUs to the highest supported frequency.

/etc/init.d/cpuspeed stop (on Red Hat)

Disables the cpu frequency scaling program in order to set the CPUs to the highest supported frequency.

LD_LIBRARY_PATH

An environment variable that indicates the location in the filesystem of bundled libraries to use when running the benchmark binaries.

kernel/randomize_va_space

This option can be used to select the type of process address space randomization that is used in the system, for architectures that support this feature.

MALLOC_CONF

An environment variable set to tune the jemalloc allocation strategy during the execution of the binaries. This environment variable setting is not needed when building the binaries on the system under test.


Operating System Tuning Parameters

sched_cfs_bandwidth_slice_us
This OS setting controls the amount of run-time(bandwidth) transferred to a run queue from the task's control group bandwidth pool. Small values allow the global bandwidth to be shared in a fine-grained manner among tasks, larger values reduce transfer overhead. The default value is 5000 (ns).
sched_latency_ns
This OS setting configures targeted preemption latency for CPU bound tasks. The default value is 24000000 (ns).
sched_migration_cost_ns
Amount of time after the last execution that a task is considered to be "cache hot" in migration decisions. A "hot" task is less likely to be migrated to another CPU, so increasing this variable reduces task migrations. The default value is 500000 (ns).
sched_min_granularity_ns
This OS setting controls the minimal preemption granularity for CPU bound tasks. As the number of runnable tasks increases, CFS(Complete Fair Scheduler), the scheduler of the Linux kernel, decreases the timeslices of tasks. If the number of runnable tasks exceeds sched_latency_ns/sched_min_granularity_ns, the timeslice becomes number_of_running_tasks * sched_min_granularity_ns. The default value is 8000000 (ns).
sched_wakeup_granularity_ns
This OS setting controls the wake-up preemption granularity. Increasing this variable reduces wake-up preemption, reducing disturbance of compute bound tasks. Lowering it improves wake-up latency and throughput for latency critical tasks, particularly when a short duty cycle load component must compete with CPU bound components. The default value is 10000000 (ns).
numa_balancing
This OS setting controls automatic NUMA balancing on memory mapping and process placement. Setting 0 disables this feature. It is enabled by default (1).

Firmware / BIOS / Microcode Settings

Set Operating Mode: (Default="Maximum Efficiency")
Select the operating mode based on your preference. Note, power savings and performance are also highly dependent on hardware and software running on system.
Determinism Slider:
Auto = Use default performance determinism settings Power Performance.
Global C-state Control:
Controls IO based C-state generation and DF C-states.
cTDP Control:
Auto = Use the fused cTDP Manual = User can set customized cTDP.
cTDP:
cTDP is the acronym for Configurable TDP. Some Rome CPU skus support a default TDP and a higher cTDP expressed in Watts. Model Normal TDP Minimum cTDP Maximum cTDP EPYC 7H12 280 225 280 EPYC 7742 225 225 240 EPYC 7702 200 165 200 EPYC 7702P 200 165 200 EPYC 7662 225 225 240 EPYC 7642 225 225 240 EPYC 7502 180 165 200 EPYC 7502P 180 165 200 EPYC 7542 225 225 240 EPYC 7402 180 165 200 EPYC 7402P 180 165 200 EPYC 7302 155 155 180 EPYC 7302P 155 155 180 EPYC 7252 120 120 150
Memory Speed:
Select the desired memory speed. Faster speeds offer better performance but consume more power.
NUMA nodes per socket:
Specifies the number of desired NUMA nodes per socket. Zero will attempt to interleave the two sockets together.
Package Power Limit Control:
Auto = Use the fused PPT\nManual = User can set customized PPT\n***PPT will be used as the ASIC power limit***
SMT Mode:
Can be used to disable symmetric multithreading. To re-enable SMT, a POWER CYCLE is needed after selecting the 'Auto' option. WARNING - S3 is NOT SUPPORTED on systems where SMT is disabled.
CCD Control:
Sets the number of CCDs to be used. Once this option has been used to remove any CCDs, a POWER CYCLE is required in order for future selections to take effect.
EfficiencyModeEn:
0 = use performance optimized CCLK DPM settings\n1 = use power efficiency optimized CCLK DPM settings
LCC as NUMA Node:
Exposes the processor's last level caches as NUMA nodes. When enabled, can improve performance for highly NUMA optimized workloads if workloads or components of workloads can be pinned into the caches.
Zero Output
When zero output is set to 'Advanced mode' and multiple power supplies are installed in the server, some of the PSUs will be automatically placed into a low power state under light load conditions. This helps to save power

Flag description origin markings:

[user] Indicates that the flag description came from the user flags file.
[suite] Indicates that the flag description came from the suite-wide flags file.
[benchmark] Indicates that the flag description came from a per-benchmark flags file.

The flags files that were used to format this result can be browsed at
http://www.spec.org/cpu2017/flags/aocc200-flags-B1-1.html,
http://www.spec.org/cpu2017/flags/Lenovo-Platform-SPECcpu2017-Flags-V1.2-Rome-E.html.

You can also download the XML flags sources by saving the following links:
http://www.spec.org/cpu2017/flags/aocc200-flags-B1-1.xml,
http://www.spec.org/cpu2017/flags/Lenovo-Platform-SPECcpu2017-Flags-V1.2-Rome-E.xml.


For questions about the meanings of these flags, please contact the tester.
For other inquiries, please contact info@spec.org
Copyright 2017-2020 Standard Performance Evaluation Corporation
Tested with SPEC CPU2017 v1.1.0.
Report generated on 2020-03-31 14:58:04 by SPEC CPU2017 flags formatter v5178.