CPU2017 Flag Description
Nettrix R620 G40 (Intel Xeon Gold 6348, 2.60 GHz)

Copyright © 2016 Intel Corporation. All Rights Reserved.


Base Compiler Invocation

C benchmarks

C++ benchmarks

Fortran benchmarks


Peak Compiler Invocation

C benchmarks (except as noted below)

600.perlbench_s

C++ benchmarks

Fortran benchmarks


Base Portability Flags

600.perlbench_s

602.gcc_s

605.mcf_s

620.omnetpp_s

623.xalancbmk_s

625.x264_s

631.deepsjeng_s

641.leela_s

648.exchange2_s

657.xz_s


Peak Portability Flags

600.perlbench_s

602.gcc_s

605.mcf_s

620.omnetpp_s

623.xalancbmk_s

625.x264_s

631.deepsjeng_s

641.leela_s

648.exchange2_s

657.xz_s


Base Optimization Flags

C benchmarks

C++ benchmarks

Fortran benchmarks


Peak Optimization Flags

C benchmarks

600.perlbench_s

602.gcc_s

605.mcf_s

625.x264_s

657.xz_s

C++ benchmarks

620.omnetpp_s

623.xalancbmk_s

631.deepsjeng_s

641.leela_s

Fortran benchmarks

648.exchange2_s


Implicitly Included Flags

This section contains descriptions of flags that were included implicitly by other flags, but which do not have a permanent home at SPEC.


Commands and Options Used to Submit Benchmark Runs

submit= MYMASK=`printf '0x%x' $((1<<$SPECCOPYNUM))`; /usr/bin/taskset $MYMASK $command
When running multiple copies of benchmarks, the SPEC config file feature submit is used to cause individual jobs to be bound to specific processors. This specific submit command, using taskset, is used for Linux64 systems without numactl.
Here is a brief guide to understanding the specific command which will be found in the config file:
submit= numactl --localalloc --physcpubind=$SPECCOPYNUM $command
When running multiple copies of benchmarks, the SPEC config file feature submit is used to cause individual jobs to be bound to specific processors. This specific submit command is used for Linux64 systems with support for numactl.
Here is a brief guide to understanding the specific command which will be found in the config file:

Shell, Environment, and Other Software Settings

numactl --interleave=all "runspec command"
Launching a process with numactl --interleave=all sets the memory interleave policy so that memory will be allocated using round robin on nodes. When memory cannot be allocated on the current interleave target fall back to other nodes.
KMP_STACKSIZE
Specify stack size to be allocated for each thread.
KMP_AFFINITY
Syntax: KMP_AFFINITY=[<modifier>,...]<type>[,<permute>][,<offset>]
The value for the environment variable KMP_AFFINITY affects how the threads from an auto-parallelized program are scheduled across processors.
It applies to binaries built with -qopenmp and -parallel (Linux and Mac OS X) or /Qopenmp and /Qparallel (Windows).
modifier:
    granularity=fine Causes each OpenMP thread to be bound to a single thread context.
type:
    compact Specifying compact assigns the OpenMP thread <n>+1 to a free thread context as close as possible to the thread context where the <n> OpenMP thread was placed.
    scatter Specifying scatter distributes the threads as evenly as possible across the entire system.
permute: The permute specifier is an integer value controls which levels are most significant when sorting the machine topology map. A value for permute forces the mappings to make the specified number of most significant levels of the sort the least significant, and it inverts the order of significance.
offset: The offset specifier indicates the starting position for thread assignment.

Please see the Thread Affinity Interface article in the Intel Composer XE Documentation for more details.

Example: KMP_AFFINITY=granularity=fine,scatter
Specifying granularity=fine selects the finest granularity level and causes each OpenMP or auto-par thread to be bound to a single thread context.
This ensures that there is only one thread per core on cores supporting HyperThreading Technology
Specifying scatter distributes the threads as evenly as possible across the entire system.
Hence a combination of these two options, will spread the threads evenly across sockets, with one thread per physical core.

Example: KMP_AFFINITY=compact,1,0
Specifying compact will assign the n+1 thread to a free thread context as close as possible to thread n.
A default granularity=core is implied if no granularity is explicitly specified.
Specifying 1,0 sets permute and offset values of the thread assignment.
With a permute value of 1, thread n+1 is assigned to a consecutive core. With an offset of 0, the process's first thread 0 will be assigned to thread 0.
The same behavior is exhibited in a multisocket system.
OMP_NUM_THREADS
Sets the maximum number of threads to use for OpenMP* parallel regions if no other value is specified in the application. This environment variable applies to both -qopenmp and -parallel (Linux and Mac OS X) or /Qopenmp and /Qparallel (Windows). Example syntax on a Linux system with 8 cores: export OMP_NUM_THREADS=8
Set stack size to unlimited
The command "ulimit -s unlimited" is used to set the stack size limit to unlimited.
Free the file system page cache
The command "echo 1> /proc/sys/vm/drop_caches" is used to free up the filesystem page cache.

Red Hat Specific features

Transparent Huge Pages
On RedHat EL 6 and later, Transparent Hugepages increase the memory page size from 4 kilobytes to 2 megabytes. Transparent Hugepages provide significant performance advantages on systems with highly contended resources and large memory workloads. If memory utilization is too high or memory is badly fragmented which prevents hugepages being allocated, the kernel will assign smaller 4k pages instead.
Hugepages are used by default unless the /sys/kernel/mm/redhat_transparent_hugepage/enabled field is changed from its RedHat EL6 default of 'always'.

Operating System Tuning Parameters

ulimit
Used to set user limits of system-wide resources. Provides control over resources available to the shell and processes started by it. Some common ulimit commands may include:
Disabling Linux services
Certain Linux services may be disabled to minimize tasks that may consume CPU cycles.
irqbalance
Disabled through "service irqbalance stop". Depending on the workload involved, the irqbalance service reassigns various IRQ's to system CPUs. Though this service might help in some situations, disabling it can also help environments which need to minimize or eliminate latency to more quickly respond to events.
cpupower tool
Use the cpupower tool to read your supported CPU frequencies, and to set them. For rhel linux, to install the tool run the command "yum install cpupowerutils". You can set the cpu frequency in three modes with the command "cpupower frequency-set -g [options]", where "[options]" could be: The default governor mode is "performance".
Tuning Kernel parameters
The following Linux Kernel parameters were tuned to better optimize performance of some areas of the system:
Transparent Huge Pages (THP)
THP is an abstraction layer that automates most aspects of creating, managing, and using huge pages. THP is designed to hide much of the complexity in using huge pages from system administrators and developers, as normal huge pages must be assigned at boot time, can be difficult to manage manually, and often require significant changes to code in order to be used effectively. Transparent Hugepages increase the memory page size from 4 kilobytes to 2 megabytes. Transparent Hugepages provide significant performance advantages on systems with highly contended resources and large memory workloads. If memory utilization is too high or memory is badly fragmented which prevents hugepages being allocated, the kernel will assign smaller 4k pages instead. Most recent Linux OS releases have THP enabled by default.
Linux Huge Page settings
If you need finer control and manually set the Huge Pages you can follow the below steps: For further information about huge pages may be found in your Linux documentation file: /usr/src/linux/Documentation/vm/hugetlbpage.txt

Firmware / BIOS / Microcode Settings

Application Performance Profile:
Application Performance Profile is designed for customers who need an easy way to optimize BIOS settings according to their application scenarios. The option could be set as "Disabled", "Computing Throughput Mode", "Computing Latency Mode", "Memory Bandwidth Mode", "Energy Efficient Mode", "Java Application Mode", and "High Reliability Mode". Default = "High Reliability Mode".
C-States:
C-states reduce CPU idle power. There are three options in this mode: "Legacy","Autonomous", and "Disable". Default is "Disable".
C1 Enhanced Mode:
Enabling C1E (C1 enhanced) state can save power by halting CPU cores that are idle. Default is "Enable".
Turbo Mode:
Enabling turbo mode can boost the overall CPU performance when all CPU cores are not being fully utilized. A CPU core can run above its rated frequency for a short perios of time when it is in turbo mode. Default is "Enable".
Hyper-Threading:
Enabling Hyper-Threading let operating system addresses two virtual or logical cores for a physical presented core. Workloads can be shared between virtual or logical cores when possible. The main function of hyper-threading is to increase the number of independent instructions in the pipeline for using the processor resources more efficiently. Default is "Enabled".
Hardware P-states:
This setting allows the user to select between OS and hardware-controlled P-states. Selecting Native Mode allows the OS to choose a P-state. Selecting Out of Band Mode allows the hardware to autonomously choose a P-state without OS guidance. Selecting Native Mode with No Legacy Support functions as Native Mode with no support for older hardware. Default is "Disable".
Per Core P-state:
When per-core P-states are enabled, each physical CPU core can operate at separate frequencies. If disabled, all cores in a package will operate at the highest resolved frequency of all active threads. Default is Enable.
Sub-NUMA Cluster (SNC):
SNC breaks up the last level cache (LLC) into disjoint clusters based on address range, with each cluster bound to a subset of the memory controllers in the system. SNC improves average latency to the LLC and memory. SNC is a replacement for the cluster on die (COD) feature found in previous processor families. For a multi-socketed system, all SNC clusters are mapped to unique NUMA domains. (See also IMC interleaving.) Values for this BIOS option can be: Default is "Disabled".
XPT Prefetch
XPT prefetch is a mechanism that enables a read request that is being sent to the last level cache to speculatively issue a copy of that read to the memory controller prefetching. This can be one of the following:
KTI Prefetch
KTI prefetch is a mechanism to get the memory read started early on a DDR bus. This can be one of the following: The default setting is "Disabled".
UPI Prefetcher
UPI prefetch is a mechanism to get the memroy read started early on DDR bus. The UPI receive path will spawn a memory read to the memory controller prefetcher. Default is Enabled.
Patrol Scrub:
Patrol Scrub is a memory RAS feature which runs a background memory scrub against all DIMMs. Can negatively impact performance. This option allows for correction of soft memory errors. Over the length of system runtime, the risk of producing multi-bit and uncorrected errors is reduced with this option. Values for this BIOS setting can be: Default is Enabled.
DCU Streamer Prefetcher:
DCU (Level 1 Data Cache) streamer prefetcher is an L1 data cache prefetcher. Lightly threaded applications and some benchmarks can benefit from having the DCU streamer prefetcher enabled. Default setting is Enable.
Hardware Prefethcer:
When this option is enable, a dedicated hardware mechanism in the processor is supported to watch the stream of instructions or data being requested by the executing program, recognize the next few elements that the program might need based on this stream and prefetch into the processor's cache. The program with good instruction and data locality will benefit from this feature when this option is enable. Default is enable.
Trusted Execution Technology:
Enable Intel Trusted Execution Technology (Intel TXT). Default is disable.
Page Policy:
Adaptive Open Page Policy can improve performance for applications with a highly localized memory access pattern; Closed Page Policy can benifit applications that access memory more randomly. The default is "Auto".
LLC Dead Line Allocation:
If the Dead Line LLC Alloc feature is disabled, dead lines will always be dropped and will never fill into the LLC. This can help save space in the LLC and prevent the LLC from evicting useful data. However, if the Dead Line LLC Alloc feature is enabled, the LLC can opportunistically fill dead lines into the LLC if there is free space available. The default is enable.
Stale AtoS
Stale AtoS is the transition of a directory line state. The inmemory directory has three states: I, A, and S. I (invalid) state means the data is clean and does not exist in any other socket's cache. The A (snoopAll) state means the data may exist in another socket in exclusive or modified state. S (Shared) state means the data is clean and may be shared across one or more socket's caches. When doing a read to memory, if the directory line is in the A state, we must snoop all the other sockets because another socket may have the line in modified state. If this is the case,the snoop will return the modified data. However, it may be the case that a line is read in A state and all the snoops come back a miss. This can happen if another socket read the line earlier and then silently dropped it from its cache without modifying it. Values for this BIOS option can be: Default is Auto.
Cooling Policy
The feature to configure "Cooling Policy" option is provided on BMC webpage. This option provides 4 choices: "Balance Mode", "Performance Mode", "Silent Mode" and "Manual Mode" and default is "Balance Mode".

Flag description origin markings:

[user] Indicates that the flag description came from the user flags file.
[suite] Indicates that the flag description came from the suite-wide flags file.
[benchmark] Indicates that the flag description came from a per-benchmark flags file.

The flags files that were used to format this result can be browsed at
http://www.spec.org/cpu2017/flags/Intel-ic2021-official-linux64_revA.html,
http://www.spec.org/cpu2017/flags/Nettrix-Platform-Settings-V4.0-ICX-revC.2022-07-19.html.

You can also download the XML flags sources by saving the following links:
http://www.spec.org/cpu2017/flags/Intel-ic2021-official-linux64_revA.xml,
http://www.spec.org/cpu2017/flags/Nettrix-Platform-Settings-V4.0-ICX-revC.2022-07-19.xml.


For questions about the meanings of these flags, please contact the tester.
For other inquiries, please contact info@spec.org
Copyright 2017-2022 Standard Performance Evaluation Corporation
Tested with SPEC CPU2017 v1.1.8.
Report generated on 2022-07-19 15:24:51 by SPEC CPU2017 flags formatter v5178.