CPU2017 Flag Description
Supermicro SuperServer SYS-621H-TN12R (X13DEM , Intel Xeon Gold 5520+)
Copyright © 2016 Intel Corporation. All Rights Reserved.
Base Portability Flags
-
- -DSPEC_LP64
- PORTABILITY
-
This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.
-
- -DSPEC_LP64
- PORTABILITY
-
This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.
-
- -DSPEC_LP64
- PORTABILITY
-
This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.
-
- -DSPEC_LP64
- PORTABILITY
-
This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.
-
-
- -DSPEC_LP64
- PORTABILITY
-
This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.
-
- -DSPEC_LP64
- PORTABILITY
-
This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.
-
- -DSPEC_LP64
- PORTABILITY
-
This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.
-
- -DSPEC_LP64
- PORTABILITY
-
This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.
-
- -DSPEC_LP64
- PORTABILITY
-
This option is used to indicate that the host system's integers are 32-bits
wide, and longs and pointers are 64-bits wide. Not all benchmarks
recognize this macro, but the preferred practice for data model selection
applies the flags to all benchmarks; this flag description is a placeholder
for those benchmarks that do not recognize this macro.
Base Optimization Flags
-
-
-
-
-
- -xCORE-AVX512
- COPTIMIZE
-
Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions.
The resulting code may contain unconditional use of features that are not supported
on other processors. This option also enables new optimizations in addition to
Intel processor-specific optimizations including advanced data layout and code
restructuring optimizations to improve memory accesses for Intel processors.
Do not use this option if you are executing a program on a processor that
is not an Intel processor. If you use this option on a non-compatible processor
to compile the main program (in Fortran) or the function main() in C/C++, the
program will display a fatal run-time error if they are executed on unsupported
processors.
-
- -O3
- COPTIMIZE
-
Enable O2 optimizations plus more aggressive optimizations,
such as prefetching, scalar replacement, and loop and memory
access transformations. Enable optimizations for maximum speed,
such as:
- Loop unrolling, including instruction scheduling
- Code replication to eliminate branches
- Padding the size of certain power-of-two arrays to allow
more efficient cache use.
On IA-32 and Intel EM64T processors, when O3 is used with options
-ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler
performs more aggressive data dependency analysis than for O2, which
may result in longer compilation times.
The O3 optimizations may not cause higher performance unless loop and
memory access transformations take place. The optimizations may slow
down code in some cases compared to O2 optimizations.
The O3 option is recommended for applications that have loops that heavily
use floating-point calculations and process large data sets.
- Includes:
-
-
-
-
-
-
-
-
-
-
-
-
- -xCORE-AVX512
- CXXOPTIMIZE
-
Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions.
The resulting code may contain unconditional use of features that are not supported
on other processors. This option also enables new optimizations in addition to
Intel processor-specific optimizations including advanced data layout and code
restructuring optimizations to improve memory accesses for Intel processors.
Do not use this option if you are executing a program on a processor that
is not an Intel processor. If you use this option on a non-compatible processor
to compile the main program (in Fortran) or the function main() in C/C++, the
program will display a fatal run-time error if they are executed on unsupported
processors.
-
- -O3
- CXXOPTIMIZE
-
Enable O2 optimizations plus more aggressive optimizations,
such as prefetching, scalar replacement, and loop and memory
access transformations. Enable optimizations for maximum speed,
such as:
- Loop unrolling, including instruction scheduling
- Code replication to eliminate branches
- Padding the size of certain power-of-two arrays to allow
more efficient cache use.
On IA-32 and Intel EM64T processors, when O3 is used with options
-ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler
performs more aggressive data dependency analysis than for O2, which
may result in longer compilation times.
The O3 optimizations may not cause higher performance unless loop and
memory access transformations take place. The optimizations may slow
down code in some cases compared to O2 optimizations.
The O3 option is recommended for applications that have loops that heavily
use floating-point calculations and process large data sets.
- Includes:
-
-
-
-
-
-
-
-
-
-
-
- -xCORE-AVX512
- FOPTIMIZE
-
Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions.
The resulting code may contain unconditional use of features that are not supported
on other processors. This option also enables new optimizations in addition to
Intel processor-specific optimizations including advanced data layout and code
restructuring optimizations to improve memory accesses for Intel processors.
Do not use this option if you are executing a program on a processor that
is not an Intel processor. If you use this option on a non-compatible processor
to compile the main program (in Fortran) or the function main() in C/C++, the
program will display a fatal run-time error if they are executed on unsupported
processors.
-
- -O3
- FOPTIMIZE
-
Enable O2 optimizations plus more aggressive optimizations,
such as prefetching, scalar replacement, and loop and memory
access transformations. Enable optimizations for maximum speed,
such as:
- Loop unrolling, including instruction scheduling
- Code replication to eliminate branches
- Padding the size of certain power-of-two arrays to allow
more efficient cache use.
On IA-32 and Intel EM64T processors, when O3 is used with options
-ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler
performs more aggressive data dependency analysis than for O2, which
may result in longer compilation times.
The O3 optimizations may not cause higher performance unless loop and
memory access transformations take place. The optimizations may slow
down code in some cases compared to O2 optimizations.
The O3 option is recommended for applications that have loops that heavily
use floating-point calculations and process large data sets.
- Includes:
-
-
-
-
-
-
- -nostandard-realloc-lhs
- EXTRA_FOPTIMIZE
-
Option standard-realloc-lhs (the default), tells the compiler that when the left-hand side of an assignment is an allocatable object, it should be reallocated to the shape of the
right-hand side of the assignment before the assignment occurs. This is the current Fortran Standard definition. This feature may cause extra overhead at run time. This option has
the same effect as option assume realloc_lhs.
If you specify nostandard-realloc-lhs, the compiler uses the old Fortran 2003 rules when interpreting assignment statements. The left-hand side is assumed to be allocated with the
correct shape to hold the right-hand side. If it is not, incorrect behavior will occur. This option has the same effect as option assume norealloc_lhs.
-
-
-
-
Implicitly Included Flags
This section contains descriptions of flags that were included implicitly
by other flags, but which do not have a permanent home at SPEC.
-
- -O2
-
Enable optimizations for speed. This is the generally recommended
optimization level. This option also enables:
- Inlining of intrinsics
- Intra-file interprocedural optimizations, which include:
- inlining
- constant propagation
- forward substitution
- routine attribute propagation
- variable address-taken analysis
- dead static function elimination
- removal of unreferenced variables
- The following capabilities for performance gain:
- constant propagation
- copy propagation
- dead-code elimination
- global register allocation
- global instruction scheduling and control speculation
- loop unrolling
- optimized code selection
- partial redundancy elimination
- strength reduction/induction variable simplification
- variable renaming
- exception handling optimizations
- tail recursions
- peephole optimizations
- structure assignment lowering and optimizations
- dead store elimination
- Includes:
-
- -O1
-
Enable optimizations for speed and disables some optimizations that increase code size and affect speed.
To limit code size, this option:
- Enable global optimization; this includes data-flow analysis,
code motion, strength reduction and test replacement, split-lifetime
analysis, and instruction scheduling.
- Disables intrinsic recognition and intrinsics inlining.
The O1 option may improve performance for applications with very large
code size, many branches, and execution time not dominated by code within loops.
-O1 sets the following options:
-funroll-loops0, -fno-builtin, -mno-ieee-fp, -fomit-framepointer, -ffunction-sections, -ftz
- Includes:
-
-
-
-
-
-
- submit= MYMASK=`printf '0x%x' $((1<<$SPECCOPYNUM))`; /usr/bin/taskset $MYMASK $command
- When running multiple copies of benchmarks, the SPEC config file feature submit is used to cause individual jobs to be bound to
specific processors. This specific submit command, using taskset, is used for Linux64 systems without numactl.
Here is a brief guide to understanding the specific command which will be found in the config file:
- /usr/bin/taskset [options] [mask] [pid | command [arg] ... ]:
taskset is used to set or retreive the CPU affinity of a running
process given its PID or to launch a new COMMAND with a given CPU
affinity. The CPU affinity is represented as a bitmask, with the
lowest order bit corresponding to the first logical CPU and highest
order bit corresponding to the last logical CPU. When the taskset
returns, it is guaranteed that the given program has been scheduled
to a specific, legal CPU, as defined by the mask setting.
- [mask]: The bitmask (in hexadecimal) corresponding to a specific
SPECCOPYNUM. The specific example above, computes this mask value in the variable $MYMASK.
The value of this mask for the first copy of a
rate run will be 0x00000001, for the second copy of the rate will
be 0x00000002 etc. Thus, the first copy of the rate run will have a
CPU affinity of CPU0, the second copy will have the affinity CPU1
etc.
- $command: Program to be started, in this case, the benchmark instance to be started.
- submit= numactl --localalloc --physcpubind=$SPECCOPYNUM $command
- When running multiple copies of benchmarks, the SPEC config file feature submit is used to cause individual jobs to be bound to
specific processors. This specific submit command is used for Linux64 systems with support for numactl.
Here is a brief guide to understanding the specific command which will be found in the config file:
- syntax: numactl [--interleave=nodes] [--preferred=node] [--physcpubind=cpus] [--cpunodebind=nodes] [--membind=nodes] [--localalloc] command args ...
- numactl runs processes with a specific NUMA scheduling or memory placement policy. The policy is set for a command and inherited by all of its children.
- "--localalloc" instructs numactl to keep a process memory on the local node while "-m" specifies which node(s) to place a process memory.
- "--physcpubind" specifies which core(s) to bind the process. In this case, copy 0 is bound to processor 0 etc.
- For full details on using numactl, please refer to your Linux documentation, 'man numactl'
This result has been formatted using multiple flags files.
The "sw environment" from each of them appears next.
Sw environment from Intel-ic2023p2-official-linux64
SPEC CPU 2017 Flag Description for the Intel(R) C++ and Fortran Compiler
for IA32 and Intel 64 applications
- numactl --interleave=all "runspec command"
- Launching a process with numactl --interleave=all sets the memory interleave policy so that memory will be allocated using round robin on nodes.
When memory cannot be allocated on the current interleave target fall back to other nodes.
- KMP_STACKSIZE
- Specify stack size to be allocated for each thread.
- KMP_AFFINITY
- Syntax: KMP_AFFINITY=[<modifier>,...]<type>[,<permute>][,<offset>]
The value for the environment variable KMP_AFFINITY affects how the threads from an auto-parallelized program are scheduled across processors.
It applies to binaries built with -qopenmp and -parallel (Linux and Mac OS X) or /Qopenmp and /Qparallel (Windows).
modifier:
granularity=fine Causes each OpenMP thread to be bound to a single thread context.
type:
compact Specifying compact assigns the OpenMP thread <n>+1 to a free thread context as close as possible to the thread context where the <n> OpenMP thread was placed.
scatter Specifying scatter distributes the threads as evenly as possible across the entire system.
permute: The permute specifier is an integer value controls which levels are most significant when sorting the machine topology map. A value for permute forces the mappings to make the specified number of most significant levels of the sort the least significant, and it inverts the order of significance.
offset: The offset specifier indicates the starting position for thread assignment.
Please see the Thread Affinity Interface article in the Intel Composer XE Documentation for more details.
- Example: KMP_AFFINITY=granularity=fine,scatter
Specifying granularity=fine selects the finest granularity level and causes each OpenMP or auto-par thread to be bound to a single thread context.
This ensures that there is only one thread per core on cores supporting HyperThreading Technology
Specifying scatter distributes the threads as evenly as possible across the entire system.
Hence a combination of these two options, will spread the threads evenly across sockets, with one thread per physical core.
- Example: KMP_AFFINITY=compact,1,0
Specifying compact will assign the n+1 thread to a free thread context as close as possible to thread n.
A default granularity=core is implied if no granularity is explicitly specified.
Specifying 1,0 sets permute and offset values of the thread assignment.
With a permute value of 1, thread n+1 is assigned to a consecutive core. With an offset of 0, the process's first thread 0 will be assigned to thread 0.
The same behavior is exhibited in a multisocket system.
- OMP_NUM_THREADS
- Sets the maximum number of threads to use for OpenMP* parallel regions if no
other value is specified in the application. This environment variable
applies to both -qopenmp and -parallel (Linux and Mac OS X) or /Qopenmp and /Qparallel (Windows).
Example syntax on a Linux system with 8 cores:
export OMP_NUM_THREADS=8
- OMP_STACKSIZE
- The OMP_STACKSIZE environment variable controls the size of the stack for threads created by the OpenMP implementation
- Set stack size to unlimited
- The command "ulimit -s unlimited" is used to set the stack size limit to unlimited.
- Free the file system page cache
- The command "echo 3> /proc/sys/vm/drop_caches" is used to free up the filesystem page cache as well as reclaimable slab objects like dentries and inodes.
- MALLOC_CONF
- Used for Jemalloc tuning at runtime. MALLOC_CONF=retain:true will retain unused virtual memory for later resue rather than discarding it.
Red Hat Specific features
- Transparent Huge Pages
- On RedHat EL 6 and later, Transparent Hugepages increase the memory page size from 4 kilobytes to 2 megabytes. Transparent Hugepages provide significant performance advantages on systems with highly contended resources and large memory workloads.
If memory utilization is too high or memory is badly fragmented which prevents hugepages being allocated, the kernel will assign smaller 4k pages instead.
Hugepages are used by default unless the /sys/kernel/mm/redhat_transparent_hugepage/enabled field is changed from its RedHat EL6 default of 'always'.
Sw environment from Supermicro-Platform-Settings-V1.2-SPR-revG
SPEC CPU2017 Platform Settings for Supermicro Systems
Operating System Tuning settings
One or more of the following settings may have been applied to the testbed. If so, the "Platform Notes" section of the report will say so; and you can read below to find out more about what these settings mean.
LD_LIBRARY_PATH=<directories> (linker)
LD_LIBRARY_PATH controls the search order for both the compile-time and run-time linkers. Usually, it can be defaulted; but testers may sometimes choose to explicitly set it (as documented in the notes in the submission), in order to ensure that the correct versions of libraries are picked up.
STACKSIZE=<n>
Set the size of the stack (temporary storage area) for each slave thread of a multithreaded program.
ulimit -s <n>
Sets the stack size to n kbytes, or "unlimited" to allow the stack size to grow without limit.
- Transparent Hugepages (THP)
-
THP is an abstraction layer that automates most aspects of creating, managing,
and using huge pages. It is designed to hide much of the complexity in using
huge pages from system administrators and developers. Huge pages
increase the memory page size from 4 kilobytes to 2 megabytes. This provides
significant performance advantages on systems with highly contended resources
and large memory workloads. If memory utilization is too high or memory is badly
fragmented which prevents hugepages being allocated, the kernel will assign
smaller 4k pages instead. Most recent Linux OS releases have THP enabled by default.
THP usage is controlled by the sysfs setting /sys/kernel/mm/transparent_hugepage/enabled.
Possible values:
- never: entirely disable THP usage.
- madvise: enable THP usage only inside regions marked MADV_HUGEPAGE using madvise(3).
- always: enable THP usage system-wide. This is the default.
THP creation is controlled by the sysfs setting /sys/kernel/mm/transparent_hugepage/defrag.
Possible values:
- never: if no THP are available to satisfy a request, do not attempt to make any.
- defer: an allocation requesting THP when none are available get normal pages while requesting THP creation in the background.
- defer+madvise: acts like "always", but only for allocations in regions marked MADV_HUGEPAGE using madvise(3); for all other regions it's like "defer".
- madvise: acts like "always", but only for allocations in regions marked MADV_HUGEPAGE using madvise(3). This is the default.
- always: an allocation requesting THP when none are available will stall until some are made.
An application that "always" requests THP often can benefit from waiting for an allocation until those huge pages can be assembled.
For more information see the Linux transparent hugepage documentation.
- CPUFreq scaling governor:
-
Governors are power schemes for the CPU. It is in-kernel pre-configured power schemes for the CPU and allows you to change the clock speed of the CPUs on the fly. On Linux systems can set the govenor for all CPUs through the cpupower utility with the following command:
- "cpupower -c all frequency-set -g governor"
Below are governors in the Linux kernel:
- performance: Run the CPU at the maximum frequency.
- powersave: Run the CPU at the minimum frequency.
- userspace: Run the CPU at user specified frequencies.
- ondemand: Scales the frequency dynamically according to current load. Jumps to the highest frequency and then possibly back off as the idle time increases.
- conservative: Scales the frequency dynamically according to current load. Scales the frequency more gradually than ondemand.
- schedutil: Scheduler-driven CPU frequency selection.
- tuned-adm:
-
A commandline interface for switching between different tuning profiles available in supported Linux distributions. The distribution provided profiles are located in /usr/lib/tuned and the user defined profiles in /etc/tuned. To set a profile, one can issue the command "tuned-adm profile (profile_name)".
Below are details about some relevant profiles:
- throughput-performance: For typical throughput performance tuning. Disables power saving mechanisms and enables sysctl settings that improve the throughput performance of disk and network I/O. CPU governor is set to performance and CPU energy performance bias is set to performance. Disk readahead values are increased.
- latency-performance: For low latency performance tuning. Disables power saving mechanisms. CPU governor is set to performance and locked to the low C states. CPU energy performance bias to performance.
- balanced: Default profile provides balanced power saving and performance. It enables CPU and disk plugins of tuned and makes the conservative governor is active and also sets the CPU energy performance bias to normal. It also enables power saving on audio and graphics card.
- powersave: Maximal power saving for whole system. It sets the CPU governor to ondemand governor and energy performance bias to powersave. It also enable power saving on USB, SATA, audio and graphics card.
- drop_caches
-
Writing this will cause kernel to drop clean caches, as well as reclaimable slab objects like dentries and inodes. Once dropped, their memory becomes free.
Set through "sysctl -w vm.drop_caches=3" to free slab objects and pagecache.
- Hyper-Threading [ALL]: (Default="Enable")
-
Enabled for Windows and Linux (OS optimized for Hyper-Threading Technology) and Disabled for other OS (OS not optimized for Hyper-Threading Technology). When Disabled only one thread per enabled core is enabled.
- Intel Virtualization Technology: (Default = "Enable")
-
When enabled, a VMM can utilize the additional hardware capabilities provided by Vanderpool Technology.
- LLC Prefetch: (Default = "Disable")
-
The LLC prefetcher is an additional prefetch mechanism on top of the existing prefetchers that prefetch data into the core Data Cache Unit (DCU) and Mid-Level Cache (MLC). Enabling LLC prefetch gives the core prefetcher the ability to prefetch data directly into the LLC without necessarily filling into the MLC.
- DCU IP Prefetcher: (Default = "Enable")
-
This L1-cache prefetcher looks for sequential load history (based on the Instruction Pointer of previous loads) and attempts on this basis to determine the next data to be expected and, if necessary, to prefetch this data from the L2 cache or the main memory into the L1 cache.
- DCU Streamer Prefetcher: (Default = "Enable")
-
This prefetcher is a L1 data cache prefetcher, which detects multiple loads from the same cache line done within a time limit, in order to then prefetch the next line from the L2 cache or the main memory into the L1 cache based on the assumption that the next cache line will also be needed.
- Power Technology: (Default = "Custom")
-
The options are Disable, Energy Efficient, and Custom.
Switch processor power management features. If value "Custom" is set, Customer can define the values of all power management setup items.
Select Energy Efficient to support power-saving mode. Select Custom to customize system power settings. Select Disabled to disable power-saving settings.
- Power Performance Tuning: (Default = "OS Controls EPB")
-
Allows the OS or BIOS to control the Energy Performance Bias.
Available options are:
- OS Controls EPB: The Energy Performance Bias setting controls by OS.
- BIOS Controls EPB: The Energy Performance Bias setting controls by ENERGY_PERF_BIAS_CFG mode item in BIOS.
- ENERGY_PERF_BIAS_CFG mode (Energy Performance Bias Setting): (Default = "Balanced Performance")
-
This BIOS option allows for processor performance and power optmization.
Available options are:
- Extreme Performance: This mode will raise system performance to its highest potential. With Extreme Performance enabled, power consumption will increase as the processor frequency is maximized. In other words, system performance is gained at the cost of system power efficiency, depending on the workload.
- Maximum Performance: Get more performance with more power consumption than performance mode.
- Performance: High performance with less need for power saving.
- Balanced Performance (Default Setting): Provides optimal performance efficiency.
- Balanced Power: Provides optimal power efficiency.
- Power: High power saving with less need for performance.
- CPU C6 Report: (Default = "Auto")
-
Controls the BIOS to report the CPU C6 State (ACPI C3) to the operating system. During the CPU C6 State, the power to all cache is turned off.
Available options are:
- Enable: Enable BIOS to report the CPU C6 State (ACPI C3) to the operating system.
- Disable: Disable BIOS to report the CPU C6 State (ACPI C3) to the operating system.
- Auto: BIOS automatically decides to report the CPU C6 State (ACPI C3) to the operating system or not depends on Power Technology setting.
- Enhanced Halt State (C1E): (Default = "Enable")
-
Power saving feature where, when enabled, idle processor cores will halt.
- Hardware P-states: (Default = "Disable")
-
The Hardware P-State setting allows the user to select between OS and hardware-controlled P-states. Selecting Native Mode allows the OS to choose a P-state. Selecting Out of Band Mode allows the hardware to autonomously choose a P-state without OS guidance. Selecting Native Mode with No Legacy Support functions as Native Mode with no support for older hardware.
- SNC: (Default = "Auto")
-
Sub-NUMA Clusters (SNC) is a feature that provides similar localization benefits as Cluster-On-Die (COD), without some of COD's downsides. SNC breaks up the LLC into disjoint clusters based on address range, with each cluster bound to a subset of the memory controllers in the system. SNC improves average latency to the LLC.
- Auto: Auto decides based on silicon prefer value.
- Disable: Disable SNC feature and there is one NUMA domain per socket.
- Enable SNC2 (2-clusters): SNC2 divides CPU into two clusters, there are two NUMA domains per socket.
- Enable SNC4 (4-clusters): SNC4 divides CPU into four clusters, there are four NUMA domains per socket.
- KTI Prefetch: (Default = "Auto")
-
When KTI Prefetch is set to Enable, the Ultra-Path Interconnect (UPI) Prefetcher will allow the memory read from a remote socket to start earlier, in an effort to reduce latency. Available options are "Auto", "Disable" and "Enable".
- Stale AtoS: (Default = "Auto")
-
The in-memory directory has three states: I, A, and S. I (invalid) state means the data is clean and does not exist in any other socket's cache. The A (snoopAll) state means the data may exist in another socket in exclusive or modified state. S (Shared) state means the data is clean and may be shared across one or more socket's caches.
When doing a read to memory, if the directory line is in the A state we must snoop all the other sockets because another socket may have the line in modified state. If this is the case, the snoop will return the modified data. However, it may be the case that a line is read in A state and all the snoops come back a miss. This can happen if another socket read the line earlier and then silently dropped it from its cache without modifying it.
Available options are:
- Enable: In the situation where a line in A state returns only snoop misses, the line will transition to S state. That way, subsequent reads to the line will encounter it in S state and not have to snoop, saving latency and snoop bandwidth. Stale AtoS may be beneficial in a workload where there are many cross-socket reads.
- Disable: Disabling this option allows the feature to process memory directories as described above.
- Auto: This will enable Stale AtoS when AEP DIMM installed on system and disable Stale AtoS if no AEP DIMM installed.
- LLC Dead Line Alloc: (Default = "Enable")
-
In the Skylake-SP non-inclusive cache scheme, MLC evictions are filled into the LLC. When lines are evicted from the MLC, the core can flag them as "dead" (i.e., not likely to be read again). The LLC has the option to drop dead lines and not fill them in the LLC. If the LLC Dead Line Alloc feature is disabled, dead lines will always be dropped and will never fill into the LLC. This can help save space in the LLC and prevent the LLC from evicting useful data. However, if the LLC Dead Line Alloc feature is enabled, the LLC can opportunistically fill dead lines into the LLC if there is free space available. Available options are "Auto", "Enable" and "Disable".
- Enforce DDR Memory Frequency POR: (Default = "POR")
-
Set to POR enforce Plan Of Record restrictions for DDR5 frequency and voltage programming. Memory speeds will be capped at Intel guidelines. Disabling allows user selection of additional supported memory speeds. Available options are "POR" and "Disable".
- Memory Frequency: (Default = "Auto")
-
Set the maximum memory frequency for onboard memory modules. Available options are "Auto", "3200", "3600", "4000", "4400", "4800", "5200", "5600".
- ADDDC Sparing: (Default = "Enabled")
-
Adaptive Double Device Data Correction (ADDDC) Sparing detects the predetermined threshold for correctable errors, copying the contents of the failing DIMM to spare memory. The failing DIMM or memory rank will then be disabled.
Available options are:
- Enabled: Enable the ADDDC Sparing feature.
- Disabled: Disable the ADDDC Sparing feature.
- Patrol Scrub: (Default = "Enable at End of POST")
-
Enable or disable the ability to proactively search the system memory, repairing correctable errors.
- Turbo Mode: (Default = "Enable")
-
Select Enable to allow the CPU to operate at the manufacturer-defined turbo speed by increasing CPU clock frequency. This feature is available when it is supported by the processors used in the system. The options are Disable and Enable.
- NUMA: (Default = "Enabled")
-
Use this feature to enable Non-Uniform Memory Access (NUMA) to enhance system performance. The options are Disabled and Enabled.
- UMA-Based Clustering: (Default = "Quadrant (4-clusters")
-
When this feature is set to Hemisphere, Uniform Memory Access (UMA)-based clustering will support 2-cluster configuration for system performance enhancement. The options are Disabled (All2All), Hemisphere (2-clusters), and Quadrant (4-clusters).
- Enable Smart Power: (Default = "ON")
-
"Enable Smart Power" is a BMC setting for power capping. When it is set "OFF", there will be no power cap and the system's power consumption remains high. When it is set "ON", users can set power cap, and node power consumption does not exceed power capping limit.
For questions about the meanings of these flags, please contact the tester.
For other inquiries, please contact info@spec.org
Copyright 2017-2024 Standard Performance Evaluation Corporation
Tested with SPEC CPU2017 v1.1.9.
Report generated on 2024-03-14 10:56:48 by SPEC CPU2017 flags formatter v5178.