CPU2017 Result Flag Description

Base Portability Flags

600.perlbench_s

- -DSPEC_LP64
- PORTABILITY
- This macro specifies that the target system uses the LP64 data model; specifically, that integers are 32 bits, while longs and pointers are 64 bits.
- Includes:
- -DSPEC_LINUX_X64
- CPORTABILITY
- This macro indicates that the benchmark is being compiled on an AMD64-compatible system running the Linux operating system.
- Includes:

602.gcc_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

605.mcf_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

620.omnetpp_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

623.xalancbmk_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
- -DSPEC_LINUX
- CXXPORTABILITY
- This flag can be set for SPEC compilation for LINUX using default compiler.

625.x264_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

631.deepsjeng_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

641.leela_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

648.exchange2_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

657.xz_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

Peak Portability Flags

600.perlbench_s

- -DSPEC_LP64
- PORTABILITY
- This macro specifies that the target system uses the LP64 data model; specifically, that integers are 32 bits, while longs and pointers are 64 bits.
- Includes:
- -DSPEC_LINUX_X64
- CPORTABILITY
- This macro indicates that the benchmark is being compiled on an AMD64-compatible system running the Linux operating system.
- Includes:

602.gcc_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

605.mcf_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

620.omnetpp_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

623.xalancbmk_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
- -DSPEC_LINUX
- CXXPORTABILITY
- This flag can be set for SPEC compilation for LINUX using default compiler.

625.x264_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

631.deepsjeng_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

641.leela_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

648.exchange2_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

657.xz_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

Base Optimization Flags

C benchmarks

- -DSPEC_OPENMP
- CC, COPTIMIZE, LD
- Definition of this macro indicates that compilation for parallel operation is enabled, and that any OpenMP directives or pragmas will be visible to the compiler. The behavior of this macro is overridden if -DSPEC_SUPPRESS_OPENMP also appears in the list of compilation flags.
- -std=c11
- intel_icc,intel_icx,intel_icpx
- CC, LD
- Sets the language dialect to conform to the indicated C standard.
- -m64
- intel_icc,intel_icpc,intel_ifort,intel_icx,intel_icpx
- CC, LD
- Compiles for a 64-bit (LP64) data model.
- -fiopenmp
- Yes
- CC, COPTIMIZE, LD
- -Wl,-z,muldefs
- EXTRA_LDFLAGS
- Enable SmartHeap and/or other library usage by forcing the linker to ignore multiple definitions if present
- -xCORE-AVX512
- COPTIMIZE
- Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
  
  Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
- -O3
- COPTIMIZE
- Enable O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enable optimizations for maximum speed, such as:
  - Loop unrolling, including instruction scheduling
  - Code replication to eliminate branches
  - Padding the size of certain power-of-two arrays to allow more efficient cache use.
  On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
- Includes:
  - -O2
    - -O1
      
      -funroll-loops
      
      -fno-builtin
      
      -mno-ieee-fp
      
      -fomit-framepointer
      
      -ffunction-sections
      
      -ftz
- -ffast-math
- COPTIMIZE
- Enable fast math mode. This option may yield faster code for programs that do not require the guarantees of exact implementation of IEEE or ISO rules/specifications for math functions.
- -flto
- COPTIMIZE
- Performs link time optimizations, which is also known as Interprocedural Optimizations.
- -mfpmath=sse
- COPTIMIZE
- Generate floating-point arithmetic for selected unit unit. Here use scalar floating-point instructions present in the SSE instruction set
- -funroll-loops
- COPTIMIZE
- Tells the compiler the maximum number of times to unroll loops. For example -funroll-loops0 would disable unrolling of loops.
- -qopt-mem-layout-trans=4
- COPTIMIZE
- Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
  - 0: Disables memory layout transformations. This is the same as specifying -qno-opt-mem-layout-trans
  - 1: Enable basic memory layout transformations like structure splitting, structure peeling, field inlining, field reordering, array field transpose, increase field alignment etc.
  - 2: Enable more memory layout transformations like advanced structure splitting. This is the same as specifying -qopt-mem-layout-trans
  - 3: Enable more memory layout transformations like copy-in/copy-out of structures for a region of code. You should only use this setting if your system has more than 4GB of physical memory per core.
  - 4: Compiler is more aggressive in using memory layout transformations. You should only use this setting if your system has more than 4GB of physical memory per core.
- -mbranches-within-32B-boundaries
- EXTRA_COPTIMIZE
- This option instructs compiler to align branches and fused branches on 32 byte boundaries
- -L/usr/local/jemalloc64-5.0.1/lib
- EXTRA_LIBS
- Specify build time link path for jemalloc 64bit built to support the CPU 2017 build. See jemalloc.net for more information.
- -ljemalloc
- EXTRA_LIBS
- Linker toggle to specify jemalloc linker library. See jemalloc.net for more information.

C++ benchmarks

- -DSPEC_OPENMP
- CXX, LD
- Definition of this macro indicates that compilation for parallel operation is enabled, and that any OpenMP directives or pragmas will be visible to the compiler. The behavior of this macro is overridden if -DSPEC_SUPPRESS_OPENMP also appears in the list of compilation flags.
- -m64
- intel_icc,intel_icpc,intel_ifort,intel_icx,intel_icpx
- CXX, LD
- Compiles for a 64-bit (LP64) data model.
- -Wl,-z,muldefs
- EXTRA_LDFLAGS
- Enable SmartHeap and/or other library usage by forcing the linker to ignore multiple definitions if present
- -xCORE-AVX512
- CXXOPTIMIZE
- Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
  
  Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
- -O3
- CXXOPTIMIZE
- Enable O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enable optimizations for maximum speed, such as:
  - Loop unrolling, including instruction scheduling
  - Code replication to eliminate branches
  - Padding the size of certain power-of-two arrays to allow more efficient cache use.
  On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
- Includes:
  - -O2
    - -O1
      
      -funroll-loops
      
      -fno-builtin
      
      -mno-ieee-fp
      
      -fomit-framepointer
      
      -ffunction-sections
      
      -ftz
- -ffast-math
- CXXOPTIMIZE
- Enable fast math mode. This option may yield faster code for programs that do not require the guarantees of exact implementation of IEEE or ISO rules/specifications for math functions.
- -flto
- CXXOPTIMIZE
- Performs link time optimizations, which is also known as Interprocedural Optimizations.
- -mfpmath=sse
- CXXOPTIMIZE
- Generate floating-point arithmetic for selected unit unit. Here use scalar floating-point instructions present in the SSE instruction set
- -funroll-loops
- CXXOPTIMIZE
- Tells the compiler the maximum number of times to unroll loops. For example -funroll-loops0 would disable unrolling of loops.
- -qopt-mem-layout-trans=4
- CXXOPTIMIZE
- Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
  - 0: Disables memory layout transformations. This is the same as specifying -qno-opt-mem-layout-trans
  - 1: Enable basic memory layout transformations like structure splitting, structure peeling, field inlining, field reordering, array field transpose, increase field alignment etc.
  - 2: Enable more memory layout transformations like advanced structure splitting. This is the same as specifying -qopt-mem-layout-trans
  - 3: Enable more memory layout transformations like copy-in/copy-out of structures for a region of code. You should only use this setting if your system has more than 4GB of physical memory per core.
  - 4: Compiler is more aggressive in using memory layout transformations. You should only use this setting if your system has more than 4GB of physical memory per core.
- -mbranches-within-32B-boundaries
- EXTRA_CXXOPTIMIZE
- This option instructs compiler to align branches and fused branches on 32 byte boundaries
- -L/opt/intel/oneapi/compiler/2021.1.1/linux/compiler/lib/intel64_lin/
- EXTRA_LIBS
- Build time link path for libraries supplied with the compiler (for example, the qkmalloc library).
- -lqkmalloc
- EXTRA_LIBS
- Linker toggle to specify qkmalloc linker library. See https://software.intel.com/en-us/articles/intel-c-compiler-190-for-linux-release-notes-for-intel-parallel-studio-xe-2019#custalloc for more information.

Fortran benchmarks

- -m64
- intel_icc,intel_icpc,intel_ifort,intel_icx,intel_icpx
- FC, LD
- Compiles for a 64-bit (LP64) data model.
- -xCORE-AVX512
- FOPTIMIZE
- Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
  
  Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
- -O3
- FOPTIMIZE
- Enable O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enable optimizations for maximum speed, such as:
  - Loop unrolling, including instruction scheduling
  - Code replication to eliminate branches
  - Padding the size of certain power-of-two arrays to allow more efficient cache use.
  On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
- Includes:
  - -O2
    - -O1
      
      -funroll-loops
      
      -fno-builtin
      
      -mno-ieee-fp
      
      -fomit-framepointer
      
      -ffunction-sections
      
      -ftz
- -ipo
- FOPTIMIZE
- Multi-file ip optimizations that includes:
  - inline function expansion
  - interprocedural constant propogation
  - dead code elimination
  - propagation of function characteristics
  - passing arguments in registers
  - loop-invariant code motion
- -no-prec-div
- FOPTIMIZE
- (disable/enable[default] -prec-div)
  -no-prec-div enables optimizations that give slightly less precise results than full IEEE division.
  
  When you specify -no-prec-div along with some optimizations, such as -xN and -xB (Linux) or /QxN and /QxB (Windows), the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.
  
  However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, do not use -no-prec-div. This will enable the default -prec-div and the result will be more accurate, with some loss of performance.
- -qopt-mem-layout-trans=4
- FOPTIMIZE
- Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
  - 0: Disables memory layout transformations. This is the same as specifying -qno-opt-mem-layout-trans
  - 1: Enable basic memory layout transformations like structure splitting, structure peeling, field inlining, field reordering, array field transpose, increase field alignment etc.
  - 2: Enable more memory layout transformations like advanced structure splitting. This is the same as specifying -qopt-mem-layout-trans
  - 3: Enable more memory layout transformations like copy-in/copy-out of structures for a region of code. You should only use this setting if your system has more than 4GB of physical memory per core.
  - 4: Compiler is more aggressive in using memory layout transformations. You should only use this setting if your system has more than 4GB of physical memory per core.
- -nostandard-realloc-lhs
- EXTRA_FOPTIMIZE
- Option standard-realloc-lhs (the default), tells the compiler that when the left-hand side of an assignment is an allocatable object, it should be reallocated to the shape of the right-hand side of the assignment before the assignment occurs. This is the current Fortran Standard definition. This feature may cause extra overhead at run time. This option has the same effect as option assume realloc_lhs.
  
  If you specify nostandard-realloc-lhs, the compiler uses the old Fortran 2003 rules when interpreting assignment statements. The left-hand side is assumed to be allocated with the correct shape to hold the right-hand side. If it is not, incorrect behavior will occur. This option has the same effect as option assume norealloc_lhs.
- -align array32byte
- EXTRA_FOPTIMIZE
- The align toggle changes how data elements are aligned. Variables and arrays are analyzed and memory layout can be altered. Specifying array32byte will look for opportunities to transform and reailgn arrays to 32byte boundaries.
- -auto
- EXTRA_FOPTIMIZE
- Make all local variables AUTOMATIC. Same as -automatic
- -mbranches-within-32B-boundaries
- EXTRA_FOPTIMIZE
- This option instructs compiler to align branches and fused branches on 32 byte boundaries

Peak Optimization Flags

C benchmarks

600.perlbench_s

- -Wl,-z,muldefs
- EXTRA_LDFLAGS
- Enable SmartHeap and/or other library usage by forcing the linker to ignore multiple definitions if present
- -prof-gen
- PASS1_CFLAGS, PASS1_LDFLAGS
- Instrument program for profiling for the first phase of two-phase profile guided otimization. This instrumentation gathers information about a program's execution paths and data values but does not gather information from hardware performance counters. The profile instrumentation also gathers data for optimizations which are unique to profile-feedback optimization.
  
  -profgen:threadsafe option collects profile guided optimization data with guards for threaded applications.
- -prof-use
- PASS2_CFLAGS, PASS2_LDFLAGS
- Instructs the compiler to produce a profile-optimized executable and merges available dynamic information (.dyn) files into a pgopti.dpi file. If you perform multiple executions of the instrumented program, -prof-use merges the dynamic information files again and overwrites the previous pgopti.dpi file.
  Without any other options, the current directory is searched for .dyn files
- -xCORE-AVX512
- COPTIMIZE
- Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
  
  Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
- -ipo
- COPTIMIZE
- Multi-file ip optimizations that includes:
  - inline function expansion
  - interprocedural constant propogation
  - dead code elimination
  - propagation of function characteristics
  - passing arguments in registers
  - loop-invariant code motion
- -O3
- COPTIMIZE
- Enable O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enable optimizations for maximum speed, such as:
  - Loop unrolling, including instruction scheduling
  - Code replication to eliminate branches
  - Padding the size of certain power-of-two arrays to allow more efficient cache use.
  On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
- Includes:
  - -O2
    - -O1
      
      -funroll-loops
      
      -fno-builtin
      
      -mno-ieee-fp
      
      -fomit-framepointer
      
      -ffunction-sections
      
      -ftz
- -no-prec-div
- COPTIMIZE
- (disable/enable[default] -prec-div)
  -no-prec-div enables optimizations that give slightly less precise results than full IEEE division.
  
  When you specify -no-prec-div along with some optimizations, such as -xN and -xB (Linux) or /QxN and /QxB (Windows), the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.
  
  However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, do not use -no-prec-div. This will enable the default -prec-div and the result will be more accurate, with some loss of performance.
- -qopt-mem-layout-trans=4
- COPTIMIZE
- Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
  - 0: Disables memory layout transformations. This is the same as specifying -qno-opt-mem-layout-trans
  - 1: Enable basic memory layout transformations like structure splitting, structure peeling, field inlining, field reordering, array field transpose, increase field alignment etc.
  - 2: Enable more memory layout transformations like advanced structure splitting. This is the same as specifying -qopt-mem-layout-trans
  - 3: Enable more memory layout transformations like copy-in/copy-out of structures for a region of code. You should only use this setting if your system has more than 4GB of physical memory per core.
  - 4: Compiler is more aggressive in using memory layout transformations. You should only use this setting if your system has more than 4GB of physical memory per core.
- -fno-strict-overflow
- EXTRA_OPTIMIZE
- Tells the compiler to remove the assumption that source code follows c99 signed overflow rules.
- -mbranches-within-32B-boundaries
- EXTRA_COPTIMIZE
- This option instructs compiler to align branches and fused branches on 32 byte boundaries
- -L/usr/local/jemalloc64-5.0.1/lib
- EXTRA_LIBS
- Specify build time link path for jemalloc 64bit built to support the CPU 2017 build. See jemalloc.net for more information.
- -ljemalloc
- EXTRA_LIBS
- Linker toggle to specify jemalloc linker library. See jemalloc.net for more information.

602.gcc_s

- -m64
- intel_icc,intel_icpc,intel_ifort,intel_icx,intel_icpx
- CC, LD
- Compiles for a 64-bit (LP64) data model.
- -std=c11
- intel_icc,intel_icx,intel_icpx
- CC, LD
- Sets the language dialect to conform to the indicated C standard.
- -Wl,-z,muldefs
- EXTRA_LDFLAGS
- Enable SmartHeap and/or other library usage by forcing the linker to ignore multiple definitions if present
- -fprofile-generate
- PASS1_CFLAGS, PASS1_LDFLAGS
- Instrument program for profiling for the first phase of two-phase profile guided otimization. This instrumentation gathers information about a program's execution paths and data values but does not gather information from hardware performance counters. The profile instrumentation also gathers data for optimizations which are unique to profile-feedback optimization.
- -fprofile-use=default.profdata
- PASS2_CFLAGS, PASS2_LDFLAGS
- Instructs the compiler to produce a profile-optimized executable and merges available dynamic information.
- -xCORE-AVX512
- COPTIMIZE, PASS1_CFLAGS, PASS1_LDFLAGS
- Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
  
  Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
- -flto
- COPTIMIZE, PASS1_CFLAGS, PASS1_LDFLAGS
- Performs link time optimizations, which is also known as Interprocedural Optimizations.
- -Ofast
- PASS1_CFLAGS, PASS1_LDFLAGS
- Enable O3 optimizations plus more aggressive optimizations, such as -ffinite-math-only –no-prec-div
- Includes:
  - -O3
    - -O2
      
      -O1
      
      -funroll-loops
      
      -fno-builtin
      
      -mno-ieee-fp
      
      -fomit-framepointer
      
      -ffunction-sections
      
      -ftz
- -O3
- COPTIMIZE
- Enable O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enable optimizations for maximum speed, such as:
  - Loop unrolling, including instruction scheduling
  - Code replication to eliminate branches
  - Padding the size of certain power-of-two arrays to allow more efficient cache use.
  On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
- Includes:
  - -O2
    - -O1
      
      -funroll-loops
      
      -fno-builtin
      
      -mno-ieee-fp
      
      -fomit-framepointer
      
      -ffunction-sections
      
      -ftz
- -ffast-math
- COPTIMIZE
- Enable fast math mode. This option may yield faster code for programs that do not require the guarantees of exact implementation of IEEE or ISO rules/specifications for math functions.
- -qopt-mem-layout-trans=4
- COPTIMIZE
- Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
  - 0: Disables memory layout transformations. This is the same as specifying -qno-opt-mem-layout-trans
  - 1: Enable basic memory layout transformations like structure splitting, structure peeling, field inlining, field reordering, array field transpose, increase field alignment etc.
  - 2: Enable more memory layout transformations like advanced structure splitting. This is the same as specifying -qopt-mem-layout-trans
  - 3: Enable more memory layout transformations like copy-in/copy-out of structures for a region of code. You should only use this setting if your system has more than 4GB of physical memory per core.
  - 4: Compiler is more aggressive in using memory layout transformations. You should only use this setting if your system has more than 4GB of physical memory per core.
- -mbranches-within-32B-boundaries
- EXTRA_COPTIMIZE
- This option instructs compiler to align branches and fused branches on 32 byte boundaries
- -L/usr/local/jemalloc64-5.0.1/lib
- EXTRA_LIBS
- Specify build time link path for jemalloc 64bit built to support the CPU 2017 build. See jemalloc.net for more information.
- -ljemalloc
- EXTRA_LIBS
- Linker toggle to specify jemalloc linker library. See jemalloc.net for more information.

625.x264_s

- -DSPEC_OPENMP
- CC, LD
- Definition of this macro indicates that compilation for parallel operation is enabled, and that any OpenMP directives or pragmas will be visible to the compiler. The behavior of this macro is overridden if -DSPEC_SUPPRESS_OPENMP also appears in the list of compilation flags.
- -fiopenmp
- Yes
- CC, LD
- -std=c11
- intel_icc,intel_icx,intel_icpx
- CC, LD
- Sets the language dialect to conform to the indicated C standard.
- -m64
- intel_icc,intel_icpc,intel_ifort,intel_icx,intel_icpx
- CC, LD
- Compiles for a 64-bit (LP64) data model.
- -Wl,-z,muldefs
- EXTRA_LDFLAGS
- Enable SmartHeap and/or other library usage by forcing the linker to ignore multiple definitions if present
- -xCORE-AVX512
- COPTIMIZE
- Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
  
  Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
- -flto
- COPTIMIZE
- Performs link time optimizations, which is also known as Interprocedural Optimizations.
- -O3
- COPTIMIZE
- Enable O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enable optimizations for maximum speed, such as:
  - Loop unrolling, including instruction scheduling
  - Code replication to eliminate branches
  - Padding the size of certain power-of-two arrays to allow more efficient cache use.
  On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
- Includes:
  - -O2
    - -O1
      
      -funroll-loops
      
      -fno-builtin
      
      -mno-ieee-fp
      
      -fomit-framepointer
      
      -ffunction-sections
      
      -ftz
- -ffast-math
- COPTIMIZE
- Enable fast math mode. This option may yield faster code for programs that do not require the guarantees of exact implementation of IEEE or ISO rules/specifications for math functions.
- -qopt-mem-layout-trans=4
- COPTIMIZE
- Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
  - 0: Disables memory layout transformations. This is the same as specifying -qno-opt-mem-layout-trans
  - 1: Enable basic memory layout transformations like structure splitting, structure peeling, field inlining, field reordering, array field transpose, increase field alignment etc.
  - 2: Enable more memory layout transformations like advanced structure splitting. This is the same as specifying -qopt-mem-layout-trans
  - 3: Enable more memory layout transformations like copy-in/copy-out of structures for a region of code. You should only use this setting if your system has more than 4GB of physical memory per core.
  - 4: Compiler is more aggressive in using memory layout transformations. You should only use this setting if your system has more than 4GB of physical memory per core.
- -fno-alias
- EXTRA_OPTIMIZE
- This options tells the compiler to assume no aliasing in the program.
- -mbranches-within-32B-boundaries
- EXTRA_COPTIMIZE
- This option instructs compiler to align branches and fused branches on 32 byte boundaries
- -L/usr/local/jemalloc64-5.0.1/lib
- EXTRA_LIBS
- Specify build time link path for jemalloc 64bit built to support the CPU 2017 build. See jemalloc.net for more information.
- -ljemalloc
- EXTRA_LIBS
- Linker toggle to specify jemalloc linker library. See jemalloc.net for more information.

C++ benchmarks

Fortran benchmarks

648.exchange2_s

- basepeak = yes

Implicitly Included Flags

This section contains descriptions of flags that were included implicitly by other flags, but which do not have a permanent home at SPEC.

Commands and Options Used to Submit Benchmark Runs

Shell, Environment, and Other Software Settings

Red Hat Specific features

Firmware / BIOS / Microcode Settings

One or more of the following settings may have been set. If so, the "Platform Notes" section of the report will say so; and you can read below to find out more about what these settings mean.

This feature allows enabling or disabling of logical processor cores on processors supporting Intel Hyper-Threading (HT). When enabled, each physical processor core operates as two logical processor cores. When disabled, each physical core operates as only one logical processor core. Enabling this option can improve overall performance for applications that benefit from a higher processor core count.

When enabled, a hypervisor or operating system supporting this option can use hardware capabilities provided by Intel VT. Some hypervisors require that you enable Intel VT. You can leave this set to enabled even if you are not using a hypervisor or an operating system that uses this option. With default BIOS settings as shipped with most systems, the default state for this setting is Enabled. However, this setting can change it's default setting depending on the Workload Profile that is selected, or what Workload Profile is default for the a certain system.

If enabled, a hypervisor or operating system supporting this option can use hardware capabilities provided by Intel VT for Directed I/O. You can leave this set to enabled even if you are not using a hypervisor or an operating system that uses this option. With default BIOS settings as shipped with most systems, the default state for this setting is Enabled. However, this setting can change it's default setting depending on the Workload Profile that is selected, or what Workload Profile is default for the a certain system.

If enabled, x2APIC support enables operating system to run more efficiently on high core count configurations. It also optimizes interrupt distribution in virtualized environments. Setting this option to enables is recommended for most cases. When enabled, the operating system can optionally enable x2APIC support when it loads. Older hypervisors and operating systems might have issues with optional x2APIC support, therefore disabling x2APIC could be necessary to address these issues. Setting this option to enabled also forces Intel VT-D to be enabled.

If enabled, SR-IOV support enables a hypervisor to create virtual instances of PCI-express device, potentially increasing performance. If enabled, the BIOS allocates additional resources to PCI-express devices. You can leave this option set to enabled even if you are not using a hypervisor. With default BIOS settings as shipped with most systems, the default state for this setting is Enabled. However, this setting can change it's default setting depending on the Workload Profile that is selected, or what Workload Profile is default for the a certain system.

This feature allows the user to select the fan cooling solution for the system. Values for this BIOS option can be:

In the Xeon Scalable processor cache scheme, mid-level cache (MLC) evictions are filled into the last level cache (LLC). If a line is evicted from the MLC to the LLC, the core can flag the evicted MLC lines as "dead". This means that the lines are not likely to be read again. This option allows dead lines to be dropped and never fill the LLC if the option is disabled. Values for this BIOS option can be:

Use this option to enable the Enhanced Processor Performance setting. When enabled, this option will adjust the processor settings to a more aggressive setting that can result in improved performance, but may result in higher power consumption. Values for this BIOS option can be either Disabled or Enabled.

Use this option to enable the Enhanced Processor Performance Profile setting. In order to set this option, the Enhanced Processor Performance option must be set to Enabled. This allows a user to choose between 3 profiles: conservative, moderate, and aggressive.

The in-memory directory has three states: invalid (I), snoopAll (A), and shared (S). Invalid (I) state means the data is clean and does not exist in any other socket`s cache. The snoopAll (A) state means the data may exist in another socket in exclusive or modified state. Shared (S) state means the data is clean and may be shared across one or more socket`s caches. When doing a read to memory, if the directory line is in the A state we must snoop all the other sockets because another socket may have the line in modified state. If this is the case, the snoop will return the modified data. However, it may be the case that a line is read in A state and all the snoops come back a miss. This can happen if another socket read the line earlier and then silently dropped it from its cache without modifying it. Values for this BIOS option can be:

Stale A to S may be beneficial in a workload where there are many cross-socket reads.

This option configures the processor Last Level Cache (LLC) prefetch feature as a result of the non-inclusive cache architecture. The LLC prefetcher exists on top of other prefetchers that that can prefetch data in the core data cache unit (DCU) and mid-level cache(MLC). In some cases, setting this option to disabled can improve performance. Typically, setting this option to enable provides better performance. Values for this BIOS option can be:

This feature allows the user to configure how the BIOS reports the size of a NUMA node (number of logical processors), which assists the Operating System in grouping processors for application use (referred to as Kgroups). Values for this BIOS option can be:

Sub-NUMA Clustering(SNC) breaks up the last level cache (LLC) into disjoint clusters based on address range, with each cluster bound to a subset of the memory controllers in the system. SNC improves average latency to the LLC and memory. SNC is a replacement for the cluster on die (COD) feature found in previous processor families. For a multi-socketed system, all SNC clusters are mapped to unique NUMA domains. (See also IMC interleaving.) Values for this BIOS option can be:

This option configures the processor Xtended Prediction Table (XPT) prefetch feature. The XPT prefetcher exists on top of other prefetchers that that can prefetch data in the core DCU, MLC, and LLC. The XPT prefetcher will issue a speculative DRAM read request in parallel to an LLC lookup. This prefetch bypasses the LLC, saving latency. In some cases, setting this option to disabled can improve performance. In some cases, setting this option to disabled can improve performance. Typically, setting this option to enable provides better performance. This option must be enabled when Sub-NUMA Clustering is enabled. Values for this BIOS option can be:

This option allows enabling/disabling the function of Data Cache Unit (DCU) Stream prefetcher. If this option sets to enabled, when the DCU Stream prefetcher detects multiple loads from the same line done within a time limit, it prefetches the next line into the L1 data cache. In some cases, setting this option to disabled can improve performance. Typically, setting this option to enabled provides better performance. Only disable this option after performing application benchmarking to verify improved performance in your environment.

This option controls the frequency scaling of the processor`s internal buses (the uncore). Values for this BIOS option can be:

This option allows a user to choose one workload profile that best fits the user`s needs. The workload profiles control many power and performance settings that are relevant to general workload areas. Values for this BIOS option can be:

This option can be manually configured if the Power Profile is set to Custom. The default value is associated with the default value of the Workload Profile - General Power Efficient Compute. If the Workload Profile changes, the default value of this setting may change. Values for this BIOS setting can be:

This option can only be configured if the Workload Profile is set to Custom, or this option is not a dependent value for the Workload Profile. This feature selects the processor's lowest idle power state (C-state) that the operating system uses. The higher the C-state, the lower the power usage of that idle state (C6 is the lowest power idle state supported by the processor). Values for this setting can be:

This option can only be configured if the Workload Profile is set to Custom, or this option is not a dependent value for the Workload Profile. This feature selects the processor's lowest idle package power state (C-state) that is enabled. The processor will automatically transition into the package C-states based on the Core C-states, in which cores on the processor have transitioned. The higher the package C-state, the lower the power usage of that idle package state. Package C6 (retention) is the lowest power idle package state supported by the processor). Values for this setting can be:

This BIOS option allows the enabling/disabling of the Processor Clocking Control (PCC) Interface. This option can be manually configured if the Power Profile is set to Custom. The default value is associated with the default value of the Workload Profile - General Power Efficient Compute. If the Workload Profile changes, the default value of this setting may change.

For operating systems which support this feature, enabling this option allows the Operating System to request processor frequency changes even when the server has the Power Regulator option configured for Dynamic Power Savings Mode.

For Operating Systems that do not support the PCC Interface or when the Power Regulator Mode is not configured for Dynamic Power Savings Mode, this option has no impact on system operation.

This option can only be configured if the Workload Profile is set to Custom, or this option is not a dependent value for the Workload Profile. This option configures several processor subsystems to optimize the processor's performance and power usage. Values for this BIOS setting can be:

This option controls whether the processor uses an energy efficiency based policy when engaging turbo range frequencies. This option is only applicable when Turbo Mode is enabled. Values for this BIOS setting can be: Enabled or Disabled.

This option allows the AHS PCI Logging size to be changed. This is a boot time option that should have no effect on run time performance. Values for this BIOS setting can be:

This option allows for correction of soft memory errors. Over the length of system runtime, the risk of producing multi-bit and uncorrected errors is reduced with this option. Values for this BIOS setting can be:

Use this option to disable the processor HW Prefetch feature. In some cases, setting this option to disabled can improve performance. Typically, setting this option to enabled provides better performance. Only disable this option after performing application benchmarking to verify improved performance in the environment. The HW Prefetcher fetches streams of data and instruction from the memory into the second-level (L2) cache if it determines this data is likely to be required in the near future. The prefetcher is capable of handling multiple streams in either the forward or backward direction. The HW Prefetcher is triggered when successive cache misses occur in the last-level cache and a stride in the access pattern is detected, such as in the case of loop iterations that access array elements. The prefetching occurs up to a page boundary. This option can reduce the latency associated with memory reads. Values for this BIOS setting can be enabled or disabled.

Use this option to disable the processor Adjacent Sector Prefetch feature. In some cases, setting this option to disabled can improve performance. Typically, setting this option to enabled provides better performance. Only disable this option after performing application benchmarking to verify improved performance in the environment. The Adjacent Sector Prefetcher retrieves both sectors of a cache line when it requires data that isn't currently in the cache. When disabled, the processor will only fetch the sector of the cache line that includes the requested data. Values for this BIOS setting can be enabled or disabled.

This option configures the processor Xtended Prediction Table (XPT) prefetch feature. The XPT prefetcher exists on top of other prefetchers that that can prefetch data in the core DCU, MLC, and LLC. The XPT prefetcher will issue a speculative DRAM read request in parallel to an LLC lookup. This prefetch bypasses the LLC, saving latency. In some cases, setting this option to disabled can improve performance. Typically, setting this option to enable provides better performance. This option must be enabled when Sub-NUMA Clustering is enabled. Values for this BIOS option can be:

Use this option to configure the UPI topology to use fewer links between processors, when available. Changing from the default can reduce UPI bandwidth performance in exchange for less power consumption. Values for this BIOS setting can be: Auto and Single Link Operation.

Use this option to place the Quick Path Interconnect (UPI) links into a low power state when the links are not being used. This lowers power usage with minimal effect on performance. You can only configure this option if two or more CPUs are present and the Workload Profile is set to Custom. Values for this BIOS setting can be: enabled and disabled.

Use this option to set the UPI Link frequency to a lower speed. Running at a lower frequency can reduce power consumption, but can also affect system performance. You can only configure this option if two or more CPUs are present and the Workload Profile is set to Custom. Values for this BIOS setting can be: Auto and Min UPI Speed.

Allows for enabling/disabling of this feature. This option can have an effect on reducing LLC miss latency. Values for this BIOS setting can be:

Use this option to configure additional memory protection with ECC (Error Checking and Correcting). Options and support vary per system. When the memory configuration supports the Fault Tolerant Memory (ADDDC) mode and the Workload Profile setting is other than Low Latency and Custom, Advanced Memory Protection is automatically changed to Fault Tolerant Memory (ADDDC) mode.

Intel Speed Select Technology - Base Frequency support is available only on select processor models. Processors with Prioritized Base Frequency support a higher base frequency for a select number of cores (high priority cores) while the remaining cores will have a lower base frequency (low priority cores). Enabling this setting will result in increasing the CPU base frequency for the high priority cores and decreasing the CPU base frequency for the low priority cores. Consult processor documentation for more information on priority core counts and frequency adjustments. Values for this BIOS setting can be:

For questions about the meanings of these flags, please contact the tester.
For other inquiries, please contact info@spec.org
Copyright 2017-2023 Standard Performance Evaluation Corporation
Tested with SPEC CPU2017 v1.1.8.
Report generated on 2023-03-02 11:18:57 by SPEC CPU2017 flags formatter v5178.

	Indicates that the flag description came from the user flags file.
	Indicates that the flag description came from the suite-wide flags file.
	Indicates that the flag description came from a per-benchmark flags file.

CPU2017 Flag DescriptionNEC Corporation Express5800/R120i-2M (Intel Xeon Platinum 8358)

Base Compiler Invocation

Peak Compiler Invocation

Base Portability Flags

Peak Portability Flags

Base Optimization Flags

Peak Optimization Flags

Implicitly Included Flags

Red Hat Specific features

CPU2017 Flag Description
NEC Corporation Express5800/R120i-2M (Intel Xeon Platinum 8358)