CPU2017 Result Flag Description

Base Portability Flags

600.perlbench_s

- -DSPEC_LP64
- PORTABILITY
- This macro specifies that the target system uses the LP64 data model; specifically, that integers are 32 bits, while longs and pointers are 64 bits.
- Includes:
- -DSPEC_LINUX_X64
- CPORTABILITY
- This macro indicates that the benchmark is being compiled on an AMD64-compatible system running the Linux operating system.
- Includes:

602.gcc_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

605.mcf_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

620.omnetpp_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

623.xalancbmk_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
- -DSPEC_LINUX
- CXXPORTABILITY
- This flag can be set for SPEC compilation for LINUX using default compiler.

625.x264_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

631.deepsjeng_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

641.leela_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

648.exchange2_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

657.xz_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

Peak Portability Flags

600.perlbench_s

- -DSPEC_LP64
- PORTABILITY
- This macro specifies that the target system uses the LP64 data model; specifically, that integers are 32 bits, while longs and pointers are 64 bits.
- Includes:
- -DSPEC_LINUX_X64
- CPORTABILITY
- This macro indicates that the benchmark is being compiled on an AMD64-compatible system running the Linux operating system.
- Includes:

602.gcc_s

- -DSPEC_LP64
- CC, LD
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

605.mcf_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

620.omnetpp_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

623.xalancbmk_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
- -DSPEC_LINUX
- CXXPORTABILITY
- This flag can be set for SPEC compilation for LINUX using default compiler.

625.x264_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

631.deepsjeng_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

641.leela_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

648.exchange2_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

657.xz_s

- -DSPEC_LP64
- PORTABILITY
- This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.

Some portability flags were found in optimization variables.

Base Optimization Flags

C benchmarks

- -m64
- intel_icc,intel_icpc,intel_ifort
- CC, LD
- Compiles for a 64-bit (LP64) data model.
- -qnextgen
- CC, LD
- Invoke Intel C++ compiler next generation code generator.
- -std=c11
- intel_icc
- CC, LD
- Sets the language dialect to conform to the indicated C standard.
- -Wl,-plugin-opt=-x86-branches-within-32B-boundaries
- LDFLAGS
- This option instructs compiler to align branches and fused branches on 32 byte boundaries
- -Wl,-z,muldefs
- EXTRA_LDFLAGS
- Enable SmartHeap and/or other library usage by forcing the linker to ignore multiple definitions if present
- -xCORE-AVX512
- COPTIMIZE
- Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
  
  Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
- -O3
- COPTIMIZE
- Enable O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enable optimizations for maximum speed, such as:
  - Loop unrolling, including instruction scheduling
  - Code replication to eliminate branches
  - Padding the size of certain power-of-two arrays to allow more efficient cache use.
  On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
- Includes:
  - -O2
    - -O1
      
      -funroll-loops
      
      -fno-builtin
      
      -mno-ieee-fp
      
      -fomit-framepointer
      
      -ffunction-sections
      
      -ftz
- -ffast-math
- COPTIMIZE
- Enable fast math mode. This option may yield faster code for programs that do not require the guarantees of exact implementation of IEEE or ISO rules/specifications for math functions.
- -flto
- COPTIMIZE
- Performs link time optimizations, which is also known as Interprocedural Optimizations.
- -mfpmath=sse
- COPTIMIZE
- Generate floating-point arithmetic for selected unit unit. Here use scalar floating-point instructions present in the SSE instruction set
- -funroll-loops
- COPTIMIZE
- Tells the compiler the maximum number of times to unroll loops. For example -funroll-loops0 would disable unrolling of loops.
- -fuse-ld=gold
- COPTIMIZE
- Instructs the compiler to use the GNU gold linker (gold) instead of the system linker when linking object files.
- -qopt-mem-layout-trans=4
- COPTIMIZE
- Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
  - 0: Disables memory layout transformations. This is the same as specifying -qno-opt-mem-layout-trans
  - 1: Enable basic memory layout transformations like structure splitting, structure peeling, field inlining, field reordering, array field transpose, increase field alignment etc.
  - 2: Enable more memory layout transformations like advanced structure splitting. This is the same as specifying -qopt-mem-layout-trans
  - 3: Enable more memory layout transformations like copy-in/copy-out of structures for a region of code. You should only use this setting if your system has more than 4GB of physical memory per core.
  - 4: Compiler is more aggressive in using memory layout transformations. You should only use this setting if your system has more than 4GB of physical memory per core.
- -fopenmp
- Yes
- COPTIMIZE
- -DSPEC_OPENMP
- COPTIMIZE
- Definition of this macro indicates that compilation for parallel operation is enabled, and that any OpenMP directives or pragmas will be visible to the compiler. The behavior of this macro is overridden if -DSPEC_SUPPRESS_OPENMP also appears in the list of compilation flags.
- -L/usr/local/jemalloc64-5.0.1/lib
- EXTRA_LIBS
- Specify build time link path for jemalloc 64bit built to support the CPU 2017 build. See jemalloc.net for more information.
- -ljemalloc
- EXTRA_LIBS
- Linker toggle to specify jemalloc linker library. See jemalloc.net for more information.

C++ benchmarks

- -m64
- intel_icc,intel_icpc,intel_ifort
- CXX, LD
- Compiles for a 64-bit (LP64) data model.
- -qnextgen
- CXX, LD
- Invoke Intel C++ compiler next generation code generator.
- -Wl,-plugin-opt=-x86-branches-within-32B-boundaries
- LDFLAGS
- This option instructs compiler to align branches and fused branches on 32 byte boundaries
- -Wl,-z,muldefs
- EXTRA_LDFLAGS
- Enable SmartHeap and/or other library usage by forcing the linker to ignore multiple definitions if present
- -xCORE-AVX512
- CXXOPTIMIZE
- Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
  
  Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
- -O3
- CXXOPTIMIZE
- Enable O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enable optimizations for maximum speed, such as:
  - Loop unrolling, including instruction scheduling
  - Code replication to eliminate branches
  - Padding the size of certain power-of-two arrays to allow more efficient cache use.
  On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
- Includes:
  - -O2
    - -O1
      
      -funroll-loops
      
      -fno-builtin
      
      -mno-ieee-fp
      
      -fomit-framepointer
      
      -ffunction-sections
      
      -ftz
- -ffast-math
- CXXOPTIMIZE
- Enable fast math mode. This option may yield faster code for programs that do not require the guarantees of exact implementation of IEEE or ISO rules/specifications for math functions.
- -flto
- CXXOPTIMIZE
- Performs link time optimizations, which is also known as Interprocedural Optimizations.
- -mfpmath=sse
- CXXOPTIMIZE
- Generate floating-point arithmetic for selected unit unit. Here use scalar floating-point instructions present in the SSE instruction set
- -funroll-loops
- CXXOPTIMIZE
- Tells the compiler the maximum number of times to unroll loops. For example -funroll-loops0 would disable unrolling of loops.
- -fuse-ld=gold
- CXXOPTIMIZE
- Instructs the compiler to use the GNU gold linker (gold) instead of the system linker when linking object files.
- -qopt-mem-layout-trans=4
- CXXOPTIMIZE
- Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
  - 0: Disables memory layout transformations. This is the same as specifying -qno-opt-mem-layout-trans
  - 1: Enable basic memory layout transformations like structure splitting, structure peeling, field inlining, field reordering, array field transpose, increase field alignment etc.
  - 2: Enable more memory layout transformations like advanced structure splitting. This is the same as specifying -qopt-mem-layout-trans
  - 3: Enable more memory layout transformations like copy-in/copy-out of structures for a region of code. You should only use this setting if your system has more than 4GB of physical memory per core.
  - 4: Compiler is more aggressive in using memory layout transformations. You should only use this setting if your system has more than 4GB of physical memory per core.
- -L/usr/local/IntelCompiler19/compilers_and_libraries_2020.1.217/linux/compiler/lib/intel64_lin
- EXTRA_LIBS
- Build time link path for libraries supplied with the compiler (for example, the qkmalloc library).
- -lqkmalloc
- EXTRA_LIBS
- Linker toggle to specify qkmalloc linker library. See https://software.intel.com/en-us/articles/intel-c-compiler-190-for-linux-release-notes-for-intel-parallel-studio-xe-2019#custalloc for more information.

Fortran benchmarks

- -m64
- intel_icc,intel_icpc,intel_ifort
- FC, LD
- Compiles for a 64-bit (LP64) data model.
- -Wl,-plugin-opt=-x86-branches-within-32B-boundaries
- LDFLAGS
- This option instructs compiler to align branches and fused branches on 32 byte boundaries
- -xCORE-AVX512
- FOPTIMIZE
- Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
  
  Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
- -O3
- FOPTIMIZE
- Enable O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enable optimizations for maximum speed, such as:
  - Loop unrolling, including instruction scheduling
  - Code replication to eliminate branches
  - Padding the size of certain power-of-two arrays to allow more efficient cache use.
  On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
- Includes:
  - -O2
    - -O1
      
      -funroll-loops
      
      -fno-builtin
      
      -mno-ieee-fp
      
      -fomit-framepointer
      
      -ffunction-sections
      
      -ftz
- -ipo
- FOPTIMIZE
- Multi-file ip optimizations that includes:
  - inline function expansion
  - interprocedural constant propogation
  - dead code elimination
  - propagation of function characteristics
  - passing arguments in registers
  - loop-invariant code motion
- -no-prec-div
- FOPTIMIZE
- (disable/enable[default] -prec-div)
  -no-prec-div enables optimizations that give slightly less precise results than full IEEE division.
  
  When you specify -no-prec-div along with some optimizations, such as -xN and -xB (Linux) or /QxN and /QxB (Windows), the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.
  
  However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, do not use -no-prec-div. This will enable the default -prec-div and the result will be more accurate, with some loss of performance.
- -qopt-mem-layout-trans=4
- FOPTIMIZE
- Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
  - 0: Disables memory layout transformations. This is the same as specifying -qno-opt-mem-layout-trans
  - 1: Enable basic memory layout transformations like structure splitting, structure peeling, field inlining, field reordering, array field transpose, increase field alignment etc.
  - 2: Enable more memory layout transformations like advanced structure splitting. This is the same as specifying -qopt-mem-layout-trans
  - 3: Enable more memory layout transformations like copy-in/copy-out of structures for a region of code. You should only use this setting if your system has more than 4GB of physical memory per core.
  - 4: Compiler is more aggressive in using memory layout transformations. You should only use this setting if your system has more than 4GB of physical memory per core.
- -nostandard-realloc-lhs
- EXTRA_FOPTIMIZE
- Option standard-realloc-lhs (the default), tells the compiler that when the left-hand side of an assignment is an allocatable object, it should be reallocated to the shape of the right-hand side of the assignment before the assignment occurs. This is the current Fortran Standard definition. This feature may cause extra overhead at run time. This option has the same effect as option assume realloc_lhs.
  
  If you specify nostandard-realloc-lhs, the compiler uses the old Fortran 2003 rules when interpreting assignment statements. The left-hand side is assumed to be allocated with the correct shape to hold the right-hand side. If it is not, incorrect behavior will occur. This option has the same effect as option assume norealloc_lhs.
- -align array32byte
- EXTRA_FOPTIMIZE
- The align toggle changes how data elements are aligned. Variables and arrays are analyzed and memory layout can be altered. Specifying array32byte will look for opportunities to transform and reailgn arrays to 32byte boundaries.
- -mbranches-within-32B-boundaries
- EXTRA_FOPTIMIZE
- This option instructs compiler to align branches and fused branches on 32 byte boundaries

Peak Optimization Flags

C benchmarks

600.perlbench_s

- -Wl,-z,muldefs
- EXTRA_LDFLAGS
- Enable SmartHeap and/or other library usage by forcing the linker to ignore multiple definitions if present
- -prof-gen
- PASS1_CFLAGS, PASS1_LDFLAGS
- Instrument program for profiling for the first phase of two-phase profile guided otimization. This instrumentation gathers information about a program's execution paths and data values but does not gather information from hardware performance counters. The profile instrumentation also gathers data for optimizations which are unique to profile-feedback optimization.
  
  -profgen:threadsafe option collects profile guided optimization data with guards for threaded applications.
- -prof-use
- PASS2_CFLAGS, PASS2_LDFLAGS
- Instructs the compiler to produce a profile-optimized executable and merges available dynamic information (.dyn) files into a pgopti.dpi file. If you perform multiple executions of the instrumented program, -prof-use merges the dynamic information files again and overwrites the previous pgopti.dpi file.
  Without any other options, the current directory is searched for .dyn files
- -xCORE-AVX512
- COPTIMIZE
- Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
  
  Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
- -ipo
- COPTIMIZE
- Multi-file ip optimizations that includes:
  - inline function expansion
  - interprocedural constant propogation
  - dead code elimination
  - propagation of function characteristics
  - passing arguments in registers
  - loop-invariant code motion
- -O3
- COPTIMIZE
- Enable O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enable optimizations for maximum speed, such as:
  - Loop unrolling, including instruction scheduling
  - Code replication to eliminate branches
  - Padding the size of certain power-of-two arrays to allow more efficient cache use.
  On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
- Includes:
  - -O2
    - -O1
      
      -funroll-loops
      
      -fno-builtin
      
      -mno-ieee-fp
      
      -fomit-framepointer
      
      -ffunction-sections
      
      -ftz
- -no-prec-div
- COPTIMIZE
- (disable/enable[default] -prec-div)
  -no-prec-div enables optimizations that give slightly less precise results than full IEEE division.
  
  When you specify -no-prec-div along with some optimizations, such as -xN and -xB (Linux) or /QxN and /QxB (Windows), the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.
  
  However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, do not use -no-prec-div. This will enable the default -prec-div and the result will be more accurate, with some loss of performance.
- -qopt-mem-layout-trans=4
- COPTIMIZE
- Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
  - 0: Disables memory layout transformations. This is the same as specifying -qno-opt-mem-layout-trans
  - 1: Enable basic memory layout transformations like structure splitting, structure peeling, field inlining, field reordering, array field transpose, increase field alignment etc.
  - 2: Enable more memory layout transformations like advanced structure splitting. This is the same as specifying -qopt-mem-layout-trans
  - 3: Enable more memory layout transformations like copy-in/copy-out of structures for a region of code. You should only use this setting if your system has more than 4GB of physical memory per core.
  - 4: Compiler is more aggressive in using memory layout transformations. You should only use this setting if your system has more than 4GB of physical memory per core.
- -fno-strict-overflow
- EXTRA_OPTIMIZE
- Tells the compiler to remove the assumption that source code follows c99 signed overflow rules.
- -mbranches-within-32B-boundaries
- EXTRA_COPTIMIZE
- This option instructs compiler to align branches and fused branches on 32 byte boundaries
- -L/usr/local/jemalloc64-5.0.1/lib
- EXTRA_LIBS
- Specify build time link path for jemalloc 64bit built to support the CPU 2017 build. See jemalloc.net for more information.
- -ljemalloc
- EXTRA_LIBS
- Linker toggle to specify jemalloc linker library. See jemalloc.net for more information.

602.gcc_s

- -m64
- intel_icc,intel_icpc,intel_ifort
- CC, LD
- Compiles for a 64-bit (LP64) data model.
- -qnextgen
- CC, LD
- Invoke Intel C++ compiler next generation code generator.
- -std=c11
- intel_icc
- CC, LD
- Sets the language dialect to conform to the indicated C standard.
- -fuse-ld=gold
- CC, LD
- Instructs the compiler to use the GNU gold linker (gold) instead of the system linker when linking object files.
- -Wl,-plugin-opt=-x86-branches-within-32B-boundaries
- LDFLAGS
- This option instructs compiler to align branches and fused branches on 32 byte boundaries
- -Wl,-z,muldefs
- EXTRA_LDFLAGS
- Enable SmartHeap and/or other library usage by forcing the linker to ignore multiple definitions if present
- -fprofile-generate
- PASS1_CFLAGS, PASS1_LDFLAGS
- Instrument program for profiling for the first phase of two-phase profile guided otimization. This instrumentation gathers information about a program's execution paths and data values but does not gather information from hardware performance counters. The profile instrumentation also gathers data for optimizations which are unique to profile-feedback optimization.
- -fprofile-use=default.profdata
- PASS2_CFLAGS, PASS2_LDFLAGS
- Instructs the compiler to produce a profile-optimized executable and merges available dynamic information.
- -xCORE-AVX512
- COPTIMIZE, PASS1_CFLAGS, PASS1_LDFLAGS
- Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
  
  Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
- -flto
- COPTIMIZE, PASS1_CFLAGS, PASS1_LDFLAGS
- Performs link time optimizations, which is also known as Interprocedural Optimizations.
- -Ofast
- PASS1_CFLAGS, PASS1_LDFLAGS
- Enable O3 optimizations plus more aggressive optimizations, such as -ffinite-math-only –no-prec-div
- Includes:
  - -O3
    - -O2
      
      -O1
      
      -funroll-loops
      
      -fno-builtin
      
      -mno-ieee-fp
      
      -fomit-framepointer
      
      -ffunction-sections
      
      -ftz
- -O3
- COPTIMIZE
- Enable O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enable optimizations for maximum speed, such as:
  - Loop unrolling, including instruction scheduling
  - Code replication to eliminate branches
  - Padding the size of certain power-of-two arrays to allow more efficient cache use.
  On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
- Includes:
  - -O2
    - -O1
      
      -funroll-loops
      
      -fno-builtin
      
      -mno-ieee-fp
      
      -fomit-framepointer
      
      -ffunction-sections
      
      -ftz
- -ffast-math
- COPTIMIZE
- Enable fast math mode. This option may yield faster code for programs that do not require the guarantees of exact implementation of IEEE or ISO rules/specifications for math functions.
- -qopt-mem-layout-trans=4
- COPTIMIZE
- Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
  - 0: Disables memory layout transformations. This is the same as specifying -qno-opt-mem-layout-trans
  - 1: Enable basic memory layout transformations like structure splitting, structure peeling, field inlining, field reordering, array field transpose, increase field alignment etc.
  - 2: Enable more memory layout transformations like advanced structure splitting. This is the same as specifying -qopt-mem-layout-trans
  - 3: Enable more memory layout transformations like copy-in/copy-out of structures for a region of code. You should only use this setting if your system has more than 4GB of physical memory per core.
  - 4: Compiler is more aggressive in using memory layout transformations. You should only use this setting if your system has more than 4GB of physical memory per core.
- -L/usr/local/jemalloc64-5.0.1/lib
- EXTRA_LIBS
- Specify build time link path for jemalloc 64bit built to support the CPU 2017 build. See jemalloc.net for more information.
- -ljemalloc
- EXTRA_LIBS
- Linker toggle to specify jemalloc linker library. See jemalloc.net for more information.

625.x264_s

- -m64
- intel_icc,intel_icpc,intel_ifort
- CC, LD
- Compiles for a 64-bit (LP64) data model.
- -qnextgen
- CC, LD
- Invoke Intel C++ compiler next generation code generator.
- -std=c11
- intel_icc
- CC, LD
- Sets the language dialect to conform to the indicated C standard.
- -Wl,-plugin-opt=-x86-branches-within-32B-boundaries
- LDFLAGS
- This option instructs compiler to align branches and fused branches on 32 byte boundaries
- -Wl,-z,muldefs
- EXTRA_LDFLAGS
- Enable SmartHeap and/or other library usage by forcing the linker to ignore multiple definitions if present
- -xCORE-AVX512
- COPTIMIZE
- Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
  
  Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
- -flto
- COPTIMIZE
- Performs link time optimizations, which is also known as Interprocedural Optimizations.
- -O3
- COPTIMIZE
- Enable O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enable optimizations for maximum speed, such as:
  - Loop unrolling, including instruction scheduling
  - Code replication to eliminate branches
  - Padding the size of certain power-of-two arrays to allow more efficient cache use.
  On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
- Includes:
  - -O2
    - -O1
      
      -funroll-loops
      
      -fno-builtin
      
      -mno-ieee-fp
      
      -fomit-framepointer
      
      -ffunction-sections
      
      -ftz
- -ffast-math
- COPTIMIZE
- Enable fast math mode. This option may yield faster code for programs that do not require the guarantees of exact implementation of IEEE or ISO rules/specifications for math functions.
- -fuse-ld=gold
- COPTIMIZE
- Instructs the compiler to use the GNU gold linker (gold) instead of the system linker when linking object files.
- -qopt-mem-layout-trans=4
- COPTIMIZE
- Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
  - 0: Disables memory layout transformations. This is the same as specifying -qno-opt-mem-layout-trans
  - 1: Enable basic memory layout transformations like structure splitting, structure peeling, field inlining, field reordering, array field transpose, increase field alignment etc.
  - 2: Enable more memory layout transformations like advanced structure splitting. This is the same as specifying -qopt-mem-layout-trans
  - 3: Enable more memory layout transformations like copy-in/copy-out of structures for a region of code. You should only use this setting if your system has more than 4GB of physical memory per core.
  - 4: Compiler is more aggressive in using memory layout transformations. You should only use this setting if your system has more than 4GB of physical memory per core.
- -fno-alias
- EXTRA_OPTIMIZE
- This options tells the compiler to assume no aliasing in the program.
- -L/usr/local/jemalloc64-5.0.1/lib
- EXTRA_LIBS
- Specify build time link path for jemalloc 64bit built to support the CPU 2017 build. See jemalloc.net for more information.
- -ljemalloc
- EXTRA_LIBS
- Linker toggle to specify jemalloc linker library. See jemalloc.net for more information.

C++ benchmarks

Fortran benchmarks

648.exchange2_s

- basepeak = yes

Implicitly Included Flags

This section contains descriptions of flags that were included implicitly by other flags, but which do not have a permanent home at SPEC.

Commands and Options Used to Submit Benchmark Runs

Shell, Environment, and Other Software Settings

Red Hat Specific features

Operating System Tuning Parameters

Used to set user limits of system-wide resources. Provides control over resources available to the shell and processes started by it. Some common ulimit commands may include:

THP is an abstraction layer that automates most aspects of creating, managing, and using huge pages. THP is designed to hide much of the complexity in using huge pages from system administrators and developers, as normal huge pages must be assigned at boot time, can be difficult to manage manually, and often require significant changes to code in order to be used effectively. Transparent Hugepages increase the memory page size from 4 kilobytes to 2 megabytes. Transparent Hugepages provide significant performance advantages on systems with highly contended resources and large memory workloads. If memory utilization is too high or memory is badly fragmented which prevents hugepages being allocated, the kernel will assign smaller 4k pages instead. Most recent Linux OS releases have THP enabled by default.

If you need finer control and manually set the Huge Pages you can follow the below steps:

Note that further information about huge pages may be found in your Linux documentation file: /usr/src/linux/Documentation/vm/hugetlbpage.txt

Firmware / BIOS / Microcode Settings

This feature allows enabling or disabling of logical processor cores on processors supporting Intel Hyper-Threading (HT). When enabled, each physical processor core operates as two logical processor cores. When disabled, each physical core operates as only one logical processor core. Enabling this option can improve overall performance for applications that benefit from a higher processor core count.

When enabled, a hypervisor or operating system supporting this option can use hardware capabilities provided by Intel VT. Some hypervisors require that you enable Intel VT. You can leave this set to enabled even if you are not using a hypervisor or an operating system that uses this option. With default BIOS settings as shipped with most systems, the default state for this setting is Enabled. However, this setting can change it's default setting depending on the Workload Profile that is selected, or what Workload Profile is default for the a certain system.

If enabled, a hypervisor or operating system supporting this option can use hardware capabilities provided by Intel VT for Directed I/O. You can leave this set to enabled even if you are not using a hypervisor or an operating system that uses this option. With default BIOS settings as shipped with most systems, the default state for this setting is Enabled. However, this setting can change it's default setting depending on the Workload Profile that is selected, or what Workload Profile is default for the a certain system.

In the Skylake cache scheme, mid-level cache (MLC) evictions are filled into the last level cache (LLC). If a line is evicted from the MLC to the LLC, the Skylake core can flag the evicted MLC lines as "dead". This means that the lines are not likely to be read again. This option allows dead lines to be dropped and never fill the LLC if the option is disabled. Values for this BIOS option can be:

This option configures the processor Last Level Cache (LLC) prefetch feature as a result of the non-inclusive cache architecture. The LLC prefetcher exists on top of other prefetchers that that can prefetch data in the core data cache unit (DCU) and mid-level cache(MLC). In some cases, setting this option to disabled can improve performance. Typically, setting this option to enable provides better performance. Values for this BIOS option can be:

SNC breaks up the last level cache (LLC) into disjoint clusters based on address range, with each cluster bound to a subset of the memory controllers in the system. SNC improves average latency to the LLC and memory. SNC is a replacement for the cluster on die (COD) feature found in previous processor families. For a multi-socketed system, all SNC clusters are mapped to unique NUMA domains. (See also IMC interleaving.) Values for this BIOS option can be:

Configure your own power and performance settings under Custom or adopt quick setting profiles.

For questions about the meanings of these flags, please contact the tester.
For other inquiries, please contact info@spec.org
Copyright 2017-2020 Standard Performance Evaluation Corporation
Tested with SPEC CPU2017 v1.1.0.
Report generated on 2020-09-16 10:27:13 by SPEC CPU2017 flags formatter v5178.

	Indicates that the flag description came from the user flags file.
	Indicates that the flag description came from the suite-wide flags file.
	Indicates that the flag description came from a per-benchmark flags file.

CPU2017 Flag DescriptionHitachi Vantara Advanced Server DS220 (Intel Xeon Gold 6258R, 2.70 GHz)

Base Compiler Invocation

Peak Compiler Invocation

Base Portability Flags

Peak Portability Flags

Base Optimization Flags

Peak Optimization Flags

Implicitly Included Flags

Red Hat Specific features

CPU2017 Flag Description
Hitachi Vantara Advanced Server DS220 (Intel Xeon Gold 6258R, 2.70 GHz)