Note: The GNU Compiler Collection provides a wide array of compiler options, described in detail and readily available at https://gcc.gnu.org/onlinedocs/gcc/Option-Index.html#Option-Index and https://gcc.gnu.org/onlinedocs/gfortran/. This SPEC CPU flags file contains excerpts from and brief summaries of portions of that documentation.
SPEC's modifications are:
Copyright (C) 2006-2020 Standard Performance Evaluation Corporation
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with the Invariant Sections being "Funding Free Software", the Front-Cover Texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the license is included in your SPEC CPU kit at $SPEC/Docs/licenses/FDL.v1.3.txt and on the web at https://www.spec.org/cpu2017/Docs/licenses/FDL.v1.3.txt. A copy of "Funding Free Software" is on your SPEC CPU kit at $SPEC/Docs/licenses/FundingFreeSW.txt and on the web at https://www.spec.org/cpu2017/Docs/licenses/FundingFreeSW.txt.
(a) The FSF's Front-Cover Text is:
A GNU Manual
(b) The FSF's Back-Cover Text is:
You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development.
Invokes the GNU C compiler.
Invokes the GNU C++ compiler.
Invokes the GNU Fortran compiler.
Invokes the GNU C compiler.
Invokes the GNU C++ compiler.
Invokes the GNU Fortran compiler.
This macro indicates that the benchmark is being compiled on an ARM system running the Linux operating system in the AArch64 execution environment.
This macro specifies that the target system uses the LP64 data model; specifically, that integers are 32 bits, while longs and pointers are 64 bits.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This flag can be set for SPEC compilation for LINUX using default compiler.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This macro indicates that the benchmark is being compiled on an ARM system running the Linux operating system in the AArch64 execution environment.
This macro specifies that the target system uses the LP64 data model; specifically, that integers are 32 bits, while longs and pointers are 64 bits.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This flag can be set for SPEC compilation for LINUX using default compiler.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
Generate code for lp64. With ilp32, int, long int and pointer are 32-bit; with lp64, int is 32-bit, but long int and pointer are 64-bit.
Sets the language dialect to include syntax from the C99 standard, such as bool and other features used in CPU 2017 benchmarks.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Produce debugging information.
Increases optimization levels: the higher the number, the more optimization is done. Higher levels of optimization may
require additional compilation time, in the hopes of reducing execution time. At -O, basic optimizations are performed,
such as constant merging and elimination of dead code. At -O2, additional optimizations are added, such as common
subexpression elimination and strict aliasing. At -O3, even more optimizations are performed, such as function inlining and
vectorization.
Many more details are available.
On aarch64 systems, mcpu sets the what kind of instructions can be used (as if by -march) and how to
tune for performance (as if by -mtune).
On x86 systems, mcpu is a deprecated synonym for mtune.
On SPARC systems, mcpu sets the available instruction set.
Tells the optimizer to unroll loops whose number of iterations can be determined at compile time or upon entry to the loop.
Enable Link Time Optimization When invoked with source code, it generates GIMPLE (one of GCC's internal representations) and writes it to special ELF sections in the object file. When the object files are linked together, all the function bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit.
Specify growth that the early inliner can make. In effect it increases the amount of inlining for code having a large abstraction penalty.
When you use -finline-functions (included in -O3), a lot of functions that would otherwise not be considered for inlining by the compiler are investigated. To those functions, a different (more restrictive) limit compared to functions declared inline can be applied.
Specifies maximal overall growth of the compilation unit caused by inlining. For example, parameter value 20 limits unit growth to 1.2 times the original size. Cold functions (either marked cold via an attribute or by profile feedback) are not accounted into the unit size.
Tells GCC to use the GNU semantics for "inline" functions, that is, the behavior prior to the C99 standard. This switch may resolve duplicate symbol errors, as noted in the 502.gcc_r benchmark description.
Link with libjemalloc, a fast, arena-based memory allocator.
Generate code for lp64. With ilp32, int, long int and pointer are 32-bit; with lp64, int is 32-bit, but long int and pointer are 64-bit.
Sets the language dialect to include syntax from the 1998 ISO C++ standard plus the 2003 technical corrigendum.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Produce debugging information.
Increases optimization levels: the higher the number, the more optimization is done. Higher levels of optimization may
require additional compilation time, in the hopes of reducing execution time. At -O, basic optimizations are performed,
such as constant merging and elimination of dead code. At -O2, additional optimizations are added, such as common
subexpression elimination and strict aliasing. At -O3, even more optimizations are performed, such as function inlining and
vectorization.
Many more details are available.
On aarch64 systems, mcpu sets the what kind of instructions can be used (as if by -march) and how to
tune for performance (as if by -mtune).
On x86 systems, mcpu is a deprecated synonym for mtune.
On SPARC systems, mcpu sets the available instruction set.
Tells the optimizer to unroll loops whose number of iterations can be determined at compile time or upon entry to the loop.
Enable Link Time Optimization When invoked with source code, it generates GIMPLE (one of GCC's internal representations) and writes it to special ELF sections in the object file. When the object files are linked together, all the function bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit.
Specify growth that the early inliner can make. In effect it increases the amount of inlining for code having a large abstraction penalty.
When you use -finline-functions (included in -O3), a lot of functions that would otherwise not be considered for inlining by the compiler are investigated. To those functions, a different (more restrictive) limit compared to functions declared inline can be applied.
Specifies maximal overall growth of the compilation unit caused by inlining. For example, parameter value 20 limits unit growth to 1.2 times the original size. Cold functions (either marked cold via an attribute or by profile feedback) are not accounted into the unit size.
Assume that a loop with an exit will eventually take the exit and not loop indefinitely. This allows the compiler to remove loops that otherwise have no side-effects, not considering eventual endless looping as such.
Link with libjemalloc, a fast, arena-based memory allocator.
Generate code for lp64. With ilp32, int, long int and pointer are 32-bit; with lp64, int is 32-bit, but long int and pointer are 64-bit.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Produce debugging information.
Increases optimization levels: the higher the number, the more optimization is done. Higher levels of optimization may
require additional compilation time, in the hopes of reducing execution time. At -O, basic optimizations are performed,
such as constant merging and elimination of dead code. At -O2, additional optimizations are added, such as common
subexpression elimination and strict aliasing. At -O3, even more optimizations are performed, such as function inlining and
vectorization.
Many more details are available.
On aarch64 systems, mcpu sets the what kind of instructions can be used (as if by -march) and how to
tune for performance (as if by -mtune).
On x86 systems, mcpu is a deprecated synonym for mtune.
On SPARC systems, mcpu sets the available instruction set.
Tells the optimizer to unroll loops whose number of iterations can be determined at compile time or upon entry to the loop.
Enable Link Time Optimization When invoked with source code, it generates GIMPLE (one of GCC's internal representations) and writes it to special ELF sections in the object file. When the object files are linked together, all the function bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit.
IPA-CP calculates its own score of cloning profitability heuristics and performs those cloning opportunities with scores that exceed ipa-cp-eval-threshold.
Specifies maximal overall growth of the compilation unit caused by interprocedural constant propagation. For example, parameter value 10 limits unit growth to 1.1 times the original size.
Maximum depth of recursive cloning for self-recursive function.
-finline-functions-called-once, which is implied by -O1, considers all "static" functions called once for inlining into their caller even if they are not marked "inline". If a call to a given function is integrated, then the function is not output as assembler code in its own right.
-fno-inline-functions-called-once inhibits this optimization.
Enabled: Put all local arrays, even those of unknown size onto stack memory.
The -fno- form disables the behavior.
Specify the partitioning algorithm used by the link-time optimizer. The value is either 1to1 to specify a partitioning mirroring the original source files or balanced to specify partitioning into equally sized chunks (whenever possible) or max to create new partition for every symbol where possible. Specifying none as an algorithm disables partitioning and streaming completely. The default value is balanced. While 1to1 can be used as an workaround for various code ordering issues, the max partitioning is intended for internal testing only. The value one specifies that exactly one partition should be used while the value none bypasses partitioning and executes the link-time optimization step directly from the WPA phase.
Link with libjemalloc, a fast, arena-based memory allocator.
Generate code for lp64. With ilp32, int, long int and pointer are 32-bit; with lp64, int is 32-bit, but long int and pointer are 64-bit.
Sets the language dialect to include syntax from the C99 standard, such as bool and other features used in CPU 2017 benchmarks.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Instruments code to collect information for profile-driven feedback. Information is collected regarding both code paths and data values.
Applies information from a profile run in order to improve optimization. Several optimizations are improved when profile data is available, including branch probabilities, loop peeling, and loop unrolling.
Produce debugging information.
Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math, -fallow-store-data-races (as of GCC 10), and the Fortran-specific -fstack-arrays unless -fmax-stack-var-size is specified, and -fno-protect-parens.
On aarch64 systems, mcpu sets the what kind of instructions can be used (as if by -march) and how to
tune for performance (as if by -mtune).
On x86 systems, mcpu is a deprecated synonym for mtune.
On SPARC systems, mcpu sets the available instruction set.
Tells the optimizer to unroll loops whose number of iterations can be determined at compile time or upon entry to the loop.
Enable Link Time Optimization When invoked with source code, it generates GIMPLE (one of GCC's internal representations) and writes it to special ELF sections in the object file. When the object files are linked together, all the function bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit.
Specify growth that the early inliner can make. In effect it increases the amount of inlining for code having a large abstraction penalty.
When you use -finline-functions (included in -O3), a lot of functions that would otherwise not be considered for inlining by the compiler are investigated. To those functions, a different (more restrictive) limit compared to functions declared inline can be applied.
Specifies maximal overall growth of the compilation unit caused by inlining. For example, parameter value 20 limits unit growth to 1.2 times the original size. Cold functions (either marked cold via an attribute or by profile feedback) are not accounted into the unit size.
The language standards set aliasing requirements: programmers are expected to follow conventions so that the compiler can keep track of memory. If a program violates the requirements (for example, using pointer arithmetic), programs may crash, or (worse) wrong answers may be silently produced.
Unfortunately, the aliasing requirements from the standards are not always well understood.
Sometimes, the aliasing requirements are understood and nevertheless intentionally violated by smart programmers who know what they are doing, such as the programmer responsible for the inner workings of Perl storage allocation and variable handling.
The -fno-strict-aliasing switch instructs the optimizer that it must not assume that the aliasing requirements from the standard are met by the current program. You will probably need it for 500.perlbench_r and 600.perlbench_s. Note that this is an optimization switch, not a portability switch. When running SPECint2017_rate_base or SPECint2017_speed_base, you must use the same optimization switches for all the C modules in base; see https://www.spec.org/cpu2017/Docs/runrules.html#BaseFlags and https://www.spec.org/cpu2017/Docs/runrules.html#MustValidate.
The switch -funsafe-math-optimizations allows the compiler to make certain(*) aggressive assumptions, such as disregarding the programmer's intended order of operations. The run rules allow such re-ordering https://www.spec.org/cpu2017/Docs/runrules.html#reordering. The rules also point out that you must get answers that pass SPEC's validation requirements. In some cases, that will mean that some optimizations must be turned off.
-fno-unsafe-math-optimizations turns off these(*) optimizations. You may need to use this flag in order to get certain benchmarks to validate. Note that this is an optimization switch, not a portability switch. If it is needed, then in base you will need to use it consistently. See: https://www.spec.org/cpu2017/Docs/runrules.html#BaseFlags and https://www.spec.org/cpu2017/Docs/runrules.html#MustValidate.
(*) Much more detail about which optimizations is available.
ffinite-math-only, which is implied by -fast-math and -Ofast, allows optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs. Setting -fno-finite-math-only does the opposite: the compiler must prepare for the possible presence of NaNs and infinities.
Link with libjemalloc, a fast, arena-based memory allocator.
Generate code for lp64. With ilp32, int, long int and pointer are 32-bit; with lp64, int is 32-bit, but long int and pointer are 64-bit.
Sets the language dialect to include syntax from the C99 standard, such as bool and other features used in CPU 2017 benchmarks.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Instruments code to collect information for profile-driven feedback. Information is collected regarding both code paths and data values.
Applies information from a profile run in order to improve optimization. Several optimizations are improved when profile data is available, including branch probabilities, loop peeling, and loop unrolling.
Produce debugging information.
Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math, -fallow-store-data-races (as of GCC 10), and the Fortran-specific -fstack-arrays unless -fmax-stack-var-size is specified, and -fno-protect-parens.
On aarch64 systems, mcpu sets the what kind of instructions can be used (as if by -march) and how to
tune for performance (as if by -mtune).
On x86 systems, mcpu is a deprecated synonym for mtune.
On SPARC systems, mcpu sets the available instruction set.
Tells the optimizer to unroll loops whose number of iterations can be determined at compile time or upon entry to the loop.
Enable Link Time Optimization When invoked with source code, it generates GIMPLE (one of GCC's internal representations) and writes it to special ELF sections in the object file. When the object files are linked together, all the function bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit.
Specify growth that the early inliner can make. In effect it increases the amount of inlining for code having a large abstraction penalty.
When you use -finline-functions (included in -O3), a lot of functions that would otherwise not be considered for inlining by the compiler are investigated. To those functions, a different (more restrictive) limit compared to functions declared inline can be applied.
Specifies maximal overall growth of the compilation unit caused by inlining. For example, parameter value 20 limits unit growth to 1.2 times the original size. Cold functions (either marked cold via an attribute or by profile feedback) are not accounted into the unit size.
The language standards set aliasing requirements: programmers are expected to follow conventions so that the compiler can keep track of memory. If a program violates the requirements (for example, using pointer arithmetic), programs may crash, or (worse) wrong answers may be silently produced.
Unfortunately, the aliasing requirements from the standards are not always well understood.
Sometimes, the aliasing requirements are understood and nevertheless intentionally violated by smart programmers who know what they are doing, such as the programmer responsible for the inner workings of Perl storage allocation and variable handling.
The -fno-strict-aliasing switch instructs the optimizer that it must not assume that the aliasing requirements from the standard are met by the current program. You will probably need it for 500.perlbench_r and 600.perlbench_s. Note that this is an optimization switch, not a portability switch. When running SPECint2017_rate_base or SPECint2017_speed_base, you must use the same optimization switches for all the C modules in base; see https://www.spec.org/cpu2017/Docs/runrules.html#BaseFlags and https://www.spec.org/cpu2017/Docs/runrules.html#MustValidate.
Tells GCC to use the GNU semantics for "inline" functions, that is, the behavior prior to the C99 standard. This switch may resolve duplicate symbol errors, as noted in the 502.gcc_r benchmark description.
Link with libjemalloc, a fast, arena-based memory allocator.
Generate code for lp64. With ilp32, int, long int and pointer are 32-bit; with lp64, int is 32-bit, but long int and pointer are 64-bit.
Sets the language dialect to include syntax from the C99 standard, such as bool and other features used in CPU 2017 benchmarks.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Instruments code to collect information for profile-driven feedback. Information is collected regarding both code paths and data values.
Applies information from a profile run in order to improve optimization. Several optimizations are improved when profile data is available, including branch probabilities, loop peeling, and loop unrolling.
Produce debugging information.
Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math, -fallow-store-data-races (as of GCC 10), and the Fortran-specific -fstack-arrays unless -fmax-stack-var-size is specified, and -fno-protect-parens.
On aarch64 systems, mcpu sets the what kind of instructions can be used (as if by -march) and how to
tune for performance (as if by -mtune).
On x86 systems, mcpu is a deprecated synonym for mtune.
On SPARC systems, mcpu sets the available instruction set.
Tells the optimizer to unroll loops whose number of iterations can be determined at compile time or upon entry to the loop.
Enable Link Time Optimization When invoked with source code, it generates GIMPLE (one of GCC's internal representations) and writes it to special ELF sections in the object file. When the object files are linked together, all the function bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit.
Specify growth that the early inliner can make. In effect it increases the amount of inlining for code having a large abstraction penalty.
When you use -finline-functions (included in -O3), a lot of functions that would otherwise not be considered for inlining by the compiler are investigated. To those functions, a different (more restrictive) limit compared to functions declared inline can be applied.
Specifies maximal overall growth of the compilation unit caused by inlining. For example, parameter value 20 limits unit growth to 1.2 times the original size. Cold functions (either marked cold via an attribute or by profile feedback) are not accounted into the unit size.
The language standards set aliasing requirements: programmers are expected to follow conventions so that the compiler can keep track of memory. If a program violates the requirements (for example, using pointer arithmetic), programs may crash, or (worse) wrong answers may be silently produced.
Unfortunately, the aliasing requirements from the standards are not always well understood.
Sometimes, the aliasing requirements are understood and nevertheless intentionally violated by smart programmers who know what they are doing, such as the programmer responsible for the inner workings of Perl storage allocation and variable handling.
The -fno-strict-aliasing switch instructs the optimizer that it must not assume that the aliasing requirements from the standard are met by the current program. You will probably need it for 500.perlbench_r and 600.perlbench_s. Note that this is an optimization switch, not a portability switch. When running SPECint2017_rate_base or SPECint2017_speed_base, you must use the same optimization switches for all the C modules in base; see https://www.spec.org/cpu2017/Docs/runrules.html#BaseFlags and https://www.spec.org/cpu2017/Docs/runrules.html#MustValidate.
Link with libjemalloc, a fast, arena-based memory allocator.
Generate code for lp64. With ilp32, int, long int and pointer are 32-bit; with lp64, int is 32-bit, but long int and pointer are 64-bit.
Sets the language dialect to include syntax from the C99 standard, such as bool and other features used in CPU 2017 benchmarks.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Produce debugging information.
Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math, -fallow-store-data-races (as of GCC 10), and the Fortran-specific -fstack-arrays unless -fmax-stack-var-size is specified, and -fno-protect-parens.
On aarch64 systems, mcpu sets the what kind of instructions can be used (as if by -march) and how to
tune for performance (as if by -mtune).
On x86 systems, mcpu is a deprecated synonym for mtune.
On SPARC systems, mcpu sets the available instruction set.
Tells the optimizer to unroll loops whose number of iterations can be determined at compile time or upon entry to the loop.
Enable Link Time Optimization When invoked with source code, it generates GIMPLE (one of GCC's internal representations) and writes it to special ELF sections in the object file. When the object files are linked together, all the function bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit.
Specify growth that the early inliner can make. In effect it increases the amount of inlining for code having a large abstraction penalty.
When you use -finline-functions (included in -O3), a lot of functions that would otherwise not be considered for inlining by the compiler are investigated. To those functions, a different (more restrictive) limit compared to functions declared inline can be applied.
Specifies maximal overall growth of the compilation unit caused by inlining. For example, parameter value 20 limits unit growth to 1.2 times the original size. Cold functions (either marked cold via an attribute or by profile feedback) are not accounted into the unit size.
Link with libjemalloc, a fast, arena-based memory allocator.
Generate code for lp64. With ilp32, int, long int and pointer are 32-bit; with lp64, int is 32-bit, but long int and pointer are 64-bit.
Sets the language dialect to include syntax from the C99 standard, such as bool and other features used in CPU 2017 benchmarks.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Instruments code to collect information for profile-driven feedback. Information is collected regarding both code paths and data values.
Applies information from a profile run in order to improve optimization. Several optimizations are improved when profile data is available, including branch probabilities, loop peeling, and loop unrolling.
Produce debugging information.
Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math, -fallow-store-data-races (as of GCC 10), and the Fortran-specific -fstack-arrays unless -fmax-stack-var-size is specified, and -fno-protect-parens.
On aarch64 systems, mcpu sets the what kind of instructions can be used (as if by -march) and how to
tune for performance (as if by -mtune).
On x86 systems, mcpu is a deprecated synonym for mtune.
On SPARC systems, mcpu sets the available instruction set.
Tells the optimizer to unroll loops whose number of iterations can be determined at compile time or upon entry to the loop.
Enable Link Time Optimization When invoked with source code, it generates GIMPLE (one of GCC's internal representations) and writes it to special ELF sections in the object file. When the object files are linked together, all the function bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit.
Specify growth that the early inliner can make. In effect it increases the amount of inlining for code having a large abstraction penalty.
When you use -finline-functions (included in -O3), a lot of functions that would otherwise not be considered for inlining by the compiler are investigated. To those functions, a different (more restrictive) limit compared to functions declared inline can be applied.
Specifies maximal overall growth of the compilation unit caused by inlining. For example, parameter value 20 limits unit growth to 1.2 times the original size. Cold functions (either marked cold via an attribute or by profile feedback) are not accounted into the unit size.
Link with libjemalloc, a fast, arena-based memory allocator.
Generate code for lp64. With ilp32, int, long int and pointer are 32-bit; with lp64, int is 32-bit, but long int and pointer are 64-bit.
Sets the language dialect to include syntax from the 1998 ISO C++ standard plus the 2003 technical corrigendum.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Instruments code to collect information for profile-driven feedback. Information is collected regarding both code paths and data values.
Applies information from a profile run in order to improve optimization. Several optimizations are improved when profile data is available, including branch probabilities, loop peeling, and loop unrolling.
Produce debugging information.
Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math, -fallow-store-data-races (as of GCC 10), and the Fortran-specific -fstack-arrays unless -fmax-stack-var-size is specified, and -fno-protect-parens.
On aarch64 systems, mcpu sets the what kind of instructions can be used (as if by -march) and how to
tune for performance (as if by -mtune).
On x86 systems, mcpu is a deprecated synonym for mtune.
On SPARC systems, mcpu sets the available instruction set.
Tells the optimizer to unroll loops whose number of iterations can be determined at compile time or upon entry to the loop.
Enable Link Time Optimization When invoked with source code, it generates GIMPLE (one of GCC's internal representations) and writes it to special ELF sections in the object file. When the object files are linked together, all the function bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit.
Specify growth that the early inliner can make. In effect it increases the amount of inlining for code having a large abstraction penalty.
When you use -finline-functions (included in -O3), a lot of functions that would otherwise not be considered for inlining by the compiler are investigated. To those functions, a different (more restrictive) limit compared to functions declared inline can be applied.
Specifies maximal overall growth of the compilation unit caused by inlining. For example, parameter value 20 limits unit growth to 1.2 times the original size. Cold functions (either marked cold via an attribute or by profile feedback) are not accounted into the unit size.
Assume that a loop with an exit will eventually take the exit and not loop indefinitely. This allows the compiler to remove loops that otherwise have no side-effects, not considering eventual endless looping as such.
Link with libjemalloc, a fast, arena-based memory allocator.
Generate code for lp64. With ilp32, int, long int and pointer are 32-bit; with lp64, int is 32-bit, but long int and pointer are 64-bit.
Sets the language dialect to include syntax from the 1998 ISO C++ standard plus the 2003 technical corrigendum.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Add the specified path to the list of paths that the linker will search for archive libraries and control scripts.
Produce debugging information.
Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math, -fallow-store-data-races (as of GCC 10), and the Fortran-specific -fstack-arrays unless -fmax-stack-var-size is specified, and -fno-protect-parens.
On aarch64 systems, mcpu sets the what kind of instructions can be used (as if by -march) and how to
tune for performance (as if by -mtune).
On x86 systems, mcpu is a deprecated synonym for mtune.
On SPARC systems, mcpu sets the available instruction set.
Tells the optimizer to unroll loops whose number of iterations can be determined at compile time or upon entry to the loop.
Enable Link Time Optimization When invoked with source code, it generates GIMPLE (one of GCC's internal representations) and writes it to special ELF sections in the object file. When the object files are linked together, all the function bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit.
Specify growth that the early inliner can make. In effect it increases the amount of inlining for code having a large abstraction penalty.
When you use -finline-functions (included in -O3), a lot of functions that would otherwise not be considered for inlining by the compiler are investigated. To those functions, a different (more restrictive) limit compared to functions declared inline can be applied.
Specifies maximal overall growth of the compilation unit caused by inlining. For example, parameter value 20 limits unit growth to 1.2 times the original size. Cold functions (either marked cold via an attribute or by profile feedback) are not accounted into the unit size.
Assume that a loop with an exit will eventually take the exit and not loop indefinitely. This allows the compiler to remove loops that otherwise have no side-effects, not considering eventual endless looping as such.
Link with libjemalloc, a fast, arena-based memory allocator.
Place uninitialized global variables in a common block. This allows the linker to resolve all tentative definitions of the same variable in different compilation units to the same object. See also https://gcc.gnu.org/gcc-10/porting_to.html.
Write a linker map to the named file.
Write a linker map to the named file.
Write a linker map to the named file.
Write a linker map to the named file.
Place uninitialized global variables in a common block. This allows the linker to resolve all tentative definitions of the same variable in different compilation units to the same object. See also https://gcc.gnu.org/gcc-10/porting_to.html.
Write a linker map to the named file.
Inhibit all warning messages.
Write a linker map to the named file.
Write a linker map to the named file.
Write a linker map to the named file.
SPECrate runs might use one of these methods to bind processes to specific processors, depending on the config file.
Linux systems: the numactl command is commonly used. Here is a brief guide to understanding the specific command which will be found in the config file:
Solaris systems: The pbind command is commonly used, via
submit=echo 'pbind -b...' > dobmk; sh dobmk
The specific command may be found in the config file; here is a brief guide to understanding that command:
pbind -b causes this copy's processes to be bound to the CPU specified by the expression that follows it. See the config file used in the run for the exact syntax, which tends to be cumbersome because of the need to carefully quote parts of the expression. When all expressions are evaluated, the jobs are typically distributed evenly across the system, with each chip running the same number of jobs as all other chips, and each core running the same number of jobs as all other cores.
The pbind expression may include various elements from the SPEC toolset and from standard Unix commands, such as:
No special commands are needed for feedback-directed optimization, other than the compiler profile flags.
One or more of the following may have been used in the run. If so, it will be listed in the notes sections. Here is a brief guide to understanding them:
LD_LIBRARY_PATH=<directories> (set via config file preENV)
LD_LIBRARY_PATH controls the search order for libraries. Often, it can be defaulted. Sometimes, it is
explicitly set (as documented in the notes in the submission), in order to ensure that the correct versions of
libraries are picked up.
OMP_STACKSIZE=N (set via config file preENV)
Set the stack size for subordinate threads.
ulimit -s N
ulimit -s unlimited
'ulimit' is a Unix commands, entered prior to the run. It sets the stack size for the main process, either
to N kbytes or to no limit.
Note: This page provides definitions for a variety of possible settings. Please see the SPEC CPU result page to find out what settings were actually used.
Many of the settings below are defined in more detail at
https://www.kernel.org/doc/Documentation/sysctl/vm.txt
https://www.kernel.org/doc/Documentation/sysctl/kernel.txt
https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt
cpupower-frequency-set Adjust the MHz for the CPUs on the system,
set limits for them, or select a "scaling governor". For example,
cpupower-frequency-set -g performance selects higher frequency
at the cost of additional power usage;
cpupower-frequency-set -g powersave does the opposite.
dirty_ratio: Sets the threshold at which processes will begin writing dirty (modified)
pages to disk. The dirty_ratio is expressed as a percentage of total available memory.
For example, this command sets the threshold to 8%
echo 8 > /proc/sys/vm/dirty_ratio
drop_caches: Reduces the size of the page cache and kernel slab objects
Example to free both:
echo 3 > /proc/sys/vm/drop_caches
numa_balancing: Automatically move memory to nodes that are accessing it.
This is done by un-mapping and re-mapping pages, which may incur unwanted overhead if proceses
are already bound to the desired memory nodes.
For example, to disable numa balancing, one could use:
echo 0 > /proc/sys/kernel/numa_balancing
numactl Controls NUMA policy for individual processes. There are many options, as defined at https://man7.org/linux/man-pages/man8/numactl.8.html. Options useful for workloads similar to SPEC CPU may include:
Note that the SPEC CPU config file may use config file preprocessing and/or
shell mathematics to compute the desired memory location or desired CPU number.
For example, these commands pick a memory unit by dividing the copy number by the number of
CPUs per node:
%define numasize 20 numactl --membind=`expr $SPECCOPYNUM / %{numasize}` --physcpubind=$SPECCOPYNUM
swappiness Controls how aggressively the kernel swaps memory pages.
The values can range from 0 to 100. Low values decrease the amount of swapping.
For example, this command indicates that swapping should occur only when essential:
echo 1 > /proc/sys/vm/swappiness
transparent_hugepage Transparent huge pages may provide a performance benefit by reducing kernel time spent looking up page locations. The potential benefit is application dependent, and some applications may do better with smaller pages.
tuned-adm Controls tuned, the dynamic adaptive system tuning daemon. Commonly, one may load a tuning profile, for example:
It is also possible to disable all profiles, using:
tuned-adm off
ulimit -s [n | unlimited]: Allow the stack size to grow to n kbytes, or unlimited to impose no limit.
zone_reclaim_mode Provides control over memory allocation when multiple NUMA nodes are active. If zone reclaim is off, data files may be cached on any node. There are three settings that can be ORed together:
For example, this command enables reclaiming:
echo 1 > /proc/sys/vm/swappiness
Dividing the chip into separate nodes (hemisphere or quadrant) may improve latency to the last level cache and main memory, which may benefit overall performance for NUMA-aware operating systems and workloads.
The jemalloc memory allocation library can speed up memory allocation, in part by keeping lists of commonly used sizes. The library includes various configuration options, which are documented at http://jemalloc.net/jemalloc.3.html and in its file INSTALL.md as found in the distribution tar file, and as posted at https://github.com/jemalloc/jemalloc/blob/master/INSTALL.md
Some of the useful options include:
Example configuration:
$ wget https://github.com/jemalloc/jemalloc/releases/download/5.2.1/jemalloc-5.2.1.tar.bz2 $ bzip2 -dc jemalloc-5.2.1.tar.bz2 | tar -xf - $ cd jemalloc-5.2.1/ $ ./configure --prefix=/usr/local/jemalloc-521 $ make -j30 $ sudo make install
Flag description origin markings:
For questions about the meanings of these flags, please contact the tester.
For other inquiries, please contact info@spec.org
Copyright 2017-2021 Standard Performance Evaluation Corporation
Tested with SPEC CPU2017 v1.1.5.
Report generated on 2021-09-17 12:25:16 by SPEC CPU2017 flags formatter v5178.