Copyright © Intel Corporation. All Rights Reserved.
Invoke the Intel oneAPI DPC++/C++ compiler and runtime environment.
The Intel® oneAPI DPC++/C++ Compiler can be found in the Intel® oneAPI Base Toolkit, Intel® oneAPI HPC Toolkit, Intel® oneAPI IoT Toolkit, or as a standalone compiler. More information and specifications can be found on the Intel® oneAPI DPC++/C++ Compiler main page.
Invoke the Intel oneAPI DPC++/C++ compiler and runtime environment.
The Intel® oneAPI DPC++/C++ Compiler can be found in the Intel® oneAPI Base Toolkit, Intel® oneAPI HPC Toolkit, Intel® oneAPI IoT Toolkit, or as a standalone compiler. More information and specifications can be found on the Intel® oneAPI DPC++/C++ Compiler main page.
Invoke the Intel Fortran Compiler (Beta), it a new compiler based on the Intel Fortran Compiler Classic (ifort) frontend and runtime libraries, using LLVM backend technology.
ifx does not support 32-bit target.
The Intel® Fortran Compiler (Beta) (ifx) can be found in the Intel® oneAPI Base Toolkit, Intel® oneAPI HPC Toolkit, Intel® oneAPI IoT Toolkit, or as a standalone compiler. For more information, see Introducing the Intel® Fortran Compiler Classic and Intel® Fortran Compiler (Beta).
Invoke the Intel oneAPI DPC++/C++ compiler and runtime environment.
The Intel® oneAPI DPC++/C++ Compiler can be found in the Intel® oneAPI Base Toolkit, Intel® oneAPI HPC Toolkit, Intel® oneAPI IoT Toolkit, or as a standalone compiler. More information and specifications can be found on the Intel® oneAPI DPC++/C++ Compiler main page.
Invoke the Intel oneAPI DPC++/C++ compiler and runtime environment.
The Intel® oneAPI DPC++/C++ Compiler can be found in the Intel® oneAPI Base Toolkit, Intel® oneAPI HPC Toolkit, Intel® oneAPI IoT Toolkit, or as a standalone compiler. More information and specifications can be found on the Intel® oneAPI DPC++/C++ Compiler main page.
Invoke the Intel Fortran Compiler (Beta), it a new compiler based on the Intel Fortran Compiler Classic (ifort) frontend and runtime libraries, using LLVM backend technology.
ifx does not support 32-bit target.
The Intel® Fortran Compiler (Beta) (ifx) can be found in the Intel® oneAPI Base Toolkit, Intel® oneAPI HPC Toolkit, Intel® oneAPI IoT Toolkit, or as a standalone compiler. For more information, see Introducing the Intel® Fortran Compiler Classic and Intel® Fortran Compiler (Beta).
specify source files are in free format. Same as -FR. -nofree indicates fixed format
-mcmodel=<size>
use a specific memory model to generate code and store data
small - Restricts code and data to the first 2GB of address space (DEFAULT)
medium - Restricts code to the first 2GB; it places no memory restriction on data
large - Places no memory restriction on code or data
-mcmodel=<size>
use a specific memory model to generate code and store data
small - Restricts code and data to the first 2GB of address space (DEFAULT)
medium - Restricts code to the first 2GB; it places no memory restriction on data
large - Places no memory restriction on code or data
enable language support for
c99 enable C99 support for C programs
c++11 enable C++11 experimental support for C++ programs
c++0x same as c++11
specify source files are in free format. Same as -FR. -nofree indicates fixed format
-mcmodel=<size>
use a specific memory model to generate code and store data
small - Restricts code and data to the first 2GB of address space (DEFAULT)
medium - Restricts code to the first 2GB; it places no memory restriction on data
large - Places no memory restriction on code or data
-mcmodel=<size>
use a specific memory model to generate code and store data
small - Restricts code and data to the first 2GB of address space (DEFAULT)
medium - Restricts code to the first 2GB; it places no memory restriction on data
large - Places no memory restriction on code or data
enable language support for
c99 enable C99 support for C programs
c++11 enable C++11 experimental support for C++ programs
c++0x same as c++11
Sets certain aggressive options to improve the speed of your application.
Enables recognition of OpenMP* features and tells the parallelizer to generate multi-threaded code based on OpenMP* directives.
amberlake
broadwell
cannonlake
cascadelake
coffeelake
goldmont
goldmont-plus
haswell
icelake-client (or icelake)
icelake-server
ivybridge
kabylake
knl
knm
sandybridge
silvermont
skylake
skylake-avx512
tremont
whiskeylake
core-avx2 - Generates code for processors that support Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® AVX, SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
core-avx-i - Generates code for processors that support Float-16 conversion instructions and the RDRND instruction, Intel® Advanced Vector Extensions (Intel® AVX), Intel® SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
corei7-avx - Generates code for processors that support Intel® Advanced Vector Extensions (Intel® AVX), Intel® SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
corei7 - Generates code for processors that support Intel® SSE4 Efficient Accelerated String and Text Processing instructions. May also generate code for Intel® SSE4 Vectorizing Compiler and Media Accelerator, Intel® SSE3, SSE2, SSE, and SSSE3 instructions.
atom - Generates code for processors that support MOVBE instructions. May also generate code for SSSE3 instructions and Intel® SSE3, SSE2, and SSE instructions.
core2 - Generates code for the Intel® Core™2 processor family.
pentium4m - Generates for Intel® Pentium® 4 processors with MMX technology.
pentium-m - Generates code for Intel® Pentium® processors. Value pentium3 is only available on Linux* systems.
pentium4
pentium3
pentium
Determines whether the compiler generates fused multiply-add (FMA) instructions if such instructions exist on the target processor. This option determines whether the compiler generates fused multiply-add (FMA) instructions if such instructions exist on the target processor. When the [Q]fma option is specified, the compiler may generate FMA instructions for combining multiply and add operations. When the negative form of the [Q]fma option is specified, the compiler must generate separate multiply and add instructions with intermediate rounding. This option has no effect unless setting CORE-AVX2 or higher is specified for option [Q]x,-march (Linux and macOS*), or /arch (Windows).
-ipo[n]
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
(n - number of multi-file objects)
Enable/disable(DEFAULT) use of ANSI aliasing rules in optimizations; user asserts that the program adheres to these rules.
enable
[no-]except - enable/disable floating point semantics
fast[=1|2] - enables more aggressive floating point optimizations
precise - allows value-safe optimizations
source - enables intermediates in source precision
strict - enables -fp-model precise -fp-model except, disables
contractions and enables pragma stdc fenv_access
double - rounds intermediates in 53-bit (double) precision
extended - rounds intermediates in 64-bit (extended) precision
Enables or disables the optimization for multiple adjacent gather/scatter type vector memory references. This content is specific to C++; it does not apply to DPC++. This option controls the optimization for multiple adjacent gather/scatter type vector memory references. This optimization hint is useful for performance tuning. It tries to generate more optimal software sequences using shuffles. If you specify this option, the compiler will apply the optimization heuristics. If you specify -qno-opt-multiple-gather-scatter-by-shuffles or /Qopt-multiple-gather-scatter-by-shuffles-, the compiler will not apply the optimization.
-qopt-zmm-usage=keywoard Specifies the level of zmm register usage. You can specify one of the following:
low - Tells the compiler that the compiled program is unlikely to benefit from zmm register usage. It specifies that the compiler should avoid using zmm register unless it can prove the gain from their usage.
high - Tells the compiler to generate zmm code without restrictions
Allow aggressive, lossy floating-point optimizations.
Enable optimizations based on the strict definition of an enum's value range.
Enable optimizations based on the strict rules for overwriting polymorphic C++ objects.
Enables dead virtual function elimination optimization. Requires -flto=full.
Sets certain aggressive options to improve the speed of your application.
Enables recognition of OpenMP* features and tells the parallelizer to generate multi-threaded code based on OpenMP* directives.
amberlake
broadwell
cannonlake
cascadelake
coffeelake
goldmont
goldmont-plus
haswell
icelake-client (or icelake)
icelake-server
ivybridge
kabylake
knl
knm
sandybridge
silvermont
skylake
skylake-avx512
tremont
whiskeylake
core-avx2 - Generates code for processors that support Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® AVX, SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
core-avx-i - Generates code for processors that support Float-16 conversion instructions and the RDRND instruction, Intel® Advanced Vector Extensions (Intel® AVX), Intel® SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
corei7-avx - Generates code for processors that support Intel® Advanced Vector Extensions (Intel® AVX), Intel® SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
corei7 - Generates code for processors that support Intel® SSE4 Efficient Accelerated String and Text Processing instructions. May also generate code for Intel® SSE4 Vectorizing Compiler and Media Accelerator, Intel® SSE3, SSE2, SSE, and SSSE3 instructions.
atom - Generates code for processors that support MOVBE instructions. May also generate code for SSSE3 instructions and Intel® SSE3, SSE2, and SSE instructions.
core2 - Generates code for the Intel® Core™2 processor family.
pentium4m - Generates for Intel® Pentium® 4 processors with MMX technology.
pentium-m - Generates code for Intel® Pentium® processors. Value pentium3 is only available on Linux* systems.
pentium4
pentium3
pentium
Determines whether the compiler generates fused multiply-add (FMA) instructions if such instructions exist on the target processor. This option determines whether the compiler generates fused multiply-add (FMA) instructions if such instructions exist on the target processor. When the [Q]fma option is specified, the compiler may generate FMA instructions for combining multiply and add operations. When the negative form of the [Q]fma option is specified, the compiler must generate separate multiply and add instructions with intermediate rounding. This option has no effect unless setting CORE-AVX2 or higher is specified for option [Q]x,-march (Linux and macOS*), or /arch (Windows).
-ipo[n]
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
(n - number of multi-file objects)
Enable/disable(DEFAULT) use of ANSI aliasing rules in optimizations; user asserts that the program adheres to these rules.
enable
[no-]except - enable/disable floating point semantics
fast[=1|2] - enables more aggressive floating point optimizations
precise - allows value-safe optimizations
source - enables intermediates in source precision
strict - enables -fp-model precise -fp-model except, disables
contractions and enables pragma stdc fenv_access
double - rounds intermediates in 53-bit (double) precision
extended - rounds intermediates in 64-bit (extended) precision
Enables or disables the optimization for multiple adjacent gather/scatter type vector memory references. This content is specific to C++; it does not apply to DPC++. This option controls the optimization for multiple adjacent gather/scatter type vector memory references. This optimization hint is useful for performance tuning. It tries to generate more optimal software sequences using shuffles. If you specify this option, the compiler will apply the optimization heuristics. If you specify -qno-opt-multiple-gather-scatter-by-shuffles or /Qopt-multiple-gather-scatter-by-shuffles-, the compiler will not apply the optimization.
-qopt-zmm-usage=keywoard Specifies the level of zmm register usage. You can specify one of the following:
low - Tells the compiler that the compiled program is unlikely to benefit from zmm register usage. It specifies that the compiler should avoid using zmm register unless it can prove the gain from their usage.
high - Tells the compiler to generate zmm code without restrictions
Allow aggressive, lossy floating-point optimizations.
Enable optimizations based on the strict definition of an enum's value range.
Enable optimizations based on the strict rules for overwriting polymorphic C++ objects.
Sets certain aggressive options to improve the speed of your application.
Enables recognition of OpenMP* features and tells the parallelizer to generate multi-threaded code based on OpenMP* directives.
amberlake
broadwell
cannonlake
cascadelake
coffeelake
goldmont
goldmont-plus
haswell
icelake-client (or icelake)
icelake-server
ivybridge
kabylake
knl
knm
sandybridge
silvermont
skylake
skylake-avx512
tremont
whiskeylake
core-avx2 - Generates code for processors that support Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® AVX, SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
core-avx-i - Generates code for processors that support Float-16 conversion instructions and the RDRND instruction, Intel® Advanced Vector Extensions (Intel® AVX), Intel® SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
corei7-avx - Generates code for processors that support Intel® Advanced Vector Extensions (Intel® AVX), Intel® SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
corei7 - Generates code for processors that support Intel® SSE4 Efficient Accelerated String and Text Processing instructions. May also generate code for Intel® SSE4 Vectorizing Compiler and Media Accelerator, Intel® SSE3, SSE2, SSE, and SSSE3 instructions.
atom - Generates code for processors that support MOVBE instructions. May also generate code for SSSE3 instructions and Intel® SSE3, SSE2, and SSE instructions.
core2 - Generates code for the Intel® Core™2 processor family.
pentium4m - Generates for Intel® Pentium® 4 processors with MMX technology.
pentium-m - Generates code for Intel® Pentium® processors. Value pentium3 is only available on Linux* systems.
pentium4
pentium3
pentium
Determines whether the compiler generates fused multiply-add (FMA) instructions if such instructions exist on the target processor. This option determines whether the compiler generates fused multiply-add (FMA) instructions if such instructions exist on the target processor. When the [Q]fma option is specified, the compiler may generate FMA instructions for combining multiply and add operations. When the negative form of the [Q]fma option is specified, the compiler must generate separate multiply and add instructions with intermediate rounding. This option has no effect unless setting CORE-AVX2 or higher is specified for option [Q]x,-march (Linux and macOS*), or /arch (Windows).
-ipo[n]
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
(n - number of multi-file objects)
Enable/disable(DEFAULT) use of ANSI aliasing rules in optimizations; user asserts that the program adheres to these rules.
enable
[no-]except - enable/disable floating point semantics
fast[=1|2] - enables more aggressive floating point optimizations
precise - allows value-safe optimizations
source - enables intermediates in source precision
strict - enables -fp-model precise -fp-model except, disables
contractions and enables pragma stdc fenv_access
double - rounds intermediates in 53-bit (double) precision
extended - rounds intermediates in 64-bit (extended) precision
Enables or disables the optimization for multiple adjacent gather/scatter type vector memory references. This content is specific to C++; it does not apply to DPC++. This option controls the optimization for multiple adjacent gather/scatter type vector memory references. This optimization hint is useful for performance tuning. It tries to generate more optimal software sequences using shuffles. If you specify this option, the compiler will apply the optimization heuristics. If you specify -qno-opt-multiple-gather-scatter-by-shuffles or /Qopt-multiple-gather-scatter-by-shuffles-, the compiler will not apply the optimization.
-qopt-zmm-usage=keywoard Specifies the level of zmm register usage. You can specify one of the following:
low - Tells the compiler that the compiled program is unlikely to benefit from zmm register usage. It specifies that the compiler should avoid using zmm register unless it can prove the gain from their usage.
high - Tells the compiler to generate zmm code without restrictions
specify how data items are aligned
keywords: all (same as -align), none (same as -noalign),
[no]commons, [no]dcommons,
[no]qcommons, [no]zcommons,
rec1byte, rec2byte, rec4byte,
rec8byte, rec16byte, rec32byte,
array8byte, array16byte, array32byte,
array64byte, array128byte, array256byte,
[no]records, [no]sequence
Allow optimizations for floating point arithmetic that assume arguments and results are not NaNs or Infinities.
Determines whether EBP is used as a general-purpose register in optimizations.
Tells the compiler to generate code for Intel® 64 architecture.
-ipo[n]
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
(n - number of multi-file objects)
Determines whether the compiler optimizes tail recursive calls. This feature is only available for ifort. This option determines whether the compiler optimizes tail recursive calls. It enables conversion of tail recursion into loops.
Enables or disables vectorization. To disable vectorization, specify -no-vec (Linux* and macOS) or /Qvec- (Windows*). To disable interpretation of SIMD directives, specify -no-simd (Linux* and macOS) or /Qsimd- (Windows*). To disable all compiler vectorization, use the "-no-vec -no-simd" (Linux* and macOS) or "/Qvec- /Qsimd-" (Windows*) compiler options. The option -no-vec (and /Qvec-) disables all auto-vectorization, including vectorization of array notation statements. The option -no-simd (and /Qsimd-) disables vectorization of loops that have SIMD directives.
Sets certain aggressive options to improve the speed of your application.
Enables recognition of OpenMP* features and tells the parallelizer to generate multi-threaded code based on OpenMP* directives.
amberlake
broadwell
cannonlake
cascadelake
coffeelake
goldmont
goldmont-plus
haswell
icelake-client (or icelake)
icelake-server
ivybridge
kabylake
knl
knm
sandybridge
silvermont
skylake
skylake-avx512
tremont
whiskeylake
core-avx2 - Generates code for processors that support Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® AVX, SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
core-avx-i - Generates code for processors that support Float-16 conversion instructions and the RDRND instruction, Intel® Advanced Vector Extensions (Intel® AVX), Intel® SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
corei7-avx - Generates code for processors that support Intel® Advanced Vector Extensions (Intel® AVX), Intel® SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
corei7 - Generates code for processors that support Intel® SSE4 Efficient Accelerated String and Text Processing instructions. May also generate code for Intel® SSE4 Vectorizing Compiler and Media Accelerator, Intel® SSE3, SSE2, SSE, and SSSE3 instructions.
atom - Generates code for processors that support MOVBE instructions. May also generate code for SSSE3 instructions and Intel® SSE3, SSE2, and SSE instructions.
core2 - Generates code for the Intel® Core™2 processor family.
pentium4m - Generates for Intel® Pentium® 4 processors with MMX technology.
pentium-m - Generates code for Intel® Pentium® processors. Value pentium3 is only available on Linux* systems.
pentium4
pentium3
pentium
Determines whether the compiler generates fused multiply-add (FMA) instructions if such instructions exist on the target processor. This option determines whether the compiler generates fused multiply-add (FMA) instructions if such instructions exist on the target processor. When the [Q]fma option is specified, the compiler may generate FMA instructions for combining multiply and add operations. When the negative form of the [Q]fma option is specified, the compiler must generate separate multiply and add instructions with intermediate rounding. This option has no effect unless setting CORE-AVX2 or higher is specified for option [Q]x,-march (Linux and macOS*), or /arch (Windows).
-ipo[n]
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
(n - number of multi-file objects)
Enable/disable(DEFAULT) use of ANSI aliasing rules in optimizations; user asserts that the program adheres to these rules.
enable
[no-]except - enable/disable floating point semantics
fast[=1|2] - enables more aggressive floating point optimizations
precise - allows value-safe optimizations
source - enables intermediates in source precision
strict - enables -fp-model precise -fp-model except, disables
contractions and enables pragma stdc fenv_access
double - rounds intermediates in 53-bit (double) precision
extended - rounds intermediates in 64-bit (extended) precision
Enables or disables the optimization for multiple adjacent gather/scatter type vector memory references. This content is specific to C++; it does not apply to DPC++. This option controls the optimization for multiple adjacent gather/scatter type vector memory references. This optimization hint is useful for performance tuning. It tries to generate more optimal software sequences using shuffles. If you specify this option, the compiler will apply the optimization heuristics. If you specify -qno-opt-multiple-gather-scatter-by-shuffles or /Qopt-multiple-gather-scatter-by-shuffles-, the compiler will not apply the optimization.
-qopt-zmm-usage=keywoard Specifies the level of zmm register usage. You can specify one of the following:
low - Tells the compiler that the compiled program is unlikely to benefit from zmm register usage. It specifies that the compiler should avoid using zmm register unless it can prove the gain from their usage.
high - Tells the compiler to generate zmm code without restrictions
Allow aggressive, lossy floating-point optimizations.
Enable optimizations based on the strict definition of an enum's value range.
Enable optimizations based on the strict rules for overwriting polymorphic C++ objects.
Enables dead virtual function elimination optimization. Requires -flto=full.
Sets certain aggressive options to improve the speed of your application.
Enables recognition of OpenMP* features and tells the parallelizer to generate multi-threaded code based on OpenMP* directives.
amberlake
broadwell
cannonlake
cascadelake
coffeelake
goldmont
goldmont-plus
haswell
icelake-client (or icelake)
icelake-server
ivybridge
kabylake
knl
knm
sandybridge
silvermont
skylake
skylake-avx512
tremont
whiskeylake
core-avx2 - Generates code for processors that support Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® AVX, SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
core-avx-i - Generates code for processors that support Float-16 conversion instructions and the RDRND instruction, Intel® Advanced Vector Extensions (Intel® AVX), Intel® SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
corei7-avx - Generates code for processors that support Intel® Advanced Vector Extensions (Intel® AVX), Intel® SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
corei7 - Generates code for processors that support Intel® SSE4 Efficient Accelerated String and Text Processing instructions. May also generate code for Intel® SSE4 Vectorizing Compiler and Media Accelerator, Intel® SSE3, SSE2, SSE, and SSSE3 instructions.
atom - Generates code for processors that support MOVBE instructions. May also generate code for SSSE3 instructions and Intel® SSE3, SSE2, and SSE instructions.
core2 - Generates code for the Intel® Core™2 processor family.
pentium4m - Generates for Intel® Pentium® 4 processors with MMX technology.
pentium-m - Generates code for Intel® Pentium® processors. Value pentium3 is only available on Linux* systems.
pentium4
pentium3
pentium
Determines whether the compiler generates fused multiply-add (FMA) instructions if such instructions exist on the target processor. This option determines whether the compiler generates fused multiply-add (FMA) instructions if such instructions exist on the target processor. When the [Q]fma option is specified, the compiler may generate FMA instructions for combining multiply and add operations. When the negative form of the [Q]fma option is specified, the compiler must generate separate multiply and add instructions with intermediate rounding. This option has no effect unless setting CORE-AVX2 or higher is specified for option [Q]x,-march (Linux and macOS*), or /arch (Windows).
-ipo[n]
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
(n - number of multi-file objects)
Enable/disable(DEFAULT) use of ANSI aliasing rules in optimizations; user asserts that the program adheres to these rules.
enable
[no-]except - enable/disable floating point semantics
fast[=1|2] - enables more aggressive floating point optimizations
precise - allows value-safe optimizations
source - enables intermediates in source precision
strict - enables -fp-model precise -fp-model except, disables
contractions and enables pragma stdc fenv_access
double - rounds intermediates in 53-bit (double) precision
extended - rounds intermediates in 64-bit (extended) precision
Enables or disables the optimization for multiple adjacent gather/scatter type vector memory references. This content is specific to C++; it does not apply to DPC++. This option controls the optimization for multiple adjacent gather/scatter type vector memory references. This optimization hint is useful for performance tuning. It tries to generate more optimal software sequences using shuffles. If you specify this option, the compiler will apply the optimization heuristics. If you specify -qno-opt-multiple-gather-scatter-by-shuffles or /Qopt-multiple-gather-scatter-by-shuffles-, the compiler will not apply the optimization.
-qopt-zmm-usage=keywoard Specifies the level of zmm register usage. You can specify one of the following:
low - Tells the compiler that the compiled program is unlikely to benefit from zmm register usage. It specifies that the compiler should avoid using zmm register unless it can prove the gain from their usage.
high - Tells the compiler to generate zmm code without restrictions
Allow aggressive, lossy floating-point optimizations.
Enable optimizations based on the strict definition of an enum's value range.
Enable optimizations based on the strict rules for overwriting polymorphic C++ objects.
Sets certain aggressive options to improve the speed of your application.
Enables recognition of OpenMP* features and tells the parallelizer to generate multi-threaded code based on OpenMP* directives.
amberlake
broadwell
cannonlake
cascadelake
coffeelake
goldmont
goldmont-plus
haswell
icelake-client (or icelake)
icelake-server
ivybridge
kabylake
knl
knm
sandybridge
silvermont
skylake
skylake-avx512
tremont
whiskeylake
core-avx2 - Generates code for processors that support Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® AVX, SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
core-avx-i - Generates code for processors that support Float-16 conversion instructions and the RDRND instruction, Intel® Advanced Vector Extensions (Intel® AVX), Intel® SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
corei7-avx - Generates code for processors that support Intel® Advanced Vector Extensions (Intel® AVX), Intel® SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions.
corei7 - Generates code for processors that support Intel® SSE4 Efficient Accelerated String and Text Processing instructions. May also generate code for Intel® SSE4 Vectorizing Compiler and Media Accelerator, Intel® SSE3, SSE2, SSE, and SSSE3 instructions.
atom - Generates code for processors that support MOVBE instructions. May also generate code for SSSE3 instructions and Intel® SSE3, SSE2, and SSE instructions.
core2 - Generates code for the Intel® Core™2 processor family.
pentium4m - Generates for Intel® Pentium® 4 processors with MMX technology.
pentium-m - Generates code for Intel® Pentium® processors. Value pentium3 is only available on Linux* systems.
pentium4
pentium3
pentium
Determines whether the compiler generates fused multiply-add (FMA) instructions if such instructions exist on the target processor. This option determines whether the compiler generates fused multiply-add (FMA) instructions if such instructions exist on the target processor. When the [Q]fma option is specified, the compiler may generate FMA instructions for combining multiply and add operations. When the negative form of the [Q]fma option is specified, the compiler must generate separate multiply and add instructions with intermediate rounding. This option has no effect unless setting CORE-AVX2 or higher is specified for option [Q]x,-march (Linux and macOS*), or /arch (Windows).
-ipo[n]
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
(n - number of multi-file objects)
Enable/disable(DEFAULT) use of ANSI aliasing rules in optimizations; user asserts that the program adheres to these rules.
enable
[no-]except - enable/disable floating point semantics
fast[=1|2] - enables more aggressive floating point optimizations
precise - allows value-safe optimizations
source - enables intermediates in source precision
strict - enables -fp-model precise -fp-model except, disables
contractions and enables pragma stdc fenv_access
double - rounds intermediates in 53-bit (double) precision
extended - rounds intermediates in 64-bit (extended) precision
Enables or disables the optimization for multiple adjacent gather/scatter type vector memory references. This content is specific to C++; it does not apply to DPC++. This option controls the optimization for multiple adjacent gather/scatter type vector memory references. This optimization hint is useful for performance tuning. It tries to generate more optimal software sequences using shuffles. If you specify this option, the compiler will apply the optimization heuristics. If you specify -qno-opt-multiple-gather-scatter-by-shuffles or /Qopt-multiple-gather-scatter-by-shuffles-, the compiler will not apply the optimization.
-qopt-zmm-usage=keywoard Specifies the level of zmm register usage. You can specify one of the following:
low - Tells the compiler that the compiled program is unlikely to benefit from zmm register usage. It specifies that the compiler should avoid using zmm register unless it can prove the gain from their usage.
high - Tells the compiler to generate zmm code without restrictions
specify how data items are aligned
keywords: all (same as -align), none (same as -noalign),
[no]commons, [no]dcommons,
[no]qcommons, [no]zcommons,
rec1byte, rec2byte, rec4byte,
rec8byte, rec16byte, rec32byte,
array8byte, array16byte, array32byte,
array64byte, array128byte, array256byte,
[no]records, [no]sequence
Allow optimizations for floating point arithmetic that assume arguments and results are not NaNs or Infinities.
Determines whether EBP is used as a general-purpose register in optimizations.
Tells the compiler to generate code for Intel® 64 architecture.
-ipo[n]
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
(n - number of multi-file objects)
Determines whether the compiler optimizes tail recursive calls. This feature is only available for ifort. This option determines whether the compiler optimizes tail recursive calls. It enables conversion of tail recursion into loops.
Enables or disables vectorization. To disable vectorization, specify -no-vec (Linux* and macOS) or /Qvec- (Windows*). To disable interpretation of SIMD directives, specify -no-simd (Linux* and macOS) or /Qsimd- (Windows*). To disable all compiler vectorization, use the "-no-vec -no-simd" (Linux* and macOS) or "/Qvec- /Qsimd-" (Windows*) compiler options. The option -no-vec (and /Qvec-) disables all auto-vectorization, including vectorization of array notation statements. The option -no-simd (and /Qsimd-) disables vectorization of loops that have SIMD directives.
KMP_AFFINITY
The KMP_AFFINITY environment variable uses the following general syntax:
Syntax |
---|
KMP_AFFINITY=[<modifier>,...]<type>[,<permute>][,<offset>] |
For example, to list a machine topology map, specify KMP_AFFINITY=verbose,none to use a modifier of verbose and a type of none.
The following table describes the supported specific arguments.
Argument |
Default |
Description |
---|---|---|
noverbose respect granularity=core |
Optional. String consisting of keyword and specifier.
|
|
none |
Required string. Indicates the thread affinity to use.
The logical and physical types are deprecated but supported for backward compatibility. |
|
0 |
Optional. Positive integer value. Not valid with type values of explicit, none, or disabled. | |
0 |
Optional. Positive integer value. Not valid with type values of explicit, none, or disabled. |
Type is the only required argument.
Does not bind OpenMP threads to particular thread contexts; however, if the operating system supports affinity, the compiler still uses the OpenMP thread affinity interface to determine machine topology. Specify KMP_AFFINITY=verbose,none to list a machine topology map.
Specifying compact assigns the OpenMP thread <n>+1 to a free thread context as close as possible to the thread context where the <n> OpenMP thread was placed. For example, in a topology map, the nearer a node is to the root, the more significance the node has when sorting the threads.
Specifying disabled completely disables the thread affinity interfaces. This forces the OpenMP run-time library to behave as if the affinity interface was not supported by the operating system. This includes the low-level API interfaces such as kmp_set_affinity and kmp_get_affinity, which have no effect and will return a nonzero error code.
Specifying explicit assigns OpenMP threads to a list of OS proc IDs that have been explicitly specified by using the proclist= modifier, which is required for this affinity type.
Specifying scatter distributes the threads as evenly as possible across the entire system. scatter is the opposite of compact; so the leaves of the node are most significant when sorting through the machine topology map.
Types logical and physical are deprecated and may become unsupported in a future release. Both are supported for backward compatibility.
For logical and physical affinity types, a single trailing integer is interpreted as an offset specifier instead of a permute specifier. In contrast, with compact and scatter types, a single trailing integer is interpreted as a permute specifier.
Specifying logical assigns OpenMP threads to consecutive logical processors, which are also called hardware thread contexts. The type is equivalent to compact, except that the permute specifier is not allowed. Thus, KMP_AFFINITY=logical,n is equivalent to KMP_AFFINITY=compact,0,n (this equivalence is true regardless of the whether or not a granularity=fine modifier is present).
For both compact and scatter, permute and offset are allowed; however, if you specify only one integer, the compiler interprets the value as a permute specifier. Both permute and offset default to 0.
The permute specifier controls which levels are most significant when sorting the machine topology map. A value for permute forces the mappings to make the specified number of most significant levels of the sort the least significant, and it inverts the order of significance. The root node of the tree is not considered a separate level for the sort operations.
The offset specifier indicates the starting position for thread assignment.
Modifiers are optional arguments that precede type. If you do not specify a modifier, the noverbose, respect, and granularity=core modifiers are used automatically.
Modifiers are interpreted in order from left to right, and can negate each other. For example, specifying KMP_AFFINITY=verbose,noverbose,scatter is therefore equivalent to setting KMP_AFFINITY=noverbose,scatter, or just KMP_AFFINITY=scatter.
Does not print verbose messages.
Prints messages concerning the supported affinity. The messages include information about the number of packages, number of cores in each package, number of thread contexts for each core, and OpenMP thread bindings to physical thread contexts.
Information about binding OpenMP threads to physical thread contexts is indirectly shown in the form of the mappings between hardware thread contexts and the operating system (OS) processor (proc) IDs. The affinity mask for each OpenMP thread is printed as a set of OS processor IDs.
KMP_LIBRARY
KMP_LIBRARY = { throughput | turnaround | serial }, Selects the OpenMP run-time library execution mode. The options for the variable value are throughput, turnaround, and serial.
The compiler with OpenMP enables you to run an application under different execution modes that can be specified at run time. The libraries support the serial, turnaround, and throughput modes.
The serial mode forces parallel applications to run on a single processor.
In a dedicated (batch or single user) parallel environment where all processors are exclusively allocated to the program for its entire run, it is most important to effectively utilize all of the processors all of the time. The turnaround mode is designed to keep active all of the processors involved in the parallel computation in order to minimize the execution time of a single job. In this mode, the worker threads actively wait for more parallel work, without yielding to other threads.
Avoid over-allocating system resources. This occurs if either too many threads have been specified, or if too few processors are available at run time. If system resources are over-allocated, this mode will cause poor performance. The throughput mode should be used instead if this occurs.
In a multi-user environment where the load on the parallel machine is not constant or where the job stream is not predictable, it may be better to design and tune for throughput. This minimizes the total time to run multiple jobs simultaneously. In this mode, the worker threads will yield to other threads while waiting for more parallel work.
The throughput mode is designed to make the program aware of its environment (that is, the system load) and to adjust its resource usage to produce efficient execution in a dynamic environment. This mode is the default.
KMP_BLOCKTIME
KMP_BLOCKTIME = value. Sets the time, in milliseconds, that a thread should wait, after completing the execution of a parallel region, before sleeping.Use the optional character suffixes: s (seconds), m (minutes), h (hours), or d (days) to specify the units.Specify infinite for an unlimited wait time.
KMP_STACKSIZE
KMP_STACKSIZE = value. Sets the number of bytes to allocate for each OpenMP* thread to use as the private stack for the thread. Recommended size is 16m. Use the optional suffixes: b (bytes), k (kilobytes), m (megabytes), g (gigabytes), or t (terabytes) to specify the units. This variable does not affect the native operating system threads created by the user program nor the thread executing the sequential part of an OpenMP* program or parallel programs created using -parallel.
OMP_NUM_THREADS
Sets the maximum number of threads to use for OpenMP* parallel regions if no other value is specified in the application. This environment variable applies to both -openmp and -parallel. Example syntax on a Linux system with 8 cores: export OMP_NUM_THREADS=8
OMP_DYNAMIC
OMP_DYNAMIC={ 1 | 0 } Enables (1, true) or disables (0,false) the dynamic adjustment of the number of threads.
OMP_SCHEDULE
OMP_SCHEDULE={ type,[chunk size]} Controls the scheduling of the for-loop work-sharing construct. type can be either of static,dynamic,guided,runtime chunk size should be positive integer
OMP_NESTED
OMP_NESTED={ 1 | 0 } Enables creation of new teams in case of nested parallel regions (1,true) or serializes (0,false) all nested parallel regions. Default is 0.
Flag description origin markings:
For questions about the meanings of these flags, please contact the tester.
For other inquiries, please contact webmaster@spec.org
Copyright 2012-2023 Standard Performance Evaluation Corporation
Tested with SPEC OMP2012 v1.1.
Report generated on Wed Aug 16 14:58:20 2023 by SPEC OMP2012 flags formatter v538.