SPEC Accel OpenMP 4.0 Flag Description for the Intel(R) C/C++ Compiler for IA32 and Intel 64 applications and Intel(R) Fortran Compiler for IA32 and Intel 64 applications

Optimization Flags

-Istd
-Istdi
-Lstd
-qopenmp
-qopenmp-offload
-O3
-xCORE-AVX2
-xCORE-AVX512
-xCOMMON-AVX512
-marchcore-avx2
-fimf-precision
-qopt-streaming-stores
-qopt-prefetch
-no-prec-sqrt
-no-prec-div
-ansi-alias
-ipo
-fp-model

- -Istd
- -I.?\s*[^ ]*include[^ ]*
- Adds the directory for include files to the search path at compile time.
- -Istdi
- -I.?\s
- Adds the directory for include files to the search path at compile time.
- -Lstd
- -L\s*[^ ]*[^ ]*
- Adds the library directory search path at link time
- -qopenmp
- -qopenmp(?=\s|$)
- Enable the compiler to generate multi-threaded code based on the OpenMP* directives (same as -fopenmp)
- -qopenmp-offload
- -qopenmp-offload=(host|mic|gfx)(?=\s|$)
- Enables OpenMP* offloading compilation for target pragmas. This option only applies to Intel(R) MIC Architecture and Intel(R) Graphics Technology. Enabled by default with -qopenmp. Use -qno-openmp-offload to disable.
  Specify kind to specify the default device for target pragmas
  host - allow target code to run on host system while still doing the outlining for offload
  mic - specify Intel(R) MIC Architecture
  gfx - specify Intel(R) Graphics Technology
- -O3
- -O3(?=\s|$)
- optimize for maximum speed and enable more aggressive optimizations that may not improve performance on some programs
- -xCORE-AVX2
- -xCORE-AVX2(?=\s|$)
- Code is optimized for Intel(R) processors with support for AVX2 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
  
  May generate Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2), Intel(R) AVX, SSE4.2, SSE4.1, SSSE3, SSE3, SSE2, and SSE instructions for Intel(R) processors.
  
  Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
- -xCORE-AVX512
- -xCORE-AVX512(?=\s|$)
- Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
  
  Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
- -xCOMMON-AVX512
- -xCOMMON-AVX512(?=\s|$)
- May generate Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) Foundation instructions, Intel(R) AVX-512 Conflict Detection instructions, as well as the instructions enabled with CORE-AVX2. Optimizes for Intel(R) processors that support Intel(R) AVX-512 instructions.
- -marchcore-avx2
- -march=core-avx2(?=\s|$)
- May generate Intel� AVX2, AVX, Intel� SSE4.2, SSE4.1, SSSE3, SSE3, SSE2 and SSE instructions /arch:core-avx2 is supported on Windows* but -mcore-avx2 is not supported for Linux* or macOS* (use -march=core-avx2 instead)
- -fimf-precision
- -fimf-precision=(high|medium|low):([a-z\,/]+)(?=\s|$)
- -fimf-precision=value[:funclist]
  defines the accuracy (precision) for math library functions
  value - defined as one of the following values
  high - equivalent to max-error = 0.6
  medium - equivalent to max-error = 4 (DEFAULT)
  low - equivalent to accuracy-bits = 11 (single precision); accuracy-bits = 26 (double precision)
  funclist - optional comma separated list of one or more math library functions to which the attribute should be applied
- -qopt-streaming-stores
- -qopt-streaming-stores.always(?=\s|$)
- Specifies whether streaming stores are generated:
  
  always - enables generation of streaming stores under the assumption that the application is memory bound
  
  auto - compiler decides when streaming stores are used (DEFAULT)
  
  never - disables generation of streaming stores
- -qopt-prefetch
- -qopt-prefetch=([0-5])(?=\s|$)
- Enable levels of prefetch insertion, where 0 disables. n may be 0 through 5 inclusive. Default is 2.
- -no-prec-sqrt
- -no-prec-sqrt(?=\s|$)
- -prec-sqrt improves precision of floating-point square root. It has a slight impact on speed. -no-prec-sqrt disables this option and enables optimizations that give slightly less precise results than full IEEE division.
- -no-prec-div
- -no-prec-div(?=\s|$)
- -prec-div improves precision of floating-point divides. It has a slight impact on speed. -no-prec-div disables this option and enables optimizations that give slightly less precise results than full IEEE division.
- -ansi-alias
- -ansi-alias(?=\s|$)
- Enable/disable(DEFAULT) use of ANSI aliasing rules in optimizations; user asserts that the program adheres to these rules.
- -ipo
- -ipo(?=\s|$)
- -ipo[n]
  
  Multi-file ip optimizations that includes:
  
  - inline function expansion
  
  - interprocedural constant propogation
  
  - dead code elimination
  
  - propagation of function characteristics
  
  - passing arguments in registers
  
  - loop-invariant code motion
  (
  n - number of multi-file objects)
- -fp-model
- -fp-model\s(except|no\-except|fast\=(1|2)|precise|source|strict|double|extended)(?=\s|$)
- enable floating point model variation
  
  [no-]except - enable/disable floating point semantics
  
  fast[=1|2] - enables more aggressive floating point optimizations
  
  precise - allows value-safe optimizations
  
  source - enables intermediates in source precision
  
  strict - enables -fp-model precise -fp-model except, disables
  
  contractions and enables pragma stdc fenv_access
  
  double - rounds intermediates in 53-bit (double) precision
  
  extended - rounds intermediates in 64-bit (extended) precision

- -port_80
- -80
- FPORTABILITY flag
- -port_noformain
- -nofor-main
- No Fortran main method exists, use C equivalent instead.
- -declare_use_inner_simd
- -DSPEC_USE_INNER_SIMD
- Enables the use of nested SIMD statements for OpenMP.

Compiler Flags

-intel_cc
-intel_CC
-intel_f90

- -intel_cc
- (?:/\S+/)?icc\b
- Invoke the Intel C compiler.
- -intel_CC
- (?:/\S+/)?icpc(?=\s|$)
- Invoke the Intel C++ compiler.
- -intel_f90
- (?:/\S+/)?ifort\b
- Invoke the Intel Fortran compiler.

Other Flags

-lfftw3

- -lfftw3
- -lfftw3(?=\s|$)
- Link using FFTW 3.3.6 library for Linux. Description from FFTW:
  
  FFTW lib compiled with -O3 -xCORE-AVX2
  
  FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST).

Shell, Environment, and Other Software Settings

Mode will maximize the absolute performance of the system without regard for power. In this mode, power consumption is a don't care. Things like fan speed and heat output of the system may increase in addition to power consumption. Efficiency of the system may go down in this mode, but the absolute performance may increase depending on the workload that is running.

Allows the user to individually modify any of the low-level settings that are preset and unchangeable in any of the other 4 preset modes.

Enabling Hyper-Threading let operating system addresses two virtual or logical cores for a physical presented core. Workloads can be shared between virtual or logical cores when possible. The main function of hyper-threading is to increase the number of independent instructions in the pipeline for using the processor resources more efficiently.

Legacy: When "Legacy" is selected, the operating system initiates the C-state transitions. For E5/E7 CPUs, ACPI C1/C2/C3 map to Intel C1/C3/C6. For 6500/7500 CPUs, ACPI C1/C3 map to Intel C1/C3 (ACPI C2 is not available). Some OS SW may defeat the ACPI mapping (e.g. intel_idle driver).

Autonomous: When "Autonomous" is selected, HALT and C1 request get converted to C6 requests in hardware.

Disable: When "Disable" is selected, only C0 and C1 are used by the OS. C1 gets enabled automatically when an OS autohalts.