Compilers: Intel C/C++/fortran Compiler
Operating systems: Linux
Selecting one of the following will take you directly to that section:
Adds the directory for include files to the search path at compile time.
Adds the directory for include files to the search path at compile time.
Adds the library directory search path at link time
Enable the compiler to generate multi-threaded code based on the OpenMP* directives (same as -fopenmp)
Enables OpenMP* offloading compilation for target pragmas. This
option only applies to Intel(R) MIC Architecture and Intel(R)
Graphics Technology. Enabled by default with -qopenmp.
Use -qno-openmp-offload to disable.
Specify kind to specify the default device for target pragmas
host - allow target code to run on host system while still doing
the outlining for offload
mic - specify Intel(R) MIC Architecture
gfx - specify Intel(R) Graphics Technology
optimize for maximum speed and enable more aggressive optimizations that may not improve performance on some programs
Code is optimized for Intel(R) processors with support for AVX2 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
May generate Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2), Intel(R) AVX, SSE4.2, SSE4.1, SSSE3, SSE3, SSE2, and SSE instructions for Intel(R) processors.
Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
May generate Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) Foundation instructions, Intel(R) AVX-512 Conflict Detection instructions, as well as the instructions enabled with CORE-AVX2. Optimizes for Intel(R) processors that support Intel(R) AVX-512 instructions.
May generate Intel® AVX2, AVX, Intel® SSE4.2, SSE4.1, SSSE3, SSE3, SSE2 and SSE instructions /arch:core-avx2 is supported on Windows* but -mcore-avx2 is not supported for Linux* or macOS* (use -march=core-avx2 instead)
-fimf-precision=value[:funclist]
defines the accuracy (precision) for math library functions
value - defined as one of the following values
high - equivalent to max-error = 0.6
medium - equivalent to max-error = 4 (DEFAULT)
low - equivalent to accuracy-bits = 11 (single
precision); accuracy-bits = 26 (double
precision)
funclist - optional comma separated list of one or more math
library functions to which the attribute should be
applied
Specifies whether streaming stores are generated:
always - enables generation of streaming stores under the assumption that the application is memory bound
auto - compiler decides when streaming stores are used (DEFAULT)
never - disables generation of streaming stores
Enable levels of prefetch insertion, where 0 disables. n may be 0 through 5 inclusive. Default is 2.
-prec-sqrt improves precision of floating-point square root. It has a slight impact on speed. -no-prec-sqrt disables this option and enables optimizations that give slightly less precise results than full IEEE division.
-prec-div improves precision of floating-point divides. It has a slight impact on speed. -no-prec-div disables this option and enables optimizations that give slightly less precise results than full IEEE division.
Enable/disable(DEFAULT) use of ANSI aliasing rules in optimizations; user asserts that the program adheres to these rules.
-ipo[n]
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
(n - number of multi-file objects)
enable floating point model variation
[no-]except - enable/disable floating point semantics
fast[=1|2] - enables more aggressive floating point optimizations
precise - allows value-safe optimizations
source - enables intermediates in source precision
strict - enables -fp-model precise -fp-model except, disables
contractions and enables pragma stdc fenv_access
double - rounds intermediates in 53-bit (double) precision
extended - rounds intermediates in 64-bit (extended) precision
FPORTABILITY flag
No Fortran main method exists, use C equivalent instead.
Enables the use of nested SIMD statements for OpenMP.
Invoke the Intel C compiler.
Invoke the Intel C++ compiler.
Invoke the Intel Fortran compiler.
Invoke the Intel C compiler.
Invoke the Intel C++ compiler.
Invoke the Intel Fortran compiler.
Link using FFTW 3.3.6 library for Linux. Description from FFTW:
FFTW lib compiled with -O3 -xCORE-AVX2
FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST).
AMD BIOS Setting
Maximum Performance
Maximizes performance and minimizes latency with little regard to power consumption.
SMT Mode
Can be used to disable symmetric multithreading. To re-enable SMT, a POWER CYCLE is needed after selecting the 'Auto' option. WARNING - S3 is NOT SUPPORTED on systems where SMT is disabled.
NUMA node per socket
Specifies the number of desired NUMA nodes per socket. Zero will attempt to interleave the two sockets together.
BIOS settings
Maximum Performance
Mode will maximize the absolute performance of the system without regard for power. In this mode, power consumption is a don't care. Things like fan speed and heat output of the system may increase in addition to power consumption. Efficiency of the system may go down in this mode, but the absolute performance may increase depending on the workload that is running.
Custom
Allows the user to individually modify any of the low-level settings that are preset and unchangeable in any of the other 4 preset modes.
Hyper-Threading
Enabling Hyper-Threading let operating system addresses two virtual or logical cores for a physical presented core. Workloads can be shared between virtual or logical cores when possible. The main function of hyper-threading is to increase the number of independent instructions in the pipeline for using the processor resources more efficiently.
C-States
Legacy: When "Legacy" is selected, the operating system initiates the C-state transitions. For E5/E7 CPUs, ACPI C1/C2/C3 map to Intel C1/C3/C6. For 6500/7500 CPUs, ACPI C1/C3 map to Intel C1/C3 (ACPI C2 is not available). Some OS SW may defeat the ACPI mapping (e.g. intel_idle driver).
Autonomous: When "Autonomous" is selected, HALT and C1 request get converted to C6 requests in hardware.
Disable: When "Disable" is selected, only C0 and C1 are used by the OS. C1 gets enabled automatically when an OS autohalts.
Platform settings
One or more of the following settings may have been applied to the testbed. If so, the "Platform Notes" section of the report will say so; and you can read below to find out more about what these settings mean.