Copyright © 2016 Intel Corporation. All Rights Reserved.
Invoke the Intel oneAPI DPC++ C compiler.
Invoke the Intel oneAPI DPC++ C++ compiler.
Invoke the Intel oneAPI Fortran compiler.
Invoke the Intel oneAPI DPC++ C compiler.
Invoke the Intel oneAPI DPC++ C++ compiler.
Invoke the Intel oneAPI Fortran compiler.
specify source files are in free format. Same as -FR. -nofree indicates fixed format
-mcmodel=<size>
use a specific memory model to generate code and store data
small - Restricts code and data to the first 2GB of address space (DEFAULT)
medium - Restricts code to the first 2GB; it places no memory restriction on data
large - Places no memory restriction on code or data
-mcmodel=<size>
use a specific memory model to generate code and store data
small - Restricts code and data to the first 2GB of address space (DEFAULT)
medium - Restricts code to the first 2GB; it places no memory restriction on data
large - Places no memory restriction on code or data
specify source files are in free format. Same as -FR. -nofree indicates fixed format
-mcmodel=<size>
use a specific memory model to generate code and store data
small - Restricts code and data to the first 2GB of address space (DEFAULT)
medium - Restricts code to the first 2GB; it places no memory restriction on data
large - Places no memory restriction on code or data
-mcmodel=<size>
use a specific memory model to generate code and store data
small - Restricts code and data to the first 2GB of address space (DEFAULT)
medium - Restricts code to the first 2GB; it places no memory restriction on data
large - Places no memory restriction on code or data
Supress compiler warnings.
Compiles for a 64-bit (LP64) data model.
Sets the language dialect to conform to the indicated C standard.
Enables error or warning handler whenever a function is used before being declared and -Wno is to turn off f warnings..
Enable O3 optimizations plus more aggressive optimizations, such as -ffinite-math-only –no-prec-div
Specifies preferred vector width for auto-vectorization. Defaults to 'none' which allows target specific decisions.
May generate machine code optimized for processors supporting the AVX-512 instruction set.
-ipo[n]
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
(n - number of multi-file objects)
This option defines a level of zmm registers usage. The low setting causes the compiler to generate code with zmm registers very carefully, only when the gain from their usage is proven. The high setting causes the compiler to use much less restrictive heuristics for zmm code generation. Specifies the level of zmm registers usage. Possible values are:
Default:
Enable fast math mode. This option may yield faster code for programs that do not require the guarantees of exact implementation of IEEE or ISO rules/specifications for math functions.
Enable optimizations based on the strict definition of an enum's value range.
Enable optimizations based on the strict rules for overwriting polymorphic C++ objects.
Enables dead virtual function elimination optimization. Requires -flto=full.
Supress compiler warnings.
Compiles for a 64-bit (LP64) data model.
Sets the language dialect to conform to the indicated C standard.
Enables error or warning handler whenever a function is used before being declared and -Wno is to turn off f warnings..
Enable O3 optimizations plus more aggressive optimizations, such as -ffinite-math-only –no-prec-div
Specifies preferred vector width for auto-vectorization. Defaults to 'none' which allows target specific decisions.
May generate machine code optimized for processors supporting the AVX-512 instruction set.
-ipo[n]
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
(n - number of multi-file objects)
This option defines a level of zmm registers usage. The low setting causes the compiler to generate code with zmm registers very carefully, only when the gain from their usage is proven. The high setting causes the compiler to use much less restrictive heuristics for zmm code generation. Specifies the level of zmm registers usage. Possible values are:
Default:
Enable fast math mode. This option may yield faster code for programs that do not require the guarantees of exact implementation of IEEE or ISO rules/specifications for math functions.
Enable optimizations based on the strict definition of an enum's value range.
Enable optimizations based on the strict rules for overwriting polymorphic C++ objects.
Enables dead virtual function elimination optimization. Requires -flto=full.
Use specified C language standard.
Supress compiler warnings.
Compiles for a 64-bit (LP64) data model.
Sets the language dialect to conform to the indicated C++ standard.
Enable O3 optimizations plus more aggressive optimizations, such as -ffinite-math-only –no-prec-div
Specifies preferred vector width for auto-vectorization. Defaults to 'none' which allows target specific decisions.
May generate machine code optimized for processors supporting the AVX-512 instruction set.
-ipo[n]
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
(n - number of multi-file objects)
This option defines a level of zmm registers usage. The low setting causes the compiler to generate code with zmm registers very carefully, only when the gain from their usage is proven. The high setting causes the compiler to use much less restrictive heuristics for zmm code generation. Specifies the level of zmm registers usage. Possible values are:
Default:
Enable fast math mode. This option may yield faster code for programs that do not require the guarantees of exact implementation of IEEE or ISO rules/specifications for math functions.
Enable optimizations based on the strict definition of an enum's value range.
Enable optimizations based on the strict rules for overwriting polymorphic C++ objects.
Supress compiler warnings.
Compiles for a 64-bit (LP64) data model.
Enable O3 optimizations plus more aggressive optimizations, such as -ffinite-math-only –no-prec-div
Specifies preferred vector width for auto-vectorization. Defaults to 'none' which allows target specific decisions.
May generate machine code optimized for processors supporting the AVX-512 instruction set.
-ipo[n]
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
(n - number of multi-file objects)
This option defines a level of zmm registers usage. The low setting causes the compiler to generate code with zmm registers very carefully, only when the gain from their usage is proven. The high setting causes the compiler to use much less restrictive heuristics for zmm code generation. Specifies the level of zmm registers usage. Possible values are:
Default:
Option standard-realloc-lhs (the default), tells the compiler that when the left-hand side of an assignment is an allocatable object, it should be reallocated to the shape of the right-hand side of the assignment before the assignment occurs. This is the current Fortran Standard definition. This feature may cause extra overhead at run time. This option has the same effect as option assume realloc_lhs.
If you specify nostandard-realloc-lhs, the compiler uses the old Fortran 2003 rules when interpreting assignment statements. The left-hand side is assumed to be allocated with the correct shape to hold the right-hand side. If it is not, incorrect behavior will occur. This option has the same effect as option assume norealloc_lhs.
The align toggle changes how data elements are aligned. Variables and arrays are analyzed and memory layout can be altered. Specifying array32byte will look for opportunities to transform and reailgn arrays to 32byte boundaries.
Make all local variables AUTOMATIC. Same as -automatic
Define the relative error, measured by the number of correct bits,for math library function results
-fimf-precision=value[:funclist]
defines the accuracy (precision) for math library functions
value - defined as one of the following values
high - equivalent to max-error = 0.6
medium - equivalent to max-error = 4 (DEFAULT)
low - equivalent to accuracy-bits = 11 (single
precision); accuracy-bits = 26 (double
precision)
funclist - optional comma separated list of one or more math
library functions to which the attribute should be
applied
Some optimization flags were found in portability variables.
Supress compiler warnings.
Compiles for a 64-bit (LP64) data model.
Sets the language dialect to conform to the indicated C standard.
Enables error or warning handler whenever a function is used before being declared and -Wno is to turn off f warnings..
Enable O3 optimizations plus more aggressive optimizations, such as -ffinite-math-only –no-prec-div
Specifies preferred vector width for auto-vectorization. Defaults to 'none' which allows target specific decisions.
May generate machine code optimized for processors supporting the AVX-512 instruction set.
-ipo[n]
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
(n - number of multi-file objects)
This option defines a level of zmm registers usage. The low setting causes the compiler to generate code with zmm registers very carefully, only when the gain from their usage is proven. The high setting causes the compiler to use much less restrictive heuristics for zmm code generation. Specifies the level of zmm registers usage. Possible values are:
Default:
Enable fast math mode. This option may yield faster code for programs that do not require the guarantees of exact implementation of IEEE or ISO rules/specifications for math functions.
Enable optimizations based on the strict definition of an enum's value range.
Enable optimizations based on the strict rules for overwriting polymorphic C++ objects.
Enables dead virtual function elimination optimization. Requires -flto=full.
Supress compiler warnings.
Compiles for a 64-bit (LP64) data model.
Sets the language dialect to conform to the indicated C standard.
Enables error or warning handler whenever a function is used before being declared and -Wno is to turn off f warnings..
Enable O3 optimizations plus more aggressive optimizations, such as -ffinite-math-only –no-prec-div
Specifies preferred vector width for auto-vectorization. Defaults to 'none' which allows target specific decisions.
May generate machine code optimized for processors supporting the AVX-512 instruction set.
-ipo[n]
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
(n - number of multi-file objects)
This option defines a level of zmm registers usage. The low setting causes the compiler to generate code with zmm registers very carefully, only when the gain from their usage is proven. The high setting causes the compiler to use much less restrictive heuristics for zmm code generation. Specifies the level of zmm registers usage. Possible values are:
Default:
Enable fast math mode. This option may yield faster code for programs that do not require the guarantees of exact implementation of IEEE or ISO rules/specifications for math functions.
Enable optimizations based on the strict definition of an enum's value range.
Enable optimizations based on the strict rules for overwriting polymorphic C++ objects.
Enables dead virtual function elimination optimization. Requires -flto=full.
Use specified C language standard.
Supress compiler warnings.
Compiles for a 64-bit (LP64) data model.
Sets the language dialect to conform to the indicated C++ standard.
Enable O3 optimizations plus more aggressive optimizations, such as -ffinite-math-only –no-prec-div
Specifies preferred vector width for auto-vectorization. Defaults to 'none' which allows target specific decisions.
May generate machine code optimized for processors supporting the AVX-512 instruction set.
-ipo[n]
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
(n - number of multi-file objects)
This option defines a level of zmm registers usage. The low setting causes the compiler to generate code with zmm registers very carefully, only when the gain from their usage is proven. The high setting causes the compiler to use much less restrictive heuristics for zmm code generation. Specifies the level of zmm registers usage. Possible values are:
Default:
Enable fast math mode. This option may yield faster code for programs that do not require the guarantees of exact implementation of IEEE or ISO rules/specifications for math functions.
Enable optimizations based on the strict definition of an enum's value range.
Enable optimizations based on the strict rules for overwriting polymorphic C++ objects.
Supress compiler warnings.
Compiles for a 64-bit (LP64) data model.
Enable O3 optimizations plus more aggressive optimizations, such as -ffinite-math-only –no-prec-div
Specifies preferred vector width for auto-vectorization. Defaults to 'none' which allows target specific decisions.
May generate machine code optimized for processors supporting the AVX-512 instruction set.
-ipo[n]
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
(n - number of multi-file objects)
This option defines a level of zmm registers usage. The low setting causes the compiler to generate code with zmm registers very carefully, only when the gain from their usage is proven. The high setting causes the compiler to use much less restrictive heuristics for zmm code generation. Specifies the level of zmm registers usage. Possible values are:
Default:
Option standard-realloc-lhs (the default), tells the compiler that when the left-hand side of an assignment is an allocatable object, it should be reallocated to the shape of the right-hand side of the assignment before the assignment occurs. This is the current Fortran Standard definition. This feature may cause extra overhead at run time. This option has the same effect as option assume realloc_lhs.
If you specify nostandard-realloc-lhs, the compiler uses the old Fortran 2003 rules when interpreting assignment statements. The left-hand side is assumed to be allocated with the correct shape to hold the right-hand side. If it is not, incorrect behavior will occur. This option has the same effect as option assume norealloc_lhs.
The align toggle changes how data elements are aligned. Variables and arrays are analyzed and memory layout can be altered. Specifying array32byte will look for opportunities to transform and reailgn arrays to 32byte boundaries.
Make all local variables AUTOMATIC. Same as -automatic
Define the relative error, measured by the number of correct bits,for math library function results
-fimf-precision=value[:funclist]
defines the accuracy (precision) for math library functions
value - defined as one of the following values
high - equivalent to max-error = 0.6
medium - equivalent to max-error = 4 (DEFAULT)
low - equivalent to accuracy-bits = 11 (single
precision); accuracy-bits = 26 (double
precision)
funclist - optional comma separated list of one or more math
library functions to which the attribute should be
applied
Some optimization flags were found in portability variables.
This section contains descriptions of flags that were included implicitly by other flags, but which do not have a permanent home at SPEC.
Enable O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enable optimizations for maximum speed, such as:
On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
Enable optimizations for speed. This is the generally recommended
optimization level. This option also enables:
- Inlining of intrinsics
- Intra-file interprocedural optimizations, which include:
- inlining
- constant propagation
- forward substitution
- routine attribute propagation
- variable address-taken analysis
- dead static function elimination
- removal of unreferenced variables
- The following capabilities for performance gain:
- constant propagation
- copy propagation
- dead-code elimination
- global register allocation
- global instruction scheduling and control speculation
- loop unrolling
- optimized code selection
- partial redundancy elimination
- strength reduction/induction variable simplification
- variable renaming
- exception handling optimizations
- tail recursions
- peephole optimizations
- structure assignment lowering and optimizations
- dead store elimination
Enable optimizations for speed and disables some optimizations that increase code size and affect speed.
To limit code size, this option:
The O1 option may improve performance for applications with very large code size, many branches, and execution time not dominated by code within loops.
-O1 sets the following options:Tells the compiler the maximum number of times to unroll loops. For example -funroll-loops0 would disable unrolling of loops.
-fno-builtin disables inline expansion for all intrinsic functions.
This option trades off floating-point precision for speed by removing the restriction to conform to the IEEE standard.
EBP is used as a general-purpose register in optimizations.
Places each function in its own COMDAT section.
Flushes denormal results to zero.
Sets the stack size to n kbytes, or unlimited to allow the stack size to grow without limit.
Launching a process with numactl --interleave=all sets the memory interleave policy so that memory will be allocated using round robin on nodes. When memory cannot be allocated on the current interleave target fall back to other nodes.
The command "echo 1> /proc/sys/vm/drop_caches" is used to free up the filesystem page cache.
For multi-copy runs or single copy runs on systems with multiple sockets, it is advantageous to bind a process to a particular core. Otherwise, the OS may arbitrarily move your process from one core to another. This can affect performance. To help, SPEC allows the use of a "submit" command where users can specify a utility to use to bind processes. We have found the utility 'numactl' to be the best choice.
numactl runs processes with a specific NUMA scheduling or memory placement policy. The policy is set for a command and inherited by all of its children. The numactl flag "--physcpubind" specifies which core(s) to bind the process. "-l" instructs numactl to keep a process memory on the local node while "-m" specifies which node(s) to place a process memory. For full details on using numactl, please refer to your Linux documentation, 'man numactl'
This is the percentage of the total amount of free and reclaimable memory. When the amount of dirty pagecache exceeds this percentage, writeback threads start writing back dirty memory. This setting can help Linux disk caching and performance by setting the percentage of system memory that can be filled with dirty pages. This can be set through a command like "echo 40 > /proc/sys/vm/dirty_background_ratio".
This control is used to define how aggressively the kernel swaps out anonymous memory relative to pagecache and other caches. Increasing the value increases the amount of swapping. The default value is 60. A value of 1 tells the kernel to only swap processes to disk if absolutely necessary. This can be set through a command like "echo 1 > /proc/sys/vm/swappiness".
This parameter controls whether memory reclaim is performed on a local NUMA node even if there is plenty of memory free on other nodes. This parameter is automatically turned on on machines with more pronounced NUMA characteristics. To tell the kernel to free local node memory rather than grabbing free memory from remote nodes, use a command like "echo 1 > /proc/sys/vm/zone_reclaim_mode".
A percentage value. When this percentage of total system memory is modified, the system begins writing the modifications to disk with the pdflush operation. The default value is 20 percent. To tell the kernel to free local node memory rather than grabbing free memory from remote nodes, use a command like "echo 1 > /proc/sys/vm/zone_reclaim_mode". This can be set through a command "echo 8 > /proc/sys/vm/dirty_ratio".
In order to take advantage of large pages, your system must be configured to use large pages. To configure your system for huge pages perform the following steps:
Create a mount point for the huge pages: "mkdir /mnt/hugepages" The huge page file system needs to be mounted when the systems reboots. Add the following to a system boot configuration file before any services are started: "mount -t hugetlbfs nodev /mnt/hugepages" Set vm/nr_hugepages=N in your /etc/sysctl.conf file where N is the maximum number of pages the system may allocate. Reboot to have the changes take effect. (Not necessary on some operating systems like RedHat Enterprise Linux 5.5).
Note that further information about huge pages may be found in your Linux documentation file: /usr/src/linux/Documentation/vm/hugetlbpage.txt
Transparent Huge Pages
On RedHat EL 6 and later, Transparent Hugepages increases the memory page size from 4 kilobytes to 2 megabytes. Transparent Hugepages provides significant performance advantages on systems with highly contended resources and large memory workloads. If memory utilization is too high or memory is badly fragmented which prevents hugepages being allocated, the kernel will assign smaller 4k pages instead. Hugepages are used by default if /sys/kernel/mm/redhat_transparent_hugepage/enabled is set to always.
Set this environment variable to "yes" to enable applications to use large pages.
Specify stack size to be allocated for each thread.
KMP_AFFINITY = < physical | logical >, starting-core-id specifies the static mapping of user threads to physical cores. For example, if you have a system configured with 8 cores, OMP_NUM_THREADS=8 and KMP_AFFINITY=physical,0 then thread 0 will mapped to core 0, thread 1 will be mapped to core 1, and so on in a round-robin fashion. KMP_AFFINITY = granularity=fine,scatter The value for the environment variable KMP_AFFINITY affects how the threads from an auto-parallelized program are scheduled across processors. Specifying granularity=fine selects the finest granularity level, causes each OpenMP thread to be bound to a single thread context. This ensures that there is only one thread per core on cores supporting HyperThreading Technology Specifying scatter distributes the threads as evenly as possible across the entire system. Hence a combination of these two options, will spread the threads evenly across sockets, with one thread per physical core.
Sets the maximum number of threads to use for OpenMP* parallel regions if no other value is specified in the application. This environment variable applies to both -openmp and -parallel (Linux and Mac OS X) or /Qopenmp and /Qparallel (Windows). Example syntax on a Linux system with 8 cores: export OMP_NUM_THREADS=8
This option allows the processor to use a given performance level as the max cap, or to let the processor operate as close to the thermal design point (TDP) as possible. Values for this BIOS option can be: Power: Processor operates as close to the TDP as possible. Performance: Processor operates at a capped performance level as the max operating state.
NUMA nodes per socket (NPS) field allows you to configure the memory NUMA domains per socket. The configuration can consist of one whole domain (NPS1), two domains (NPS2), or four domains (NPS4). In the case of a two-socket platform, an additional NPS profile is available to have whole system memory to be mapped as single NUMA domain (NPS0).
Enabling this option allows the chipset to defer memory transactions and process them out of order for optimal performance.
When running multiple copies of benchmarks, the SPEC config file feature submit is sometimes used to cause individual jobs to be bound to specific processors. This specific submit command is used for Linux. The description of the elements of the command are:
/usr/bin/taskset [options] [mask] [pid | command [arg] ... ] :Flag description origin markings:
For questions about the meanings of these flags, please contact the tester.
For other inquiries, please contact webmaster@spec.org
Copyright 2012-2024 Standard Performance Evaluation Corporation
Tested with SPEC OMP2012 v1.1.
Report generated on Wed May 29 12:16:45 2024 by SPEC OMP2012 flags formatter v538.