macosx-iccifort-v91-flags-file-20070721.xml
SPEC CPU2006 Flags Disclosure for the Intel Compilers (v9.1)
on Mac OSX
The system under test is deemed reasonably quiet by turning off the following
from the System Preferences panel:
- Automatic Software Updates (turned ON by default)
- Screen Savers (turned ON by default)
- Unused wireless and bluetooth connectivity (turned ON by default)
- Network time syncrhonization (turned ON by default)
]]>
icc
icc invokes the Intel C++ compiler. It is invoked as:
icc [ options ] file1 [ file2 ... ]
where,
- options: represent zero or more compile options
- fileN: is a C/C++ source (.C .c .cc .cp .cpp .cxx .c++ .i),
assembly (.s), object (.o), static library (.a), or other linkable
file.
Invoking the compiler using icc compiles .c and .i files as C.
Using icc only links in C++ libraries if C++ source is provided
on the command line.
]]>
icpc
The icpc command uses the same compiler options as
the icc command. Invoking the compiler using icpc compiles .c, and
.i files as C++. Using icpc always links in C++ libraries.
]]>
ifort
ifort invokes the Intel Fortran compiler. It is invoked as:
ifort [ options ] file1 [ file2 ... ]
where,
- options: represent zero or more compile options
- fileN: is a Fortran source file, assembly file, object
file, object library, or other linkable file.
]]>
For mixed-language benchmarks, tell the compiler that the main
program is not written in Fortran
]]>
Enables optimizations for speed and disables some optimizations that
increase code size and affect speed. To limit code size, this option:
- Enables global optimization; this includes data-flow analysis,
code motion, strength reduction and test replacement, split-lifetime
analysis, and instruction scheduling.
- Disables intrinsic recognition and intrinsics inlining.
- Disables loop unrolling.
The O1 option may improve performance for applications with very large
code size, many branches, and execution time not dominated by code within loops.
On IA-32 Mac OSX platforms, -O1 sets the following:
- -unroll0,
- -fno-builtin,
- -mno-ieee-fp,
- -fomit-frame-pointer (same as -fp),
- -ffunction-sections
]]>
Enables optimizations for speed. This is the generally recommended
optimization level. This option also enables:
- Inlining of intrinsics
- Intra-file interprocedural optimizations, which include:
- inlining
- constant propagation
- forward substitution
- routine attribute propagation
- variable address-taken analysis
- dead static function elimination
- removal of unreferenced variables
- The following capabilities for performance gain:
- constant propagation
- copy propagation
- dead-code elimination
- global register allocation
- global instruction scheduling and control speculation
- loop unrolling
- optimized code selection
- partial redundancy elimination
- strength reduction/induction variable simplification
- variable renaming
- exception handling optimizations
- tail recursions
- peephole optimizations
- structure assignment lowering and optimizations
- dead store elimination
]]>
Enables O2 optimizations plus more aggressive optimizations,
such as prefetching, scalar replacement, and loop and memory
access transformations.
Enables optimizations for maximum speed, such as:
- Loop unrolling, including instruction scheduling
- Code replication to eliminate branches
- Padding the size of certain power-of-two arrays to allow
more efficient cache use
On IA-32 and Intel EM64T processors, when O3 is used with options
-ax or -x (Linux/Mac OSX), the compiler performs more aggressive
data dependency analysis than for O2, which may result in
longer compilation times.
The O3 optimizations may not cause higher performance unless loop and
memory access transformations take place. The optimizations may slow
down code in some cases compared to O2 optimizations.
The O3 option is recommended for applications that have loops that heavily
use floating-point calculations and process large data sets.
]]>
This option enables additional interprocedural optimizations for single
file compilation. These optimizations are a subset of full intra-file
interprocedural optimizations. One of these optimizations enables the
compiler to perform inline function expansion for calls to functions
defined within the current source file.
]]>
This option enables multi-file interprocedural optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
When you specify this option, the compiler performs
inline function expansion for calls to functions
defined in separate files.
]]>
The -fast option enhances execution speed across the entire program
by including the following options that can improve run-time performance:
- -O3 (maximum speed and high-level optimizations)
- -ipo (enables interprocedural optimizations across files)
- -no-prec-div (disable -prec-div), where -prec-div
improves precision of FP divides (some speed impact)
- -mdynamic-no-pic, where -mydynamic-no-pic indicates that code is not relocatable
To override one of the options set by -fast, specify that option after the
-fast option on the command line. The options set by -fast may change from
release to release.
]]>
Code is not relocatable, but external references are relocatable.
]]>
This option improves precision of floating-point divides. It has a slight
impact on speed.
With some optimizations, such as -xN and -xB (Linux) or /QxN and /QxB (Windows),
the compiler may change floating-point division computations into multiplication
by the reciprocal of the denominator. For example, A/B is computed as
A * (1/B) to improve the speed of the computation.
However, sometimes the value produced by this transformation is
not as accurate as full IEEE division. When it is important to have fully
precise IEEE division, use this option to disable the floating-point
division-to-multiplication optimization. The result is more accurate, with some
loss of performance.
If you specify -no-prec-div (Linux and Mac OSX), it enables
optimizations that give slightly less precise results than full IEEE
division. The default is -prec-div.
]]>
Instrument program for profiling for the first phase of
two-phase profile guided optimization. This instrumentation gathers information
about a program's execution paths and data values but does not gather
information from hardware performance counters. The profile instrumentation
also gathers data for optimizations which are unique to profile-feedback
optimization.
]]>
Instructs the compiler to produce a profile-optimized
executable and merges available dynamic information (.dyn)
files into a pgopti.dpi file. If you perform multiple
executions of the instrumented program, -Qprof_use merges
the dynamic information files again and overwrites the
previous pgopti.dpi file.
Without any other options, the current directory is
searched for .dyn files
]]>
Generates static binaries. Libraries are statically linked in to the executable. Default behavior on
Mac OS X is to produce dynamically linked binaries.
]]>
Tells the compiler the maximum number of times (n) to unroll loops.
]]>
Disables inline expansion of all intrinsic functions.
]]>
Disables conformance to the ANSI C and IEEE 754 standards for
floating-point arithmetic.
]]>
Allows use of EBP as a general-purpose register in optimizations.
]]>
Places each function in its own COMDAT section.
]]>
Pass options o1, o2, etc. to the linker for processing.
]]>
Specifies the initial address of the stack pointer value, where
value is a hexadecimal number rounded to the segment alignment.
The default segment alignment is the target pagesize (currently,
1000 hexadecimal for the PowerPC and for i386). If -stack_size
is specified and -stack_addr is not, a default stack address
specific for the architecture being linked will be used and its
value printed as a warning message. This creates a segment
named __UNIXSTACK. Note that the initial stack address will be
either at the high address of the segment or the low address of
the segment depending on which direction the stack grows for the
architecture being linked.
]]>
Specifies the size of the stack segment value, where value is a
hexadecimal number rounded to the segment alignment. The
default segment alignment is the target pagesize (currently,
1000 hexadecimal for the PowerPC and for i386). If -stack_addr
is specified and -stack_size is not, a default stack size specific
for the architecture being linked will be used and its
value printed as a warning message. This creates a segment
named __UNIXSTACK .
]]>