Siemens CDS++ compiler flags (as of May 1999)
=============================================
The following is a list of short explanations of compiler/linker flags
used for SPEC CINT95 result submissions for Siemens RM systems,
using the CDS++ 2.0 A compiler.
Most flags are semantically similar to those that have been used with the
predecessor compiler pyrC6. The syntax has changed in many cases; in several
cases, the old syntax is still accepted together with the new one.
It is likely that future result submissions, if they use new compilers
or new compiler versions, will have different flags; then this flag
description will be superseded by a new one.
---------------------------------------------------------------------
1. Compiler Flags
[Syntax note: For most flags that have a numeric parameter
(e.g., inlining control), this parameter can be separated from
the flag by either a comma "," or a colon ":".
After a "-F" or "-K", the blank space is optional.]
-qfeedback
Standard (1-pass) feedback optimization:
Produce code that collects call graph and flow graph
information suitable for feedback directed optimization.
-F profdir,
Specifies that profiling information should be written to
and read from the directory . Default is ./PROF.
-qfeedback2
Additional (2-pass) feedback optimization:
Produce code that collects information from an executable
optimized in a first pass of feedback optimization
(i.e. one compiled with -qfeedback / -F O4 or -qfeedback / -F O5).
-F profdir2,
Specifies that profiling information from 2-pass feedback
compilation should be written to
and read from the directory . Default is ./PROF2.
-F use_fb2
Specifies that profiling information from 2-pass feedback
compilation should be used in the generation of the (final)
executable. Must be used together with -F O4 or -F O5.
-F X4
Performs all safe and generally applicable optimizations
including interprocedural optimizations, register
allocation across function calls and feedback directed
optimizations (function inlining, procedure positioning,
branch elimination, procedure splitting, register
allocation and cross basic block scheduling). This flag
also directs the compiler to produce nonposition-
independent code, to generate code using the instruction
set of the MIPS4 ISA, to inline alloca, printf, memcpy,
memset, memcmp, and memmove and to use U-code system
libraries. These libraries represent the same system
services as their regular counterparts, but in a form
more suitable for interprocedural optimization.
The flag also includes -F fast_int_mul (see below).
-F cost_benefit,
Tells the "umerge" phase to consider only those functions for
inlining/cloning whose estimated ratio of cost over savings
is less than n.
-F G
Specifies that data items smaller than bytes in
size should be placed in the global data area and accessed
using a faster addressing mode. Default is 0.
-F inline_limit,
Sets a size threshold for inlining/cloning. A call will not be
inlined/cloned if the resulting function (after inlining/cloning)
exceeds basic blocks. Default is 500.
-F loopunroll,
Tells the optimizer to unroll loops times.
Default is 4, -FX4 sets to 8.
-F unrolllimit,
is the limit on the number of instructions within a
loop unrolled by the optimizer.
Default is 320, -FX4 sets to 2000.
-F fast_int_mul
Directs the optimizer to to use the floating-point unit
to perform 32-bit integer multiplications wherever
doing so would result in correct, faster code. Because this
flag changes the behavior of multiplications that overflow,
programs that depend on the trunction to 32-bits of two-
complement multiplication (the default behavior) should not
use this flag.
Because the difference to the default behavior appears in
overflow cases only (not in legal C programs), and because rule
2.2.5 of the CPU95 Run Rules exempts numerical accuracy
flags from baseline restrictions anyway, this flag is not
an assertion flag in the sense of the CPU95 Run Rules.
-F no_positioning
Disables procedure positioning feedback optimization.
-F afep,
Subroutine entries are allocated on 2 ** num byte boundaries.
Default is num=2.
-F hot_switch_opt,,
-Wc,-xjp_mh_opt,,
Controls the hot switch optimization which uses conditional
branches instead of indirect jumps at C switch statements.
For a switch label to be considered for this optimization,
the label's relative frequency of execution must be greater
than num1 percent. The parameter num2 limits the maximum number
of conditional branches. -F X4 sets the values to 6 and 5,
respectively.
(-Wc,-xxx: Syntax for flags that direct the compiler's code
generator )
-KOlimit,
-F Olimit,
Changes the threshold size for optimizing very large
programs. The argument specifies the maximum size in
basic blocks of a function that will be optimized by the
global optimizer. The default value of the argument is 1000.
The optimization phase of the compiler warns the user if
this flag is needed to optimize a particular program.
-F X4 sets num to 4000.
-Kr4000
Causes pipeline optimization for the R4000 and R4400 CPU
-Wb,-xxx
Syntax for flags that direct the compiler's back end
-Wb,-br_likely_cntl,,
Controls the branch likely optimization which sets the
likely bit in a conditional branch. If feedback indicates
that a conditional branch is probably taken and the
branch cannot be reversed, the branch's likely bit is set
if both of the following criteria are met:
1) the branch is taken at least percent of the time and
2) equals 0 or the branch is taken at
times more often than the time the branch's function is
called.
Both and are expressed as percentages.
-Wb,-prefetch,,
-F prefetch,,
This will insert prefetch instructions in loops if a
loop appears to access memory in a serial fashion. Only
loops which have at least iterations are
considered. is the expected latency for
fetches from memory in units of machine instruction
cycle times.
Off by default; -F X4 sets it "on" and sets the values to
400 and 400, respectively.
-WG,-xxx / -Wg,-xxx / -Wn,-xxx
Flags that have one of these forms control either the
"inliner" pass of the compiler (-Wg,-xxx), or the "cloner"
pass of the compiler (-Wn,-xxx), or both (-WG,-xxx).
A setting with a more specific value (lower case letter g or n)
overrides the more general setting (uppercase letter G).
Although the following description uses the "-WG,-xxx" form,
it holds for the other forms also.
Some flags exist for the "cloner" only (the pass that
optimizes for specific call locations of subroutines), they
provide finer control over the cloning process. They can be
written in the form -WG,-xxx or -Wn,-xxx; the following
description uses the form -Wn,-xxx.
-WG,-boc:
Tells the "umerge" phase to consider only those functions for
inlining/cloning whose estimated ratio of runtime cycle save
to I-cache cost of doing inlining/cloning is greater than or
equal to n.
-WG,-clone_expansion:
-Wn,-clone_expansion:
Directs the cloner and/or inliner to limit the maximum relative
growth of the program to . The default for is 1.3.
-Wn,-recursion_depth:
Sets the maximum number of function calls through which the
cloner will search to identify recursive functions. For
example, -WN,-recursion_depth:1 means that functions who call
themselves will be consider recursive functions.
-WG,-only_clone_recursion
-Wn,-only_clone_recursion
Directs the cloner and/or inliner to only clone recursive
functions.
-WG,-recursion_limit:
-Wn,-recursion_limit:
Directs the cloner and/or inlinert to limit the maximum number
of basic blocks in a recursive function to .
If -Wn,-recursion_limit isn't given, then this is set by the
-WG,-inline_limit flag.
If neither of these flags is given, the default is 500.
-Wo,-xxx
Syntax for flags that direct the compiler's optimizer pass
-Wo,-no_const_in_reg
Tells the optimizer not to put constants in registers.
-Wo,-recursive_calls
Directs uopt to use different heuristics that result in better
performance if there are recursive function calls in the
source code. Only effective in -F X4 mode.
-Wo,-splitedges,
Controls the edge splitting algorithm in "uopt" which inserts
an empty basic block on infrequently executed control flow
edges to increase optimization opportunities. This
optimization uses feedback information to limit the number
of split edges and avoid excessive compilation time.
"uopt" will split an edge if its execution frequency
multiplied by num is less than the smaller of the execution
frequencies of the edge's head and tail basic blocks.
Setting num to zero disables edge splitting.
2. Linker Flags
-dn This option is passed to ld. It specifies static linking
in the link editor.
3. Portability Flags:
-DI_TIME
-DI_SYS_TIME
Enables certain (SPEC-approved) source code parts via conditional
compilation.
Questions?
More details can be found in the compiler documentation. SPEC-specific
questions should be sent to the SPEC OSG representative
Reinhold Weicker, reinhold.weicker@pdb.siemens.de