Flag description file for Sun compiled SPECint2000 and SPECfp2000 binaries using the following compilers: PGI Fortran 5.2 PathScale EKO Compiler Suite, Release 1.1 Red Hat gcc 3.5 ssa (from Red Hat Enterprise Linux WS 3 (AMD64)) SuSE optional gcc 3.3 (from SLES8 SP3) ---------------------------------------------------------------------------- PGI (Portland Group International) compiler 5.2 flags ---------------------------------------------------------------------------- +ACML Linking with AMD Core Math Library. Supplied with the PGI compiler 5.2 RM_SOURCES=lapak.f90 Remove the source file 'lapak.f90' in 178.galgel. The optimization levels and their meanings are as follows: -O0 A basic block is generated for each Fortran statement. No scheduling is done between statements. No global optimizations are performed. -O1 Scheduling within extended basic blocks is performed. Some register allocation is performed. No global optimizations are performed. -O2 All level 1 optimizations are performed. In addition, scalar optimizations such as induction recognition and loop invariant motion are performed by the global optimizer. -O3 This level performs all level-one and level-two optimizations and enables more aggressive hoisting and scalar replacement optimizations. -fast Equivalent to "-O2 -Munroll=c:1 -Mnoframe -Mlre" -fastsse Equivalent to "-fast -Mscalarsse -Mvect=sse -Mcache_align -Mflushz" -Mcache_align Align unconstrained objects of length greater than or equal to 16 bytes on cache-line boundaries. An unconstrained object is a data object that is not a member of an aggregate structure or common block. This option does not affect the alignment of allocatable or automatic arrays. Note: To effect cache-line alignment of stack-based local variables, the main program or function must be compiled with -Mcache_align. -Mfixed Process source using Fortran90 freeform specifications. -Mflushz Set SSE MXCSR register to flush-to-zero mode. -Mipa[=option[,option,...]] Enable and specify options for InterProcedural Analysis (IPA). This also sets the optimization level to a minimum of 2; see -O. If no option list is specified, then it is equivalent to -Mipa=const. The options are: const (default) noconst Enable (disable) propagation of constants across procedure calls. inline:n Determine additional functions to inline, allowing up to n levels of inlining. fast Chooses generally optimal -Mipa flags for the target platform; use pgf90 -Mipa -help to see the equivalent options. fast is equivalant to -Mipa=align,arg,const,f90ptr,shape,globals,localarg,ptr align noalign Enable (disable) recognition when pointer targets are all cache-line aligned, allowing better SSE code generation. arg noarg Remove (don't remove) arguments replaced by -Mipa=ptr,const. -Mipa=noarg implies -Mipa=nolocalarg. const (default) noconst Enable (disable) propagation of constants across procedure calls. f90ptr nof90ptr Enable (disable) Fortran 90 pointer disambiguation across procedure calls. globals noglobals Analyze (don't analyze) which globals are modified by procedure calls. localarg nolocalarg Enable (disable) feature to externalize local variables to allow arguments to be replaced by -Mipa=ptr. -Mipa=localarg implies -Mipa=arg. ptr noptr Enable (disable) pointer disambiguation across procedure calls. shape noshape Perform (don't perform) Fortran 90 shape propagation. -mp Enable OpenMP -Mnoframe Eliminate operations that set up a true stack frame pointer for functions. -Mnosmart Don't run the Smart assembly re-write tool to enable post-compilation linear assembly scheduling and optimization -Mscalarsse Utilize the SSE (Streaming SIMD(Single Instruction Multiple Data) Extensions) and SSE2 instructions to perform the operations coded. This assumes the user has an assembler capable of interpreting SSE/SSE2 instructions, as in later versions of Linux. This implies -Mflushz. -Munroll Invokes the loop unroller. This also sets the optimization level to 2 if the level is set to less than 2. c:m Instructs the compiler to completely unroll loops with a constant loop count less than or equal to m, a supplied constant. If this value is not supplied, the m count is set to 4. n:u Instructs the compiler to unroll u times, a loop which is not completely unrolled, or has a non-constant loop count. If u is not supplied, the unroller computes the number of times a candidate loop is unrolled. -Mvect=sse Instructs the vectorizer to search for loops, and where possible, use the SSE or SSE2 and prefetch instructions (depending on which processor is targeted). -Mlre[=assoc|noassoc] -Mnolre Enable (disable) loop-carried redundancy elimination. The assoc option allows expression reassociation, and the noassoc option disallows expression reassociation. ---------------------------------------------------------------------------- PathScale EKO Compiler Suite, Release 1.1 ---------------------------------------------------------------------------- PathScale compiler flag disclosure in this file is an excerpt from "PathScale EKO Compiler Suite (Fortran, C and C++ compilers) flag descriptions, for SPEC CPU2000 submissions." Copyright 2003, 2004 PathScale, Inc. All Rights Reserved. PathScale EKO Compiler Suite (Fortran, C and C++ compilers) flag descriptions, for SPEC CPU2000 submissions. Portability Flags: -DSPEC_CPU2000_LP64 Compile using LP64 programming model. -DLINUX_i386 Linux Intel system, use "long long" as 64bit variable. -DHAS_ERRLIST Prog env provides specification for "sys_errlist[]". -DSPEC_CPU2000_NEED_BOOL Use SPEC provided definition of the boolean type. -DSPEC_CPU2000_LINUX_I386 Compile for an I386 system running Linux. -DPSEC_CPU2000_GLIBC22 Compatibility with 2.2 & later versions of glibc -DSYS_IS_USG Specifies that the operating system is USG compliant. -DSYS_HAS_TIME_PROTO Do not explicitly declare time(). -DSYS_HAS_SIGNAL_PROTO Do not explicitly #include -DSYS_HAS_IOCTL_PROTO Do not explicitly declare ioctl(). -DSYS_HAS_ANSI System is ANSI compliant. -DSYS_HAS_CALLOC_PROTO Do not explicitly declare calloc(). -fixedform tells f90 compiler to use fixed format (F77 72 column format), instead of F90 free format. -DWANT_STDC_PROTO Use function prototypes as in standard C. Optimization Flags: Some suboptions either enable or disable the feature. To enable a feature, either specify only the suboption name or specify =1, =ON, or =TRUE. Disabling a feature, is accomplished by adding =0, =OFF, or =FALSE. These values are insensitive to case: 'on' & 'ON' mean the same thing. Below, ON & OFF indicate the enabling or disabling of a feature. -CG[:...] Code Generation option group: control the optimizations and transformations of the instruction-level code generator. -CG:cflow=(ON|OFF) A value of OFF disables control flow optimization in the code generation. Default is ON. -CG:gcm=(ON|OFF) Specifying OFF disables the instruction-level global code motion optimization phase. The default is ON. -CG:local_fwd_sched=(ON|OFF) Changes the instruction scheduling algorithm to work forward instead of backward for the instructions in each basic block. The default is OFF. -CG:p2align_freq=n Aligns branch targets based on execution frequency. This option is meaningful only under feedback-directed compilation. The default value n=0 turns off the alignment optimization. Any other value specifies the frequency threshold at or above which this alignment will be performed by the compiler. -CG:prefetch=(ON|OFF) Suppresses any generation of prefetch instructions in the code generator. This has the same effect as -LNO:prefetch=0. The default is ON. -fb_create Used to specify that an instrumented executable program is to be generated. Such an executable is suitable for producing feedback data files with the specified prefix for use in feedback-directed compilation (FDO). The commonly used prefix is "fbdata". This is OFF by default. -fb_opt Used to specify feedback-directed compilation (FDO) by extracting feedback data from files with the specified prefix, which were previously generated using -fb_create. The commonly used prefix is "fbdata". This optimization is off by default. -fno-math-errno Do not set ERRNO after calling math functions that are executed with a single instruction, e.g., sqrt. A program that relies on IEEE exceptions for math error handling may want to use this flag for speed while maintaining IEEE arithmetic compatibility. This is implied by -Ofast. The default is -fmath-errno. -INLINE:aggressive=(ON|OFF) Tells the compiler to be more aggressive about inlining. The default is -INLINE:aggressive=OFF. -IPA[:...] IPA option group: control the inter-procedural analyses and transformations performed. Note that giving just the group name without any options, i.e., -IPA, will invoke the interprocedural analyzer. -IPA is off by default unless -Ofast is specified. -ipa Same as -IPA alone. -IPA:callee_limit=(n) Functions whose size exceeds this limit will never be automatically inlined by the compiler. The default is n=2000. -IPA:ctype=(ON|OFF) Turns on optimizations that speed up interfaces to the constructs defined in ctype.h by assuming that the program will not be run in a multi-threaded environment. The default is OFF. -IPA:linear=(ON|OFF) Sets linearization of array references. setting can be ON or OFF. When inlining Fortran subroutines, IPA tries to map formal array parameters to the shape of the actual parameter. It may not always be able to always map it. In the case that it cannot map the parameter, it linearizes the array reference. By default, it will not inline such callsites because they may cause performance problems. The default is OFF. -IPA:min_hotness=(n) When feedback information is available, a call site to a procedure must be invoked with a frequency that exceeds the threshold specified by n before the procedure will be inlined at that call site. The default is n=10. -IPA:plimit=(n) Inline calls to a procedure until the procedure has grown to size of n. The default is 2500. -IPA:space=(n) Inline until a program expansion of n% is reached. This defaults to 100. -LNO: option group specifies options and transformations performed on loop nests. The -LNO: option group is enabled only if the -O3 option is also specified on the compiler command line. -LNO:blocking[=(ON|OFF)] Enable/disable the cache blocking transformation. The default is on at -O3 or higher. -LNO:fusion=n Perform loop fusion, n: 0 - off, 1 - conservative, 2 - aggressive. The default is 1. -LNO:interchange[=(ON|OFF)] Specifying OFF disables the loop interchange transformation in the loop nest optimizer. Default is ON. -LNO:opt=n This option controls the LNO optimization level. n can be one of the following: 0 = Disables nearly all loop nest optimizations 1 = Performs full loop nest transformations (default) -LNO:ou_prod_max=n Indicates that the product of unrolling of the various outer loops in a given loop nest is not to exceed n, where n is a positive integer. The default is 16. -LNO:prefetch[=(0|1|2|3)] Specify level of prefetching. 0 = Prefetch disabled. 1 = Prefetch is done only for arrays that are always referenced in each iteration of a loop, the default. 2 = Prefetch is done without the above restrictions. 3 = Most aggressive. -LNO:prefetch_ahead=n Prefetch n cache line(s) ahead. The default is 2. -LNO:sclrze=(ON|OFF) Turns on/off the optimization that replaces an array by a scalar variable. The default is ON. -m32 Generates code according to the 32-bit ABI, also known as x86 or IA32. -O or -O2 Turn on extensive optimization. The optimizations at this level are generally conservative, in the sense that they (1) are virtually always beneficial, (2) provide improvements commensurate to the compile time spent to achieve them, and (3) avoid changes which affect such things as floating point accuracy. -O3 Turn on aggressive optimization. The optimizations at this level are distinguished from -O2 by their aggressiveness, generally seeking highest-quality generated code even if it requires extensive compile time. They may include optimizations which are generally beneficial but occasionally hurt performance. This includes but is not limited to turning on the Loop Nest Optimizer, -LNO:opt=1, and setting -OPT:ro=1:IEEE_arith=2:Olimit=9000:reorg_common=ON. -Ofast Equivalent to "-O3 -ipa -OPT:Ofast -fno-math-errno." -OPT:Ofast is described below. -OPT:alias= Specifies the pointer aliasing model to be used. By specifiying one or more of the following for , the compiler is able to make assumptions throughout the compilation: typed Assume that the code adheres to the ANSI/ISO C standard which states that two pointers of different types cannot point to the same location in memory. This is on by default when -Ofast is specified. restrict Specifies that distinct pointers are assumed to point to distinct, non-overlapping objects. This is off by default. disjoint Specifies that any two pointer expressions are assumed to point to distinct, non-overlapping objects. This is off by default. -OPT:div_split=(ON|OFF) Enable/disable changing x/y into x*(recip(y)). This is OFF by default. -OPT:goto=(OFF|ON) Disable/enable the conversion of GOTOs into higher level structures like FOR loops. The default is ON for -O2 or higher. -OPT:IEEE_arithmetic,IEEE_arith=(n) specify level of conformance to IEEE 754 floating pointing roundoff/overflow behavior. n can be one of the following: 1 Adheres to IEEE accuracy. This is the default when optimization levels -O0, -O1 and -O2 are in effect. 2. May produce inexact result not conforming to IEEE 754. This is the default when -O3 is in effect. 3. All mathematically valid transformations are allowed. -OPT:Ofast Use optimizations selected to maximize performance. Although the optimizations are generally safe, they may affect floating point accuracy due to rearrangement of computations. This effectively turns on the following optimizations: -OPT:ro=2:Olimit=0:div_split=ON:alias=typed -TARG:msse2=on -OPT:Olimit=(n) Disable optimization when size of program unit is > n. When n is 0, program unit size is ignored and optimization process will not be disabled due to compile time limit. The default is 0 when -Ofast is specified, otherwise the default is 6000 under -O2 and 9000 under -O3. -OPT:roundoff,ro=(n) Specifies the level of acceptable departure from source language floating-point, round-off, and overflow semantics. n can be one of the following: 0 Inhibits optimizations that might affect the floating-point behavior. This is the default when optimization levels -O0, -O1, and -O2 are in effect. 1 Allows simple transformations that might cause limited round-off or overflow differences. Compounding such transformations could have more extensive effects. This is the default level when -O3 is in effect. 2 Allows more extensive transformations, such as the reordering of reduction loops. This is the default level when -Ofast is specified. 3 Enables any mathematically valid transformation. -OPT:unroll_analysis=(ON|OFF) The default value of ON lets the compiler analyze the content of the loop to determine the best unrolling parameters, instead of strictly adhering to the -OPT:unroll_times_max and -OPT:unroll_size parameters. -OPT:unroll_times_max,unroll_times=(n) Unroll inner loops by a maximum of n. The default is 4. -OPT:unroll_size=(n) Sets the ceiling of maximum number of instructions for an unrolled inner loop. If n = 0, the ceiling is disregarded. -static Suppresses dynamic linking at run-time for shared libraries; uses static linking instead. -TENV:X=(0|1|2|3|4) Specify the level of enabled exceptions that will be assumed for purposes of performing speculative code motion (default is 1 at all optimization levels). In general, an instruction will not be speculated (i.e. moved above a branch by the optimizer) unless any exceptions it might cause are disabled by this option. At level 0, no speculative code motion may be performed. At level 1, safe speculative code motion may be performed, with IEEE-754 underflow and inexact exceptions disabled. At level 2, all IEEE-754 exceptions are disabled except divide by zero. At level 3, all IEEE-754 exceptions are disabled including divide by zero. At level 4, memory exceptions may be disabled or ignored. -TENV:frame_pointer=(ON|OFF) Default is ON for C++ and OFF otherwise. Local variables in the function stack frame are addressed via the frame pointer register. Ordinarily, the compiler will replace this use of frame pointer by addressing local variables via the stack pointer when it determines that the stack pointer is fixed throughout the function invocation. This frees up the frame pointer for other purposes. Turning this flag on forces the compiler to use the frame pointer to address local variables. This flag defaults to on for C++ because the exception handling mechanism relies on the frame pointer register being used to address local variables. This flag can be turned off for C++ for programs that do not throw exceptions. -Wl,-x Passes the -x option to the linker. With this flag set, the linker will not preserve local (non-global) symbols in the output symbol table. The linker enters external and static symbols only. This option conserves space in the output file. This is OFF by default. -WOPT:mem_opnds=(ON|OFF) ON makes the scalar optimizer preserve any memory operands of arithmetic operations so as to promote subsumption of memory loads into the operands of arithmetic operations. The default is OFF. -WOPT:retype_expr=(ON|OFF) ON enables the optimization in the compiler that converts 64-bit address computation to use 32-bit arithmetic as much as possible. The default is OFF. -WOPT:val=(0|1|2) Controls the number of times the value-numbering optimization is performed in the global optimizer, with the default being 1. This optimization tries to recognize expressions that will compute identical run-time values and changes the program to avoid re-computing them. ---------------------------------------------------------------------------- Flags for gcc 3.5 ssa (from Red Hat Enterprise Linux WS 3 (AMD64)) SuSE optional gcc 3.3 (from SLES8 SP3) ---------------------------------------------------------------------------- -O0 Do not optimize. This is the default. -O -O1 Optimize. Optimizing compilation takes somewhat more time, and a lot more memory for a large function. With `-O', the compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time. `-O' turns on the following optimization flags: -fdefer-pop -fmerge-constants -fthread-jumps -floop-optimize -fcrossjumping -fif-conversion -fif-conversion2 -fdelayed-branch -fguess-branch-probability -fcprop-registers `-O' also turns on `-fomit-frame-pointer' on machines where doing so does not interfere with debugging. -O2 Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. The compiler does not perform loop unrolling or function inlining when you specify `-O2'. As compared to `-O', this option increases both compilation time and the performance of the generated code. `-O2' turns on all optimization flags specified by `-O'. It also turns on the following optimization flags: -fforce-mem -foptimize-sibling-calls -fstrength-reduce -fcse-follow-jumps -fcse-skip-blocks -frerun-cse-after-loop -frerun-loop-opt -fgcse -fgcse-lm -fgcse-sm -fdelete-null-pointer-checks -fexpensive-optimizations -fregmove -fschedule-insns -fschedule-insns2 -fsched-interblock -fsched-spec -fcaller-saves -fpeephole2 -freorder-blocks -freorder-functions -fstrict-aliasing -falign-functions -falign-jumps -falign-loops -falign-labels Please note the warning under `-fgcse' about invoking `-O2' on programs that use computed gotos. -O3 Optimize yet more. `-O3' turns on all optimizations specified by `-O2' and also turns on the `-finline-functions', `-fweb', `-funit-at-time', `-ftracer', `-funswitch-loops' and `-frename-registers' options. -funroll-all-loops Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly. `-funroll-all-loops' implies the same options as `-funroll-loops' -fprofile-arcs/-fbranch-probabilities -fprofile-arcs Instrument "arcs" during compilation to generate coverage data or for profile-directed block ordering. During execution the program records how many times each branch is executed and how many times it is taken. When the compiled program exits it saves this data to a file called `AUXNAME.da' for each source file. AUXNAME is generated from the name of the output file, if explicitly specified and it is not the final executable, otherwise it is the basename of the source file. In both cases any suffix is removed (e.g. `foo.da' for input file `dir/foo.c', or `dir/foo.da' for output file specified as `-o dir/foo.o'). For profile-directed block ordering, compile the program with `-fprofile-arcs' plus optimization and code generation options, generate the arc profile information by running the program on a selected workload, and then compile the program again with the same optimization and code generation options plus `-fbranch-probabilities' (*note Options that Control Optimization: Optimize Options.). With `-fprofile-arcs', for each function of your program GCC creates a program flow graph, then finds a spanning tree for the graph. Only arcs that are not on the spanning tree have to be instrumented: the compiler adds code to count the number of times that these arcs are executed. When an arc is the only exit or only entrance to a block, the instrumentation code can be added to the block; otherwise, a new basic block must be created to hold the instrumentation code. -ffast-math Sets `-fno-math-errno', `-funsafe-math-optimizations', `-fno-trapping-math', `-ffinite-math-only' and `-fno-signaling-nans'. This option causes the preprocessor macro `__FAST_MATH__' to be defined. This option should never be turned on by any `-O' option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions. -fno-math-errno Do not set ERRNO after calling math functions that are executed with a single instruction, e.g., sqrt. A program that relies on IEEE exceptions for math error handling may want to use this flag for speed while maintaining IEEE arithmetic compatibility. This option should never be turned on by any `-O' option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions. The default is `-fmath-errno'. -funsafe-math-optimizations Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid and (b) may violate IEEE or ANSI standards. When used at link-time, it may include libraries or startup files that change the default FPU control word or other similar optimizations. This option should never be turned on by any `-O' option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions. The default is `-fno-unsafe-math-optimizations'. -ffinite-math-only Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs. This option should never be turned on by any `-O' option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications. The default is `-fno-finite-math-only'. -fno-trapping-math Compile code assuming that floating-point operations cannot generate user-visible traps. These traps include division by zero, overflow, underflow, inexact result and invalid operation. This option implies `-fno-signaling-nans'. Setting this option may allow faster code if one relies on "non-stop" IEEE arithmetic, for example. This option should never be turned on by any `-O' option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions. The default is `-ftrapping-math'. -fsignaling-nans Compile code assuming that IEEE signaling NaNs may generate user-visible traps during floating-point operations. Setting this option disables optimizations that may change the number of exceptions visible with signaling NaNs. This option implies `-ftrapping-math'. This option causes the preprocessor macro `__SUPPORT_SNAN__' to be defined. The default is `-fno-signaling-nans'. This option is experimental and does not currently guarantee to disable all GCC optimizations that affect signaling NaN behavior. -m32 -m64 These `-m' switches are supported in addition to the above on AMD x86-64 processors in 64-bit environments. Generate code for a 32-bit or 64-bit environment. The 32-bit environment sets int, long and pointer to 32 bits and generates code that runs on any i386 system. The 64-bit environment sets int to 32 bits and long and pointer to 64 bits and generates code for AMD's x86-64 architecture. -msse2 -mno-sse -mno-sse2 These switches enable or disable the use of built-in functions that allow direct access to the MMX, SSE and 3Dnow extensions of the instruction set. -fno-defer-pop Always pop the arguments to each function call as soon as that function returns. For machines which must pop arguments after a function call, the compiler normally lets arguments accumulate on the stack for several function calls and pops them all at once. Disabled at levels `-O', `-O2', `-O3', `-Os'. -fmerge-constants Attempt to merge identical constants (string constants and floating point constants) across compilation units. This option is the default for optimized compilation if the assembler and linker support it. Use `-fno-merge-constants' to inhibit this behavior. Enabled at levels `-O', `-O2', `-O3', `-Os'. -fthread-jumps Perform optimizations where we check to see if a jump branches to a location where another comparison subsumed by the first is found. If so, the first branch is redirected to either the destination of the second branch or a point immediately following it, depending on whether the condition is known to be true or false. Enabled at levels `-O', `-O2', `-O3', `-Os'. -floop-optimize Perform loop optimizations: move constant expressions out of loops, simplify exit test conditions and optionally do strength-reduction and loop unrolling as well. Enabled at levels `-O', `-O2', `-O3', `-Os'. -fcrossjumping Perform cross-jumping transformation. This transformation unifies equivalent code and save code size. The resulting code may or may not perform better than without cross-jumping. Enabled at levels `-O', `-O2', `-O3', `-Os'. -fif-conversion Attempt to transform conditional jumps into branch-less equivalents. This include use of conditional moves, min, max, set flags and abs instructions, and some tricks doable by standard arithmetics. The use of conditional execution on chips where it is available is controlled by `if-conversion2'. Enabled at levels `-O', `-O2', `-O3', `-Os'. -fif-conversion2 Use conditional execution (where available) to transform conditional jumps into branch-less equivalents. Enabled at levels `-O', `-O2', `-O3', `-Os'. -fdelayed-branch If supported for the target machine, attempt to reorder instructions to exploit instruction slots available after delayed branch instructions. Enabled at levels `-O', `-O2', `-O3', `-Os'. -fguess-branch-probability Do not guess branch probabilities using a randomized model. Sometimes gcc will opt to use a randomized model to guess branch probabilities, when none are available from either profiling feedback (`-fprofile-arcs') or `__builtin_expect'. This means that different runs of the compiler on the same program may produce different object code. In a hard real-time system, people don't want different runs of the compiler to produce code that has different behavior; minimizing non-determinism is of paramount import. This switch allows users to reduce non-determinism, possibly at the expense of inferior optimization. The default is `-fguess-branch-probability' at levels `-O', `-O2', `-O3', `-Os'. -fcprop-registers After register allocation and post-register allocation instruction splitting, we perform a copy-propagation pass to try to reduce scheduling dependencies and occasionally eliminate the copy. Disabled at levels `-O', `-O2', `-O3', `-Os'. -fforce-mem Force memory operands to be copied into registers before doing arithmetic on them. This produces better code by making all memory references potential common subexpressions. When they are not common subexpressions, instruction combination should eliminate the separate register-load. Enabled at levels `-O2', `-O3', `-Os'. -foptimize-sibling-calls Optimize sibling and tail recursive calls. Enabled at levels `-O2', `-O3', `-Os'. -fstrength-reduce Perform the optimizations of loop strength reduction and elimination of iteration variables. Enabled at levels `-O2', `-O3', `-Os'. -fcse-follow-jumps In common subexpression elimination, scan through jump instructions when the target of the jump is not reached by any other path. For example, when CSE encounters an `if' statement with an `else' clause, CSE will follow the jump when the condition tested is false. Enabled at levels `-O2', `-O3', `-Os'. -fcse-skip-blocks This is similar to `-fcse-follow-jumps', but causes CSE to follow jumps which conditionally skip over blocks. When CSE encounters a simple `if' statement with no else clause, `-fcse-skip-blocks' causes CSE to follow the jump around the body of the `if'. Enabled at levels `-O2', `-O3', `-Os'. -frerun-cse-after-loop Re-run common subexpression elimination after loop optimizations has been performed. Enabled at levels `-O2', `-O3', `-Os'. -frerun-loop-opt Run the loop optimizer twice. Enabled at levels `-O2', `-O3', `-Os'. -fgcse Perform a global common subexpression elimination pass. This pass also performs global constant and copy propagation. _Note:_ When compiling a program using computed gotos, a GCC extension, you may get better runtime performance if you disable the global common subexpression elimination pass by adding `-fno-gcse' to the command line. Enabled at levels `-O2', `-O3', `-Os'. -fgcse-lm When `-fgcse-lm' is enabled, global common subexpression elimination will attempt to move loads which are only killed by stores into themselves. This allows a loop containing a load/store sequence to be changed to a load outside the loop, and a copy/store within the loop. Enabled by default when gcse is enabled. -fgcse-sm When `-fgcse-sm' is enabled, A store motion pass is run after global common subexpression elimination. This pass will attempt to move stores out of loops. When used in conjunction with `-fgcse-lm', loops containing a load/store sequence can be changed to a load before the loop and a store after the loop. Enabled by default when gcse is enabled. -fdelete-null-pointer-checks Use global dataflow analysis to identify and eliminate useless checks for null pointers. The compiler assumes that dereferencing a null pointer would have halted the program. If a pointer is checked after it has already been dereferenced, it cannot be null. In some environments, this assumption is not true, and programs can safely dereference null pointers. Use `-fno-delete-null-pointer-checks' to disable this optimization for programs which depend on that behavior. Enabled at levels `-O2', `-O3', `-Os'. -fexpensive-optimizations Perform a number of minor optimizations that are relatively expensive. Enabled at levels `-O2', `-O3', `-Os'. -fregmove Attempt to reassign register numbers in move instructions and as operands of other simple instructions in order to maximize the amount of register tying. This is especially helpful on machines with two-operand instructions. Note `-fregmove' and `-foptimize-register-move' are the same optimization. Enabled at levels `-O2', `-O3', `-Os'. -fschedule-insns If supported for the target machine, attempt to reorder instructions to eliminate execution stalls due to required data being unavailable. This helps machines that have slow floating point or memory load instructions by allowing other instructions to be issued until the result of the load or floating point instruction is required. Enabled at levels `-O2', `-O3', `-Os'. -fschedule-insns2 Similar to `-fschedule-insns', but requests an additional pass of instruction scheduling after register allocation has been done. This is especially useful on machines with a relatively small number of registers and where memory load instructions take more than one cycle. Enabled at levels `-O2', `-O3', `-Os'. -fsched-interblock -fno-sched-interblock Don't schedule instructions across basic blocks. This is normally enabled by default when scheduling before register allocation, i.e. with `-fschedule-insns' or at `-O2' or higher. -fsched-spec -fno-sched-spec Don't allow speculative motion of non-load instructions. This is normally enabled by default when scheduling before register allocation, i.e. with `-fschedule-insns' or at `-O2' or higher. -fcaller-saves Enable values to be allocated in registers that will be clobbered by function calls, by emitting extra instructions to save and restore the registers around such calls. Such allocation is done only when it seems to result in better code than would otherwise be produced. This option is always enabled by default on certain machines, usually those which have no call-preserved registers to use instead. Enabled at levels `-O2', `-O3', `-Os'. -fpeephole2 -fno-peephole -fno-peephole2 Disable any machine-specific peephole optimizations. The difference between `-fno-peephole' and `-fno-peephole2' is in how they are implemented in the compiler; some targets use one, some use the other, a few use both. `-fpeephole' is enabled by default. `-fpeephole2' enabled at levels `-O2', `-O3', `-Os'. -freorder-blocks Reorder basic blocks in the compiled function in order to reduce number of taken branches and improve code locality. Enabled at levels `-O2', `-O3', `-Os'. -freorder-functions Reorder basic blocks in the compiled function in order to reduce number of taken branches and improve code locality. This is implemented by using special subsections `text.hot' for most frequently executed functions and `text.unlikely' for unlikely executed functions. Reordering is done by the linker so object file format must support named sections and linker must place them in a reasonable way. Also profile feedback must be available in to make this option effective. See `-fprofile-arcs' for details. Enabled at levels `-O2', `-O3', `-Os'. -fstrict-aliasing Allows the compiler to assume the strictest aliasing rules applicable to the language being compiled. For C (and C++), this activates optimizations based on the type of expressions. In particular, an object of one type is assumed never to reside at the same address as an object of a different type, unless the types are almost the same. For example, an `unsigned int' can alias an `int', but not a `void*' or a `double'. A character type may alias any other type. Pay special attention to code like this: union a_union { int i; double d; }; int f() { a_union t; t.d = 3.0; return t.i; } The practice of reading from a different union member than the one most recently written to (called "type-punning") is common. Even with `-fstrict-aliasing', type-punning is allowed, provided the memory is accessed through the union type. So, the code above will work as expected. However, this code might not: int f() { a_union t; int* ip; t.d = 3.0; ip = &t.i; return *ip; } Every language that wishes to perform language-specific alias analysis should define a function that computes, given an `tree' node, an alias set for the node. Nodes in different alias sets are not allowed to alias. For an example, see the C front-end function `c_get_alias_set'. Enabled at levels `-O2', `-O3', `-Os'. -falign-functions -falign-functions=N Align the start of functions to the next power-of-two greater than N, skipping up to N bytes. For instance, `-falign-functions=32' aligns functions to the next 32-byte boundary, but `-falign-functions=24' would align to the next 32-byte boundary only if this can be done by skipping 23 bytes or less. `-fno-align-functions' and `-falign-functions=1' are equivalent and mean that functions will not be aligned. Some assemblers only support this flag when N is a power of two; in that case, it is rounded up. If N is not specified, use a machine-dependent default. Enabled at levels `-O2', `-O3'. -falign-jumps -falign-jumps=N Align branch targets to a power-of-two boundary, for branch targets where the targets can only be reached by jumping, skipping up to N bytes like `-falign-functions'. In this case, no dummy operations need be executed. If N is not specified, use a machine-dependent default. Enabled at levels `-O2', `-O3'. -falign-loops -falign-loops=N Align loops to a power-of-two boundary, skipping up to N bytes like `-falign-functions'. The hope is that the loop will be executed many times, which will make up for any execution of the dummy operations. If N is not specified, use a machine-dependent default. Enabled at levels `-O2', `-O3'. -falign-labels -falign-labels=N Align all branch targets to a power-of-two boundary, skipping up to N bytes like `-falign-functions'. This option can easily make code slower, because it must insert dummy operations for when the branch target is reached in the usual flow of the code. If `-falign-loops' or `-falign-jumps' are applicable and are greater than this value, then their values are used instead. If N is not specified, use a machine-dependent default which is very likely to be `1', meaning no alignment. Enabled at levels `-O2', `-O3'. -finline-limit=N By default, gcc limits the size of functions that can be inlined. This flag allows the control of this limit for functions that are explicitly marked as inline (i.e., marked with the inline keyword or defined within the class definition in c++). N is the size of functions that can be inlined in number of pseudo instructions (not counting parameter handling). The default value of N is 600. Increasing this value can result in more inlined code at the cost of compilation time and memory consumption. Decreasing usually makes the compilation faster and less code will be inlined (which presumably means slower programs). This option is particularly useful for programs that use inlining heavily such as those based on recursive templates with C++. Inlining is actually controlled by a number of parameters, which may be specified individually by using `--param NAME=VALUE'. The `-finline-limit=N' option sets some of these parameters as follows: max-inline-insns is set to N. max-inline-insns-single is set to N/2. max-inline-insns-single-auto is set to N/2. min-inline-insns is set to 130 or N/4, whichever is smaller. max-inline-insns-rtl is set to N. Using `-finline-limit=600' thus results in the default settings for these parameters. See below for a documentation of the individual parameters controlling inlining. _Note:_ pseudo instruction represents, in this particular context, an abstract measurement of function's size. In no way, it represents a count of assembly instructions and as such its exact meaning might change from one release to an another. -freduce-all-givs Forces all general-induction variables in loops to be strength-reduced. _Note:_ When compiling programs written in Fortran, `-fmove-all-movables' and `-freduce-all-givs' are enabled by default when you use the optimizer. These options may generate better or worse code; results are highly dependent on the structure of loops within the source code. -fprefetch-loop-arrays If supported by the target machine, generate instructions to prefetch memory to improve the performance of loops that access large arrays. Disabled at level `-Os'. rm -f *.da *.life analyz_prbrob.out Remove any profile feedback information from previous runs. ---------------------------------------------------------------------------- Portability flags used with gcc 3.5 ssa compiler ---------------------------------------------------------------------------- -DSPEC_CPU2000_LP64 Used to make longs and pointers 64 bit. -DHAS_ERRLIST Tells that the system provides the "sys_nerr" and "sys_errlist[]" variables in 252.eon.