SGI Flags Disclosure The following are the compiler switches/options used by SGI for the recent SPEC CPU2000 submissions. Portability Flags: -DUSG Specifies that the operating system is USG compliant. -Dalloca=__builtin_alloca Replace occurances of alloca() with __builtin_alloca. -DMIPS Specifies that this is a MIPS microprocessor. -DHOST_WORDS_BIG_ENDIAN Specifies that this is a big-endian host. -DSGI Compile for an SGI system. -DSPEC_CPU2000_SGI Compile for an SGI system. -DI_FCNTL Tells program to include . -DSYS_IS_USG Specifies that the operating system is USG compliant. -DSYS_HAS_TIME_PROTO Do not explicitly declare time(). -DSYS_HAS_SIGNAL_PROTO Do not explicitly #include -DSYS_HAS_IOCTL_PROTO Do not explicitly declare ioctl(). -DSYS_HAS_ANSI System is ANSI compliant. -DSYS_HAS_CALLOC_PROTO Do not explicitly declare calloc(). -DHAVE_SIGNED_CHAR System supports a "signed char" type. -fixedform Compiler flag, tells f90 compiler to use fixed format (F77 72 column format), instead of F90 free format. Optimization Flags: -bigp_off Disables the use of large pages within your program. This is the default for all optimization levels except -Ofast. -fb_create Used to specify that an instrumented executable program is to be generated. Such an executable is suitable for producing one or more .Counts files for feedback compilation. -fb_opt Used to specify a the path to the instrumented executable program previously generated using -fb_create. Using the path, the compiler will find the .Counts file that should be used to guide feedback compilation. The specified instrumented binary along with the .Counts file it produced will be used to generate a compiler feedback file, which will then be used to direct optimization of the program. -CG:ld_latency= Specifies the assumed latency of load instruction in processor cycles to be used to determine optimal instruction scheduling. -INLINE:aggressive=on Tells the compiler to be more aggressive about inlining. -IPA[:...] IPA option group: control the inter-procedural analyses and transformations performed. Note that giving just the group name without any options, i.e. -IPA, will invoke IPA with the default settings. -IPA:aggr_cprop[=(on/off)] Enable/disable aggressive interprocedural constant propagation. Attempt to avoid passing constant parameters, replacing the corresponding formal parameters by the constant values. -IPA:callee_limit=(n) Functions whole size exceeds this limit will never be automatically inlined by the compiler. -IPA:clone=on Allows IPA to clone procedures while inlining. -IPA:common_pad_size=n Specifies the amount by which to pad common block array dimensions. By default, the compiler automatically chooses the amount of padding to improve cache behavior for common block array accesses. -IPA:inline[=(on/off)] Controls whether compiler performs inter-file subprogram inlining during main IPA processing. -IPA:linear=on Sets linearization of array references. setting can be ON or OFF. When inlining Fortran subroutines, IPA tries to map formal array parameters to the shape of the actual parameter. It may not always be able to always map it. In the case that it cannot map the parameter, it linearizes the array reference. By default, it will not inline such callsites because they may cause performance problems. The default is OFF. -IPA:maxdepth=n Directs IPA to not attempt to inline functions at a depth of more than n in the callgraph, where functions which make no calls are at depth 0, those which call only depth 0 functions are at depth 1, and so on. Inlining remains subject to overriding limits on code expansion. See also forcedepth, space, and plimit. -IPA:min_hot=(n) When feedback information is available, a call site to a procedure must be invoked with a frequency that exceeds the threshold specified by n before the procedure will be inlined at that call site. -IPA:multi_clone=n Specifies the maximum number of clones that can be created from a single procedure. By default, this value is 0. interprocedural optimization, but it also may significantly increase the code size. -IPA:node_bloat=n When used in conjunction with IPA:multi_clone, this specifies the maximum percentage growth of the total number of procedures relative to the original program. -IPA:pad=off Disables the automatic padding of common block arrays. -IPA:plimit=(n) Inline calls to a procedure until the procedure has grown to size of n. -IPA:small_pu=(n) A procedure with size smaller than n is not subjected to the plimit restriction. -IPA:space=(n) Inline until a program expansion of n% is reached -IPA:use_intrinsic[=(ON|OFF)] Enable/disable loading the intrinsic version of standard library functions. -LANG:exceptions=(on/off) Enables or disables exception handling constructs in the language. Generally, code with and without exception handling cannot be mixed. Specifically, the scopes crossed between throwing and catching an exception must all have been compiled with exceptions=ON. Default is ON. -lfastm Causes the executable to be linked using libfastm.so, a faster, lower-precision versions of various routines from libm.so. -LNO:ap=(0/1/2) Controls automatic parallelization: 0 - no parallelization, 1 - normal parallelization, 2 - parallelize loops regardless of number of trip counts. -LNO:auto_dist=true On Origin systems, use a heuristic to distribute local and global arrays that are accessed in parallel. The heuristic is based on access patterns of the named arrays; access patterns of arrays used as dummy arguments are ignored. -LNO:blocking[=(on/off)] Enable/disable the cache blocking transformation. -LNO:cs2=(n) Specify size of second level cache (e.g. 4m equals 4 megabytes) -LNO:fission=(n/on/off) Perform loop fission, n: 0 - off, 1 - conservative, 2 - aggressive -LNO:fusion=(n/on/off) Perform loop fusion, n: 0 - off, 1 - conservative, 2 - aggressive -LNO:interchange=(on/off) Perform loop interchange. -LNO:local_pad_size=n Specifies the amount by which to pad local array dimensions. By default, the compiler automatically chooses the amount of padding to improve cache behavior for local array accesses. -LNO:opt= Controls the LNO optimization level. n can be one of the following: 0 Disables nearly all loop nest optimization. 1 Peforms full loop nest transformations. This is the default. -LNO:ou_prod_max=n Indicates that the product of unrolling of the various outer loops in a given loop nest is not to exceed n, where n is a positive integer. The default is 16. -LNO:outer_unroll_max,ou_max=(n) Outer_unroll_max indicates that the compiler may unroll outer loops in a loop nest by as many as n per loop, but no more. -LNO:pf2=(on/off) Enable/disable prefetch for second level cache. -LNO:prefetch[=(0|1|2)] Specify level of prefetching. 0 = Prefetch disabled. 1 = Prefetch enabled but conservative 2 = Prefetch enabled and aggressive. -LNO:prefetch_ahead=[n] Prefetch n cache line(s) ahead -LNO:pwr2[=(on/off)] If enabled, when the leading dimension of an array is a power of two, the compiler makes an extra effort to make the inner loop stride one. -mips4 Generate code using the full MIPS IV instruction set which is supported on R10000, R5000 and R8000 systems, and search for mips4 libraries/objects at link-time. -n32 use high performance 32bit mips-ABI -O or -O2 Turn on extensive optimization. The optimizations at this level are generally conservative, in the sense that they (1) are virtually always beneficial, (2) provide improvements commensurate to the compile time spent to achieve them, and (3) avoid changes which affect such things as floating point accuracy. -O3 Turn on aggressive optimization. The optimizations at this level are distinguished from -O2 by their aggressiveness, generally seeking highest-quality generated code even if it requires extensive compile time. They may include optimizations which are generally beneficial but occasionally hurt performance. -Ofast[=ipxx] Use optimizations selected to maximize performance for the given SGI target platform IPxx. The optimizations may differ for the various platforms, and will always enable the full instruction set of the target platform (e.g. -mips4 for an R10000). Although the optimizations are generally safe, they may affect floating point accuracy due to rearrangement of computations -OPT:alias= Specifies the pointer aliasing model to be used. By specifiying one or more of the following for , the compiler is able to make assumptions throughout the compilation: restrict Specifies that distinct pointers are assumed to point to distinct, non-overlapping objects. disjoint Specifies that any two pointer expressions are assumed to point to distinct, non-overlapping objects. -OPT:div_split=(true/false) Enable/disable changing x/y into x*(recip(y)) -OPT:fast_bit_intrinsics[=(on/off)] Disable/enable the check for the bit count being within range for Fortran bit intrinsics (e.g., BTEST, ISHFT). -OPT:Olimit=(n) Disable optimization when size of program unit is > n. When n is 0, program unit size is ignored and optimization process will not be disabled due to compile time limit. -OPT:goto=(off/on) Disable/enable the conversion of GOTOs into higher level structures like FOR loops. -OPT:IEEE_arith=(n) specify level of conformance to IEEE 754 floating pointing roundoff/overflow behavior. At level 3, all mathematically valid transformations are allowed. -OPT:ro=(n) Specify the level of acceptable deviation from source order floating point roundoff and overflow behavior. At level 3, any mathematically valid transformation is enabled. -OPT:unroll_times_max=(n) Unroll inner loops by a maximum of n. -OPT:unroll_size=(n) Sets the ceiling of maximum number of instructions for an unrolled inner loop. If n = 0, the ceiling is disregarded. -OPT:unroll_analysis=[on/off] Enable/disable unrolling of inner loops by analysing resource and processor specifics. -pfa Turns on the MIPSpro Auto Parallelizing Option, which enables the compiler to automatically discover parallelism in the source code. -TARG:platform[=ipxx] Identify the target SGI platform for compilation, choosing various internal parameters (such as cache sizes) appropriately. -TENV:X=(0..5) Specify the level of enabled exceptions that will be assumed for purposes of performing speculative code motion (default level 1 at -O0..-O2, 2 at -O3). In general, an instruction will not be speculated (i.e. moved above a branch by the optimizer) unless any exceptions it might cause are disabled by this option. At level 0, no speculative code motion may be performed. At level 1, safe speculative code motion may be performed, with IEEE-754 underflow and inexact exceptions disabled. At level 2, all IEEE-754 exceptions are disabled except divide by zero. At level 3, all IEEE-754 exceptions are disabled including divide by zero. At level 4, memory exceptions may be disabled or ignored. -Wl,-x Passes the -x option to the linker. With this flag set, the linker will not preserve local (non-global) symbols in the output symbol table. The linker enters external and static symbols only. This option conserves space in the output file. The following are descriptions of the system tunable parameters used to enable large pages. systune -i Systune is a tool that enables a user to examine and configure your tunable kernel parameters. -i puts systune in interactive mode. percent_totalmem_4m_pages=n A system tunable parameter that can be set from within systune. It tells IRIX what the maximum percent of memory can be allocated to 1MB pages. nlpages_4m=n A system tunable parameter that can be set from within systune. It tells IRIX to statically allocate n 4MB pages at system boot time. PAGESIZE_DATA This environment variable tells IRIX what size pages to give your applications. Legal values for are 16, 256, 1024, 4096, and 16384 representing the size in kilobytes of the pages to be used. This only works for applications compiled either with -Ofast or with -bigp_on.