Sun flags file for SPEC benchmark suite SPEC OMP2001 This file is for flags used with the Sun SPARC based systems. Flags described below are for the compilers: Sun Studio 12 And for the OS Solaris 10 Revised 14 July 2008 ---------------------------------------------------------------------------- Compiler flags ---------------------------------------------------------------------------- Flag Description ---- ----------- -autopar Enables automatic compiler parallelization -D Set definition for preprocessor. -dalign Selects generation of faster double word load/store instructions, and alignment of double and quad data on their natural boundaries in common blocks. -dbl_align_all=yes Force alignment of data on 8-byte boundaries. The value is either yes or no. If yes, all variables will be aligned on 8-byte boundaries -depend=yes Selects dependence analysis to better optimize DO loops. -e Accept extended (132 character) input source lines (FORTRAN) -fast This is a convenience option for selecting a set of optimizations for performance and it chooses the following switches that are defined elsewhere in this page: (C) -fns -fsimple=2 -fsingle -xalias_level=basic -xarch=native -xbuiltin=%all -xcache= -xchip= -xdepend -xlibmil -xlibmopt -xmemalign=8s -xO5 -xprefetch=auto,explicit (C++) -xO5 -xarch=native -xcache= -xchip= -xdepend=yes -xmemalign=8s -fsimple=2 -fns=yes -ftrap=%none -xlibmil -xlibmopt -xbuiltin=%all (Fortran) -xO5 -xarch=native -xcache= -xchip= -xdepend=yes -xpad=local -xvector=lib -dalign -fsimple=2 -fns=yes -ftrap=common -xlibmil -xlibmopt -fixed Accept fixed-format input source files (FORTRAN) -fma=fused Generate fused multiply add instruction where possible. -xarch=sparcfmaf is required, which is automatically enabled by -fast. -fns[=yes|no] Select non-standard floating point mode. -fns is the same as -fns=yes. This flag causes the nonstandard floating point mode to be enabled when a program begins execution. By default, the nonstandard floating point mode will not be enabled automatically. Warning: When nonstandard mode is enabled, floating point arithmetic may produce results that do not conform to the requirements of the IEEE 754 standard. See the Numerical Computation Guide for more information (see docs.sun.com). -fsimple=2 Selects aggressive floating-point optimizations. This option might be unsuited for programs requiring strict IEEE 754 standards compliance. -fsingle (-Xt and -Xs modes only) Causes the compiler to evaluate float expressions as single precision, rather than double precision. (This option has no effect if the compiler is used in either -Xa or -Xc modes, as float expressions are already evaluated as single precision.) -ftrap=t Sets the IEEE 754 trapping mode in effect at startup. t is a comma-separated list that consists of one or more of the following: %all, %none, common, [no%]invalid, [no%]overflow, [no%]underflow, [no%]division, [no%]inexact. The default is -ftrap=%none. This option sets the IEEE 754 trapping modes that are established at program initialization. Processing is left-to-right. common - invalid, division by zero, and overflow. %none - the default, turns off all trapping modes. Do not use this option for programs that depend on IEEE standard exception handling; you can get different numerical results, premature program termination, or unexpected SIGFPE signals. -g Include symbols in the object code. -lm Link with math library -lmopt This chooses the math library that is optimized for speed -lmtmalloc This uses fast concurrent malloc library suitable for multi-threaded applications -lmvec Link with vector math library -m32 Compile for 32 bit. -m64 Compile for 64 bit. -openmp See -xopenmp -pad=local Local padding to improve use of cache. -Qoption Pass option list to the compiler phase (Fortran, C++): f90comp - Fortran first pass iropt - Global optimizer cg - Code generator -qoption Same as -Qoption, the q is not case sensitive -Qoption cg -Qlp-ip=1 Turns on prefetching for one-level indirect memory accesses. -Qoption iropt -Apf:ipa=9 Specifies the number of prefetch instructions (distance) to be issued. Helps guide prefetching based on inter-procedural analysis. -Qoption iropt -Apf:l2subblock=256 Specifies the size of single subblock for the l2 cache line. Helps guide prefetching. -Qoption iropt -Apf:largedim Option applies to prefteching. If array dimensions are unknown, assume large dimensions. -Qoption iropt -Apf:pdl= Allow prefetching through up to n levels of indirect memory references. -Qoption iropt -Athr Turn on tree height reduction. -Qoption iropt -Atile:skewp Perform loop tiling which is enabled by loop skewing. Loop skewing transforms a non-fully interchangeable loop nest to a fully interchangeable loop nest. -Qoption iropt -Aujam:inner=g Increase the probability that small-trip-count inner loops will be fully unrolled. -Qoption iropt -Rloop_dist Do not perform loop distribution transformations. -Qoption iropt -xprefetch_level[=1|2|3] Increase the probability that small-trip-count inner loops will be fully unrolled. -xprefetch_level=1 enables automatic generation of prefetch instructions. -xprefetch_level=2 enables additional generation beyond level 1 and -xprefetch=3 enables additional generation beyond level 2. -stackvar Allocate local variables on the stack whenever possible. -unroll=n See -xunroll. -W2, Pass option list to the compiler phase iropt (C) -Wc, Pass option list to the compiler phase cg (C) -W2,-Apf:l2subblock=256 Specifies the size of single subblock for the l2 cache line. Helps guide prefetching. -W2,-Apf:outer=0 Do not consider out loop in marking. -W2,-Apf:pdl=1 Allow prefetching through up to n levels of indirect memory references. -Wc,-Qlp-ip=1 Turns on prefetching for one-level indirect memory accesses. -xalias_level= Allows compiler to perform type-based alias analysis at the given alias level (C). basic - assume ISO C9X aliasing rules for basic types only. std - assume ISO C9X aliasing rules. strong - assume all pointers are type safe (strongly typed). -xarch=isa This option limits the code generated by the compiler to the instructions of the specified instruction set architecture. native Compiler selects appropriate setting sparcvis2 Supports VIS2 instructions sparcfmaf Supports FP MulAdd extensions -xbuiltin=%all Substitute intrinsic functions or inline system functions where profitable for performance. -Xc Assume strict ANSI C conformance. -xcache=s1/l1/a1[/t1][:sn/ln/an[/tn]] Defines the cache characteristics of the processor. si - size of data cache at level i li - line size of data cache at level i ai - associativity of data cache at level i ti - threads sharing data cache at level i (default=1) -xchip=name Specifies the target processor for use by the optimizer. name: sparc64vi, ultra4, ultra4plus (among others) -xcode=abs32 Generate 32-bit absolute addresses. Code+data+bss size is limited to 2**32 bytes. -xcode=abs44 Generate 44-bit absolute addresses. Code+data+bss size is limited to 2**44 bytes. -xcrossfile[=] Enable optimization and inlining across source files, n={0|1}. The default is -xcrossfile=0 which specifies that no cross file optimizations are performed. -xcrossfile is equivalent to -xcrossfile=1. Normally, the scope of the compiler's analysis is limited to each separate file on the command line. With -xcrossfile, the compiler analyzes all the files named on the command line as if they had been concatenated into a single source file. -xdepend[=yes|no] Analyzes loops for inter-iteration data dependencies and does loop restructuring. -xdepend=yes and -xdepend are equivalent. Requires -xO3 or higher. -xipo[=] Enable optimization and inlining across source files, n={0|1|2}. At -xipo=2, the compiler performs interprocedural aliasing analysis as well as optimiza- tion of memory allocation and layout to improve cache performance. -xjobs= Set how many processes the compiler creates to complete its work. Currently, -xjobs works only with the -xipo option. This allows the interprocedural optimizer to use n as the maximum number of code generator instances it can invoke to compile different files. -xlibmil selects inlining of certain math library routines. -xlibmopt Selects linking the optimized math library. -xlic_lib=sunperf Link in the Sun supplied performance libraries -xlinkopt=n Perform link-time optimizations on relocatable object files, n specifies how aggressive the optimization is performed, 2 is the most aggressive. -xmemalign=8s Specify maximum assumed memory alignment, 8s means at most 8 byte alignment. -xO1 Does basic local optimization (peephole). -xO2 xO1 and more local and global optimizations. -xO3 Besides what xO2 does, it optimizes references or definitions for external variables. Loop unrolling and software pipelining are also performed. -xO4 xO3 plus function inlining. -xO5 Besides what xO4 does, it enables speculative code motion. -xopenmp= Enable OpenMP language extension ={noopt|parallel|none}. If you specify -xopenmp, but do not include a value, the compiler assumes -xopenmp=parallel. parallel Enables recognition of OpenMP pragmas. The optimization level under -xopenmp=parallel is -x03. The compiler changes the optimization level to -x03 if necessary and issues a warn- ing. -xpad=common:n Suggest to the compiler the amount n to pad common blocks. -xpad=local Add padding between adjacent local variables. -xpagesize= Set the preferred page size for running the program. The command pagesize -a shows which pagesizes are available on the platform. -xpagesize_heap= Set the preferred page size for the heap. -xpagesize_stack= Set the preferred page size for the stack. -xprefetch_level[=] Controls the aggressiveness of the -xprefetch=auto option (n={1|2|3}). The compiler becomes more aggressive, or in other words, introduces more prefetches with each, higher, level of -xprefetch_level. -xprefetch[=val[,val]] Enable prefetch instructions on those architectures that support prefetch. [no%]auto [Disable] Enable automatic generation of prefetch instructions. [no%]explicit [Disable] Enable explicit prefetch macros yes -xprefetch=yes is the same as -xprefetch=auto,explicit no -xprefetch=no is the same as -xprefetch=no%auto,no%explicit latx:n.n Adjust the compiler's assumed prefetch-to-load and prefetch-to-store latencies by the specified factor. Defaults If -xprefetch is not specified, -xprefetch=no%auto,explicit is assumed. If only -xprefetch is specified, -xprefetch=auto,explicit is assumed. -xprefetch_auto_type=indirect_array_access Generate indirect prefetches for a data arrays accessed indirectly. -xregs= Specify the usage of optional registers -xtarget=native Sets the hardware target. If the program is intended to run on a different target than the compilation machine, follow the -fast with the appropriate -xtarget= option. For example: f95 -fast -xtarget=ultra -xunroll=n Suggests to the compiler to unroll loop n times. -xvector=lib Transform math library calls within loops into single calls to the equivalent vector math routines when such transformations are possible. -xvector=yes Same as -xvector=lib -xprofile Use the profile feature, shorthand used for the process below -xprofile=

Collect data for a profile or use a profile to optimize

={{collect,use}[:],tcov} collect[:name] Collects and saves execution frequency for later use by the optimizer with -xprofile=use. The compiler generates code to measure statement execution-frequency. use[:name] Uses execution frequency data to optimize strategically. The name is the name of the executable that is being analyzed. -xprofile=collect Collect profile data for feedback directed optimizations. -xprofile=use Use data collected for profile feedback. ---------------------------------------------------------------------------- Operating System ---------------------------------------------------------------------------- Environment Variables Description --------------------- ----------- LD_PRELOAD=mpss.so.1 Allow use of the mpss.so.1 shared object, which provides a means by which preferred stack and/or heap page sizes can be selected. Once preloaded, the mpss.so.1 shared object reads environment variables MPSSHEAP and MPSSSTACK to determine any preferred page MPSSHEAP= Specify the preferred page size for heap. The specified page size is applied to all created processes. MPSSSTACK= Specify the preferred page size for stack. The specified page size is applied to all created processes. OMP_DYNAMIC Enables (TRUE) or disables (FALSE) dynamic adjustment of the number of threads available for execution of parallel regions. OMP_NUM_THREADS Sets the number of threads to use during execution, unless that number is explicitly changed by calling the OMP_SET_NUM_THREADS subroutine. SUNW_MP_PROCBIND This environment variable can be used to bind the LWPs (lightweight processes) managed by the microtasking library, libmtsk, to processors. Performance can be enhanced with processor binding, but performance degradation will occur if multiple LWPs are bound to the same processor. The value for SUNW_MP_PROCBIND can be - The string TRUE or FALSE (in any case). - a non-negative integer. - a list of two or more non-negative integers separated by one or more spaces (" "). - two non-negative integers, n1 and n2, separated by a minus ("-"); n1 must be less than or equal to n2. Integers in the above denote the "logical" processor IDs to which the LWPs are to be bound. Logical processor IDs are consecutive integers that start with 0, and may or may not be identical to the actual processsor IDs. If n processors are available online, then their logical processor IDs are 0, 1, ..., n-1. By default, LWPs are not bound to processors. It is left up to the operating system, Solaris, to schedule LWPs onto processors. STACKSIZE A default stacksize of 4 MB (for 32-bit programs) and 8 MB (for 64-bit programs) is used for additional threads created in an OpenMP program. The environment variable STACKSIZE can be used to set it to a different value. For example, setenv STACKSIZE 2048 creates threads with stacksize of 2 MB each. OMP_NESTED Enables or disables nested parallelism. Value is either TRUE or FALSE. SUNW_MP_GUIDED_SCHED_WEIGHT= Sets the weighting factor used to determine the size of chunks assigned to threads in loops with GUIDED scheduling. The value should be a positive floating-point number, and will apply to all loops with GUIDED scheduling in the program. If not set, the default value assumed is 2.0. SUNW_MP_THR_IDLE=SPIN Controls the end-of-task status of each helper thread executing the parallel part of a program. You can set the value to spin, sleep ns, or sleep nms. The default is SPIN -- the thread spins (or busy-waits) after completing a parallel task until a new parallel task arrives. Choosing SLEEP time specifies the amount of time a helper thread should spin-wait after completing a parallel task. If, while a thread is spinning, a new task arrives for the thread, the tread executes the new task immediately. Otherwise, the thread goes to sleep and is awakened when a new task arrives. time may be specified in seconds, (ns) or just (n), or milliseconds, (nms). SLEEP with no argument puts the thread to sleep immediately after completing a parallel task. SLEEP, SLEEP (0), SLEEP (0s), and SLEEP (0ms) are all equivalent. - - - - - - - - - - - - - - - - - - - - - - - - - Shell Variables Description --------------- ----------- ulimit -s unlimited Set size of stack segment to unlimited - - - - - - - - - - - - - - - - - - - - - - - - - /etc/system Variables Description --------------------- ----------- set autoup= When the file system flush daemon fsflush runs, it writes to disk all modified file buffers that are more than nnn seconds old. set tune_t_fsflushr= Controls the number of seconds between runs of the file system flush daemon, fsflush. ----------------------------------------------------------------------------