Flag description file for Sun compiled SPECompM2001 binaries using the PGI Fortran 5.2-1. ---------------------------------------------------------------------------- PGI (Portland Group International) compiler 5.2-1 flags ---------------------------------------------------------------------------- The optimization levels and their meanings are as follows: -O0 A basic block is generated for each Fortran statement. No scheduling is done between statements. No global optimizations are performed. -O1 Scheduling within extended basic blocks is performed. Some register allocation is performed. No global optimizations are performed. -O2 All level 1 optimizations are performed. In addition, scalar optimizations such as induction recognition and loop invariant motion are performed by the global optimizer. -O3 This level performs all level-one and level-two optimizations and enables more aggressive hoisting and scalar replacement optimizations. -O4 This level performs all level-one and level-two optimizations and enables more aggressive hoisting and scalar replacement optimizations. -fast Equivalent to "-O2 -Munroll=c:1 -Mnoframe -Mlre" -fastsse Equivalent to "-fast -Mscalarsse -Mvect=sse -Mcache_align -Mflushz" -Mcache_align Align unconstrained objects of length greater than or equal to 16 bytes on cache-line boundaries. An unconstrained object is a data object that is not a member of an aggregate structure or common block. This option does not affect the alignment of allocatable or automatic arrays. Note: To effect cache-line alignment of stack-based local variables, the main program or function must be compiled with -Mcache_align. -Mconcur[=option[,option,...]] Instructs the compiler to enable auto- concurrentization of loops. This also sets the optimization level to a minimum of 2; see -O. If -Mconcur is specified, multiple processors will be used to execute loops which the compiler determines to be parallelizable. When linking, the -Mconcur switch must be specified or unresolved references will occur. The NCPUS environment variable controls how many processors will be used to execute parallelized loops. The options can be one or more of the following: altcode:n noaltcode Generate (don't generate) alternate scalar code for parallelized loops. The parallelizer generates scalar code to be executed whenever the loop count is less than or equal to n. If noaltcode is specified, the parallelized version of the loop is always executed regardless of the loop count. altreduction[:n] Generate alternate scalar code for parallelized loops containing a reduction. If a parallelized loop contains a reduction, the parallelizer generates scalar code to be executed whenever the loop count is less than or equal to n. assoc (default) noassoc Enable (disable) parallelization of loops with reductions. dist:block Parallelize with block distribution. Contiguous blocks of iterations of a parallelizable loop are assigned to the available processors. dist:cyclic Parallelize with cyclic distribution. The outermost parallelizable loop in any loop nest is parallelized. If a parallelized loop is innermost, its iterations are allocated to processors cyclically. For example, if there are 3 processors executing a loop, processor 0 performs iterations 0, 3, 6, etc; processor 1 performs iterations 1, 4, 7, etc; and processor 2 performs iterations 2, 5, 8, etc. cncall nocncall (default) Assume (don't assume) that loops containing calls are safe to parallelize. Also, no minimum loop count threshold must be satisfied before parallelization will occur, and last values of scalars are assumed to be safe. levels:n Parallelize loops nested at most n levels deep; the default is 3. -Mextend Allow 132-column source lines. -Mfixed Process source using Fortran90 freeform specifications. -Mflushz Set SSE MXCSR register to flush-to-zero mode. -Mipa[=option[,option,...]] Enable and specify options for InterProcedural Analysis (IPA). This also sets the optimization level to a minimum of 2; see -O. If no option list is specified, then it is equivalent to -Mipa=const. The options are: const (default) noconst Enable (disable) propagation of constants across procedure calls. inline:n Determine additional functions to inline, allowing up to n levels of inlining. fast Chooses generally optimal -Mipa flags for the target platform; use pgf90 -Mipa -help to see the equivalent options. fast is equivalant to -Mipa=align,arg,const,f90ptr,shape,globals,localarg,ptr align noalign Enable (disable) recognition when pointer targets are all cache-line aligned, allowing better SSE code generation. arg noarg Remove (don't remove) arguments replaced by -Mipa=ptr,const. -Mipa=noarg implies -Mipa=nolocalarg. const (default) noconst Enable (disable) propagation of constants across procedure calls. f90ptr nof90ptr Enable (disable) Fortran 90 pointer disambiguation across procedure calls. globals noglobals Analyze (don't analyze) which globals are modified by procedure calls. localarg nolocalarg Enable (disable) feature to externalize local variables to allow arguments to be replaced by -Mipa=ptr. -Mipa=localarg implies -Mipa=arg. ptr noptr Enable (disable) pointer disambiguation across procedure calls. shape noshape Perform (don't perform) Fortran 90 shape propagation. -mp Enable OpenMP -Mframe -Mnoframe (default) Set up (don't set up) a true stack frame pointer for functions; -Mnoframe allows slightly more efficient operation when a stack frame is not needed, but some options override -Mnoframe. -M[no]smart (disable) enable optional AMD64-specific post-pass instruction scheduling. -Minline[=option[,option,...]] Pass options to the function inliner. The options are: lib:filename.ext Specify an inline library created by a previous -Mextract option. Functions from the specified library are inlined. If no library is specified, functions are extracted from a temporary library created during an extract prepass. except: Specifies which functions should not be inlined. [name:]function A non-numeric option is assumed to be a function name. If name: is specified, what follows is always the name of a function. [size:]number A numeric option is assumed to be a size. Functions containing number or less statements are inlined. If both number and function are specified, then functions matching the given name(s) or meeting the size requirements, are inlined. levels:number number of levels of inlining are performed. The default is 1. -Mlongbranch Enable long branches. -Mfptrap (default) -Mnofptrap -Mnofptrap performs the semantics of -Knoieee (use in-line divide, link in non-IEEE libraries if available, and disable underflow traps) and disables floating point traps. -Mscalarsse Utilize the SSE (Streaming SIMD(Single Instruction Multiple Data) Extensions) and SSE2 instructions to perform the operations coded. This assumes the user has an assembler capable of interpreting SSE/SSE2 instructions, as in later versions of Linux. This implies -Mflushz. -Munroll Invokes the loop unroller. This also sets the optimization level to 2 if the level is set to less than 2. c:m Instructs the compiler to completely unroll loops with a constant loop count less than or equal to m, a supplied constant. If this value is not supplied, the m count is set to 4. n:u Instructs the compiler to unroll u times, a loop which is not completely unrolled, or has a non-constant loop count. If u is not supplied, the unroller computes the number of times a candidate loop is unrolled. -Mvect[=option[,option,...]] Pass options to the internal vectorizer. This also sets the optimization level to a minimum of 2; see -O. If no option list is specified, then the following vector optimizations are used: assoc,cachesize:262144,nosse. The vect options are: assoc (default) noassoc Enable (disable) certain associativity conversions that can change the results of a computation due to floating point roundoff error differences. A typical optimization is to change the order of additions, which is mathematically correct, but can be computationally different, due to roundoff error. -mp Interpret OpenMP pragmas to explicitly parallelize regions of code for execution by multiple threads on a multi-processor system. Most OpenMP pragmas as well as the SGI parallelization pragmas are supported. See Chapters 5 and 6 of the PGI User's Guide for more information on these pragmas. -mcmodel=medium Allows to build objects > 2GB -Mlre[=assoc|noassoc] -Mnolre Enable (disable) loop-carried redundancy elimination. The assoc option allows expression reassociation, and the noassoc option disallows expression reassociation. -lacml Linking with AMD Core Math Library. Supplied with the PGI compiler 5.2-1. RM_SOURCES=lapak.f90 Remove the source file 'lapak.f90' in 178.galgel. ---------------------------------------------------------------------------- Environment Variables Description OMP_DYNAMIC Enables or disables dynamic adjustment of the number of threads available for execution of parallel regions. OMP_NUM_THREADS Sets the number of threads to use during execution, unless that number is explicitly changed by calling the OMP_SET_NUM_THREADS subroutine. NCPUS Sets the number of threads to use during execution. Note that OMP_NUM_THREADS take the precedence over NCPUS. ---------------------------------------------------------------------------- Shell Command Description ulimit -s unlimited Set size of stack segment to unlimited ----------------------------------------------------------------------------