Flag description file for Sun compiled SPECompM2001
binaries using the PGI Fortran 5.2-1.

----------------------------------------------------------------------------
PGI (Portland Group International) compiler 5.2-1 flags
----------------------------------------------------------------------------

The optimization levels and their meanings are as follows:	

-O0	A basic block is generated for each Fortran statement.  No scheduling 
	is done between statements.  No global optimizations are performed.

-O1	Scheduling within extended basic blocks is performed.  Some register 
	allocation is performed.  No global optimizations are performed.

-O2	All level 1 optimizations are performed.  In addition,  scalar
	optimizations such as induction recognition and loop invariant motion 
	are performed by the global optimizer. 
                
-O3	This level performs all level-one and level-two optimizations and 
	enables more aggressive hoisting and scalar replacement optimizations.

-O4	This level performs all level-one and level-two optimizations and 
	enables more aggressive hoisting and scalar replacement optimizations.

-fast	 Equivalent to "-O2 -Munroll=c:1 -Mnoframe -Mlre"

-fastsse Equivalent to "-fast -Mscalarsse -Mvect=sse -Mcache_align -Mflushz" 

-Mcache_align    
     Align unconstrained objects of length greater than or equal to 16 bytes on
     cache-line boundaries. An unconstrained object is a data object that is not
     a member of an aggregate structure or common block. This option does
     not affect the alignment of allocatable or automatic arrays.

     Note: To effect cache-line alignment of stack-based local variables, the
     main program or function must be compiled with -Mcache_align.

-Mconcur[=option[,option,...]]
      Instructs the compiler to enable auto- concurrentization of loops.
      This also sets the optimization level to a minimum of 2; see -O.  If
      -Mconcur is specified, multiple processors will be used to execute
      loops which the compiler determines to be parallelizable.  When
      linking, the -Mconcur switch must be specified or unresolved
      references will occur. The NCPUS environment variable controls how
      many processors will be used to execute parallelized loops.  The
      options can be one or more of the following:

      altcode:n noaltcode
	Generate (don't generate) alternate scalar code for parallelized
	loops.  The parallelizer generates scalar code to be executed
	whenever the loop count is less than or equal to n.  If noaltcode
	is specified, the parallelized version of the loop is always
	executed regardless of the loop count.

      altreduction[:n]
	Generate alternate scalar code for parallelized loops containing a
	reduction.  If a parallelized loop contains a reduction, the
	parallelizer generates scalar code to be executed whenever the
	loop count is less than or equal to n.

      assoc (default) noassoc
	Enable (disable) parallelization of loops with reductions.

      dist:block
	Parallelize with block distribution.  Contiguous blocks of
	iterations of a parallelizable loop are assigned to the available
	processors.

      dist:cyclic
	Parallelize with cyclic distribution. The outermost parallelizable
	loop in any loop nest is parallelized.  If a parallelized loop is
	innermost, its iterations are allocated to processors cyclically.
	For example, if there are 3 processors executing a loop, processor
	0 performs iterations 0, 3, 6, etc; processor 1 performs
	iterations 1, 4, 7, etc; and processor 2 performs iterations 2, 5,
	8, etc.

      cncall nocncall (default)
	Assume (don't assume) that loops containing calls are safe to
	parallelize.  Also, no minimum loop count threshold must be
	satisfied before parallelization will occur, and last values of
	scalars are assumed to be safe.

      levels:n  Parallelize loops nested at most n levels deep; the
        default is 3.

-Mextend
     Allow 132-column source lines.

-Mfixed 
     Process source using Fortran90 freeform specifications.

-Mflushz 	 
     Set SSE MXCSR register to flush-to-zero mode.

-Mipa[=option[,option,...]]
     Enable and specify options for InterProcedural Analysis (IPA).  This
     also sets the optimization level to a minimum of 2; see -O.  If no
     option list is specified, then it is equivalent to -Mipa=const.  The
     options are:

      const (default) noconst
		Enable (disable) propagation of constants across procedure
		calls.

      inline:n  
		Determine additional functions to inline, allowing up to n
		levels of inlining.

      fast      
		Chooses generally optimal -Mipa flags for the target
		platform; use pgf90 -Mipa -help to see the equivalent
		options.  fast is equivalant to
		-Mipa=align,arg,const,f90ptr,shape,globals,localarg,ptr


      align noalign
		Enable (disable) recognition when pointer targets are all
		cache-line aligned, allowing better SSE code generation.

      arg noarg 
		Remove (don't remove) arguments replaced by
		-Mipa=ptr,const.  -Mipa=noarg implies -Mipa=nolocalarg.


      const (default) noconst
		Enable (disable) propagation of constants across procedure
		calls.

      f90ptr nof90ptr
		Enable (disable) Fortran 90 pointer disambiguation across
		procedure calls.


      globals noglobals
		Analyze (don't analyze) which globals are modified by
		procedure calls.

      localarg nolocalarg
		Enable (disable) feature to externalize local variables to
		allow arguments to be replaced by -Mipa=ptr.
		-Mipa=localarg implies -Mipa=arg.


      ptr noptr 
		Enable (disable) pointer disambiguation across procedure
		calls.


      shape noshape
		Perform (don't perform) Fortran 90 shape propagation.


-mp  Enable OpenMP
	
-Mframe -Mnoframe (default) 
    Set up (don't set up) a true stack frame pointer for functions;
    -Mnoframe allows slightly more efficient operation when a stack frame
    is not needed, but some options override -Mnoframe.

-M[no]smart   
     (disable) enable optional AMD64-specific post-pass instruction scheduling.

-Minline[=option[,option,...]]
      Pass options to the function inliner. The options
      are:

      lib:filename.ext
		Specify an inline library created by a previous -Mextract
		option.  Functions from the specified library are
		inlined.  If no library is specified, functions are
		extracted from a temporary library created during an
		extract prepass.

      except:<func>
		Specifies which functions should not be inlined.

      [name:]function
		A non-numeric option is assumed to be a function name.  If
		name: is specified, what follows is always the name of a
		function.

      [size:]number
		A numeric option is assumed to be a size.  Functions
		containing number or less statements are inlined.  If both
		number and function are specified, then functions matching
		the given name(s) or meeting the size requirements, are
		inlined.

      levels:number
		number of levels of inlining are performed.  The default
		is 1.

-Mlongbranch
     Enable long branches.

-Mfptrap (default) -Mnofptrap
      -Mnofptrap performs the semantics of -Knoieee (use in-line divide,
      link in non-IEEE libraries if available, and disable underflow
      traps) and disables floating point traps.

-Mscalarsse   
     Utilize the SSE (Streaming SIMD(Single Instruction Multiple Data) 
     Extensions) and SSE2  instructions to perform the operations  coded. 
     This assumes the user has an assembler capable of interpreting SSE/SSE2  
     instructions, as in later versions of Linux.  This implies -Mflushz.

-Munroll  
     Invokes the loop unroller.  This also sets the optimization level to 2 
     if the level is set to less than 2.
			
      c:m	Instructs the compiler to completely unroll loops with a
	constant loop count less than or equal to m, a supplied constant.
	If this value is not supplied, the m count is set to 4.

      n:u	Instructs the compiler to unroll u times, a loop which is
	not completely unrolled, or has a non-constant loop count.
	If u is not supplied, the unroller computes the number of times a
	candidate loop is unrolled.


-Mvect[=option[,option,...]]
     Pass options to the internal vectorizer.  This also sets the
     optimization level to a minimum of 2; see -O.  If no option list is
     specified, then the following vector optimizations are used:
     assoc,cachesize:262144,nosse. The vect options are:

     assoc (default) noassoc
	 Enable (disable) certain associativity conversions that can
	 change the results of a computation due to floating point
	 roundoff error differences.  A typical optimization is to change
	 the order of additions, which is mathematically correct, but can
	 be computationally different, due to roundoff error.

-mp   Interpret OpenMP pragmas to explicitly parallelize regions of code
      for execution by multiple threads on a multi-processor system. Most
      OpenMP pragmas as well as the SGI parallelization pragmas are
      supported. See Chapters 5 and 6 of the PGI User's Guide for more
      information on these pragmas.

-mcmodel=medium
      Allows to build objects > 2GB

-Mlre[=assoc|noassoc] -Mnolre
     Enable (disable) loop-carried redundancy elimination.  The assoc
     option allows expression reassociation, and the noassoc option
     disallows expression reassociation.

-lacml  
     Linking with AMD Core Math Library. Supplied with the PGI compiler
     5.2-1.

RM_SOURCES=lapak.f90
       Remove the source file 'lapak.f90' in 178.galgel.

----------------------------------------------------------------------------

Environment Variables   Description

OMP_DYNAMIC             Enables or disables dynamic adjustment of
                        the number of threads available for
                        execution of parallel regions.

OMP_NUM_THREADS         Sets the number of threads to use
                        during execution, unless that number is
                        explicitly changed by calling the
                        OMP_SET_NUM_THREADS subroutine.

NCPUS                   Sets the number of threads to use during execution.
                        Note that OMP_NUM_THREADS take the precedence over 
                        NCPUS.

----------------------------------------------------------------------------

Shell Command           Description

ulimit -s unlimited     Set size of stack segment to unlimited

----------------------------------------------------------------------------