IBM SPEC HPC2002 Flag Descriptions for Opteron Portland Group Compiler Technology's Fortran compiler pgf90 5.0-1 Last updated: 17-October-2003 Portability Options -i8 Treat INTEGER variables as eight bytes. Operations use 64 bits. -Mfixed Indicates that source code is in fixed (72 column) format. -DSPEC_HPG_MPI_INT4 (ChemS and ChemM) Indicates that calls to the MPI library expect 4-byte integers rather than 8-byte integers. -DBITS64 (ChemS and ChemM) Indicates the size of standard integers. BITS64 indicates 64-bit integers. Flags and Compiler options for the Portland Group's Fortran and C/C++ compilers Fortran pgf90 and C/C++ pgcc 5.0-1 The optimization levels and their meanings are as follows: -O0 A basic block is generated for each Fortran statement. No scheduling is done between statements. No global optimizations are performed. -O1 Scheduling within extended basic blocks is performed. Some register allocation is performed. No global optimizations are performed. -O2 All level 1 optimizations are performed. In addition, scalar optimizations such as induction recognition and loop invariant motion are performed by the global optimizer. -O3 This level performs all level-one and level-two optimizations and enables more aggressive hoisting and scalar replacement optimizations. -fast Chooses generally optimal flags for the target platform. Equivalent to "-O2 -Munroll -Mnoframe" -fastsse Chooses generally optimal flags for machines that supports the SSE type instructions. Equivalent to "-fast -Mscalarsse -Mvect=sse -Mcache_align -Mflushz" -Mcache_align Align unconstrained objects of length greater than or equal to 16 bytes on cache-line boundaries. An unconstrained object is a data object that is not a member of an aggregate structure or common block. This option does not affect the alignment of allocatable or automatic arrays. Note: To effect cache-line alignment of stack-based local variables, the main program or function must be compiled with -Mcache_align. -Mflushz Set SSE MXCSR register to flush-to-zero mode. -Mnoframe Eliminate operations that set up a true stack frame pointer for functions. -Mnosmart Don't run the Smart assembly re-write tool to enable post-compilation linear assembly scheduling and optimization -Mscalarsse Utilize the SSE (Streaming SIMD(Single Instruction Multiple Data) Extensions) and SSE2 instructions to perform the operations coded. This assumes the user has an assembler capable of interpreting SSE/SSE2 instructions, as in later versions of Linux. This implies -Mflushz. -Munroll Invokes the loop unroller. This also sets the optimization level to 2 if the level is set to less than 2. c:m Instructs the compiler to completely unroll loops with a constant loop count less than or equal to m, a supplied constant. If this value is not supplied, the m count is set to 4. n:u Instructs the compiler to unroll u times, a loop which is not completely unrolled, or has a non-constant loop count. If u is not supplied, the unroller computes the number of times a candidate loop is unrolled. Concur Automatic Parallelization -Mconcur Instructs the compiler to enable auto-concurrentization of loops. If -Mconcur is specified, nultiple processors will be used to execute loops that the compiler determines to be parallelizable. -Mconcur=altcode:n Instructs the parallelizer to generate alternate scalar code for parallelized loops. If altcode is specified without arguments, the parallelizer determies an appropriate cutoff length and generates scalar code to be executed whenever the loop count is less than or equal to that length. If altcode:n is specified, the scalar altcode is executed whenever the loop count is less than or equal to n. -Mconcur=noaltcode The parallelized version of the loop is always executed. -Mconcur=dist:block Contiguous blocks of iterations of a parallelizable loop are assigned to the available processors. -Mconcur=dist:cyclic The outermost parallelizable loop in any loop nest is parallelized. If a parallelized loop is innermost, its iterations are allocated to processors cyclically. -Mconcur=cncall Calls in parallel loops are safe to parallelize. Loops containing calls are candidates for parallelization. -Mconcur=noassoc Disables parallelization of loops with reductions. Vect Vectorizer -Mvect Without any suboptions -Mvect is equivalent to "-Mvect=assoc,cachesize:262144". -Mvect=altcode Instructs the vectorizer to generate alternate scalar code for vectorized loops. -Mvect=assoc Instructs the vectorizer to enable certain associativity conversions that can change the results of a computation due to roundoff error. -Mvect=prefetch Instructs the vectorizer to search for vectorizable loops and, where possible, make use of prefetch instructions. -Mvect=cachesize:n Instructs the vectorizer, when performing cache tiling optimizations, to assume a cache size of n. The default is 262144 bytes. -Mvect=sse Instructs the vectorizer to search for loops, and where possible, use the SSE or SSE2 and prefetch instructions (depending on which processor is targeted). IPA InterProcedural Analyzer -Mipa=align Instructs the IPA to recognize when pointer targets are all cache-line aligned, allowing better SSE code generation. -Mipa=arg Instructs the IPA to remove arguments replaced by -Mipa=ptr,const -Mipa=const Enable propagation of constants across procedure calls. -Mipa=fast Equivalent to: -Mipa=const,globals,localarg,ptr,vestigial -Mipa=globals Instructs the IPA to optimize references to globals when not used in procedure calls. -Mipa=localarg Externalizes local variables for use with -Mipa=arg -Mipa=ptr Instructs the IPA to perform pointer disambiguation across procedure calls. -Mipa=vestigial Instructs the IPA to eliminate functions that are not called. rm -f *.da *.life analyz_prbrob.out Remove any profile feedback information from previous runs. BIOS Setting Definitions - DRAM Interleave Defines whether data will be interleaved among the four data banks within individual DRAMs. Node Interleave Defines whether or not data addresses will be alternating between both processors in 4KB blocks. ACPI SRAT Defines whether the Static Resource Allocation Table is exported by the BIOS to a location where the operating system can see it. The SRAT may only be exported when Node Interleave is disabled. Special Options for ch_gm device: -s Close stdin - can run in background without tty input problems. -r Cleanup remote shared memory files - should be removed automatically, but always good to have an option to force it. -pg Specifies the procgroup file. -wd Specifies the working directory. -ddt Specifies DDT debugging session. --gm-no-shmem Disable the shared memory support (enabled by default). --gm-wait Wait seconds between each spawning step. --gm-kill Kill all processes seconds after the first exits. --gm-eager Specifies the Eager/Rendez-vous protocol threshold size. --gm-recv Specifies the receive mode , or , is the default.