CPU2006 Result Flag Description

This result has been formatted using multiple flags files. The "default header section" from each of them appears next.

Default header section from x86-open64-4.2.2-flags-revA

x86 Open64 Compiler Suite SPEC CPU2006 Flag Description

Compilers: x86 Open64 Compiler Suite

Default header section from amd-platform

Platform settings file

Default header section from pgi80_linux_flags

PGI Server Complete for Linux, Release 8.0. Optimization, Compiler, and Other flags for use by SPEC CPU2006

Base Optimization Flags

C benchmarks

- -fastsse
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE
- Chooses generally optimal flags for the target platform. As of the PGI 7.0 release, the flags "-fast" and "-fastsse" are equivalent for 64-bit compilations. For 32-bit compilations "-fast" does not include "-Mscalarsse", "-Mcache_align", or "-Mvect=sse".
- Includes:
  - -O2
    - -O1
  - -Munroll=c:1
    - -Munroll
  - -Mautoinline
  - -Msmart
  - -Mlre
  - -Mnoframe
  - -Mvect=sse
    - -Mvect
      
      -Mvect=assoc
      
      -Mvect=altcode
  - -Mcache_align
  - -Mflushz
  - -Mdaz
  - -Mscalarsse
- -Msmartalloc=huge
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE
- Link with the huge page runtime library. The maximum number of huge pages the application can use is limited by the number of huge pages the operating system has available or the value of the environment variable PGI_HUGE_PAGES.
- -Mfprelaxed
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE
- Instructs the compiler to use relaxed precision in the calculation of some intrinsic functions. Can result in improved performance at the expense of numerical accuracy. The default on an AMD system is "-Mfprelaxed=sqrt,rsqrt,order". The default on an Intel system is "-Mfprelaxed=rsqrt,sqrt,div,order"
- Includes:
- -Mipa=fast
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE
- Instructs the compiler to perform interprocedural analysis. Equivalant to -Mipa=align,arg,const,f90ptr,shape,globals,libc,localarg,ptr,pure.
- Includes:
- -Mipa=inline
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE
- Interprocedural Analysis option: Automatically determine which functions to inline, limit to 2 levels (default). IPA-based function inlining is performed from leaf routines upward.
- -tp shanghai-64
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE
- Specify the type of the target processor as AMD64 Shangahi Processor 64-bit mode.
- -Bstatic_pgi
- pgcc_l, pgcpp_l, pgf95_l
- EXTRA_LDFLAGS
- Statically link with the PGI runtime libraries. System libraries may still be dynamically linked.

C++ benchmarks

- -fastsse
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- CXXOPTIMIZE
- Chooses generally optimal flags for the target platform. As of the PGI 7.0 release, the flags "-fast" and "-fastsse" are equivalent for 64-bit compilations. For 32-bit compilations "-fast" does not include "-Mscalarsse", "-Mcache_align", or "-Mvect=sse".
- Includes:
  - -O2
    - -O1
  - -Munroll=c:1
    - -Munroll
  - -Mautoinline
  - -Msmart
  - -Mlre
  - -Mnoframe
  - -Mvect=sse
    - -Mvect
      
      -Mvect=assoc
      
      -Mvect=altcode
  - -Mcache_align
  - -Mflushz
  - -Mdaz
  - -Mscalarsse
- -Msmartalloc=huge
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- CXXOPTIMIZE
- Link with the huge page runtime library. The maximum number of huge pages the application can use is limited by the number of huge pages the operating system has available or the value of the environment variable PGI_HUGE_PAGES.
- -Mfprelaxed
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- CXXOPTIMIZE
- Instructs the compiler to use relaxed precision in the calculation of some intrinsic functions. Can result in improved performance at the expense of numerical accuracy. The default on an AMD system is "-Mfprelaxed=sqrt,rsqrt,order". The default on an Intel system is "-Mfprelaxed=rsqrt,sqrt,div,order"
- Includes:
- --zc_eh
- pgcpp_l, pgcpp_w
- CXXOPTIMIZE
- Generate zero-overhead C++ exception handlers.
- -Mipa=fast
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- CXXOPTIMIZE
- Instructs the compiler to perform interprocedural analysis. Equivalant to -Mipa=align,arg,const,f90ptr,shape,globals,libc,localarg,ptr,pure.
- Includes:
- -Mipa=inline
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- CXXOPTIMIZE
- Interprocedural Analysis option: Automatically determine which functions to inline, limit to 2 levels (default). IPA-based function inlining is performed from leaf routines upward.
- -tp shanghai-64
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- CXXOPTIMIZE
- Specify the type of the target processor as AMD64 Shangahi Processor 64-bit mode.
- -Bstatic_pgi
- pgcc_l, pgcpp_l, pgf95_l
- EXTRA_LDFLAGS
- Statically link with the PGI runtime libraries. System libraries may still be dynamically linked.

Fortran benchmarks

- -fastsse
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Chooses generally optimal flags for the target platform. As of the PGI 7.0 release, the flags "-fast" and "-fastsse" are equivalent for 64-bit compilations. For 32-bit compilations "-fast" does not include "-Mscalarsse", "-Mcache_align", or "-Mvect=sse".
- Includes:
  - -O2
    - -O1
  - -Munroll=c:1
    - -Munroll
  - -Mautoinline
  - -Msmart
  - -Mlre
  - -Mnoframe
  - -Mvect=sse
    - -Mvect
      
      -Mvect=assoc
      
      -Mvect=altcode
  - -Mcache_align
  - -Mflushz
  - -Mdaz
  - -Mscalarsse
- -Msmartalloc=huge
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Link with the huge page runtime library. The maximum number of huge pages the application can use is limited by the number of huge pages the operating system has available or the value of the environment variable PGI_HUGE_PAGES.
- -Mfprelaxed
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Instructs the compiler to use relaxed precision in the calculation of some intrinsic functions. Can result in improved performance at the expense of numerical accuracy. The default on an AMD system is "-Mfprelaxed=sqrt,rsqrt,order". The default on an Intel system is "-Mfprelaxed=rsqrt,sqrt,div,order"
- Includes:
- -Mvect=short
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Enables generation of packed SSE instructions for short vector operations that arise from scalar code outside of loops or within the body of a loop iteration.
- Includes:
  - -Mvect
    - -Mvect=assoc
    - -Mvect=altcode
- -Mipa=fast
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Instructs the compiler to perform interprocedural analysis. Equivalant to -Mipa=align,arg,const,f90ptr,shape,globals,libc,localarg,ptr,pure.
- Includes:
- -Mipa=inline
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Interprocedural Analysis option: Automatically determine which functions to inline, limit to 2 levels (default). IPA-based function inlining is performed from leaf routines upward.
- -tp shanghai-64
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Specify the type of the target processor as AMD64 Shangahi Processor 64-bit mode.
- -Bstatic_pgi
- pgcc_l, pgcpp_l, pgf95_l
- EXTRA_LDFLAGS
- Statically link with the PGI runtime libraries. System libraries may still be dynamically linked.

Benchmarks using both Fortran and C

- -fastsse
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Chooses generally optimal flags for the target platform. As of the PGI 7.0 release, the flags "-fast" and "-fastsse" are equivalent for 64-bit compilations. For 32-bit compilations "-fast" does not include "-Mscalarsse", "-Mcache_align", or "-Mvect=sse".
- Includes:
  - -O2
    - -O1
  - -Munroll=c:1
    - -Munroll
  - -Mautoinline
  - -Msmart
  - -Mlre
  - -Mnoframe
  - -Mvect=sse
    - -Mvect
      
      -Mvect=assoc
      
      -Mvect=altcode
  - -Mcache_align
  - -Mflushz
  - -Mdaz
  - -Mscalarsse
- -Msmartalloc=huge
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Link with the huge page runtime library. The maximum number of huge pages the application can use is limited by the number of huge pages the operating system has available or the value of the environment variable PGI_HUGE_PAGES.
- -Mfprelaxed
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Instructs the compiler to use relaxed precision in the calculation of some intrinsic functions. Can result in improved performance at the expense of numerical accuracy. The default on an AMD system is "-Mfprelaxed=sqrt,rsqrt,order". The default on an Intel system is "-Mfprelaxed=rsqrt,sqrt,div,order"
- Includes:
- -Mipa=fast
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Instructs the compiler to perform interprocedural analysis. Equivalant to -Mipa=align,arg,const,f90ptr,shape,globals,libc,localarg,ptr,pure.
- Includes:
- -Mipa=inline
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Interprocedural Analysis option: Automatically determine which functions to inline, limit to 2 levels (default). IPA-based function inlining is performed from leaf routines upward.
- -tp shanghai-64
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Specify the type of the target processor as AMD64 Shangahi Processor 64-bit mode.
- -Mvect=short
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Enables generation of packed SSE instructions for short vector operations that arise from scalar code outside of loops or within the body of a loop iteration.
- Includes:
  - -Mvect
    - -Mvect=assoc
    - -Mvect=altcode
- -Bstatic_pgi
- pgcc_l, pgcpp_l, pgf95_l
- EXTRA_LDFLAGS
- Statically link with the PGI runtime libraries. System libraries may still be dynamically linked.

Peak Optimization Flags

C benchmarks

470.lbm

- -fastsse
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE
- Chooses generally optimal flags for the target platform. As of the PGI 7.0 release, the flags "-fast" and "-fastsse" are equivalent for 64-bit compilations. For 32-bit compilations "-fast" does not include "-Mscalarsse", "-Mcache_align", or "-Mvect=sse".
- Includes:
  - -O2
    - -O1
  - -Munroll=c:1
    - -Munroll
  - -Mautoinline
  - -Msmart
  - -Mlre
  - -Mnoframe
  - -Mvect=sse
    - -Mvect
      
      -Mvect=assoc
      
      -Mvect=altcode
  - -Mcache_align
  - -Mflushz
  - -Mdaz
  - -Mscalarsse
- -Msmartalloc=huge
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE
- Link with the huge page runtime library. The maximum number of huge pages the application can use is limited by the number of huge pages the operating system has available or the value of the environment variable PGI_HUGE_PAGES.
- -Mprefetch=t0
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE
- Use the prefetcht0 instruction.
- Includes:
  - -Mprefetch
- -Mloop32
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE
- Aligns or does not align innermost loops on 32 byte boundaries with -tp barcelona. Small loops on barcelona systems may run fast if aligned on 32-byte boundaries; however, in practice, most assemblers do not yet implement efficient padding causing some programs to run more slowly with this as default. Use -Mloop32 on systems with an assembler tuned for barcleona. The default is -Mnoloop32.
- -Mfprelaxed
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE
- Instructs the compiler to use relaxed precision in the calculation of some intrinsic functions. Can result in improved performance at the expense of numerical accuracy. The default on an AMD system is "-Mfprelaxed=sqrt,rsqrt,order". The default on an Intel system is "-Mfprelaxed=rsqrt,sqrt,div,order"
- Includes:
- -Mipa=fast
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE
- Instructs the compiler to perform interprocedural analysis. Equivalant to -Mipa=align,arg,const,f90ptr,shape,globals,libc,localarg,ptr,pure.
- Includes:
- -Mipa=inline
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE
- Interprocedural Analysis option: Automatically determine which functions to inline, limit to 2 levels (default). IPA-based function inlining is performed from leaf routines upward.
- -tp shanghai-64
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE
- Specify the type of the target processor as AMD64 Shangahi Processor 64-bit mode.
- -Bstatic_pgi
- pgcc_l, pgcpp_l, pgf95_l
- EXTRA_LDFLAGS
- Statically link with the PGI runtime libraries. System libraries may still be dynamically linked.

482.sphinx3

- -Mpfi=indirect
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- PASS1_CFLAGS, PASS1_LDFLAGS
- Generate profile-feedback instrumentation (PFI); this includes extra code to collect run-time statistics and dump them to a trace file for use in a subsequent compilation. PFI gathers information about a program's execution and data values but does not gather information from hardware performance counters. PFI does gather data for optimizations which are unique to profile-feedback optimization.
  
  The indirect sub-option enables collection of indirect function call targets, which can be used for indirect function call inlining.
- -Mpfo=indirect
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- PASS2_CFLAGS, PASS2_LDFLAGS
- Enable profile-feedback optimizations including indirect function call inlining. This option requires a pgfi.out file generated from a binary built with -Mpfi=indirect.
- -Mipa=fast
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- PASS2_CFLAGS, PASS2_LDFLAGS
- Instructs the compiler to perform interprocedural analysis. Equivalant to -Mipa=align,arg,const,f90ptr,shape,globals,libc,localarg,ptr,pure.
- Includes:
- -Mipa=inline
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- PASS2_CFLAGS, PASS2_LDFLAGS
- Interprocedural Analysis option: Automatically determine which functions to inline, limit to 2 levels (default). IPA-based function inlining is performed from leaf routines upward.
- -fastsse
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE
- Chooses generally optimal flags for the target platform. As of the PGI 7.0 release, the flags "-fast" and "-fastsse" are equivalent for 64-bit compilations. For 32-bit compilations "-fast" does not include "-Mscalarsse", "-Mcache_align", or "-Mvect=sse".
- Includes:
  - -O2
    - -O1
  - -Munroll=c:1
    - -Munroll
  - -Mautoinline
  - -Msmart
  - -Mlre
  - -Mnoframe
  - -Mvect=sse
    - -Mvect
      
      -Mvect=assoc
      
      -Mvect=altcode
  - -Mcache_align
  - -Mflushz
  - -Mdaz
  - -Mscalarsse
- -Mfprelaxed
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE
- Instructs the compiler to use relaxed precision in the calculation of some intrinsic functions. Can result in improved performance at the expense of numerical accuracy. The default on an AMD system is "-Mfprelaxed=sqrt,rsqrt,order". The default on an Intel system is "-Mfprelaxed=rsqrt,sqrt,div,order"
- Includes:
- -Msmartalloc
- pgcc_l, pgcpp_l, pgf95_l
- COPTIMIZE
- Adds a call to the routine "mallopt" in the main routine. This option can have a dramatic impact on the performance of programs that dynamically allocate memory, especially for those which have a few large mallocs. To be effective, this switch must be specified when compiling the file containing the Fortran, C, or C++ main routine.
- -tp shanghai-64
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE
- Specify the type of the target processor as AMD64 Shangahi Processor 64-bit mode.
- -Bstatic_pgi
- pgcc_l, pgcpp_l, pgf95_l
- EXTRA_LDFLAGS
- Statically link with the PGI runtime libraries. System libraries may still be dynamically linked.

C++ benchmarks

444.namd

- -Mpfi
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- PASS1_CXXFLAGS, PASS1_LDFLAGS
- Generate profile-feedback instrumentation (PFI); this includes extra code to collect run-time statistics and dump them to a trace file for use in a subsequent compilation. PFI gathers information about a program's execution and data values but does not gather information from hardware performance counters. PFI does gather data for optimizations which are unique to profile-feedback optimization.
- -Mpfo
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- PASS2_CXXFLAGS, PASS2_LDFLAGS
- Enable profile-feedback optimizations.
- -Mipa=fast
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- PASS2_CXXFLAGS, PASS2_LDFLAGS
- Instructs the compiler to perform interprocedural analysis. Equivalant to -Mipa=align,arg,const,f90ptr,shape,globals,libc,localarg,ptr,pure.
- Includes:
- -Mipa=inline
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- PASS2_CXXFLAGS, PASS2_LDFLAGS
- Interprocedural Analysis option: Automatically determine which functions to inline, limit to 2 levels (default). IPA-based function inlining is performed from leaf routines upward.
- -fastsse
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- CXXOPTIMIZE
- Chooses generally optimal flags for the target platform. As of the PGI 7.0 release, the flags "-fast" and "-fastsse" are equivalent for 64-bit compilations. For 32-bit compilations "-fast" does not include "-Mscalarsse", "-Mcache_align", or "-Mvect=sse".
- Includes:
  - -O2
    - -O1
  - -Munroll=c:1
    - -Munroll
  - -Mautoinline
  - -Msmart
  - -Mlre
  - -Mnoframe
  - -Mvect=sse
    - -Mvect
      
      -Mvect=assoc
      
      -Mvect=altcode
  - -Mcache_align
  - -Mflushz
  - -Mdaz
  - -Mscalarsse
- -Munroll=n:4
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- CXXOPTIMIZE
- "-Munroll=n:n" instructs the compiler to unroll loops 4 times where 4 is a supplied constant value. If no constant value is given, then a default of 4 is used.
- Includes:
  - -Munroll
- -Munroll=m:8
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- CXXOPTIMIZE
- "-Munroll=m:n" instructs the compiler to unroll loops with multiple blocks 8 times where 8 is a supplied constant value. If no constant value is given, then a default of 4 is used.
- Includes:
  - -Munroll
- -Msmartalloc=huge
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- CXXOPTIMIZE
- Link with the huge page runtime library. The maximum number of huge pages the application can use is limited by the number of huge pages the operating system has available or the value of the environment variable PGI_HUGE_PAGES.
- -Mnodepchk
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- CXXOPTIMIZE
- Don't check dependence relations for vector or parallel code.
- -Mfprelaxed
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- CXXOPTIMIZE
- Instructs the compiler to use relaxed precision in the calculation of some intrinsic functions. Can result in improved performance at the expense of numerical accuracy. The default on an AMD system is "-Mfprelaxed=sqrt,rsqrt,order". The default on an Intel system is "-Mfprelaxed=rsqrt,sqrt,div,order"
- Includes:
- --zc_eh
- pgcpp_l, pgcpp_w
- CXXOPTIMIZE
- Generate zero-overhead C++ exception handlers.
- -tp shanghai-64
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- CXXOPTIMIZE
- Specify the type of the target processor as AMD64 Shangahi Processor 64-bit mode.
- -Bstatic_pgi
- pgcc_l, pgcpp_l, pgf95_l
- EXTRA_LDFLAGS
- Statically link with the PGI runtime libraries. System libraries may still be dynamically linked.

447.dealII

- -march=barcelona
- CXX, LD
- Compiler will generate instructions and schedule them appropriately for the selected processor type. The default value, auto, means to optimize for the platform on which the compiler is running, as determined by reading /proc/cpuinfo. anyx86 means a generic 32-bit x86 processor without SSE2 support.
- -Ofast
- CXXOPTIMIZE
- Uses a selection of optimizations in order to maximize performance.
  Specifying "-Ofast" is equivalent to -O3 -ipa -OPT:Ofast -fno-math-errno -ffast-math.
  These optimization options are generally safe. Floating-point accuracy may be affected due to the transformation of the computational code. Note the interprocedural analysis option, -ipa, specifies limitations on how libraries and object files (.o files) are built.
- Includes:
  - -O3
  - -ipa
  - -OPT:Ofast
  - -fno-math-errno
  - -ffast-math
- -static
- CXXOPTIMIZE
- -static
  On systems that support dynamic linking, this prevents linking with shared libraries. On other systems, this option has no effect.
- -INLINE:aggressive=on
- CXXOPTIMIZE
- -INLINE:aggressive=(on|off|0|1): Instructs the compiler to be very aggressive when performing inlining. The default is "-INLINE:aggressive=OFF".
- -LNO:opt=0
- CXXOPTIMIZE
- This option group commands the compiler loop nest optimizer to perform nested loop analysis and transformations. Note an optimization level of "-O3" or higher must be specified in order to enable the "-LNO:" options. To verify the LNO options that were invoked during compilation use the option "-LIST:all_options=ON".
  
  -LNO:opt=(0|1) : Instructs the compiler at which level to perform loop nest optimizations. The flag can be set to:
  0 The compiler is restricted to suppress nearly all loop nest optimizations.
  1 The compiler performs full loop nest optimizations.
  The default is "-LNO:opt=1"
- -Wf,-fno-exceptions
- CXXOPTIMIZE
- (For C++ only) -fexceptions enables exception handling and thus generates extra code needed to propagate exceptions. -fno-exceptions disables exception handling. Exception handling is enabled by default.
- -m32
- CXXOPTIMIZE
- Generate code for a 32-bit environment. The 32-bit environment sets int, long and pointer to 32 bits and generates code that runs on any i386 system. The compiler generates x86 or IA32 32-bit ABI. The default on a 32-bit host is 32-bit ABI. The default on a 64-bit host is 64-bit ABI if the target platform specified is 64-bit, otherwise the default is 32-bit.
- -OPT:unroll_times_max=8
- CXXOPTIMIZE
- -OPT:unroll_times_max=N
  Instructs the compiler to limit the unrolling of inner loops to the value specified by N. The default is "-OPT:unroll_times_max=4".
- -OPT:unroll_size=256
- CXXOPTIMIZE
- -OPT:unroll_size=N
  Instructs the compiler to limit the number of instructions produced when unrolling inner loops. When N=0 the ceiling is disregarded. Note by specifying "-O3" sets "-OPT:unroll_size=128". The default is "-OPT:unroll_size=40".
- -OPT:unroll_level=2
- CXXOPTIMIZE
- -OPT:unroll_level=(1|2)
  Controls the level at which the compiler will perform unrolling optimizations. When "-OPT:unroll_level=2" the compiler is instructed to aggressively unroll loops in the presence of control flow. The default is "-OPT:unroll_level=1".
- -HP:bdt=2m:heap=2m
- CXXOPTIMIZE
- Instructs the compiler to use 2MB hugepages for bss, data and text segments (i.e. bdt), and/or for heap allocation. Mixed usage of huge and small pages is not supported for bdt, but is supported for heap allocation. The limit option specifies a combined limit on the number of hugepages that may be used by the compiled program. If no limit is set, the number of hugepages that can be used by the program is effectively limited by the system configuration.
- -GRA:unspill=on
- CXXOPTIMIZE
- -GRA:unspill=(on|off|0|1)
  The compiler is instructed to mitigate existing and suboptimal boundry conditions between global register allocation and local register allocation by unspilling register candidates which were really available at those boundary conditions. The default is "-GRA:unspill=OFF".
- -CG:cmp_peep=on
- CXXOPTIMIZE
- -CG:cmp_peep=(on|off|0|1): Instructs the compiler to perform aggressive load execution peeps on compare operations. Note for 32-bit environments the default is "-CG:cmp_peep=ON". The default is "CG:cmp_peep=OFF".
- -TENV:frame_pointer=off
- CXXOPTIMIZE
- These -TENV: options control the target environment assumed and/or produced by the compiler.
  
  -TENV:frame_pointer=(on|off)
  Setting this option to ON tells the compiler to use the frame pointer register to address local variables in the function stack frame. Generally, if the compiler determines that the stack pointer is fixed it will use the stack pointer to address local variables throughout the function invocation in place of the frame pointer. This frees up the frame pointer for other purposes. The default is ON for C/C++ and OFF for Fortran. This flag defaults to ON for C/C++ because the exception handling mechanism relies on the frame pointer register being used to address local variables. This flag can be turned OFF for C/C++ programs that do not generate exceptions.

450.soplex

- -march=barcelona
- CXX, LD
- Compiler will generate instructions and schedule them appropriately for the selected processor type. The default value, auto, means to optimize for the platform on which the compiler is running, as determined by reading /proc/cpuinfo. anyx86 means a generic 32-bit x86 processor without SSE2 support.
- -fb_create fbdata
- PASS1_CXXFLAGS, PASS1_LDFLAGS
- -fb_create <path>
  Instructs the compiler to generate an instrumented executable program from the source code under development. The instrumented executable produces feedback data files at runtime using an example dataset. filename specifies the name of the feedback data file generated by the instrumented executable.
  opencc -O2 -ipa -fb_create fbdata -o foo foo.c
  "fbdata" will contain the instrumented feedback data from the instrumented executable "foo". The default is "-fb_create" is disabled.
  
  When feedback directed optimization (FDO) is used, an instrumented executable is used for a training run that produces information regarding execution paths and data values. Hardware performance counters are not used. This information is then provided to the optimizer during a second compilation pass to produce an optimized executable. FDO enables the optimizer to perform some optimizations which are not available without feedback data. The safety level of optimizations is unchanged with FDO, i.e. the safety level is the same as determined by the other (non-FDO) optimization flags specified on the compile and link lines.
- -fb_opt fbdata
- PASS2_CXXFLAGS, PASS2_LDFLAGS
- -fb_opt <prefix for feedback data files>
  Instructs the compiler to perform a feedback directed compilation using the instrumented feedback data produced by the -fb_create option.
  opencc -O2 -ipa -fb-opt fbdata -o foo foo.c
  The new executable, foo, will be optimized to execute faster, and will not include any instrumentation library calls. Note the same optimization flags specified when creating the instrumented data file with the -fb_create must be specified when invoking the compiler with the -fb_opt option. Otherwise, the compiler will emit checksum errors. The default is "-fb_opt" disabled.
- -O3
- CXXOPTIMIZE
- Specify the basic level of optimization desired.
  The options can be one of the following:
  
  0    Instructs the compiler to not optimize.
  
  1    With -O1, the compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time.
  
  2    Optimize even more. This is the default.
  The compiler performs nearly all supported optimizations that do not involve a space-speed tradeoff. The compiler does not perform loop unrolling or function inlining when you specify -O2. As compared to -O1, this option increases both compilation time and the performance of the generated code.
  
  3    Optimize yet more.
  -O3 turns on all optimizations specified by -O2 and proceed to take a more aggressive approach. The compiler attempts to generate high-quality code even at the expense of compile time. This level of optimization may specify optimization options that are generally beneficial but decisively cause performance degradations.
  
  s    Optimize for size.
  -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size.
  
  If multiple "O" options are used, with or without level numbers, the last such option is the one that is effective. Level 2 is assumed if no value is specified (i.e. "-O". The default is "-O2".
- -INLINE:aggressive=on
- CXXOPTIMIZE
- -INLINE:aggressive=(on|off|0|1): Instructs the compiler to be very aggressive when performing inlining. The default is "-INLINE:aggressive=OFF".
- -OPT:IEEE_arith=3
- CXXOPTIMIZE
- -OPT:IEEE_arithmetic,IEEE_arith,IEEE_a=(1|2|3)
  This flag regulates the level of conformance to ANSI/IEEE 754- 1985 floating point roundoff and overflow. The levels of conformance:
  1 Adhere to IEEE 754 accuracy. Specifying "-O0", "-O1", and "-O2" will set "-OPT:IEEE_arithmetic=1".
  2 Produces inexact results that do not conform to IEEE 754 may be calculated. Specifying "-O3" will set "-OPT:IEEE_arithmetic=2".
  3 All valid mathematical transformations are allowed.
- -OPT:IEEE_NaN_Inf=off
- CXXOPTIMIZE
- -OPT:IEEE_NaN_Inf=(on|off|0|1)
  -OPT:IEEE_NaN_inf=ON instructs the compiler to conform to ANSI/IEEE 754-1985 for all operations which produce a NaN or infinity result. Note NaN and infinity are typically handled as special cases in floating-point representations of real numbers and are defined by the IEEE 754 Standards for Binary Floating-point Arithmetic. The default is "-OPT:IEEE_NaN_Inf=ON".
  -OPT:IEEE_NaN_inf=OFF instructs the compiler to calculate various operations that do not produce IEEE-754 results. For example, x/x is set to the value 1 without performing a divide operation and x=x is set to TRUE with out executing a test operation. "-OPT:IEEE_NaN_inf=OFF" specifies multiple optimizations that increase performance.
- -OPT:fold_unsigned_relops=on
- CXXOPTIMIZE
- -OPT:fold_unsigned_relops=(on|off|0|1)
  Instructs the compiler to fold unsigned relational operators that will transform possible integer overflow. The default is "-OPT:fold_unsigned_relops=ON".
- -OPT:malloc_alg=1
- CXXOPTIMIZE
- -OPT:malloc_algorithm,malloc_alg=(0|1)
  To improve runtime speed the compiler will select an optimal malloc algorithm. To enable the selected algorithm, setup code is included in the C/C++ and Fortran main function.
  - n=0: Default, no changes to the malloc options. No call to mallopt() is made.
  - n=1: M_MMAP_MAX=2 and M_TRIM_THRESHOLD=0x10000000. Call mallopt with the two settings.
  The two parameters, M_MMAP_MAX and M_TRIM_THRESHOLD, are described below.
  
  Function: int mallopt (int param, int value) When calling mallopt, the param argument specifies the parameter to be set, and value the new value to be set. Possible choices for param, as defined in malloc.h, are:
  - M_MMAP_MAX The maximum number of chunks to allocate with mmap. Setting this to zero disables all use of mmap.
  - M_TRIM_THRESHOLD This is the minimum size (in bytes) of the top-most, releasable chunk that will cause sbrk to be called with a negative argument in order to return memory to the system.
- -CG:load_exe=0
- CXXOPTIMIZE
- -CG:load_exe=N : The parameter N must be a non-negative integer which specifies the threshold for considering a memory load operation into the operand of an arithmetic instruction. If N=0 this subsumption optimization is not performed (or turned off). If the number of times the result of the memory load is used exceeds the value of N then the subsumption optimization is not performed. For example, if n=1 this subsumption is performed only when the result of the memory load has only one use. The default value of N varies with target processor and source language.
- -fno-exceptions
- CXXOPTIMIZE
- (For C++ only) -fexceptions enables exception handling and thus generates extra code needed to propagate exceptions. -fno-exceptions disables exception handling. Exception handling is enabled by default.
- -m32
- CXXOPTIMIZE
- Generate code for a 32-bit environment. The 32-bit environment sets int, long and pointer to 32 bits and generates code that runs on any i386 system. The compiler generates x86 or IA32 32-bit ABI. The default on a 32-bit host is 32-bit ABI. The default on a 64-bit host is 64-bit ABI if the target platform specified is 64-bit, otherwise the default is 32-bit.
- -HP:bdt=2m
- CXXOPTIMIZE
- Instructs the compiler to use 2MB hugepages for bss, data and text segments (i.e. bdt), and/or for heap allocation. Mixed usage of huge and small pages is not supported for bdt, but is supported for heap allocation. The limit option specifies a combined limit on the number of hugepages that may be used by the compiled program. If no limit is set, the number of hugepages that can be used by the program is effectively limited by the system configuration.

453.povray

- -march=barcelona
- CXX, LD
- Compiler will generate instructions and schedule them appropriately for the selected processor type. The default value, auto, means to optimize for the platform on which the compiler is running, as determined by reading /proc/cpuinfo. anyx86 means a generic 32-bit x86 processor without SSE2 support.
- -fb_create fbdata
- PASS1_CXXFLAGS, PASS1_LDFLAGS
- -fb_create <path>
  Instructs the compiler to generate an instrumented executable program from the source code under development. The instrumented executable produces feedback data files at runtime using an example dataset. filename specifies the name of the feedback data file generated by the instrumented executable.
  opencc -O2 -ipa -fb_create fbdata -o foo foo.c
  "fbdata" will contain the instrumented feedback data from the instrumented executable "foo". The default is "-fb_create" is disabled.
  
  When feedback directed optimization (FDO) is used, an instrumented executable is used for a training run that produces information regarding execution paths and data values. Hardware performance counters are not used. This information is then provided to the optimizer during a second compilation pass to produce an optimized executable. FDO enables the optimizer to perform some optimizations which are not available without feedback data. The safety level of optimizations is unchanged with FDO, i.e. the safety level is the same as determined by the other (non-FDO) optimization flags specified on the compile and link lines.
- -fb_opt fbdata
- PASS2_CXXFLAGS, PASS2_LDFLAGS
- -fb_opt <prefix for feedback data files>
  Instructs the compiler to perform a feedback directed compilation using the instrumented feedback data produced by the -fb_create option.
  opencc -O2 -ipa -fb-opt fbdata -o foo foo.c
  The new executable, foo, will be optimized to execute faster, and will not include any instrumentation library calls. Note the same optimization flags specified when creating the instrumented data file with the -fb_create must be specified when invoking the compiler with the -fb_opt option. Otherwise, the compiler will emit checksum errors. The default is "-fb_opt" disabled.
- -Ofast
- CXXOPTIMIZE
- Uses a selection of optimizations in order to maximize performance.
  Specifying "-Ofast" is equivalent to -O3 -ipa -OPT:Ofast -fno-math-errno -ffast-math.
  These optimization options are generally safe. Floating-point accuracy may be affected due to the transformation of the computational code. Note the interprocedural analysis option, -ipa, specifies limitations on how libraries and object files (.o files) are built.
- Includes:
  - -O3
  - -ipa
  - -OPT:Ofast
  - -fno-math-errno
  - -ffast-math
- -INLINE:aggressive=on
- CXXOPTIMIZE
- -INLINE:aggressive=(on|off|0|1): Instructs the compiler to be very aggressive when performing inlining. The default is "-INLINE:aggressive=OFF".
- -HP:bdt=2m:heap=2m
- CXXOPTIMIZE
- Instructs the compiler to use 2MB hugepages for bss, data and text segments (i.e. bdt), and/or for heap allocation. Mixed usage of huge and small pages is not supported for bdt, but is supported for heap allocation. The limit option specifies a combined limit on the number of hugepages that may be used by the compiled program. If no limit is set, the number of hugepages that can be used by the program is effectively limited by the system configuration.

Fortran benchmarks

410.bwaves

- -fastsse
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Chooses generally optimal flags for the target platform. As of the PGI 7.0 release, the flags "-fast" and "-fastsse" are equivalent for 64-bit compilations. For 32-bit compilations "-fast" does not include "-Mscalarsse", "-Mcache_align", or "-Mvect=sse".
- Includes:
  - -O2
    - -O1
  - -Munroll=c:1
    - -Munroll
  - -Mautoinline
  - -Msmart
  - -Mlre
  - -Mnoframe
  - -Mvect=sse
    - -Mvect
      
      -Mvect=assoc
      
      -Mvect=altcode
  - -Mcache_align
  - -Mflushz
  - -Mdaz
  - -Mscalarsse
- -Msmartalloc
- pgcc_l, pgcpp_l, pgf95_l
- FOPTIMIZE
- Adds a call to the routine "mallopt" in the main routine. This option can have a dramatic impact on the performance of programs that dynamically allocate memory, especially for those which have a few large mallocs. To be effective, this switch must be specified when compiling the file containing the Fortran, C, or C++ main routine.
- -Mprefetch=nta
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Use the prefetchnta instruction.
- Includes:
  - -Mprefetch
- -Mfprelaxed
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Instructs the compiler to use relaxed precision in the calculation of some intrinsic functions. Can result in improved performance at the expense of numerical accuracy. The default on an AMD system is "-Mfprelaxed=sqrt,rsqrt,order". The default on an Intel system is "-Mfprelaxed=rsqrt,sqrt,div,order"
- Includes:
- -Mipa=fast
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Instructs the compiler to perform interprocedural analysis. Equivalant to -Mipa=align,arg,const,f90ptr,shape,globals,libc,localarg,ptr,pure.
- Includes:
- -Mipa=inline
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Interprocedural Analysis option: Automatically determine which functions to inline, limit to 2 levels (default). IPA-based function inlining is performed from leaf routines upward.
- -tp shanghai-64
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Specify the type of the target processor as AMD64 Shangahi Processor 64-bit mode.
- -Bstatic_pgi
- pgcc_l, pgcpp_l, pgf95_l
- EXTRA_LDFLAGS
- Statically link with the PGI runtime libraries. System libraries may still be dynamically linked.

416.gamess

- -march=barcelona
- FC, LD
- Compiler will generate instructions and schedule them appropriately for the selected processor type. The default value, auto, means to optimize for the platform on which the compiler is running, as determined by reading /proc/cpuinfo. anyx86 means a generic 32-bit x86 processor without SSE2 support.
- -fb_create fbdata
- PASS1_FFLAGS, PASS1_LDFLAGS
- -fb_create <path>
  Instructs the compiler to generate an instrumented executable program from the source code under development. The instrumented executable produces feedback data files at runtime using an example dataset. filename specifies the name of the feedback data file generated by the instrumented executable.
  opencc -O2 -ipa -fb_create fbdata -o foo foo.c
  "fbdata" will contain the instrumented feedback data from the instrumented executable "foo". The default is "-fb_create" is disabled.
  
  When feedback directed optimization (FDO) is used, an instrumented executable is used for a training run that produces information regarding execution paths and data values. Hardware performance counters are not used. This information is then provided to the optimizer during a second compilation pass to produce an optimized executable. FDO enables the optimizer to perform some optimizations which are not available without feedback data. The safety level of optimizations is unchanged with FDO, i.e. the safety level is the same as determined by the other (non-FDO) optimization flags specified on the compile and link lines.
- -fb_opt fbdata
- PASS2_FFLAGS, PASS2_LDFLAGS
- -fb_opt <prefix for feedback data files>
  Instructs the compiler to perform a feedback directed compilation using the instrumented feedback data produced by the -fb_create option.
  opencc -O2 -ipa -fb-opt fbdata -o foo foo.c
  The new executable, foo, will be optimized to execute faster, and will not include any instrumentation library calls. Note the same optimization flags specified when creating the instrumented data file with the -fb_create must be specified when invoking the compiler with the -fb_opt option. Otherwise, the compiler will emit checksum errors. The default is "-fb_opt" disabled.
- -O2
- FOPTIMIZE
- Specify the basic level of optimization desired.
  The options can be one of the following:
  
  0    Instructs the compiler to not optimize.
  
  1    With -O1, the compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time.
  
  2    Optimize even more. This is the default.
  The compiler performs nearly all supported optimizations that do not involve a space-speed tradeoff. The compiler does not perform loop unrolling or function inlining when you specify -O2. As compared to -O1, this option increases both compilation time and the performance of the generated code.
  
  3    Optimize yet more.
  -O3 turns on all optimizations specified by -O2 and proceed to take a more aggressive approach. The compiler attempts to generate high-quality code even at the expense of compile time. This level of optimization may specify optimization options that are generally beneficial but decisively cause performance degradations.
  
  s    Optimize for size.
  -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size.
  
  If multiple "O" options are used, with or without level numbers, the last such option is the one that is effective. Level 2 is assumed if no value is specified (i.e. "-O". The default is "-O2".
- -OPT:Ofast
- FOPTIMIZE
- -OPT:Ofast
  Maximizes performance for a given platform using the selected optimizations. "-OPT:Ofast" specifies four optimizations; "-OPT:ro=2", "-OPT:Olimit=0", "-OPT:div_split=ON", and "-OPT:alias=typed". Note the specified optimizations are ordinarily safe but floating point accuracy due to transformations may be diminished.
- Includes:
- -OPT:ro=3
- FOPTIMIZE
- -OPT:roundoff,ro=(0|1|2|3)
  "-OPT:roundoff" specifies acceptable levels of divergence for both accuracy and overflow/underflow behavior of floating-point results relative to the source language rules. The roundoff value is in the range 0-3 with each value described as follows:
  0 Do no transformations which could affect floating-point results. The default for optimization levels "-O0", "-O1", and "-O2".
  1 Allow all transformations which have a limited affect on floating-point results. For roundoff, limited is defined as only the last bit or two of the mantissa is affected. For overflow or underflow, limited is defined as intermediate results of the transformed calculation may overflow or underflow within a factor of two of where the original expression may have overflowed or underflowed. Note that effects may be less limited when compounded by multiple transformations. This is the default when "-O3" is specified.
  2 Specifies transformations with extensive effects on floating-point results. For example, allow associative rearrangement (i.e. even across loop iterations) and the distribution of multiplication over addition or subtraction. Do not specify transformations known to cause: a. cumulative roundoff errors, or b. overflow/underflow of operands in a large range of valid floating-point values. This is the default when specifying "-OPT:Ofast".
  3 Specify any mathematically valid transformation of floating-point expressions. For example, floating point induction variables in loops are permitted (even if known to cause cumulative roundoff errors). Also permitted are fast algorithms for complex absolute value and divide (which will overflow/underflow for operands beyond the square root of the representable extremes).
- -OPT:unroll_size=256
- FOPTIMIZE
- -OPT:unroll_size=N
  Instructs the compiler to limit the number of instructions produced when unrolling inner loops. When N=0 the ceiling is disregarded. Note by specifying "-O3" sets "-OPT:unroll_size=128". The default is "-OPT:unroll_size=40".
- -HP:bdt=2m:heap=2m
- FOPTIMIZE
- Instructs the compiler to use 2MB hugepages for bss, data and text segments (i.e. bdt), and/or for heap allocation. Mixed usage of huge and small pages is not supported for bdt, but is supported for heap allocation. The limit option specifies a combined limit on the number of hugepages that may be used by the compiled program. If no limit is set, the number of hugepages that can be used by the program is effectively limited by the system configuration.

434.zeusmp

- -fastsse
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Chooses generally optimal flags for the target platform. As of the PGI 7.0 release, the flags "-fast" and "-fastsse" are equivalent for 64-bit compilations. For 32-bit compilations "-fast" does not include "-Mscalarsse", "-Mcache_align", or "-Mvect=sse".
- Includes:
  - -O2
    - -O1
  - -Munroll=c:1
    - -Munroll
  - -Mautoinline
  - -Msmart
  - -Mlre
  - -Mnoframe
  - -Mvect=sse
    - -Mvect
      
      -Mvect=assoc
      
      -Mvect=altcode
  - -Mcache_align
  - -Mflushz
  - -Mdaz
  - -Mscalarsse
- -Mfprelaxed
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Instructs the compiler to use relaxed precision in the calculation of some intrinsic functions. Can result in improved performance at the expense of numerical accuracy. The default on an AMD system is "-Mfprelaxed=sqrt,rsqrt,order". The default on an Intel system is "-Mfprelaxed=rsqrt,sqrt,div,order"
- Includes:
- -Mprefetch=distance:8
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Set the fetch-ahead distance for prefetch instructions to 8 cache lines
- Includes:
  - -Mprefetch
- -Mprefetch=t0
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Use the prefetcht0 instruction.
- Includes:
  - -Mprefetch
- -Msmartalloc=huge
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Link with the huge page runtime library. The maximum number of huge pages the application can use is limited by the number of huge pages the operating system has available or the value of the environment variable PGI_HUGE_PAGES.
- -Msmartalloc=hugebss
- pgcc_l, pgcpp_l, pgf95_l
- FOPTIMIZE
- Link with the huge page runtime library. Use huge pages for an executable's .BSS section.
- -Mipa=fast
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Instructs the compiler to perform interprocedural analysis. Equivalant to -Mipa=align,arg,const,f90ptr,shape,globals,libc,localarg,ptr,pure.
- Includes:
- -Mipa=inline
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Interprocedural Analysis option: Automatically determine which functions to inline, limit to 2 levels (default). IPA-based function inlining is performed from leaf routines upward.
- -tp shanghai-64
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Specify the type of the target processor as AMD64 Shangahi Processor 64-bit mode.
- -Bstatic_pgi
- pgcc_l, pgcpp_l, pgf95_l
- EXTRA_LDFLAGS
- Statically link with the PGI runtime libraries. System libraries may still be dynamically linked.

437.leslie3d

- -Mpfi=indirect
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- PASS1_FFLAGS, PASS1_LDFLAGS
- Generate profile-feedback instrumentation (PFI); this includes extra code to collect run-time statistics and dump them to a trace file for use in a subsequent compilation. PFI gathers information about a program's execution and data values but does not gather information from hardware performance counters. PFI does gather data for optimizations which are unique to profile-feedback optimization.
  
  The indirect sub-option enables collection of indirect function call targets, which can be used for indirect function call inlining.
- -Mpfo=indirect
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- PASS2_FFLAGS, PASS2_LDFLAGS
- Enable profile-feedback optimizations including indirect function call inlining. This option requires a pgfi.out file generated from a binary built with -Mpfi=indirect.
- -Mipa=fast
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- PASS2_FFLAGS, PASS2_LDFLAGS
- Instructs the compiler to perform interprocedural analysis. Equivalant to -Mipa=align,arg,const,f90ptr,shape,globals,libc,localarg,ptr,pure.
- Includes:
- -Mipa=inline
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- PASS2_FFLAGS, PASS2_LDFLAGS
- Interprocedural Analysis option: Automatically determine which functions to inline, limit to 2 levels (default). IPA-based function inlining is performed from leaf routines upward.
- -fastsse
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Chooses generally optimal flags for the target platform. As of the PGI 7.0 release, the flags "-fast" and "-fastsse" are equivalent for 64-bit compilations. For 32-bit compilations "-fast" does not include "-Mscalarsse", "-Mcache_align", or "-Mvect=sse".
- Includes:
  - -O2
    - -O1
  - -Munroll=c:1
    - -Munroll
  - -Mautoinline
  - -Msmart
  - -Mlre
  - -Mnoframe
  - -Mvect=sse
    - -Mvect
      
      -Mvect=assoc
      
      -Mvect=altcode
  - -Mcache_align
  - -Mflushz
  - -Mdaz
  - -Mscalarsse
- -Mvect=fuse
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Instructs the vectorizer to enable loop fusion.
- Includes:
  - -Mvect
    - -Mvect=assoc
    - -Mvect=altcode
- -Msmartalloc=huge
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Link with the huge page runtime library. The maximum number of huge pages the application can use is limited by the number of huge pages the operating system has available or the value of the environment variable PGI_HUGE_PAGES.
- -Mprefetch=distance:8
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Set the fetch-ahead distance for prefetch instructions to 8 cache lines
- Includes:
  - -Mprefetch
- -Mprefetch=t0
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Use the prefetcht0 instruction.
- Includes:
  - -Mprefetch
- -Mfprelaxed
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Instructs the compiler to use relaxed precision in the calculation of some intrinsic functions. Can result in improved performance at the expense of numerical accuracy. The default on an AMD system is "-Mfprelaxed=sqrt,rsqrt,order". The default on an Intel system is "-Mfprelaxed=rsqrt,sqrt,div,order"
- Includes:
- -tp shanghai-64
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- FOPTIMIZE
- Specify the type of the target processor as AMD64 Shangahi Processor 64-bit mode.
- -Bstatic_pgi
- pgcc_l, pgcpp_l, pgf95_l
- EXTRA_LDFLAGS
- Statically link with the PGI runtime libraries. System libraries may still be dynamically linked.

459.GemsFDTD

- -march=barcelona
- FC, LD
- Compiler will generate instructions and schedule them appropriately for the selected processor type. The default value, auto, means to optimize for the platform on which the compiler is running, as determined by reading /proc/cpuinfo. anyx86 means a generic 32-bit x86 processor without SSE2 support.
- -Ofast
- FOPTIMIZE
- Uses a selection of optimizations in order to maximize performance.
  Specifying "-Ofast" is equivalent to -O3 -ipa -OPT:Ofast -fno-math-errno -ffast-math.
  These optimization options are generally safe. Floating-point accuracy may be affected due to the transformation of the computational code. Note the interprocedural analysis option, -ipa, specifies limitations on how libraries and object files (.o files) are built.
- Includes:
  - -O3
  - -ipa
  - -OPT:Ofast
  - -fno-math-errno
  - -ffast-math
- -LNO:fission=2
- FOPTIMIZE
- This option group commands the compiler loop nest optimizer to perform nested loop analysis and transformations. Note an optimization level of "-O3" or higher must be specified in order to enable the "-LNO:" options. To verify the LNO options that were invoked during compilation use the option "-LIST:all_options=ON".
  
  -LNO:fission=N : Instsructs the compiler to perform loop fission. This option can be set to:
  0 Suppress loop fission.
  1 The compiler performs normal loop fission as necessary.
  2 The compiler performs loop fission prior to loop fusion.
  Note loop fusion is usually applied before loop fission, therefore if "-LNO:fission=ON" and "-LNO:fusion=ON" when the compiler is invoked a reverse effect may be induced. To counter this effect specify "-LNO:fission=2" to instruct the compiler to perform loop fission prior to loop fusion. The default is "-LNO:fission=0".
- -LNO:simd=2
- FOPTIMIZE
- This option group commands the compiler loop nest optimizer to perform nested loop analysis and transformations. Note an optimization level of "-O3" or higher must be specified in order to enable the "-LNO:" options. To verify the LNO options that were invoked during compilation use the option "-LIST:all_options=ON".
  
  -LNO:simd=(0|1|2) : The compiler is instructed to use single instruction multiple data (SIMD) instructions, supported by the target processor, when vectorizing the inner loop. The flag can be set to:
  0 The compiler is instructed to suppress vectorization.
  1 The compiler is instructed to vectorize only if there is no performance degradation due to sub-optimal alignment and does not induce floating-point operation inaccuracies.
  2 Instructs the compiler to aggressively vectorize with no constraints in place.
  The default is "LNO:simd=1".
- -LNO:prefetch_ahead=1
- FOPTIMIZE
- This option group commands the compiler loop nest optimizer to perform nested loop analysis and transformations. Note an optimization level of "-O3" or higher must be specified in order to enable the "-LNO:" options. To verify the LNO options that were invoked during compilation use the option "-LIST:all_options=ON".
  
  -LNO:prefetch_ahead=N : The compiler is instructed to prefetch ahead N cache line(s). The default is "-LNO:prefetch_ahead=2".
- -CG:load_exe=0
- FOPTIMIZE
- -CG:load_exe=N : The parameter N must be a non-negative integer which specifies the threshold for considering a memory load operation into the operand of an arithmetic instruction. If N=0 this subsumption optimization is not performed (or turned off). If the number of times the result of the memory load is used exceeds the value of N then the subsumption optimization is not performed. For example, if n=1 this subsumption is performed only when the result of the memory load has only one use. The default value of N varies with target processor and source language.
- -HP
- FOPTIMIZE
- Instructs the compiler to use 2MB hugepages for bss, data and text segments (i.e. bdt), and/or for heap allocation. Mixed usage of huge and small pages is not supported for bdt, but is supported for heap allocation. The limit option specifies a combined limit on the number of hugepages that may be used by the compiled program. If no limit is set, the number of hugepages that can be used by the program is effectively limited by the system configuration.

465.tonto

- -march=barcelona
- FC, LD
- Compiler will generate instructions and schedule them appropriately for the selected processor type. The default value, auto, means to optimize for the platform on which the compiler is running, as determined by reading /proc/cpuinfo. anyx86 means a generic 32-bit x86 processor without SSE2 support.
- -Ofast
- FOPTIMIZE
- Uses a selection of optimizations in order to maximize performance.
  Specifying "-Ofast" is equivalent to -O3 -ipa -OPT:Ofast -fno-math-errno -ffast-math.
  These optimization options are generally safe. Floating-point accuracy may be affected due to the transformation of the computational code. Note the interprocedural analysis option, -ipa, specifies limitations on how libraries and object files (.o files) are built.
- Includes:
  - -O3
  - -ipa
  - -OPT:Ofast
  - -fno-math-errno
  - -ffast-math
- -OPT:alias=no_f90_pointer_alias
- FOPTIMIZE
- The -OPT: The "-OPT:" option group controls various optimizations. The "-OPT:" options supersede the defaults that are based on the main optimization level.
  
  -OPT:alias=<model>
  Identify which pointer aliasing model to use. The compiler will make assumptions during compilation when one or more of the following <model> is specified:
  
  typed
  Assumes that two pointers of different types will not point to the same location in memory (i.e. the code adheres to the ANSI/ISO C standards). Note when specifying "-OPT:Ofast" turns this option ON.
  
  restrict
  Assumes that distinct pointers are pointing to distinct non-overlapping objects. The default is that this optimization is disabled.
  
  disjoint
  Assumes that any two pointer expressions are pointing to distinct non-overlapping objects. This default is that this optimization is disabled.
  
  no_f90_pointer_alias
  Assumes that any two different Fortran 90 pointers are pointing to distinct non-overlapping objects. The default is that this optimization is disabled.
- -LNO:blocking=off
- FOPTIMIZE
- This option group commands the compiler loop nest optimizer to perform nested loop analysis and transformations. Note an optimization level of "-O3" or higher must be specified in order to enable the "-LNO:" options. To verify the LNO options that were invoked during compilation use the option "-LIST:all_options=ON".
  
  -LNO:blocking=(on|off|0|1): Instructs the compiler to perform cache blocking transformation. The default is "-LNO:blocking=ON".
- -CG:load_exe=1
- FOPTIMIZE
- -CG:load_exe=N : The parameter N must be a non-negative integer which specifies the threshold for considering a memory load operation into the operand of an arithmetic instruction. If N=0 this subsumption optimization is not performed (or turned off). If the number of times the result of the memory load is used exceeds the value of N then the subsumption optimization is not performed. For example, if n=1 this subsumption is performed only when the result of the memory load has only one use. The default value of N varies with target processor and source language.
- -IPA:plimit=525
- FOPTIMIZE
- -IPA:plimit=N : The compiler is instructed to halt inlining within a program once the intermediate representation indicates that the code size of the program has surpassed the limit set by N. The default is "-IPA:plimit=2500".
- -HP
- FOPTIMIZE
- Instructs the compiler to use 2MB hugepages for bss, data and text segments (i.e. bdt), and/or for heap allocation. Mixed usage of huge and small pages is not supported for bdt, but is supported for heap allocation. The limit option specifies a combined limit on the number of hugepages that may be used by the compiled program. If no limit is set, the number of hugepages that can be used by the program is effectively limited by the system configuration.

Benchmarks using both Fortran and C

435.gromacs

- -march=barcelona
- CC, FC, LD
- Compiler will generate instructions and schedule them appropriately for the selected processor type. The default value, auto, means to optimize for the platform on which the compiler is running, as determined by reading /proc/cpuinfo. anyx86 means a generic 32-bit x86 processor without SSE2 support.
- -Ofast
- COPTIMIZE, FOPTIMIZE
- Uses a selection of optimizations in order to maximize performance.
  Specifying "-Ofast" is equivalent to -O3 -ipa -OPT:Ofast -fno-math-errno -ffast-math.
  These optimization options are generally safe. Floating-point accuracy may be affected due to the transformation of the computational code. Note the interprocedural analysis option, -ipa, specifies limitations on how libraries and object files (.o files) are built.
- Includes:
  - -O3
  - -ipa
  - -OPT:Ofast
  - -fno-math-errno
  - -ffast-math
- -OPT:rsqrt=2
- COPTIMIZE, FOPTIMIZE
- -OPT:rsqrt=(0|1|2)
  Instructs the compiler to use the reciprocal square root instruction when calculating the square root. This transformation may vary the accuracy slightly.
  0 Restrain from using the reciprocal square root instruction.
  1 Use the reciprocal square root instruction followed by operations that will improve the accuracy of the results.
  2 Use the reciprocal square root instruction without improving the result accuracy.
  Note "-OPT:rsqrt=1" if "-OPT:roudoff=2" or "-OPT:roundoff=3". The default is "-OPT:rsqrt=0".
- -HP:bdt=2m:heap=2m
- COPTIMIZE, FOPTIMIZE
- Instructs the compiler to use 2MB hugepages for bss, data and text segments (i.e. bdt), and/or for heap allocation. Mixed usage of huge and small pages is not supported for bdt, but is supported for heap allocation. The limit option specifies a combined limit on the number of hugepages that may be used by the compiled program. If no limit is set, the number of hugepages that can be used by the program is effectively limited by the system configuration.

436.cactusADM

- -fastsse
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Chooses generally optimal flags for the target platform. As of the PGI 7.0 release, the flags "-fast" and "-fastsse" are equivalent for 64-bit compilations. For 32-bit compilations "-fast" does not include "-Mscalarsse", "-Mcache_align", or "-Mvect=sse".
- Includes:
  - -O2
    - -O1
  - -Munroll=c:1
    - -Munroll
  - -Mautoinline
  - -Msmart
  - -Mlre
  - -Mnoframe
  - -Mvect=sse
    - -Mvect
      
      -Mvect=assoc
      
      -Mvect=altcode
  - -Mcache_align
  - -Mflushz
  - -Mdaz
  - -Mscalarsse
- -Mconcur
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- Yes
- COPTIMIZE, FOPTIMIZE
- Instructs the compiler to enable auto-concurrentization of loops. If -Mconcur is specified, multiple processors will be used to execute loops that the compiler determines to be parallelizable.
- -Msmartalloc=huge
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Link with the huge page runtime library. The maximum number of huge pages the application can use is limited by the number of huge pages the operating system has available or the value of the environment variable PGI_HUGE_PAGES.
- -Mfprelaxed
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Instructs the compiler to use relaxed precision in the calculation of some intrinsic functions. Can result in improved performance at the expense of numerical accuracy. The default on an AMD system is "-Mfprelaxed=sqrt,rsqrt,order". The default on an Intel system is "-Mfprelaxed=rsqrt,sqrt,div,order"
- Includes:
- -Mipa=fast
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Instructs the compiler to perform interprocedural analysis. Equivalant to -Mipa=align,arg,const,f90ptr,shape,globals,libc,localarg,ptr,pure.
- Includes:
- -Mipa=inline
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Interprocedural Analysis option: Automatically determine which functions to inline, limit to 2 levels (default). IPA-based function inlining is performed from leaf routines upward.
- -tp shanghai-64
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Specify the type of the target processor as AMD64 Shangahi Processor 64-bit mode.
- -Bstatic_pgi
- pgcc_l, pgcpp_l, pgf95_l
- EXTRA_LDFLAGS
- Statically link with the PGI runtime libraries. System libraries may still be dynamically linked.

454.calculix

- -Mpfi=indirect
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- PASS1_CFLAGS, PASS1_FFLAGS, PASS1_LDFLAGS
- Generate profile-feedback instrumentation (PFI); this includes extra code to collect run-time statistics and dump them to a trace file for use in a subsequent compilation. PFI gathers information about a program's execution and data values but does not gather information from hardware performance counters. PFI does gather data for optimizations which are unique to profile-feedback optimization.
  
  The indirect sub-option enables collection of indirect function call targets, which can be used for indirect function call inlining.
- -Mpfo=indirect
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- PASS2_CFLAGS, PASS2_FFLAGS, PASS2_LDFLAGS
- Enable profile-feedback optimizations including indirect function call inlining. This option requires a pgfi.out file generated from a binary built with -Mpfi=indirect.
- -Mipa=fast
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- PASS2_CFLAGS, PASS2_FFLAGS, PASS2_LDFLAGS
- Instructs the compiler to perform interprocedural analysis. Equivalant to -Mipa=align,arg,const,f90ptr,shape,globals,libc,localarg,ptr,pure.
- Includes:
- -Mipa=inline
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- PASS2_CFLAGS, PASS2_FFLAGS, PASS2_LDFLAGS
- Interprocedural Analysis option: Automatically determine which functions to inline, limit to 2 levels (default). IPA-based function inlining is performed from leaf routines upward.
- -fastsse
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Chooses generally optimal flags for the target platform. As of the PGI 7.0 release, the flags "-fast" and "-fastsse" are equivalent for 64-bit compilations. For 32-bit compilations "-fast" does not include "-Mscalarsse", "-Mcache_align", or "-Mvect=sse".
- Includes:
  - -O2
    - -O1
  - -Munroll=c:1
    - -Munroll
  - -Mautoinline
  - -Msmart
  - -Mlre
  - -Mnoframe
  - -Mvect=sse
    - -Mvect
      
      -Mvect=assoc
      
      -Mvect=altcode
  - -Mcache_align
  - -Mflushz
  - -Mdaz
  - -Mscalarsse
- -Mvect=short
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Enables generation of packed SSE instructions for short vector operations that arise from scalar code outside of loops or within the body of a loop iteration.
- Includes:
  - -Mvect
    - -Mvect=assoc
    - -Mvect=altcode
- -Msmartalloc=huge
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Link with the huge page runtime library. The maximum number of huge pages the application can use is limited by the number of huge pages the operating system has available or the value of the environment variable PGI_HUGE_PAGES.
- -Mprefetch=t0
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Use the prefetcht0 instruction.
- Includes:
  - -Mprefetch
- -Mpre
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Enable partial redundancy elimination.
- -Mfprelaxed
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Instructs the compiler to use relaxed precision in the calculation of some intrinsic functions. Can result in improved performance at the expense of numerical accuracy. The default on an AMD system is "-Mfprelaxed=sqrt,rsqrt,order". The default on an Intel system is "-Mfprelaxed=rsqrt,sqrt,div,order"
- Includes:
- -tp shanghai-64
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Specify the type of the target processor as AMD64 Shangahi Processor 64-bit mode.
- -Bstatic_pgi
- pgcc_l, pgcpp_l, pgf95_l
- EXTRA_LDFLAGS
- Statically link with the PGI runtime libraries. System libraries may still be dynamically linked.

481.wrf

- -fastsse
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Chooses generally optimal flags for the target platform. As of the PGI 7.0 release, the flags "-fast" and "-fastsse" are equivalent for 64-bit compilations. For 32-bit compilations "-fast" does not include "-Mscalarsse", "-Mcache_align", or "-Mvect=sse".
- Includes:
  - -O2
    - -O1
  - -Munroll=c:1
    - -Munroll
  - -Mautoinline
  - -Msmart
  - -Mlre
  - -Mnoframe
  - -Mvect=sse
    - -Mvect
      
      -Mvect=assoc
      
      -Mvect=altcode
  - -Mcache_align
  - -Mflushz
  - -Mdaz
  - -Mscalarsse
- -Mvect=noaltcode
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Disables alternate code generation for vectorized loops.
- -Msmartalloc=huge
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Link with the huge page runtime library. The maximum number of huge pages the application can use is limited by the number of huge pages the operating system has available or the value of the environment variable PGI_HUGE_PAGES.
- -Mprefetch=distance:8
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Set the fetch-ahead distance for prefetch instructions to 8 cache lines
- Includes:
  - -Mprefetch
- -Mfprelaxed
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Instructs the compiler to use relaxed precision in the calculation of some intrinsic functions. Can result in improved performance at the expense of numerical accuracy. The default on an AMD system is "-Mfprelaxed=sqrt,rsqrt,order". The default on an Intel system is "-Mfprelaxed=rsqrt,sqrt,div,order"
- Includes:
- -tp shanghai-64
- pgcc_l, pgcpp_l, pgf95_l, pgcc_w, pgcpp_w, pgf95_w
- COPTIMIZE, FOPTIMIZE
- Specify the type of the target processor as AMD64 Shangahi Processor 64-bit mode.
- -Bstatic_pgi
- pgcc_l, pgcpp_l, pgf95_l
- EXTRA_LDFLAGS
- Statically link with the PGI runtime libraries. System libraries may still be dynamically linked.

Implicitly Included Flags

This section contains descriptions of flags that were included implicitly by other flags, but which do not have a permanent home at SPEC.

System and Other Tuning Information

In order to take full advantage of using PGI's huge page runtime library, your system must be configured to use huge pages. It is safe to run binaries compiled with "-Msmartalloc=huge" on systems not configured to use huge pages, however, you will not benefit from the performance improvements huge pages offer. To configure your system for huge pages perform the following steps:

Note that further information about huge pages may be found in your Linux documentation file: /usr/src/linux/Documentation/vm/hugetlbpage.txt

For the PGI compiler, the maximum number of huge pages an application is allowed to use can be set at run time via the environment variable PGI_HUGE_PAGES. If not set, then the process may use all available huge pages when compiled with "-Msmartalloc=huge" or a maximum of n pages where the value of n is set via the compile time flag "-Msmartalloc=huge:n".

For the x86 Open64 compiler, the maximum number of huge pages an application is allowed to use can be set at run time via the environment variable HUGETLB_LIMIT. If not set, then the process may use all available huge pages when compiled with "-HP (or -HUGEPAGE)" or a maximum of n pages where the value of n is set via the compile time flag "-HP:limit=n".

For multi-copy runs or single copy runs on systems with multiple sockets, it is advantageous to bind a process to a particular core. Otherwise, the OS may arbitrarily move your process from one core to another. This can effect performance. To help, SPEC allows the use of a "submit" command where users can specify a utility to use to bind processes. We have found the utility 'numactl' to be the best choice.

numactl runs processes with a specific NUMA scheduling or memory placement policy. The policy is set for a command and inherited by all of its children. The numactl flag "--physcpubind" specifies which core(s) to bind the process. "-l" instructs numactl to keep a process memory on the local node while "-m" specifies which node(s) to place a process memory. For full details on using numactl, please refer to your Linux documentation, 'man numactl'

Note that some versions of numactl, particularly the version found on SLES 10, we have found that the utility incorrectly interprets application arguments as it's own. For example, with the command "numactl --physcpubind=0 -l a.out -m a", numactl will interpret a.out's "-m" option as it's own "-m" option. To work around this problem, a user can put the command to be run in a shell script and then run the shell script using numactl. For example: "echo 'a.out -m a' > run.sh ; numactl --physcpubind=0 bash run.sh"

Sets the stack size to n kbytes, or unlimited to allow the stack size to grow without limit.

Sets the maximum number of OpenMP parallel threads auto-parallelized (-Mconcur) applications may use.

For questions about the meanings of these flags, please contact the tester.
For other inquiries, please contact webmaster@spec.org
Copyright 2006-2014 Standard Performance Evaluation Corporation
Tested with SPEC CPU2006 v1.1.
Report generated on Wed Jul 23 03:04:48 2014 by SPEC CPU2006 flags formatter v6906.

	Indicates that the flag description came from the user flags file.
	Indicates that the flag description came from the suite-wide flags file.
	Indicates that the flag description came from a per-benchmark flags file.

CPU2006 Flag DescriptionTyan Thunder n4250QE (S4985-SI), AMD Opteron 8439 SE

Test sponsored by Advanced Micro Devices

Default header section from x86-open64-4.2.2-flags-revA

x86 Open64 Compiler Suite SPEC CPU2006 Flag Description

Compilers: x86 Open64 Compiler Suite

Default header section from amd-platform

Platform settings file

AMD Platform Settings

Default header section from pgi80_linux_flags

PGI Server Complete for Linux, Release 8.0. Optimization, Compiler, and Other flags for use by SPEC CPU2006

Compilers: PGI Server Complete 8.0

Operating systems: Linux

Base Compiler Invocation

Peak Compiler Invocation

Base Portability Flags

Peak Portability Flags

Base Optimization Flags

Peak Optimization Flags

Base Other Flags

Peak Other Flags

Implicitly Included Flags

CPU2006 Flag Description
Tyan Thunder n4250QE (S4985-SI), AMD Opteron 8439 SE