CPU2006 Result Flag Description

The text for many of the descriptions below was taken from the Sun Studio Compiler Documentation, which is copyright © 2005 Sun Microsystems, Inc. The original documentation can be found at docs.sun.com.

Base Optimization Flags

C benchmarks

- -fast
- sun_cc
- OPTIMIZE
- A convenience option, this switch selects several other options that are described in this file.
- Includes:
- -fma=fused
- OPTIMIZE
- Enables the use of the fused multiply-add instruction.
- -xcache=128/64/2:5120/256/10
- OPTIMIZE
- xcache defines the cache properties for use by the optimizer. It can specify use of default assumptions ("generic"); use of whatever the compiler can assume about the current platform ("native"); or an explicit description of up to three levels of cache, using colon-separated specifiers of the form si/li/ai, where:
  - si is the size of the cache, in kb
  - li is the line size, in bytes
  - ai is the associativity
- -xipo=2
- OPTIMIZE
- Perform optimizations across all object files in the link step:
  - 0 = off
  - 1 = on
  - 2 = performs whole-program detection and analysis.
  At -xipo=2, the compiler performs inter-procedural aliasing analysis as well as optimization of memory allocation and layout to improve cache performance.
- -xpagesize=4M
- OPTIMIZE
- Set the preferred page size for running the program.
- -xprefetch_level=2
- OPTIMIZE
- Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default for Fortran is 2. The default for C and C++ is 1.
- -xprefetch=latx:2
- OPTIMIZE
- Adjust the compiler's assumptions about prefetch latency by the specified factor. Typically values in the range of 0.5 to 2.0 will be useful. A lower number might indicate that data will usually be cache resident; a higher number might indicate a relatively larger gap between the processor speed and the memory speed (compared to the assumptions built into the compiler).
- -xalias_level=std
- sun_cc
- COPTIMIZE
- Allows the compiler to perform type-based alias analysis at the specified alias level:
  - basic assume that memory references that involve different C basic types do not alias each other.
  - std assume aliasing rules described in the ISO 1999 C standard.
  - strong in addition to the restrictions at the std level, assume that pointers of type char * are used only to access an object of type char; and assume that there are no interior pointers.
- -xprefetch_level=3
- COPTIMIZE
- Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default for Fortran is 2. The default for C and C++ is 1.
- -xprefetch_auto_type=indirect_array_access
- COPTIMIZE
- Generate indirect prefetches for data arrays accessed indirectly.

C++ benchmarks

- -xdepend
- CXX, LD
- Analyze loops for inter-iteration data dependencies, and do loop restructuring.
- -library=stlport4
- CXX, LD
- Use STLport's Standard Library implementation instead of the default libCstd.
- -fast
- sun_CC
- OPTIMIZE
- A convenience option, this switch selects several other options that are described in this file.
- Includes:
  - -dalign
  - -fns
  - -fsimple=2
  - -ftrap=%none
  - -xbuiltin=%all
  - -xlibmil
  - -xlibmopt
  - -xO5
  - -xtarget=native
- -fma=fused
- OPTIMIZE
- Enables the use of the fused multiply-add instruction.
- -xcache=128/64/2:5120/256/10
- OPTIMIZE
- xcache defines the cache properties for use by the optimizer. It can specify use of default assumptions ("generic"); use of whatever the compiler can assume about the current platform ("native"); or an explicit description of up to three levels of cache, using colon-separated specifiers of the form si/li/ai, where:
  - si is the size of the cache, in kb
  - li is the line size, in bytes
  - ai is the associativity
- -xipo=2
- OPTIMIZE
- Perform optimizations across all object files in the link step:
  - 0 = off
  - 1 = on
  - 2 = performs whole-program detection and analysis.
  At -xipo=2, the compiler performs inter-procedural aliasing analysis as well as optimization of memory allocation and layout to improve cache performance.
- -xpagesize=4M
- OPTIMIZE
- Set the preferred page size for running the program.
- -xprefetch_level=2
- OPTIMIZE
- Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default for Fortran is 2. The default for C and C++ is 1.
- -xprefetch=latx:2
- OPTIMIZE
- Adjust the compiler's assumptions about prefetch latency by the specified factor. Typically values in the range of 0.5 to 2.0 will be useful. A lower number might indicate that data will usually be cache resident; a higher number might indicate a relatively larger gap between the processor speed and the memory speed (compared to the assumptions built into the compiler).
- -xalias_level=compatible
- sun_CC
- CXXOPTIMIZE
- Allows the compiler to perform type-based alias analysis:
  - any assumes that any type can alias any other
  - simple assumes that fundamental types are not aliased
  - compatible assumes that layout-incompatible types are not aliased.

Fortran benchmarks

- -fast
- sun_f90
- OPTIMIZE
- A convenience option, this switch selects the following switches that are described in this file:
- Includes:
  - -dalign
  - -depend
    - -xdepend
  - -fns
  - -fsimple=2
  - -fsingle
  - -ftrap_common
  - -xlibmil
  - -xlibmopt
  - -xO5
  - -xpad=local
  - -xprefetch=auto,explicit
  - -xtarget=native
  - -xvector=yes
- -fma=fused
- OPTIMIZE
- Enables the use of the fused multiply-add instruction.
- -xcache=128/64/2:5120/256/10
- OPTIMIZE
- xcache defines the cache properties for use by the optimizer. It can specify use of default assumptions ("generic"); use of whatever the compiler can assume about the current platform ("native"); or an explicit description of up to three levels of cache, using colon-separated specifiers of the form si/li/ai, where:
  - si is the size of the cache, in kb
  - li is the line size, in bytes
  - ai is the associativity
- -xipo=2
- OPTIMIZE
- Perform optimizations across all object files in the link step:
  - 0 = off
  - 1 = on
  - 2 = performs whole-program detection and analysis.
  At -xipo=2, the compiler performs inter-procedural aliasing analysis as well as optimization of memory allocation and layout to improve cache performance.
- -xpagesize=4M
- OPTIMIZE
- Set the preferred page size for running the program.
- -xprefetch_level=2
- OPTIMIZE
- Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default for Fortran is 2. The default for C and C++ is 1.
- -xprefetch=latx:2
- OPTIMIZE
- Adjust the compiler's assumptions about prefetch latency by the specified factor. Typically values in the range of 0.5 to 2.0 will be useful. A lower number might indicate that data will usually be cache resident; a higher number might indicate a relatively larger gap between the processor speed and the memory speed (compared to the assumptions built into the compiler).

Benchmarks using both Fortran and C

- -fast
- sun_cc
- OPTIMIZE
- A convenience option, this switch selects several other options that are described in this file.
- Includes:
- -fast
- sun_f90
- OPTIMIZE
- A convenience option, this switch selects the following switches that are described in this file:
- Includes:
  - -dalign
  - -depend
    - -xdepend
  - -fns
  - -fsimple=2
  - -fsingle
  - -ftrap_common
  - -xlibmil
  - -xlibmopt
  - -xO5
  - -xpad=local
  - -xprefetch=auto,explicit
  - -xtarget=native
  - -xvector=yes
- -fma=fused
- OPTIMIZE
- Enables the use of the fused multiply-add instruction.
- -xcache=128/64/2:5120/256/10
- OPTIMIZE
- xcache defines the cache properties for use by the optimizer. It can specify use of default assumptions ("generic"); use of whatever the compiler can assume about the current platform ("native"); or an explicit description of up to three levels of cache, using colon-separated specifiers of the form si/li/ai, where:
  - si is the size of the cache, in kb
  - li is the line size, in bytes
  - ai is the associativity
- -xipo=2
- OPTIMIZE
- Perform optimizations across all object files in the link step:
  - 0 = off
  - 1 = on
  - 2 = performs whole-program detection and analysis.
  At -xipo=2, the compiler performs inter-procedural aliasing analysis as well as optimization of memory allocation and layout to improve cache performance.
- -xpagesize=4M
- OPTIMIZE
- Set the preferred page size for running the program.
- -xprefetch_level=2
- OPTIMIZE
- Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default for Fortran is 2. The default for C and C++ is 1.
- -xprefetch=latx:2
- OPTIMIZE
- Adjust the compiler's assumptions about prefetch latency by the specified factor. Typically values in the range of 0.5 to 2.0 will be useful. A lower number might indicate that data will usually be cache resident; a higher number might indicate a relatively larger gap between the processor speed and the memory speed (compared to the assumptions built into the compiler).
- -xalias_level=std
- sun_cc
- COPTIMIZE
- Allows the compiler to perform type-based alias analysis at the specified alias level:
  - basic assume that memory references that involve different C basic types do not alias each other.
  - std assume aliasing rules described in the ISO 1999 C standard.
  - strong in addition to the restrictions at the std level, assume that pointers of type char * are used only to access an object of type char; and assume that there are no interior pointers.
- -xprefetch_level=3
- COPTIMIZE
- Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default for Fortran is 2. The default for C and C++ is 1.
- -xprefetch_auto_type=indirect_array_access
- COPTIMIZE
- Generate indirect prefetches for data arrays accessed indirectly.

Peak Optimization Flags

C benchmarks

433.milc

- -fast
- sun_cc
- OPTIMIZE
- A convenience option, this switch selects several other options that are described in this file.
- Includes:
- -xcache=128/64/2:5120/256/10
- OPTIMIZE
- xcache defines the cache properties for use by the optimizer. It can specify use of default assumptions ("generic"); use of whatever the compiler can assume about the current platform ("native"); or an explicit description of up to three levels of cache, using colon-separated specifiers of the form si/li/ai, where:
  - si is the size of the cache, in kb
  - li is the line size, in bytes
  - ai is the associativity
- -xpagesize=4M
- OPTIMIZE
- Set the preferred page size for running the program.
- -xipo=2
- EXTRA_OPTIMIZE
- Perform optimizations across all object files in the link step:
  - 0 = off
  - 1 = on
  - 2 = performs whole-program detection and analysis.
  At -xipo=2, the compiler performs inter-procedural aliasing analysis as well as optimization of memory allocation and layout to improve cache performance.
- -xprefetch_level=2
- EXTRA_OPTIMIZE
- Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default for Fortran is 2. The default for C and C++ is 1.
- -fsimple=1
- EXTRA_OPTIMIZE
- Controls simplifying assumptions for floating point arithmetic:
  - -fsimple=0 permits no simplifying assumptions. Preserves strict IEEE 754 conformance.
  - -fsimple=1 allows the optimizer to assume:
    - The IEEE 754 default rounding/trapping modes do not change after process initialization.
    - Computations producing no visible result other than potential floating-point exceptions may be deleted.
    - Computations with Infinity or NaNs as operands need not propagate NaNs to their results. For example, x*0 may be replaced by 0.
    - Computations do not depend on sign of zero.
  - -fsimple=2 permits more aggressive floating point optimizations that may cause programs to produce different numeric results due to changes in rounding. Even with -fsimple=2, the optimizer still is not permitted to introduce a floating point exception in a program that otherwise produces none.
- -xprefetch_auto_type=indirect_array_access
- EXTRA_OPTIMIZE
- Generate indirect prefetches for data arrays accessed indirectly.
- -W2,-Ainline:rs=400
- EXTRA_OPTIMIZE
- [optimizer flag]
  
  Inliner only considers routines smaller than n pseudo instructions as possible inline candidates.
- -xalias_level=std
- sun_cc
- EXTRA_OPTIMIZE
- Allows the compiler to perform type-based alias analysis at the specified alias level:
  - basic assume that memory references that involve different C basic types do not alias each other.
  - std assume aliasing rules described in the ISO 1999 C standard.
  - strong in addition to the restrictions at the std level, assume that pointers of type char * are used only to access an object of type char; and assume that there are no interior pointers.
- -fma=fused
- EXTRA_OPTIMIZE
- Enables the use of the fused multiply-add instruction.
- -xprefetch=latx:3
- EXTRA_OPTIMIZE
- Adjust the compiler's assumptions about prefetch latency by the specified factor. Typically values in the range of 0.5 to 2.0 will be useful. A lower number might indicate that data will usually be cache resident; a higher number might indicate a relatively larger gap between the processor speed and the memory speed (compared to the assumptions built into the compiler).

470.lbm

- -xprofile=collect:./feedback
- PASS1_CFLAGS, PASS1_LDFLAGS
- Collect profile data for feedback-directed optimization. If an option directory is named, the feedback will be stored there. When FDO is used, the training run gathers information regarding execution paths. As of the Sun Studio 11 version of the compiler suite, the training run gathers information about data values on SPARC systems, but not on x86 systems. Hardware performance counters are not used. FDO improves existing optimizations but does not introduce new classes of optimization.
- -xprofile=use:./feedback
- PASS2_CFLAGS, PASS2_LDFLAGS
- Use data collected for profile feedback. If an option directory is named, look for the feedback data there.
- -fast
- sun_cc
- OPTIMIZE
- A convenience option, this switch selects several other options that are described in this file.
- Includes:
- -xcache=128/64/2:5120/256/10
- OPTIMIZE
- xcache defines the cache properties for use by the optimizer. It can specify use of default assumptions ("generic"); use of whatever the compiler can assume about the current platform ("native"); or an explicit description of up to three levels of cache, using colon-separated specifiers of the form si/li/ai, where:
  - si is the size of the cache, in kb
  - li is the line size, in bytes
  - ai is the associativity
- -xpagesize=4M
- OPTIMIZE
- Set the preferred page size for running the program.
- -xprefetch_level=3
- EXTRA_OPTIMIZE
- Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default for Fortran is 2. The default for C and C++ is 1.
- -xipo=2
- EXTRA_OPTIMIZE
- Perform optimizations across all object files in the link step:
  - 0 = off
  - 1 = on
  - 2 = performs whole-program detection and analysis.
  At -xipo=2, the compiler performs inter-procedural aliasing analysis as well as optimization of memory allocation and layout to improve cache performance.
- -xrestrict
- EXTRA_OPTIMIZE
- Treat pointer-valued function parameters as restricted pointers.
- -fma=fused
- EXTRA_OPTIMIZE
- Enables the use of the fused multiply-add instruction.
- -Wc,-Qlp=1
- EXTRA_OPTIMIZE
- [code generator flag]
  
  Control irregular loop prefetching; turns the module on (1) or off (0) (default is on for F90; off for C/C++)
- -Wc,-Qlp-av=512
- EXTRA_OPTIMIZE
- [code generator flag]
  
  Control irregular loop prefetching; sets the prefetch look ahead distance, in bytes. The default is 256.
- -Wc,-Qlp-t=1
- EXTRA_OPTIMIZE
- [code generator flag]
  
  Control irregular loop prefetching; sets the number of attempts at prefetching. If not specified, t=2 if -xprefetch_level=3 has been set; otherwise, defaults to t=1.
- -Wc,-Qlp-fa=1
- EXTRA_OPTIMIZE
- [code generator flag]
  
  Control irregular loop prefetching; a setting of "1" means force user settings to override internally computed values.
- -Wc,-Qms_pipe-prefolim=64
- EXTRA_OPTIMIZE
- [code generator flag]
  
  Set number of outstanding prefetches in pipelined loops to <n>
- -xprefetch=latx:5
- EXTRA_OPTIMIZE
- Adjust the compiler's assumptions about prefetch latency by the specified factor. Typically values in the range of 0.5 to 2.0 will be useful. A lower number might indicate that data will usually be cache resident; a higher number might indicate a relatively larger gap between the processor speed and the memory speed (compared to the assumptions built into the compiler).

482.sphinx3

- basepeak = yes

C++ benchmarks

444.namd

- -xdepend
- CXX, LD
- Analyze loops for inter-iteration data dependencies, and do loop restructuring.
- -library=stlport4
- CXX, LD
- Use STLport's Standard Library implementation instead of the default libCstd.
- -fast
- sun_CC
- OPTIMIZE
- A convenience option, this switch selects several other options that are described in this file.
- Includes:
  - -dalign
  - -fns
  - -fsimple=2
  - -ftrap=%none
  - -xbuiltin=%all
  - -xlibmil
  - -xlibmopt
  - -xO5
  - -xtarget=native
- -xcache=128/64/2:5120/256/10
- OPTIMIZE
- xcache defines the cache properties for use by the optimizer. It can specify use of default assumptions ("generic"); use of whatever the compiler can assume about the current platform ("native"); or an explicit description of up to three levels of cache, using colon-separated specifiers of the form si/li/ai, where:
  - si is the size of the cache, in kb
  - li is the line size, in bytes
  - ai is the associativity
- -xpagesize=4M
- OPTIMIZE
- Set the preferred page size for running the program.
- -xalias_level=compatible
- sun_CC
- CXXOPTIMIZE, EXTRA_OPTIMIZE
- Allows the compiler to perform type-based alias analysis:
  - any assumes that any type can alias any other
  - simple assumes that fundamental types are not aliased
  - compatible assumes that layout-incompatible types are not aliased.
- -xprefetch_level=1
- EXTRA_OPTIMIZE
- Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default for Fortran is 2. The default for C and C++ is 1.
- -fma=fused
- EXTRA_OPTIMIZE
- Enables the use of the fused multiply-add instruction.
- -xprefetch=latx:3
- EXTRA_OPTIMIZE
- Adjust the compiler's assumptions about prefetch latency by the specified factor. Typically values in the range of 0.5 to 2.0 will be useful. A lower number might indicate that data will usually be cache resident; a higher number might indicate a relatively larger gap between the processor speed and the memory speed (compared to the assumptions built into the compiler).

447.dealII

- -xdepend
- CXX, LD
- Analyze loops for inter-iteration data dependencies, and do loop restructuring.
- -library=stlport4
- CXX, LD
- Use STLport's Standard Library implementation instead of the default libCstd.
- -xprofile=collect:./feedback
- PASS1_CXXFLAGS, PASS1_LDFLAGS
- Collect profile data for feedback-directed optimization. If an option directory is named, the feedback will be stored there. When FDO is used, the training run gathers information regarding execution paths. As of the Sun Studio 11 version of the compiler suite, the training run gathers information about data values on SPARC systems, but not on x86 systems. Hardware performance counters are not used. FDO improves existing optimizations but does not introduce new classes of optimization.
- -xprofile=use:./feedback
- PASS2_CXXFLAGS, PASS2_LDFLAGS
- Use data collected for profile feedback. If an option directory is named, look for the feedback data there.
- -fast
- sun_CC
- OPTIMIZE
- A convenience option, this switch selects several other options that are described in this file.
- Includes:
  - -dalign
  - -fns
  - -fsimple=2
  - -ftrap=%none
  - -xbuiltin=%all
  - -xlibmil
  - -xlibmopt
  - -xO5
  - -xtarget=native
- -xcache=128/64/2:5120/256/10
- OPTIMIZE
- xcache defines the cache properties for use by the optimizer. It can specify use of default assumptions ("generic"); use of whatever the compiler can assume about the current platform ("native"); or an explicit description of up to three levels of cache, using colon-separated specifiers of the form si/li/ai, where:
  - si is the size of the cache, in kb
  - li is the line size, in bytes
  - ai is the associativity
- -xpagesize=4M
- OPTIMIZE
- Set the preferred page size for running the program.
- -xalias_level=compatible
- sun_CC
- CXXOPTIMIZE, EXTRA_OPTIMIZE
- Allows the compiler to perform type-based alias analysis:
  - any assumes that any type can alias any other
  - simple assumes that fundamental types are not aliased
  - compatible assumes that layout-incompatible types are not aliased.
- -xipo=2
- EXTRA_OPTIMIZE
- Perform optimizations across all object files in the link step:
  - 0 = off
  - 1 = on
  - 2 = performs whole-program detection and analysis.
  At -xipo=2, the compiler performs inter-procedural aliasing analysis as well as optimization of memory allocation and layout to improve cache performance.
- -xrestrict
- EXTRA_OPTIMIZE
- Treat pointer-valued function parameters as restricted pointers.
- -fma=fused
- EXTRA_OPTIMIZE
- Enables the use of the fused multiply-add instruction.
- -xprefetch=latx:4.5
- EXTRA_OPTIMIZE
- Adjust the compiler's assumptions about prefetch latency by the specified factor. Typically values in the range of 0.5 to 2.0 will be useful. A lower number might indicate that data will usually be cache resident; a higher number might indicate a relatively larger gap between the processor speed and the memory speed (compared to the assumptions built into the compiler).

450.soplex

- -xdepend
- CXX, EXTRA_OPTIMIZE, LD
- Analyze loops for inter-iteration data dependencies, and do loop restructuring.
- -library=stlport4
- CXX, LD
- Use STLport's Standard Library implementation instead of the default libCstd.
- -xprofile=collect:./feedback
- PASS1_CXXFLAGS, PASS1_LDFLAGS
- Collect profile data for feedback-directed optimization. If an option directory is named, the feedback will be stored there. When FDO is used, the training run gathers information regarding execution paths. As of the Sun Studio 11 version of the compiler suite, the training run gathers information about data values on SPARC systems, but not on x86 systems. Hardware performance counters are not used. FDO improves existing optimizations but does not introduce new classes of optimization.
- -xprofile=use:./feedback
- PASS2_CXXFLAGS, PASS2_LDFLAGS
- Use data collected for profile feedback. If an option directory is named, look for the feedback data there.
- -fast
- sun_CC
- OPTIMIZE
- A convenience option, this switch selects several other options that are described in this file.
- Includes:
  - -dalign
  - -fns
  - -fsimple=2
  - -ftrap=%none
  - -xbuiltin=%all
  - -xlibmil
  - -xlibmopt
  - -xO5
  - -xtarget=native
- -xcache=128/64/2:5120/256/10
- OPTIMIZE
- xcache defines the cache properties for use by the optimizer. It can specify use of default assumptions ("generic"); use of whatever the compiler can assume about the current platform ("native"); or an explicit description of up to three levels of cache, using colon-separated specifiers of the form si/li/ai, where:
  - si is the size of the cache, in kb
  - li is the line size, in bytes
  - ai is the associativity
- -xpagesize=4M
- OPTIMIZE
- Set the preferred page size for running the program.
- -xalias_level=compatible
- sun_CC
- CXXOPTIMIZE, EXTRA_OPTIMIZE
- Allows the compiler to perform type-based alias analysis:
  - any assumes that any type can alias any other
  - simple assumes that fundamental types are not aliased
  - compatible assumes that layout-incompatible types are not aliased.
- -xipo=2
- EXTRA_OPTIMIZE
- Perform optimizations across all object files in the link step:
  - 0 = off
  - 1 = on
  - 2 = performs whole-program detection and analysis.
  At -xipo=2, the compiler performs inter-procedural aliasing analysis as well as optimization of memory allocation and layout to improve cache performance.
- -xprefetch_level=2
- EXTRA_OPTIMIZE
- Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default for Fortran is 2. The default for C and C++ is 1.
- -fsimple=0
- EXTRA_OPTIMIZE
- Controls simplifying assumptions for floating point arithmetic:
  - -fsimple=0 permits no simplifying assumptions. Preserves strict IEEE 754 conformance.
  - -fsimple=1 allows the optimizer to assume:
    - The IEEE 754 default rounding/trapping modes do not change after process initialization.
    - Computations producing no visible result other than potential floating-point exceptions may be deleted.
    - Computations with Infinity or NaNs as operands need not propagate NaNs to their results. For example, x*0 may be replaced by 0.
    - Computations do not depend on sign of zero.
  - -fsimple=2 permits more aggressive floating point optimizations that may cause programs to produce different numeric results due to changes in rounding. Even with -fsimple=2, the optimizer still is not permitted to introduce a floating point exception in a program that otherwise produces none.
- -xrestrict
- EXTRA_OPTIMIZE
- Treat pointer-valued function parameters as restricted pointers.
- -xprefetch_auto_type=indirect_array_access
- EXTRA_OPTIMIZE
- Generate indirect prefetches for data arrays accessed indirectly.
- -Qoption cg -Qlp-ol=1
- EXTRA_OPTIMIZE
- [code generator flag]
  
  Turns on prefetching for outer loops
- -Qoption cg -Qlp-it=3
- EXTRA_OPTIMIZE
- [code generator flag]
  
  Indicates to the compiler to insert n extra prefetches for each indirect access in outer loops
- -Qoption cg -Qlp-imb=1
- EXTRA_OPTIMIZE
- [code generator flag]
  
  Insert indirect prefetches when the indirect access chain spans across basic blocks.
- -Qoption iropt -Apf:pdl=3
- EXTRA_OPTIMIZE
- [optimizer flag]
  
  Allow prefetching through up to n levels of indirect memory references.

453.povray

- -xdepend
- CXX, LD
- Analyze loops for inter-iteration data dependencies, and do loop restructuring.
- -library=stlport4
- CXX, LD
- Use STLport's Standard Library implementation instead of the default libCstd.
- -xprofile=collect:./feedback
- PASS1_CXXFLAGS, PASS1_LDFLAGS
- Collect profile data for feedback-directed optimization. If an option directory is named, the feedback will be stored there. When FDO is used, the training run gathers information regarding execution paths. As of the Sun Studio 11 version of the compiler suite, the training run gathers information about data values on SPARC systems, but not on x86 systems. Hardware performance counters are not used. FDO improves existing optimizations but does not introduce new classes of optimization.
- -xprofile=use:./feedback
- PASS2_CXXFLAGS, PASS2_LDFLAGS
- Use data collected for profile feedback. If an option directory is named, look for the feedback data there.
- -fast
- sun_CC
- OPTIMIZE
- A convenience option, this switch selects several other options that are described in this file.
- Includes:
  - -dalign
  - -fns
  - -fsimple=2
  - -ftrap=%none
  - -xbuiltin=%all
  - -xlibmil
  - -xlibmopt
  - -xO5
  - -xtarget=native
- -xcache=128/64/2:5120/256/10
- OPTIMIZE
- xcache defines the cache properties for use by the optimizer. It can specify use of default assumptions ("generic"); use of whatever the compiler can assume about the current platform ("native"); or an explicit description of up to three levels of cache, using colon-separated specifiers of the form si/li/ai, where:
  - si is the size of the cache, in kb
  - li is the line size, in bytes
  - ai is the associativity
- -xpagesize=4M
- OPTIMIZE
- Set the preferred page size for running the program.
- -xalias_level=compatible
- sun_CC
- CXXOPTIMIZE
- Allows the compiler to perform type-based alias analysis:
  - any assumes that any type can alias any other
  - simple assumes that fundamental types are not aliased
  - compatible assumes that layout-incompatible types are not aliased.
- -xipo=2
- EXTRA_OPTIMIZE
- Perform optimizations across all object files in the link step:
  - 0 = off
  - 1 = on
  - 2 = performs whole-program detection and analysis.
  At -xipo=2, the compiler performs inter-procedural aliasing analysis as well as optimization of memory allocation and layout to improve cache performance.
- -xrestrict
- EXTRA_OPTIMIZE
- Treat pointer-valued function parameters as restricted pointers.
- -fma=fused
- EXTRA_OPTIMIZE
- Enables the use of the fused multiply-add instruction.

Fortran benchmarks

410.bwaves

- -fast
- sun_f90
- OPTIMIZE
- A convenience option, this switch selects the following switches that are described in this file:
- Includes:
  - -dalign
  - -depend
    - -xdepend
  - -fns
  - -fsimple=2
  - -fsingle
  - -ftrap_common
  - -xlibmil
  - -xlibmopt
  - -xO5
  - -xpad=local
  - -xprefetch=auto,explicit
  - -xtarget=native
  - -xvector=yes
- -xcache=128/64/2:5120/256/10
- OPTIMIZE
- xcache defines the cache properties for use by the optimizer. It can specify use of default assumptions ("generic"); use of whatever the compiler can assume about the current platform ("native"); or an explicit description of up to three levels of cache, using colon-separated specifiers of the form si/li/ai, where:
  - si is the size of the cache, in kb
  - li is the line size, in bytes
  - ai is the associativity
- -xpagesize=4M
- OPTIMIZE
- Set the preferred page size for running the program.
- -xipo=2
- EXTRA_OPTIMIZE
- Perform optimizations across all object files in the link step:
  - 0 = off
  - 1 = on
  - 2 = performs whole-program detection and analysis.
  At -xipo=2, the compiler performs inter-procedural aliasing analysis as well as optimization of memory allocation and layout to improve cache performance.
- -xprefetch_level=2
- EXTRA_OPTIMIZE
- Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default for Fortran is 2. The default for C and C++ is 1.
- -fma=fused
- EXTRA_OPTIMIZE
- Enables the use of the fused multiply-add instruction.
- -xprefetch=latx:3
- EXTRA_OPTIMIZE
- Adjust the compiler's assumptions about prefetch latency by the specified factor. Typically values in the range of 0.5 to 2.0 will be useful. A lower number might indicate that data will usually be cache resident; a higher number might indicate a relatively larger gap between the processor speed and the memory speed (compared to the assumptions built into the compiler).

416.gamess

- -fast
- sun_f90
- OPTIMIZE
- A convenience option, this switch selects the following switches that are described in this file:
- Includes:
  - -dalign
  - -depend
    - -xdepend
  - -fns
  - -fsimple=2
  - -fsingle
  - -ftrap_common
  - -xlibmil
  - -xlibmopt
  - -xO5
  - -xpad=local
  - -xprefetch=auto,explicit
  - -xtarget=native
  - -xvector=yes
- -xcache=128/64/2:5120/256/10
- OPTIMIZE
- xcache defines the cache properties for use by the optimizer. It can specify use of default assumptions ("generic"); use of whatever the compiler can assume about the current platform ("native"); or an explicit description of up to three levels of cache, using colon-separated specifiers of the form si/li/ai, where:
  - si is the size of the cache, in kb
  - li is the line size, in bytes
  - ai is the associativity
- -xpagesize=4M
- OPTIMIZE
- Set the preferred page size for running the program.
- -xipo=2
- EXTRA_OPTIMIZE
- Perform optimizations across all object files in the link step:
  - 0 = off
  - 1 = on
  - 2 = performs whole-program detection and analysis.
  At -xipo=2, the compiler performs inter-procedural aliasing analysis as well as optimization of memory allocation and layout to improve cache performance.
- -xprefetch_level=2
- EXTRA_OPTIMIZE
- Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default for Fortran is 2. The default for C and C++ is 1.
- -fma=fused
- EXTRA_OPTIMIZE
- Enables the use of the fused multiply-add instruction.

434.zeusmp

- basepeak = yes

437.leslie3d

- -fast
- sun_f90
- OPTIMIZE
- A convenience option, this switch selects the following switches that are described in this file:
- Includes:
  - -dalign
  - -depend
    - -xdepend
  - -fns
  - -fsimple=2
  - -fsingle
  - -ftrap_common
  - -xlibmil
  - -xlibmopt
  - -xO5
  - -xpad=local
  - -xprefetch=auto,explicit
  - -xtarget=native
  - -xvector=yes
- -xcache=128/64/2:5120/256/10
- OPTIMIZE
- xcache defines the cache properties for use by the optimizer. It can specify use of default assumptions ("generic"); use of whatever the compiler can assume about the current platform ("native"); or an explicit description of up to three levels of cache, using colon-separated specifiers of the form si/li/ai, where:
  - si is the size of the cache, in kb
  - li is the line size, in bytes
  - ai is the associativity
- -xpagesize=4M
- OPTIMIZE
- Set the preferred page size for running the program.
- -xprefetch_level=3
- EXTRA_OPTIMIZE
- Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default for Fortran is 2. The default for C and C++ is 1.
- -qoption cg -Qlp=1
- EXTRA_OPTIMIZE
- [code generator flag]
  
  Control irregular loop prefetching; turns the module on (1) or off (0) (default is on for F90; off for C/C++)
- -qoption cg -Qlp-fa=0
- EXTRA_OPTIMIZE
- [code generator flag]
  
  Control irregular loop prefetching; a setting of "1" means force user settings to override internally computed values.
- -qoption cg -Qlp-fl=1
- EXTRA_OPTIMIZE
- [code generator flag]
  
  Control irregular loop prefetching; a setting of "1" means force the optimization to be turned on for all languages.
- -qoption cg -Qlp-av=448
- EXTRA_OPTIMIZE
- [code generator flag]
  
  Control irregular loop prefetching; sets the prefetch look ahead distance, in bytes. The default is 256.
- -qoption cg -Qlp-t=4
- EXTRA_OPTIMIZE
- [code generator flag]
  
  Control irregular loop prefetching; sets the number of attempts at prefetching. If not specified, t=2 if -xprefetch_level=3 has been set; otherwise, defaults to t=1.
- -xprefetch=latx:3.5
- EXTRA_OPTIMIZE
- Adjust the compiler's assumptions about prefetch latency by the specified factor. Typically values in the range of 0.5 to 2.0 will be useful. A lower number might indicate that data will usually be cache resident; a higher number might indicate a relatively larger gap between the processor speed and the memory speed (compared to the assumptions built into the compiler).

459.GemsFDTD

- basepeak = yes

465.tonto

- -fast
- sun_f90
- OPTIMIZE
- A convenience option, this switch selects the following switches that are described in this file:
- Includes:
  - -dalign
  - -depend
    - -xdepend
  - -fns
  - -fsimple=2
  - -fsingle
  - -ftrap_common
  - -xlibmil
  - -xlibmopt
  - -xO5
  - -xpad=local
  - -xprefetch=auto,explicit
  - -xtarget=native
  - -xvector=yes
- -xcache=128/64/2:5120/256/10
- OPTIMIZE
- xcache defines the cache properties for use by the optimizer. It can specify use of default assumptions ("generic"); use of whatever the compiler can assume about the current platform ("native"); or an explicit description of up to three levels of cache, using colon-separated specifiers of the form si/li/ai, where:
  - si is the size of the cache, in kb
  - li is the line size, in bytes
  - ai is the associativity
- -xpagesize=4M
- OPTIMIZE
- Set the preferred page size for running the program.
- -xipo=2
- EXTRA_OPTIMIZE
- Perform optimizations across all object files in the link step:
  - 0 = off
  - 1 = on
  - 2 = performs whole-program detection and analysis.
  At -xipo=2, the compiler performs inter-procedural aliasing analysis as well as optimization of memory allocation and layout to improve cache performance.
- -xprefetch=latx:12
- EXTRA_OPTIMIZE
- Adjust the compiler's assumptions about prefetch latency by the specified factor. Typically values in the range of 0.5 to 2.0 will be useful. A lower number might indicate that data will usually be cache resident; a higher number might indicate a relatively larger gap between the processor speed and the memory speed (compared to the assumptions built into the compiler).
- -lfast
- EXTRA_LIBS
- This library provides faster versions of some common functions, such as malloc/free and bcopy.

Benchmarks using both Fortran and C

435.gromacs

- -xprofile=collect:./feedback
- PASS1_CFLAGS, PASS1_FFLAGS, PASS1_LDFLAGS
- Collect profile data for feedback-directed optimization. If an option directory is named, the feedback will be stored there. When FDO is used, the training run gathers information regarding execution paths. As of the Sun Studio 11 version of the compiler suite, the training run gathers information about data values on SPARC systems, but not on x86 systems. Hardware performance counters are not used. FDO improves existing optimizations but does not introduce new classes of optimization.
- -xprofile=use:./feedback
- PASS2_CFLAGS, PASS2_FFLAGS, PASS2_LDFLAGS
- Use data collected for profile feedback. If an option directory is named, look for the feedback data there.
- -fast
- sun_cc
- OPTIMIZE
- A convenience option, this switch selects several other options that are described in this file.
- Includes:
- -fast
- sun_f90
- OPTIMIZE
- A convenience option, this switch selects the following switches that are described in this file:
- Includes:
  - -dalign
  - -depend
    - -xdepend
  - -fns
  - -fsimple=2
  - -fsingle
  - -ftrap_common
  - -xlibmil
  - -xlibmopt
  - -xO5
  - -xpad=local
  - -xprefetch=auto,explicit
  - -xtarget=native
  - -xvector=yes
- -xcache=128/64/2:5120/256/10
- OPTIMIZE
- xcache defines the cache properties for use by the optimizer. It can specify use of default assumptions ("generic"); use of whatever the compiler can assume about the current platform ("native"); or an explicit description of up to three levels of cache, using colon-separated specifiers of the form si/li/ai, where:
  - si is the size of the cache, in kb
  - li is the line size, in bytes
  - ai is the associativity
- -xpagesize=4M
- OPTIMIZE
- Set the preferred page size for running the program.
- -xipo=2
- EXTRA_OPTIMIZE
- Perform optimizations across all object files in the link step:
  - 0 = off
  - 1 = on
  - 2 = performs whole-program detection and analysis.
  At -xipo=2, the compiler performs inter-procedural aliasing analysis as well as optimization of memory allocation and layout to improve cache performance.
- -xinline=
- EXTRA_OPTIMIZE
- Turn off inlining.
- -xarch=generic
- EXTRA_OPTIMIZE
- Specifies which instructions can be used. Among the choices are:
  - v8plusa Use instructions that are available on the UltraSPARC processors
  - v8plusb Use instructions that are available on the UltraSPARC III/IV processors
  - sparcfmaf Allows use of the v8plusb set plus extensions for floating-point multiply-add
  - native Use the instructions available on the current processor
  - generic Use instructions that are compatible with most SPARC processors
- -xchip=generic
- EXTRA_OPTIMIZE
- xchip determines timing properties that are assumed by the compiler. It does not limit which instructions are allowed (see xtarget for that). Among the choices are:
  - ultra3 Optimize for the UltraSPARC III processor
  - ultra3cu Optimize for the UltraSPARC IIIcu processor
  - sparc64vi Optimize for the SPARC64 VI processor
  - native Optimize for the current processor
  - generic Use timing properties for good performance on most SPARC processors
- -fsimple=0
- EXTRA_OPTIMIZE
- Controls simplifying assumptions for floating point arithmetic:
  - -fsimple=0 permits no simplifying assumptions. Preserves strict IEEE 754 conformance.
  - -fsimple=1 allows the optimizer to assume:
    - The IEEE 754 default rounding/trapping modes do not change after process initialization.
    - Computations producing no visible result other than potential floating-point exceptions may be deleted.
    - Computations with Infinity or NaNs as operands need not propagate NaNs to their results. For example, x*0 may be replaced by 0.
    - Computations do not depend on sign of zero.
  - -fsimple=2 permits more aggressive floating point optimizations that may cause programs to produce different numeric results due to changes in rounding. Even with -fsimple=2, the optimizer still is not permitted to introduce a floating point exception in a program that otherwise produces none.
- -fma=fused
- EXTRA_OPTIMIZE
- Enables the use of the fused multiply-add instruction.

Implicitly Included Flags

This section contains descriptions of flags that were included implicitly by other flags, but which do not have a permanent home at SPEC.

System and Other Tuning Information

One or more of the following settings may have been applied to the testbed. If so, the "Platform Notes" section of the report will say so; and you can read below to find out more about what these settings mean.

autoup=<n> (Unix /etc/system)
When the file system flush daemon fsflush runs, it writes to disk all modified file buffers that are more than n seconds old.

bufhwm=<n> (Unix /etc/system)
Sets the upper limit of the file system buffer cache. The units for bufhwm are in kilobytes.

cpu_bringup_set=<n> (Unix /etc/system)
Specifies which processors are enabled at boot time. <n> represents a bitmap of the processors to be brought online.

disablecomponent (System Management Services)
This command can be used prior to booting the system for a 1-cpu test. The tester uses disablecomponent to add all other CPUs to the "blacklist", which is a list of components that cannot be used at boot time.

LD_LIBRARY_PATH=<directories> (linker)
LD_LIBRARY_PATH controls the search order for both the compile-time and run-time linkers. Usually, it can be defaulted; but testers may sometimes choose to explicitly set it (as documented in the notes in the submission), in order to ensure that the correct versions of libraries are picked up.

LD_PRELOAD=<shared object> (Unix environment variable)
Adds the named shared object to the runtime environment.

MADV=access_lwp and LD_PRELOAD=madv.so.1 (Unix environment variables)
When the madv.so.1 shared object is present in the LD_PRELOAD list, it is possible to provide advice to the system about how memory is likely to be accessed. The advice present in MADV applies to all processes and their descendants. A commonly used value is access_lwp, which means that when memory is allocated, the next process to touch it will be the primary user. Examples of other possible values include sequential, for memory that is used only once and then no longer needed and acces_many when many processes will be sharing data.

MPSSHEAP=<size>, MPSSSTACK=<size>, and LD_PRELOAD=mpss.so.1 (Unix environment variables)
When these variables are set, the mpss.so.1 shared object will set the preferred page size for new processes, and their descendants, to the requested sizes for the heap and stack.

PARALLEL=<n> (Unix environment variable)
If programs have been compiled with -xautopar, this environment variable can be set to the number of processors that programs should use.

segmap_percent=<n> (Unix /etc/system)
This value controls the size of the segmap cache as a percent of total memory. Set this value to help keep the file system cache from consuming memory unnecessarily.

STACKSIZE=<n> (Unix environment variable)
Set the size of the stack (temporary storage area) for each slave thread of a multithreaded program.

submit=echo 'pbind -b...' > dobmk; sh dobmk (SPEC tools, Unix shell)
When running multiple copies of benchmarks, the SPEC config file feature submit is sometimes used to cause individual jobs to be bound to specific processors. If so, the specific command may be found in the config file; here is a brief guide to understanding that command:

svcadm disable webconsole (Unix, superuser commands)
Turns off the Sun Web Console, a browser-based interface that performs systems management. If it is enabled, system administrators can manage systems, devices and services from remote systems.

ts_dispatch_extended=<n> (Unix /etc/system)
Controls which dispatch table is loaded upon boot. A value of 1 loads the large system table, a value of 0 loads the regular system table.

tune_t_fsflushr=<n> (Unix /etc/system)
Controls the number of seconds between runs of the file system flush daemon, fsflush.

ulimit -s <n> (Unix shell)
Sets the stack size to n kbytes, or "unlimited" to allow the stack size to grow without limit.
Note that the "heap" and the "stack" share space; if your application allocates large amounts of memory on the heap, then you may find that the stack limit should not be set to "unlimited". A commonly used setting for SPEC CPU2006 purposes is a stack size of 128MB (131072K).

For questions about the meanings of these flags, please contact the tester.
For other inquiries, please contact webmaster@spec.org
Copyright 2006-2014 Standard Performance Evaluation Corporation
Tested with SPEC CPU2006 v1.0.1.
Report generated on Tue Jul 22 11:32:25 2014 by SPEC CPU2006 flags formatter v6906.

	Indicates that the flag description came from the user flags file.
	Indicates that the flag description came from the suite-wide flags file.
	Indicates that the flag description came from a per-benchmark flags file.

CPU2006 Flag DescriptionSun Microsystems Sun SPARC Enterprise M9000

Base Compiler Invocation

Peak Compiler Invocation

Base Optimization Flags

Peak Optimization Flags

Base Other Flags

Peak Other Flags

Implicitly Included Flags

CPU2006 Flag Description
Sun Microsystems Sun SPARC Enterprise M9000