Compilers: IBM XL C/C++ Advanced Edition for Linux V9.0 and XL Fortran Advanced Edition for Linux V11.1
Compilers: IBM XL C/C++ for Linux V10.1 and XL Fortran for Linux V12.1
Operating systems: SUSE Linux Enterprise 10, SUSE Linux Enterprise 11, and Red Hat Enterprise Linux Advanced Platform 5
Last updated: 13-Apr-2009
Invoke the IBM XL C compliler. 32-bit binaries are produced by default.
Support ISO C99 standard, and accepts implementation-specific language extensions.
Invoke the IBM XL C++ compliler. 32-bit binaries are produced by default.
Invoke the IBM XL C compliler. 32-bit binaries are produced by default.
Support ISO C99 standard, and accepts implementation-specific language extensions.
Invoke the IBM XL C++ compliler. 32-bit binaries are produced by default.
This macro indicates that the benchmark is being compiled on a PowerPC system running the Linux operating system.
Portability changes for Linux
Causes the compiler to treat "char" variables as signed instead of the default of unsigned.
This flag can be set for SPEC compilation for Linux using default compiler.
This macro indicates that the benchmark is being compiled on a PowerPC system running the Linux operating system.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
Portability changes for Linux
Causes the compiler to treat "char" variables as signed instead of the default of unsigned.
This flag can be set for SPEC compilation for Linux using default compiler.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsProduces object code containing instructions that will run on the specified processors. "auto" selects the processor the complile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the architecture system for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
qalias=ansi | noansi If ansi is specified, type-based aliasing is used during optimization, which restricts the lvalues that can be safely used to access a data object. The default is ansi for the xlc, xlC, and c89 commands. This option has no effect unless you also specify the -O option. qalias=std |nostd Indicates whether the compilation units contain any non-standard aliasing (see Compiler Reference for more information). If so, specify nostd.
Indicates that the compiler understands how to do alloca().
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsProduces object code containing instructions that will run on the specified processors. "auto" selects the processor the complile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the architecture system for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
Cause the C++ compiler to generate Run Time Type Identification code for exception handling and for use by the typeid and dynamic_cast operators.
Link with MicroQuill's SmartHeap (32-bit) library for Linux on POWER. This is a library that optimizes calls to new, delete, malloc and free.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both exectuion path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
Perform optimizations for maximum performance. This includes interprocedural analysis on all of the objects presented on the "link" step.
-O4 is equivalent to the following flagsProduces object code containing instructions that will run on the specified processors. "auto" selects the processor the complile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the architecture system for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
qalias=ansi | noansi If ansi is specified, type-based aliasing is used during optimization, which restricts the lvalues that can be safely used to access a data object. The default is ansi for the xlc, xlC, and c89 commands. This option has no effect unless you also specify the -O option. qalias=std |nostd Indicates whether the compilation units contain any non-standard aliasing (see Compiler Reference for more information). If so, specify nostd.
Link with MicroQuill's SmartHeap (32-bit) library for Linux on POWER. This is a library that optimizes calls to new, delete, malloc and free.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both exectuion path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
Performs additional optimizations that are memory intensive, compile-time intensive, and may change the semantics of the program slightly, unless -qstrict is specified. We recommend these optimizations when the desire for run-time speed improvements outweighs the concern for limiting compile-time resources. The optimizations provided include:
Produces object code containing instructions that will run on the specified processors. "auto" selects the processor the complile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the architecture system for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both exectuion path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
Perform optimizations for maximum performance. This includes interprocedural analysis on all of the objects presented on the "link" step.
-O4 is equivalent to the following flagsProduces object code containing instructions that will run on the specified processors. "auto" selects the processor the complile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the architecture system for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
Indicates that the compiler understands how to do alloca().
Generates 64 bit ABI binaries. The default is to generate 32 bit binaries.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsProduces object code containing instructions that will run on the specified processors. "auto" selects the processor the complile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the architecture system for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
Disables generation of vector instructions for processors that support them.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both exectuion path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
Perform optimizations for maximum performance. This includes interprocedural analysis on all of the objects presented on the "link" step.
-O4 is equivalent to the following flagsProduces object code containing instructions that will run on the specified processors. "auto" selects the processor the complile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the architecture system for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
Disables generation of vector instructions for processors that support them.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both exectuion path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsProduces object code containing instructions that will run on the specified processors. "auto" selects the processor the complile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the architecture system for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsProduces object code containing instructions that will run on the specified processors. "auto" selects the processor the complile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the architecture system for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both exectuion path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsProduces object code containing instructions that will run on the specified processors. "auto" selects the processor the complile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the architecture system for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
Disables generation of vector instructions for processors that support them.
Generates 64 bit ABI binaries. The default is to generate 32 bit binaries.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both exectuion path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsProduces object code containing instructions that will run on the specified processors. "auto" selects the processor the complile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the architecture system for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
Generates 64 bit ABI binaries. The default is to generate 32 bit binaries.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both exectuion path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
Perform optimizations for maximum performance. This includes interprocedural analysis on all of the objects presented on the "link" step.
-O4 is equivalent to the following flagsProduces object code containing instructions that will run on the specified processors. "auto" selects the processor the complile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the architecture system for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
Cause the C++ compiler to generate Run Time Type Identification code for exception handling and for use by the typeid and dynamic_cast operators.
Link with MicroQuill's SmartHeap (32-bit) library for Linux on POWER. This is a library that optimizes calls to new, delete, malloc and free.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both exectuion path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
Perform optimizations for maximum performance. This includes interprocedural analysis on all of the objects presented on the "link" step.
-O4 is equivalent to the following flagsProduces object code containing instructions that will run on the specified processors. "auto" selects the processor the complile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the architecture system for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
Disables generation of vector instructions for processors that support them.
Link with MicroQuill's SmartHeap (32-bit) library for Linux on POWER. This is a library that optimizes calls to new, delete, malloc and free.
Pass the -q flag to the linker causing the final executable to have the relocation information.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsProduces object code containing instructions that will run on the specified processors. "auto" selects the processor the complile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the architecture system for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
Link with MicroQuill's SmartHeap (32-bit) library for Linux on POWER. This is a library that optimizes calls to new, delete, malloc and free.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
This section contains descriptions of flags that were included implicitly by other flags, but which do not have a permanent home at SPEC.
Perform optimizations for maximum performance. This includes interprocedural analysis on all of the objects presented on the "link" step.
-O4 is equivalent to the following flagsPerforms additional optimizations that are memory intensive, compile-time intensive, and may change the semantics of the program slightly, unless -qstrict is specified. We recommend these optimizations when the desire for run-time speed improvements outweighs the concern for limiting compile-time resources. The optimizations provided include:
Performs a set of optimizations that are intended to offer improved performance without an unreasonable increase in time or storage that is required for compilation including:
Performs high-order transformations on loops during optimization. o arraypad The compiler will pad any arrays where it infers that there may be a benefit. o level=0 The compiler performs a limited set of high-order loop transformations. o level=1 The compiler performs its full set of high-order loop transformations. o simd Replaces certain instruction sequences with vector instructions. o vector Replaces certain instruction sequences with calls to the MASS library. Specifying -qhot without suboptions implies -qhot=nosimd, -qhot=noarraypad, -qhot=vector and -qhot=level=1. The -qhot option is also implied by -O4, and -O5.
Enhances optimization by doing detailed analysis across procedures (interprocedural analysis or IPA). The level determines the amount of interprocedural analysis and optimization that is performed.
level=0 Does only minimal interprocedural analysis and optimization
level=1 turns on inlining , limited alias analysis, and limited call-site tailoring
level=2 turns on full interprocedural data flow and alias analysis
Produces object code containing instructions that will run on the specified processors. "auto" selects the processor the complile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the architecture system for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
Enhances optimization by doing detailed analysis across procedures (interprocedural analysis or IPA). The level determines the amount of interprocedural analysis and optimization that is performed.
level=0 Does only minimal interprocedural analysis and optimization
level=1 turns on inlining , limited alias analysis, and limited call-site tailoring
level=2 turns on full interprocedural data flow and alias analysis
echo 200 > /proc/sys/vm/nr_hugepages
Usage: chsyscfg -r lpar | prof | sys | sysprof | frame -m <managed system> | -e <managed frame> -f <configuration file> | -i "<configuration data>" [--help] Changes partitions, partition profiles, system profiles, or the attributes of a managed system or a managed frame. -r - the type of resource(s) to be changed: lpar - partition prof - partition profile sys - managed system sysprof - system profile frame - managed frame -m <managed system> - the managed system's name -e <managed frame> - the managed frame's name -f <configuration file> - the name of the file containing the configuration data for this command. The format is: attr_name1=value,attr_name2=value,... or "attr_name1=value1,value2,...",... -i "<configuration data>" - the configuration data for this command. The format is: "attr_name1=value,attr_name2=value,..." or ""attr_name1=value1,value2,...",..." --help - prints this help The valid attribute names for this command are: -r prof required: name, lpar_id | lpar_name optional: ... lpar_proc_compat_mode (default | POWER6_enhanced)
submit = numactl --membind=\$SPECCOPYNUM --physcpubind=\$SPECCOPYNUM $command
--membind=nodes Only allocate memory from nodes. Allocation will fail when there is not enough memory available on these nodes. --physcpubind=cpus Only execute process on cpus. This accepts physical cpu numbers as shown in the processor fields of /proc/cpuinfo.
HUGETLB_VERBOSE=0 : Turn off any debugging message from libhugetlbfs HUGETLB_MORECORE=yes: Instructs libhugetlbfs to override libc's normal morecore() function with a hugepage version and use it for malloc(). HUGETLB_MORECORE_HEAPBASE=0x50000000: Specifies that the hugepage heap address to start at 0x50000000. XLFRTEOPTS=intrinthrds=1 : Causes the Fortran runtime to only use a single thread.
- First we copied the original executable (baseexe) to baseexe.orig. - Then, the executable is instrumented and its initial profile generated, as follows: $ fdprpro -a instr baseexe The output will be generated (by default) in baseexe.instr and its profile in baseexe.nprof. - Next, run baseexe.instr using the training data. This will fill the profile file with information that characterizes the training workload. - Finally, re-run FDPR-Pro with the profile file provided, as follows: $ fdprpro -a opt -f baseexe.nprof [optimization options] baseexe Instrumentation Options Descriptions: -ei, --embedded-instrumentation Perform embedded instrumentation. The profile will be collected into global variables. -fd Fdesc, --file-descriptor Fdesc Set the file descriptor number to be used when opening the profile file. The default of Fdesc is set to the maximum-allowed number of open files. -imullX, --mullX-instrumentation perform value profiling of RA and RB operands in mullX instruc- tions. -issu, --instrumentation-safe-stack-usage Ensure additional stack space is properly allocated for the instrumented run. Use this option if your application uses stack extensively (e.g., when the program uses alloca()). Note that this option adds extra overhead on instrumentation code. -iso offset, --instrumentation-stack-offset offset Set the offset from the stack, a negative number, where the instrumentation's area for saving registers is kept at runtime. Use with care. -M addr, --profile-map addr Set shared memory segment address for profiling. Alternative shared memory addresses are needed when the instrumented program application creates a conflict with the shared-memory addresses preserved for the profiling. Typical alternative values are 0x40000000, 0x50000000, ... up to 0xC0000000. The default is set to 0x3000000. -[no]ri, --[no]register-instrumentation Instrument the input program file to collect profile information about indirect branches via registers. The default is set to col- lect the profile information. -[no]sfp, --[no]save-floating-point-registers Save floating point registers in instrumented code. The default is set to save floating point registers. Optimization Options Descriptions: -A alignment, --align-code alignment Align program so that hot code will be aligned on alignment-byte addresses. -abb factor, --align-basic-blocks factor Align basic blocks that are hotter than the average by a given (float) factor. This is a lower-level machine-specific alignment compared to --align-code. Value of -1 (the default) disables this option. -bf, --branch-folding Eliminate branch to branch instructions. -bldcg, --build-dcg Build a Data Connectivity Graph (DCG) for enhanced data reordering (applicable only with the -RD flag). -bp, --branch-prediction Set branch prediction bit for conditional branches according to the collected profile. -btcar, --branch-table-csect-anchor-removal Eliminate load instructions used when accessing branch tables. -cbtd, --convert-bss-to-data Convert BSS section into a data section. This is useful for more aggressive tocload and RD optimizations. -cRD, --conservativeRD Perform conservative static data reordering by packing together all frequently referenced static variables. -dce, --dead-code-elimination Eliminate instructions related to unused local variables within frequently executed functions. This is useful mainly after apply- ing function inlining optimization. -dp, --data-prefetch Insert data-cache prefetch instructions to improve data-cache per- formance. -dpht threshold, --data-placement-hotness-threshold threshold Set data placement algorithm hotness threshold between (0,1), where 0 reorders the static variables in large groups based on the control flow, and 1 reorders the variables in very small groups based on their access frequency. (This is applicable only with the -RD flag). -dpnf factor, --data-placement-normalization-factor factor Set data placement algorithm normalization factor between (0,1), where 0 causes static variables to be reordered regardless of their size, and 1 locates only small sized variables first. (applicable only with the -RD flag). -ece, --epilog-code-eliminate Reduce code size by grouping common instructions in function epi- logs, into a single unified code. -fc, --function-cloning Enable function cloning phase only during function inlining opti- mizations (applicable only with function inlining flags: -i, -si, -ihf, -isf, -shci). -hr, --hco-reschedule Relocate instructions from frequently executed code to rarely exe- cuted code areas, when possible. -hrf factor, --hco-resched-factor factor Set the aggressiveness of the -hr optimization option according to a factor value between (0,1), where 0 is the least aggressive fac- tor (applicable only with the -hr option). -i, --inline Same as --selective-inline with --inline-small-funcs 12. -ihf pct, --inline-hot-functions pct Inline all function call sites to functions that have a frequency count greater than the given pct frequency percentage. -isf size, --inline-small-funcs size Inline all functions that are smaller than or equal to the given size in bytes. -kr, --killed-registers Eliminate stores and restores of registers that are killed (over- written) after frequently executed function calls. -lap, --load-address-propagation Eliminate load instructions of variable addresses by re-using pre- loaded addresses of adjacent variables. -las, --load-after-store Add NOP instructions to place each load instruction further apart following a store instruction that references the same memory address. -lro, --link-register-optimization Eliminate saves and restores of the link register in frequently- executed functions. -lu aggressiveness_factor, --loop-unroll aggressiveness_factor Unroll short loops containing one to several basic blocks accord- ing to an aggressiveness factor between (1,9), where 1 is the least aggressive unrolling option for very hot and short loops. -lun unrolling_number, --loop-unrolling-number unrolling_number Set the number of unrolled iterations in each unrolled loop. The allowed range is between (2,50). Default is set to 2. (Applicable only with the -lu flag). -nop, --nop-removal Remove NOP instructions from reordered code. -O Switch on basic optimizations only. Same as -RC -nop -bp -bf. -O2 Switch on less aggressive optimization flags. Same as -O -hr -pto -isf 8 -tlo -kr. -O3 Switch on aggressive optimization flags. Same as -O2 -RD -isf 12 -si -dp -lro -las -vro -btcar -lu 9 -rt 0 -so. -O4 Switch on aggressive optimization flags together with aggressive function inlining. Same as -O3 -sidf 50 -ihf 20 -sdp 9 -shci 90 and -bldcg (for XCOFF files). -O5 Switch on aggressive optimization flags together with HLR opti- mization. Same as -O4 -sa -gcpyp -gcnstp -dce -vrox. -omullX, --mullX-optimization Optimize mullX instructions by adding a run-time check on RA and RB and performing equivalent operations with lower penalty. The optimization requires the use of -imullX in the instrumentation phase. -pbsi, --path-based-selective-inline Perform selective inlining of dominant hot function calls based on the control flow paths leading to hot functions. -pc, --preserve-csects Preserve CSects' boundaries in reordered code. -pca, --propagate-constant-area Relocate the constant variables area to the top of the code sec- tion when possible. -pfb, --preserve-first-bb Preserve original location of the entry point basic block in pro- gram. -pp, --preserve-functions Preserve functions' boundaries in reordered code. -[no]pr, --[no]ptrgl-r11 Perform removal of R11 load instruction in _ptrgl csect. -pto, --ptrgl-optimization Perform optimization of indirect call instructions via registers by replacing them with conditional direct jumps. -ptoht heatness_threshold, --ptrgl-optimization-heatness-threshold heatness_threshold Set the frequency threshold for indirect calls that are to be optimized by -pto optimization. Allowed range between 0 and 1. Default is set to 0.8. (Applicable only with -pto flag). -ptosl limit_size, --ptrgl-optimization-size-limit limit_size Set the limit of the number of conditional statements generated by -pto optimization. Allowed values are between 1 and 100. Default value is set to 3. (Applicable only with the -pto flag). -RC, --reorder-code Perform code reordering. -rcaf aggressiveness_factor, --reorder-code-aggressivenes-factor aggressiveness_factor Set the aggressiveness of code reordering optimization. Allowed values are [0 1 2], where 0 preserves then original code order and 2 is the most aggressive. Default is set to 1. (Applicable only with the -RC flag). -rccrf reversal_factor, --reorder-code-condition-reversal-factor rever- sal_factor Set the threshold fraction that determines when to enable condi- tion reversal for each conditional branch during code reordering. Allowed input range is between 0.0 and 1.0 where 0.0 tries to pre- serve original condition direction and 1.0 ignores it. Default is set to 0.8 (Applicable only with the -RC flag). -rcctf termination_factor, --reorder-code-chain-termination-factor ter- mination_factor Set the threshold fraction that determines when to terminate each chain of basic blocks during code reordering. Allowed input range is between 0.0 and 1.0 where 0.0 generates long chains and 1.0 creates single basic block chains. Default is set to 0.05. (Appli- cable only with the -RC flag). -RD, --reorder-data Perform static data reordering. -rmte, --remove-multiple-toc-entries Remove multiple TOC entries pointing to the same location in the input program file. -rt removal_factor, --reduce-toc removal_factor Perform removal of TOC entries according to a removal factor between (0,1), where 0 removes non-accessed TOC entries only and 1 removes all possible TOC entries. -rtb, --remove-traceback-tables Remove traceback tables in reordered code. -sdp aggressiveness_factor, --stride-data-prefetch aggressiveness_fac- tor Perform data prefetching within frequently executed loops based on stride analysis, according to an aggressiveness factor between (1,9), where 1 is the least aggressive. -sdpla iterations_number, --stride-data-prefetch-look-ahead itera- tions_number Set the number of iterations for which data is prefetched into the cache ahead of time. Default value is set to 4 iterations. (Appli- cable only with the -sdp flag). -sdpms stride_min_size, --stride-data-prefetch-min-size stride_min_size Set the minimal stride size in bytes, for which data will be con- sidered a candidate for prefetching. Default value is set to 128 bytes. (Applicable only with the -sdp flag). -see level Use simplified prolog/epilog for functions that perform condi- tional early-exit. Use basic optimization with level=0 and maximal with level=1. -shci pct, --selective-hot-code-inline pct Perform selective inlining of functions in order to decrease the total number of execution counts, so that only functions with hot- ness above the given percentage are inlined. -si, --selective-inline Perform selective inlining of dominant hot function calls. -sidf percentage_factor, --selective-inline-dominant-factor percent- age_factor Set a dominant factor percentage for selective inline optimiza- tion. The allowed range is between 0 and 100. Default is set to 80. (Applicable only with the -si and -pbsi flags). -siht frequency_factor, --selective-inline-hotness-threshold fre- quency_factor Set a hotness threshold factor percentage for selective inline optimization to inline all dominant function calls that have a frequency count greater than the given frequency percentage. Default is set to 100. (Applicable only with the -si -pbsi flags). -slbp, --spinlock-branch-prediction Perform branch prediction bit setting for conditional branches in spinlock code containing l*arx and st*cx instructions. (Applicable after -bp flag). -sldp, --spinlock-data-prefetch Perform data prefetching for memory access instructions preceding spinlock code containing l*arx and st*cx instructions. -sll Lib1:Prof1,...,LibN:ProfN, --static-link-libraries Lib1:Prof1,...,LibN:ProfN Statically link hot code from specified dynamically linked libraries to the input program. The parameter consists of a comma- separated list of libraries and their profiles. IMPORTANT: Licens- ing rights of specified libraries should be observed when applying this copying optimization. -sllht hotness_threshold, --static-link-libraries-hotness-threshold hotness_threshold Set hotness threshold for the --static-link-libraries optimiza- tion. The allowed input range is between 0 (least aggressive) and 1, or -1, which does not require a profile and selects all code that might be called by the input program from the given libraries. Default is set at 0.5. -so, --stack-optimization Reduce the stack frame size of functions that are called with a small number of arguments. -spc, --shortcut-plt-calls Shortcut PLT calls in shared libraries to local functions if they exist. Note: Resolving to external symbols is disabled for such calls. -stf, --stack-flattening Merge the stack frames of inlined functions with the frames of the calling functions. -tb, --preserve-traceback-tables Force the restructuring of traceback tables in reordered code. If -tb option is omitted, traceback tables are automatically included only for C++ applications that use the Try & Catch mechanism. -tlo, --tocload-optimization Replace each load instruction that references the TOC with a cor- responding add-immediate instruction via the TOC anchor register, where possible. -ucde, --unreachable-code-data-elimination Remove unreachable code and non-accessed static data. -vro, --volatile-registers-optimization Eliminate stores and restores of non-volatile registers in fre- quently executed functions by using available volatile registers. -vrox, --volatile-registers-extended-optimization Eliminate stores and restores of non-volatile registers in fre- quently executed functions by using available volatile registers, the extended version supports FP registers and transparency. General Options: -h, --help Print online help. -m machine-model, --machine machine-model Generate code for the specified machine model. Target machine can be one of the following models: power2, power3, ppc405, ppc440, power4, ppc970, power5, power6, ppe, spe, spe_edp, z10, z9. Default is set to no machine. -q, --quiet Set quiet output mode, suppressing informational messages. -st stat_file, --statistics stat_file Output statistics information to stat_file. If stat_file is '-', the output goes to standard output. See --verbose for the default. -v level, --verbose level Set verbose output mode level. When set, various statistics about the target optimized program are printed into the file pro- gram.stat. Allowed level range is between 0 and 3. Default is set to 0. -V, --version Print version.
Flag description origin markings:
For questions about the meanings of these flags, please contact the tester.
For other inquiries, please contact webmaster@spec.org
Copyright 2006-2014 Standard Performance Evaluation Corporation
Tested with SPEC CPU2006 v1.1.
Report generated on Tue Jul 22 23:43:12 2014 by SPEC CPU2006 flags formatter v6906.