Compilers: IBM XL C/C++ Version 13.1 for Linux
Compilers: IBM XL Fortran Version 15.1 for Linux
Libraries: IBM IBM Advance Toolchain 7
Operating systems: Red Hat Enterprise Linux Server release 7
Last updated: 16-June-2014
Invoke the IBM XL C compiler. 32-bit binaries are produced by default. Link with the IBM Advanced Toolchain libraries.
Allows most any c dialect.
Invoke the IBM XL C++ compiler. 32-bit binaries are produced by default. Link with the IBM Advanced Toolchain libraries.
Invoke the IBM XL Fortran compiler. 32-bit binaries are produced by default. Link with the IBM Advanced Toolchain libraries.
Invoke the IBM XL C compiler. 32-bit binaries are produced by default. Link with the IBM Advanced Toolchain libraries.
Allows most any c dialect.
Invoke the IBM XL Fortran compiler. 32-bit binaries are produced by default. Link with the IBM Advanced Toolchain libraries.
Invoke the IBM XL C compiler. 32-bit binaries are produced by default. Link with the IBM Advanced Toolchain libraries.
Allows most any c dialect.
Invoke the IBM XL C++ compiler. 32-bit binaries are produced by default. Link with the IBM Advanced Toolchain libraries.
Invoke the IBM XL Fortran compiler. 32-bit binaries are produced by default. Link with the IBM Advanced Toolchain libraries.
Invoke the IBM XL C compiler. 32-bit binaries are produced by default. Link with the IBM Advanced Toolchain libraries.
Allows most any c dialect.
Invoke the IBM XL Fortran compiler. 32-bit binaries are produced by default. Link with the IBM Advanced Toolchain libraries.
Indicates that the input fortran source program is in fixed form.
Indicates that the input fortran source program is in fixed form.
Indicates that the input fortran source program is in fixed form.
Indicates that the input fortran source program is in fixed form.
Adds an underscore to global entities to match the C compiler ABI
Indicates that the input fortran source program is in fixed form.
Adds an underscore to global entities to match the C compiler ABI
Indicates that the input fortran source program is in fixed form.
Indicates that the input fortran source program is in fixed form.
Adds an underscore to global entities to match the C compiler ABI
This macro indicates that C functions called from Fortran should not have an underscore added to their names.
Causes the compiler to treat "char" variables as signed instead of the default of unsigned.
Indicates that the input fortran source program is in fixed form.
Indicates that the input fortran source program is in fixed form.
Indicates that the input fortran source program is in fixed form.
Indicates that the input fortran source program is in fixed form.
Adds an underscore to global entities to match the C compiler ABI
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
Indicates that the input fortran source program is in fixed form.
Adds an underscore to global entities to match the C compiler ABI
Indicates that the input fortran source program is in fixed form.
Indicates that the input fortran source program is in fixed form.
Adds an underscore to global entities to match the C compiler ABI
This macro indicates that C functions called from Fortran should not have an underscore added to their names.
Causes the compiler to treat "char" variables as signed instead of the default of unsigned.
The inline option specifies the threshold and limit of inlined functions
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsEnables the generation of vector instructions for processors that support them.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
The inline option specifies the threshold and limit of inlined functions
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsCause the C++ compiler to generate Run Time Type Identification code
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsqalias=ansi | noansi If ansi is specified, type-based aliasing is used during optimization, which restricts the lvalues that can be safely used to access a data object. The default is ansi for the xlc, xlC, and c89 commands. This option has no effect unless you also specify the -O option. qalias=std |nostd Indicates whether the compilation units contain any non-standard aliasing (see Compiler Reference for more information). If so, specify nostd.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
The inline option specifies the threshold and limit of inlined functions
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsEnables the generation of vector instructions for processors that support them.
qalias=ansi | noansi If ansi is specified, type-based aliasing is used during optimization, which restricts the lvalues that can be safely used to access a data object. The default is ansi for the xlc, xlC, and c89 commands. This option has no effect unless you also specify the -O option. qalias=std |nostd Indicates whether the compilation units contain any non-standard aliasing (see Compiler Reference for more information). If so, specify nostd.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
The inline option specifies the threshold and limit of inlined functions
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsEnables the generation of vector instructions for processors that support them.
The compiler generates additional symbol information for use by the "fdpr" binary optimization tool.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The inline option specifies the threshold and limit of inlined functions
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both execution path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsEnables the generation of vector instructions for processors that support them.
Generates 64 bit ABI binaries. The default is to generate 32 bit ABI binaries.
The compiler generates additional symbol information for use by the "fdpr" binary optimization tool.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The inline option specifies the threshold and limit of inlined functions
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both execution path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsEnables the generation of vector instructions for processors that support them.
The compiler generates additional symbol information for use by the "fdpr" binary optimization tool.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The inline option specifies the threshold and limit of inlined functions
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Perform optimizations for maximum performance. This includes interprocedural analysis on all of the objects presented on the "link" step.
-O4 is equivalent to the following flagsThe compiler generates additional symbol information for use by the "fdpr" binary optimization tool.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The inline option specifies the threshold and limit of inlined functions
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both execution path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Perform optimizations for maximum performance. This includes interprocedural analysis on all of the objects presented on the "link" step.
-O4 is equivalent to the following flagsThe compiler generates additional symbol information for use by the "fdpr" binary optimization tool.
Cause the C++ compiler to generate Run Time Type Identification code
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The inline option specifies the threshold and limit of inlined functions
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both execution path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Produces object code containing instructions that will run on the specified processors. "auto" selects the processor the compile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the system architecture for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
Enables the generation of vector instructions for processors that support them.
The noprefetch option causes the compiler to generate no prefetch instructions and to not adjust the DSCR when executing this program.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
The inline option specifies the threshold and limit of inlined functions
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both execution path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Produces object code containing instructions that will run on the specified processors. "auto" selects the processor the compile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the system architecture for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
The prefetch=dscr option causes the Data Streams Control Register to be set to the value specified when executing this program.
The compiler generates additional symbol information for use by the "fdpr" binary optimization tool.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsEnables the generation of vector instructions for processors that support them.
The compiler generates additional symbol information for use by the "fdpr" binary optimization tool.
Causes the Fortran compiler to allocate dynamic arrays on the heap instead of the stack
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsEnables the generation of vector instructions for processors that support them.
The prefetch=dscr option causes the Data Streams Control Register to be set to the value specified when executing this program.
The partition suboption specifies the size of the program sections that are analysed together. Larger partitons may produce better analysis but require more storage. Default is medium.
The compiler generates additional symbol information for use by the "fdpr" binary optimization tool.
qalias=ansi | noansi If ansi is specified, type-based aliasing is used during optimization, which restricts the lvalues that can be safely used to access a data object. The default is ansi for the xlc, xlC, and c89 commands. This option has no effect unless you also specify the -O option. qalias=std |nostd Indicates whether the compilation units contain any non-standard aliasing (see Compiler Reference for more information). If so, specify nostd.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Perform optimizations for maximum performance. This includes interprocedural analysis on all of the objects presented on the "link" step.
-O4 is equivalent to the following flagsEnables the generation of vector instructions for processors that support them.
Generates 64 bit ABI binaries. The default is to generate 32 bit ABI binaries.
The compiler generates additional symbol information for use by the "fdpr" binary optimization tool.
-qxlf90=<suboption> Determines whether the compiler provides the Fortran 90 or the Fortran 95 level of support for certain aspects of the language. <suboption> can be one of the following: signedzero | nosignedzero Determines how the SIGN(A,B) function handles signed real 0.0. In addition, determines whether negative internal values will be prefixed with a minus when formatted output would produce a negative sign zero. autodealloc | noautodealloc Determines whether the compiler deallocates allocatable arrays that are declared locally without either the SAVE or the STATIC attribute and have a status of currently allocated when the subprogram terminates. oldpad | nooldpad When the PAD=specifier is present in the INQUIRE statement, specifying -qxlf90=nooldpad returns UNDEFINED when there is no connection, or when the connection is for unformatted I/O. This behavior conforms with the Fortran 95 standard and above. Specifying -qxlf90=oldpad preserves the Fortran 90 behavior. Default: o signedzero, autodealloc and nooldpad for the xlf95, xlf95_r, xlf95_r7 and f95 invocation commands. o nosignedzero, noautodealloc and oldpad for all other invocation commands.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both execution path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsGenerates 64 bit ABI binaries. The default is to generate 32 bit ABI binaries.
The compiler generates additional symbol information for use by the "fdpr" binary optimization tool.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
Determines substitute path names for XL Fortran executables such as the compiler, assembler, linker, and preprocessor. It can be used in combination with the -t option, which determines which of these components are affected by -B.
Parameter | Description | Executable name |
---|---|---|
a | Assembler | as |
b | Low-level optimizer | xlfcode |
c | Compiler front end | xlfentry |
d | Disassembler | dis |
F | C preprocessor | cpp |
h | Array language optimizer | xlfhot |
I | High-level optimizer, compile step | ipa |
l | Linker | ld |
z | Binder | bolt |
Pass the --hugetlbfs-align flag to the linker so that we can control (by environment variable HUGETLB_ELFMAP) which program segments are placed in hugepages.
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both execution path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsGenerates 64 bit ABI binaries. The default is to generate 32 bit ABI binaries.
The partition suboption specifies the size of the program sections that are analysed together. Larger partitons may produce better analysis but require more storage. Default is medium.
The compiler generates additional symbol information for use by the "fdpr" binary optimization tool.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The inline option specifies the threshold and limit of inlined functions
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both execution path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Perform optimizations for maximum performance. This includes interprocedural analysis on all of the objects presented on the "link" step.
-O4 is equivalent to the following flagsThe partition suboption specifies the size of the program sections that are analysed together. Larger partitons may produce better analysis but require more storage. Default is medium.
The compiler generates additional symbol information for use by the "fdpr" binary optimization tool.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The inline option specifies the threshold and limit of inlined functions
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both execution path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Perform optimizations for maximum performance. This includes interprocedural analysis on all of the objects presented on the "link" step.
-O4 is equivalent to the following flagsProduces object code containing instructions that will run on the specified processors. "auto" selects the processor the compile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the system architecture for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
The partition suboption specifies the size of the program sections that are analysed together. Larger partitons may produce better analysis but require more storage. Default is medium.
Generates 64 bit ABI binaries. The default is to generate 32 bit ABI binaries.
The compiler generates additional symbol information for use by the "fdpr" binary optimization tool.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The inline option specifies the threshold and limit of inlined functions
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsEnables the generation of vector instructions for processors that support them.
The compiler generates additional symbol information for use by the "fdpr" binary optimization tool.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
The inline option specifies the threshold and limit of inlined functions
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flagsThe partition suboption specifies the size of the program sections that are analysed together. Larger partitons may produce better analysis but require more storage. Default is medium.
The compiler generates additional symbol information for use by the "fdpr" binary optimization tool.
Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.
Pass the -q flag to the linker causing the final executable to have the relocation information.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
Suppresses the message with the message number specified.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
Suppresses the message with the message number specified.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
Suppresses the message with the message number specified.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
Suppresses the message with the message number specified.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
Suppresses the message with the message number specified.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
Suppresses the message with the message number specified.
This section contains descriptions of flags that were included implicitly by other flags, but which do not have a permanent home at SPEC.
Perform optimizations for maximum performance. This includes interprocedural analysis on all of the objects presented on the "link" step.
-O4 is equivalent to the following flagsPerforms high-order transformations on loops during optimization. The supported values for suboption are:
Specifying -qhot without suboptions implies -qhot=nosimd, -qhot=noarraypad, -qhot=vector and -qhot=level=1. The -qhot option is also implied by -O4, and -O5.
Enhances optimization by doing detailed analysis across procedures (interprocedural analysis or IPA). The level determines the amount of interprocedural analysis and optimization that is performed.
level=0 Does only minimal interprocedural analysis and optimization
level=1 turns on inlining , limited alias analysis, and limited call-site tailoring
level=2 turns on full interprocedural data flow and alias analysis
Produces object code containing instructions that will run on the specified processors. "auto" selects the processor the compile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the system architecture for which the executable program is optimized. This includes instruction scheduling and cache setting.
The supported values for suboption are
Enables the generation of vector instructions for processors that support them.
Enhances optimization by doing detailed analysis across procedures (interprocedural analysis or IPA). The level determines the amount of interprocedural analysis and optimization that is performed.
level=0 Does only minimal interprocedural analysis and optimization
level=1 turns on inlining , limited alias analysis, and limited call-site tailoring
level=2 turns on full interprocedural data flow and alias analysis
submit = numactl -l -C $BIND $command
submit = numactl --membind=\$SPECCOPYNUM --physcpubind=\$SPECCOPYNUM $command submit = numactl -l -C $BIND $command
-l Allocates memory from the local node of the cpu. --membind=nodes Only allocate memory from nodes. Allocation will fail when there is not enough memory available on these nodes. -C --physcpubind=cpus Only execute process on cpus. This accepts physical cpu numbers as shown in the processor fields of /proc/cpuinfo.
Usage: - First we copied the original executable (baseexe) to baseexe.orig. - Then, the executable is instrumented and its initial profile generated, as follows: $ fdprpro -a instr baseexe The output will be generated (by default) in baseexe.instr and its profile in baseexe.nprof. - Next, run baseexe.instr using the training data. This will fill the profile file with information that characterizes the training workload. - Finally, re-run FDPR-Pro with the profile file provided, as follows: $ fdprpro -a opt -f baseexe.nprof [optimization options] baseexe fdprpro [options] -f profile program where -f specifies the profile run data. program is the name of the executable. [options] can be one or more of the following: Action Options: -a/--action [action] Specifies customized actions where [action] can be one of the following: anl analyze program instr generate instrumented program for profile gathering (same as -1) opt generate optimized program (same as -3) check_sign check fdpr signature in the input program Action Options: -anl, --analyze-program Analyze the program but do not create a modified binary. This option is used to generate profile/code coverage reports in text format. When used with the -d option it will generate the disassembly of the original program -cci, --code-coverage-instrumentation Instrument program in order to obtain code coverage information. program must be compiled with line number debug info -pi, --profile-instrumentation Instrument the program to obtain execution count profile -ui, --user-instrumentation Instrument program by insert calls to user supplied functions compiled into shared library Analysis Options: -aawc/-noaawc, --analyze-assembly-written-csects/--noanalyze-assembly-written-csects Analyze/Do not analyze objects written in Assembly. -acf <analysis configuration file>, --analysis-configuration-file <analysis configuration file> Provide a configuration file of analysis information (advanced option) -asd, --analyze-static-data Analyze static data objects as distinct data elements for data reordering (unsafe for certain compilers) -esa, --extra-safe-analysis Limit analysis phase to compiler generated code -fca, --funcsect-analysis Apply special analysis for an input executable that was compiled with the -qfuncsect compiler option -ff <string>, --file-format <string> Input file format: can be LM (load module) or PO (program object) -ifl <file>, --ignored-function-list <file> Set the ignored function list. The file contains names of functions that considered as unsafe and thus are not modified -iinf, --ignore-info Ignore .info sections produced with the -qfdpr option during compile time Instrumentation Options: -ccl <level>, --code-coverage-level <level> Perform code coverage at the basic block level (BB) or at the functions level (FUNC). default is BB -ei, --embedded-instrumentation Perform embedded instrumentation. The profile will be collected into the application's global data area. When the application terminates, the collected data will be lost -fd <Fdesc>, --file-descriptor <Fdesc> Set the file descriptor number to be used when opening the profile file. The default of <Fdesc> is set to the maximum-allowed number of open files -icsp, --instr-call-site-profiling Instrument each basic block in order to collect each caller context frequency -icvp, --instr-call-value-profiling instrument the values of parameters passed in function calles -imullX, --mullX-instrumentation perform value profiling of RA and RB operands in mullX instructions -iderat, --derat-instrumentation Perform value profiling of RA and RB operands in load/store indexed instructions -infp, --ignore-not-found-procedures Ignore not found procedures defined in the instrumentation directives file and do not exit with error -ipcr/-noipcr, --instrumentation-preserve-condition-register/--noinstrumentation-preserve-condition-register Preserve/Do not preserve the condition register while calling stubs -ipctr/-noipctr, --instrumentation-preserve-count-register/--noinstrumentation-preserve-count-register Preserve/Do not preserve the count register while calling stubs -ipe/-noipe, --instrumentation-preserve-environment/--noinstrumentation-preserve-environment Do not preserve registers that are not overwritten while calling stubs. -noipe implies -noipvr -noipspr -iplr/-noiplr, --instrumentation-preserve-link-register/--noinstrumentation-preserve-link-register Preserve/Do not preserve the link register while calling stubs -ipnvr, --instrumentation-preserve-non-volatile-registers Preserve the non volatile registers while calling stubs. -ipspr/-noipspr, --instrumentation-preserve-special-registers/--noinstrumentation-preserve-special-registers Preserve/Do not preserve the special purpose registers while calling stubs -ipvr/-noipvr, --instrumentation-preserve-volatile-registers/--noinstrumentation-preserve-volatile-registers Preserve/Do not preserve the volatile registers while calling stubs. -noipvr implies -noipnvr and -nosfp -ipxer/-noipxer, --instrumentation-preserve-fixed-point-exception-register/--noinstrumentation-preserve-fixed-point-exception-register Preserve/Do not preserve the fixed-point exception register while calling stubs -issu, --instrumentation-safe-stack-usage Ensure that additional stack space is properly allocated for the instrumented run. Use this option if your application uses the stack extensively (e.g., when the program uses alloca()). Note that this option adds extra overhead on instrumentation code -iso <offset>, --instrumentation-stack-offset <offset> Set the offset from the stack, a negative number, where the instrumentation's area for saving registers is kept at runtime. Use with care -M <addr>, --profile-map <addr> Set the shared memory segment address for profiling. Alternative shared memory addresses are needed when the instrumented program application creates a conflict with the shared-memory addresses preserved for the profiling. Typical alternative values are 0x40000000, 0x50000000, ... up to 0xC0000000. The default is set to 0x3000000 -ptm, --profile-to-memory Use shared memory key instead of file mapping to obtain a shared memory area for the profile data -ri/-nori, --register-instrumentation/--noregister-instrumentation Instrument/Do not instrument the input program file to collect profile information about indirect branches via registers. The default is set to collect the profile information -sfp/-nosfp, --save-floating-point-registers/--nosave-floating-point-registers Save/Do not save floating point registers in instrumented code. The default is set to save floating point registers -shmkey <key number>, --shared-memory-key <key number> Specify a shared memory key to use when creating a shared memory area for the profile. The default key is created by hashing the profile file name (with ftok). -spescr <0-127>, --spe-scratch-register <0-127> Specify a global SPE scratch register, decreasing instrumenation overhead, in order to minimize possibility of local store overflow Profile Files Options: -af <prof_file>, --ascii-profile-file <prof_file> Set the name of a text format profile file containing profile information. -aop, --accept-old-profile Accept the old profile file collected on previous versions of the input program file (requires the -f flag) -f <prof_file>, --profile-file <prof_file> Set the profile file name. The profile file is created during the instrumentation phase and read during the optimization phase. The profile file is updated each time you run the instrumented program -fdir <prof_file_dir>, --profile-file-directory <prof_file_dir> Set the run-time location of the profile file. The profile will be search during the profiling phase at this location. The default location is the path given in the profile file name (-f option). Applicable only at instrumentation phase Optimization Options: -A <alignment>, --align-code <alignment> Specify code alignment strategy. 1: Use grouping rules of target machine (default), 2: Same as 1 but consider also hotness of branch targets. See -m for the selected machine model. -abb <factor>, --align-basic-blocks <factor> Align basic blocks that are hotter than the average by a given (float) <factor>. This is a lower-level machine-specific alignment compared to --align-code. Value of -1 (the default) disables this option -bh <factor>, --branch-hint <factor> add branch hints to basic blocks that are hotter then the average by given (float) <factor>. This is a SPE specific optimization. Value of -1 (the default) disables this option -ccc <threshold>, --cold-code-connector <threshold> Preserves original order for code which is less frequently executed than given threshold -bldcg, --build-dcg Build a Data Connectivity Graph (DCG) for enhanced data reordering (applicable only with the -RD flag) -cbpth, --cold-branch-prediction-threshold Set the Cold Branch Prediction Threshold for branch prediction optimization. Branches whose execution count relative to the average is below this value will be statically predicted. Allowed values are between (0,1). Default is -1 - optimization is not applied. (Applicable only with the -bp flag) -bpth, --branch-prediction-threshold Set threshold for event based branch prediction optimization -pbp, --preserve-branch-predication Preserve branch predication pattern (bc+8) and avoid code reordering and branch prediction -btcar, --branch-table-csect-anchor-removal Eliminate load instructions used when accessing branch tables -cbsi, --chain-based-selective-inline Perform selective inlining of functions that produce long hot chains of code -cbtd, --convert-bss-to-data Convert BSS section into a data section. This is useful for more aggressive tocload and RD optimizations -cib-opt, --convert-indirect-branches-optimization Convert indirect branch to direct branch -cRD, --conservativeRD Perform conservative static data reordering by packing together all frequently referenced static variables -dce, --dead-code-elimination Eliminate instructions related to unused local variables within frequently executed functions. This is useful mainly after applying function inlining optimization -dp, --data-prefetch Insert data-cache prefetch instructions to improve data-cache performance -dpht <threshold>, --data-placement-hotness-threshold <threshold> Set data placement algorithm hotness threshold between (0,1), where 0 reorders the static variables in large groups based on the control flow, and 1 reorders the variables in very small groups based on their access frequency. (This is applicable only with the -RD flag) -dpnf <factor>, --data-placement-normalization-factor <factor> Set data placement algorithm normalization factor between (0,1), where 0 causes static variables to be reordered regardless of their size, and 1 locates only small sized variables first. (applicable only with the -RD flag) -ece, --epilog-code-eliminate Reduce code size by grouping common instructions in function epilogs, into a single unified code -fatc <num_of_bytes>, --fat-const <num_of_bytes> Inflate constant areas in code section by adding <num_of_bytes> (entire set to 255) to each constant area -fatd <num_of_bytes>, --fat-data <num_of_bytes> Inflate data section by adding <num_of_bytes> (entire set to 255) to each data basic unit -fatn <num_of_nops>, --fat-nop <num_of_nops> Inflate code secion by adding <num_of_nop> to each code basic block -bined < binary_editor>, --binary-editor < binary_editor> Edit existing binary code (advanced option) -fc, --function-cloning Enable function cloning phase only during function inlining optimizations (applicable only with function inlining flags: -i, -si, -ihf, -isf, -shci) -hr, --hco-reschedule Relocate instructions from frequently executed code to rarely executed code areas, when possible -hrf <factor>, --hco-resched-factor <factor> Set the aggressiveness of the -hr optimization option according to a factor value between (0,1), where 0 is the least aggressive factor (applicable only with the -hr option) -tasr, --toc-anchor-store-reschedule Relocate TOC store instructions from frequently executed code to rarely executed code areas, when possible -i, --inline Same as --selective-inline with --inline-small-funcs 12 -ia, --indirect-analysis Perform indirect branch target analysis -icm-opt, --icm-optimization Replace a sequence of l/ltr or ly/ltr instructions with and icm or icmy instruction respectively -ihf <pct>, --inline-hot-functions <pct> Inline all function call sites to functions that have a frequency count greater than the given <pct> frequency percentage -iplte, --inline-plt-entries Replaces the call to a PLT entry with the PLT entry code itself, by inlining the first part of the entry -isf <size>, --inline-small-funcs <size> Inline all functions that are smaller than or equal to the given <size> in bytes -kr, --killed-registers Eliminate stores and restores of registers that are killed (overwritten) after frequently executed function calls -lal-opt, --load-after-load-optimization Replace two load instruction from the same memory location to one load instruction and one placement instruction -lap, --load-address-propagation Eliminate load instructions of variable addresses by re-using pre-loaded addresses of adjacent variables -larl-opt, --larl-optimization Replace a sequence of bras/const area/llgt instructions with a single lalr instruction -las, --load-after-store Add NOP instructions to place each load instruction further apart following a store instruction that references the same memory address -plas, --pattern-based-load-after-store Optimizes inefficient memory access patterns in order to avoid load-after-store events. -ebplas, --event-based-pattern-based-load-after-store Optimizes inefficient memory access patterns in order to avoid load-after-store events. The optimization is possible if PM_MRK_LSU_REJECT_LHS profile is available -rcl, --remove-constant-load Reduces the number of load instructions used to bring constant values into registers. The parameter is used to control which version of optimization is applied, versions from 0 to 3 are available. -ldce, --local-dead-code-optimization Local dead code elimination (basic block scope only) - needless when using -dce -ldp-opt, --long-displacement-optimization Replace an instruction which has long displacement with the matching insturction which has short displacement, according to the displacement operand (e.g. ay-->a, oy-->o, xy-->x, etc.) -lgfr-opt, --lgfr-optimization Replace when can a 32 bit instruction with its matching 64 bit instruction -llgh-opt, --llgh-optimization Replace a sequence of lh/nilh/llgfr instructions with a single llgh instruction -fce, --fix-cobol-entries An optimization for COBOL code - fixes entries of CSECTs. Needed for HLR optimizations. -pvgc <mode>, --print-visual-graph-csect <mode> Print a .dot file with CFG information for each csect. Mode 0 is for a graph containing full instructions list for each node, 1 is for a graph with short nodes description. -pvgf <mode>, --print-visual-graph-func <mode> Print a .dot file with CFG information for each function. Mode 0 is for a graph containing full instructions list for each node, 1 is for a graph with short nodes description. -lro, --link-register-optimization Eliminate saves and restores of the link register in frequently-executed functions -lu <aggressiveness_factor>, --loop-unroll <aggressiveness_factor> Unroll short loops containing one to several basic blocks according to an aggressiveness factor between (1,9), where 1 is the least aggressive unrolling option for very hot and short loops -lun <unrolling_number>, --loop-unrolling-number <unrolling_number> Set the number of unrolled iterations in each unrolled loop. The allowed range is between (2,50). Default is set to 2. (Applicable only with the -lu flag) -lux <unrolling_factor>, --loop-unroll-extended <unrolling_factor> Unroll hot loops using given unrolling factor. The allowed values are integer numbers that are power of 2. Value -1 disables the optimization, value 1 calculates the unrolling factor automatically, given a machine model -mvc-opt, --mvc-optimization Replace an mvc instruction with lg/stg instructions -nillr15-opt, --nillr15-optimization Remove a nill r15,0xfffe instruction if followed by an stmg r14,r12,8(r13) instruction -sls, --store-load-on-stack-opt Optimize store load on stack pattern -fmrx, --fmr-to-xxlor Replace FMR instructions from reordered code with XXLOR instruction -xscpx, --xscpsgndp-to-xxlor Replace Xscpsgndp instructions from reordered code with XXLOR instruction -dir, --dependant-instr-resched Put NOP between dependant instructions -O Switch on basic optimizations only. Same as -RC -nop -bp -bf -O2 Switch on less aggressive optimization flags. Same as -O -hr -pto -isf 8 -tlo -kr -see 0 -O3 Switch on aggressive optimization flags. Same as -O2 -RD -isf 12 -si -lro -las -vro -btcar (for XCOFF files) -lu 9 -rt 0 -so -see 1 -oderat -O4 Switch on aggressive optimization flags together with aggressive function inlining. Same as -O3 -sidf 50 -ihf 20 -sdp 9 -shci 90 and -bldcg (for XCOFF files) -ocvp, --opt-call-value-profiling specialize function calls according to the values of their passed parameters -ocsp, --opt-call-site-profiling Cluster functions with simliar behaviour according to calling context -omullX, --mullX-optimization Optimize mullX instructions by adding a run-time check on RA and RB and performing equivalent operations with lower penalty. The optimization requires the use of -imullX in the instrumentation phase -oderat, --derat-optimization Optimize load/store indexed instructions by adding a run-time check on RA and RB and performing equivalent operations with lower penalty. The optimization requires the use of -iderat in the instrumentation phase -pbsi, --path-based-selective-inline Perform selective inlining of dominant hot function calls based on the control flow paths leading to hot functions -pc, --preserve-csects Preserve CSects' boundaries in reordered code -pca, --propagate-constant-area Relocate the constant variables area to the top of the code section when possible -pfb, --preserve-first-bb Preserve original location of the entry point basic block in program -pp, --preserve-functions Preserve functions' boundaries in reordered code -pr/-nopr, --ptrgl-r11/--noptrgl-r11 Perform/Do not perform removal of R11 load instruction in _ptrgl csect (the default is to perform the optimization) -pto, --ptrgl-optimization Perform optimization of indirect call instructions via registers by replacing them with conditional direct jumps -ptoht <heatness_threshold>, --ptrgl-optimization-heatness-threshold <heatness_threshold> Set the frequency threshold for indirect calls that are to be optimized by -pto optimization. Allowed range between 0 and 1. Default is set to 0.8. (Applicable only with -pto flag) -ptosl <limit_size>, --ptrgl-optimization-size-limit <limit_size> Set the limit of the number of conditional statements generated by -pto optimization. Allowed values are between 1 and 100. Default value is set to 3. (Applicable only with the -pto flag) -rcaf <aggressiveness_factor>, --reorder-code-aggressivenes-factor <aggressiveness_factor> Set the aggressiveness of code reordering optimization. Allowed values are [0 | 1 | 2], where 0 preserves then original code order and 2 is the most aggressive. Default is set to 1. (Applicable only with the -RC flag) -rccrf <reversal_factor>, --reorder-code-condition-reversal-factor <reversal_factor> Set the threshold fraction that determines when to enable condition reversal for each conditional branch during code reordering. Allowed input range is between 0.0 and 1.0 where 0.0 tries to preserve original condition direction and 1.0 ignores it. Default is set to 0.8 (Applicable only with the -RC flag) -rcctf <termination_factor>, --reorder-code-chain-termination-factor <termination_factor> Set the threshold fraction that determines when to terminate each chain of basic blocks during code reordering. Allowed input range is between 0.0 and 1.0 where 0.0 generates long chains and 1.0 creates single basic block chains. Default is set to 0.05. (Applicable only with the -RC flag) -RD, --reorder-data Perform static data reordering -ippcf, --instrument-for-path-profiling Perform cross function path profiling instrumentation -ppcf, --optimize-with-path-profiling Perform cross function path profiling optimization -rmte, --remove-multiple-toc-entries Remove multiple TOC entries pointing to the same location in the input program file -rt <removal_factor>, --reduce-toc <removal_factor> Perform removal of TOC entries according to a removal factor between (0,1), where 0 removes non-accessed TOC entries only and 1 removes all possible TOC entries -rtb, --remove-traceback-tables Remove traceback tables in reordered code -rcs, --remove-csect-symbols Remove csect symbols -sal-opt, --store-after-load-optimization Remove store after load when there is no change -scca <level>, --safe-calling-conventions-analysis <level> Determine how conservative must FDPR be when analysing a function that may break calling conventions -sdp <aggressiveness_factor>, --stride-data-prefetch <aggressiveness_factor> Perform data prefetching within frequently executed loops based on stride analysis, according to an aggressiveness factor between (1,9), where 1 is the least aggressive -sdpila <instructions_number>, --stride-data-prefetch-instruction-look-ahead <instructions_number> Set the number of instructions for which data is prefetched into the cache ahead of time. Default value is platform dependant. (Applicable only with the -sdp flag) -sdpms <stride_min_size>, --stride-data-prefetch-min-size <stride_min_size> Set the minimal stride size in bytes, for which data will be considered a candidate for prefetching. Default value is set to 128 bytes. (Applicable only with the -sdp flag) -ebp <evt_based_prefetch>, --event-based-prefetch <evt_based_prefetch> Perform data prefetching based on the events file -ebpla <instructions_number>, --event-based-prefetch-look-ahead <instructions_number> Set the number of instructions for which event based prefetch is performed. Default value is platform dependant. (Applicable only with the -ebp flag) -see <level> Use simplified prolog/epilog for functions that perform conditional early-exit. Use basic optimization with <level>=0 and maximal with <level>=1 -shci <pct>, --selective-hot-code-inline <pct> Perform selective inlining of functions in order to decrease the total number of execution counts, so that only functions with hotness above the given percentage are inlined -si, --selective-inline Perform selective inlining of dominant hot function calls -sidf <percentage_factor>, --selective-inline-dominant-factor <percentage_factor> Set a dominant factor percentage for selective inline optimization. The allowed range is between 0 and 100. Default is set to 80. (Applicable only with the -si and -pbsi flags) -siht <frequency_factor>, --selective-inline-hotness-threshold <frequency_factor> Set a hotness threshold factor percentage for selective inline optimization to inline all dominant function calls that have a frequency count greater than the given frequency percentage. Default is set to 100. (Applicable only with the -si -pbsi flags) -slbp, --spinlock-branch-prediction Perform branch prediction bit setting for conditional branches in spinlock code containing l*arx and st*cx instructions. (Applicable after -bp flag) -sldp, --spinlock-data-prefetch Perform data prefetching for memory access instructions preceding spinlock code containing l*arx and st*cx instructions -sll <Lib1:Prof1,...,LibN:ProfN>, --static-link-libraries <Lib1:Prof1,...,LibN:ProfN> Statically link hot code from specified dynamically linked libraries to the input program. The parameter consists of a comma-separated list of libraries and their profiles. IMPORTANT: Licensing rights of specified libraries should be observed when applying this copying optimization -sllht <hotness_threshold>, --static-link-libraries-hotness-threshold <hotness_threshold> Set hotness threshold for the --static-link-libraries optimization. The allowed input range is between 0 (least aggressive) and 1, or -1, which does not require a profile and selects all code that might be called by the input program from the given libraries. Default is set at 0.5 -so, --stack-optimization Reduce the stack frame size of functions that are called with a small number of arguments -spc, --shortcut-plt-calls Shortcut PLT calls in shared libraries to local functions if they exist. Note: Resolving to external symbols is disabled for such calls -stf, --stack-flattening Merge the stack frames of inlined functions with the frames of the calling functions -tb, --preserve-traceback-tables Force the restructuring of traceback tables in reordered code. If -tb option is omitted, traceback tables are automatically included only for C++ applications that use the Try & Catch mechanism -tlo, --tocload-optimization Replace each load instruction that references the TOC with a corresponding add-immediate instruction via the TOC anchor register, where possible -ucde, --unreachable-code-data-elimination Remove unreachable code and non-accessed static data -vro, --volatile-registers-optimization Eliminate stores and restores of non-volatile registers in frequently executed functions by using available volatile registers -vrox, --volatile-registers-extended-optimization Eliminate stores and restores of non-volatile registers in frequently executed functions by using available volatile registers, the extended version supports FP registers and transparency Output Options: -bcdf <file>, --binary-code-dump-file <file> Create a binary dump of the code (opcodes) with annotations of addresses. -ccgi <mode>, --code-coverage-generate-info <mode> Produce coverage information in a file based on profile information. Use <mode>=XML for an XML output and <mode>=FLAT for a formatted text file. The generated file is <output file>.cci[.xml] -cep, --complement-edge-profile Complements partial profile information given for the basic blocks' frequencies by adding missing basic block-to-basic block edge counts -d, --disassemble-text Print the disassembled text section of the output program into <output_file>.dis_text file -dap, --dump-ascii-profile Dump profile information in ASCII format into <program>.aprof (requires the -f flag). -db, --disassemble-bss Print the disassembled bss section of the output program into <output_file>.dis_bss file -dd, --disassemble-data Print the disassembled data section of the output program into <output_file>.dis_data file -diap, --dump-initial-ascii-profile Dump the given profile information in ASCII format into <program>.aprof.init (requires the -f flag) -dim, --dump-instruction-mix Dump instruction mix statistics based on gathered profile information -dm, --dump-mapper Print a map of basic blocks and static variables with their respective new -> old addresses into a <program>.mapper file -enc, --encapsulate Encapsulate SPE executables present in the PPE input (see --spe-directory) -o <output_file>, --output-file <output_file> Set the name of the output file. The default instrumented file is <program>.instr. The default optimized file is <program>.fdpr -scl, --show-constant-load Adds annotaions in fdpr disassembly on load instructions used to bring constant values into registers (requires -d flag) -pds, --preserve-debug-symbols Preserve debug symbols -plc, --preserve-linkage-conventions Preserve linkage conventions -ppcf, --print-prof-counts-file Print a text format of the profiling counters into a <program>.counts file (requires the -f flag). -sf, --strip-file Strip the output file -simo, --single-input-multiple-outputs Optimize in parallel into multiple outputs as specified by option sets read from stdin -spedir <directory>, --spe-directory <directory> Set the directory into which SPE executables will be extracted and from which they will be encapsulated General Options: -cell, --cell-supervisor Integrated PPE/SPE processing. Perform SPE extraction, processing, and encapsulation automatically prior to PPE processing -h, --help Print the online help -j <jour_file>, --journal <jour_file> Output optimization journal information to <jour_file> -smt, --smt_mode set SMT mode (1:ST, 2: (SMT2-shared, SMT2-split), 4:SMT4, 8:SMT8) -m <machine-model>, --machine <machine-model> Generate code for the specified machine model. Target machine can be one of the following models: power2, power3, ppc405, ppc440, power4, ppc970, power5, power6, power7, power8, ppe, spe, spe_edp, z10, z9. Default is power7 -q, --quiet Set the output mode to quiet, suppressing informational messages -st <stat_file>, --statistics <stat_file> Output statistics information to <stat_file>. If <stat_file> is '-', the output goes to the standard output. See --verbose for the default -v <level>, --verbose <level> Set verbose output mode level. When set, various statistics about the output program are printed into the file <program>.stat. Allowed level range is between 0 and 3. Default is set to 0 -V, --version Print the version number -w <level>, --warning-level <level> Set the warning level so only errors of this level and below will be printed. The levels are: 1: errors, 2: warnings, 3: debug warning, 4: debug information. Default is 2 -armember For archive files - list of archive members to be optimized, if -armember is not specified, all members will be optimized
XLFRTEOPTS=intrinthrds=1 : Causes the Fortran runtime to only use a single thread.
echo 200 > /proc/sys/vm/nr_hugepagesor
echo 200 > /proc/sys/vm/nr_overcommit_hugepagesto allocate from the dynamic hugepage pool. This can be done in both Red Hat and Suse Enterprise, since both distributions implements Transparent Huge Pages (THP). THP is an abstraction layer that automates most aspects of creating, managing and using huge pages, and can currently only map anonymous memory regions such as heap and stack space.
You can also use the environment variables below to manage huge pages behavior: HUGETLB_VERBOSE=0 : Turn off any debugging message from libhugetlbfs HUGETLB_MORECORE=yes: Instructs libhugetlbfs to override libc's normal morecore() function with a hugepage version and use it for malloc(). HUGETLB_MORECORE_HEAPBASE=0x50000000: Specifies that the hugepage heap address to start at 0x50000000. HUGETLB_ELFMAP=R ; Instructs libhugetlbfs to place text segment in hugepages. HUGETLB_ELFMAP=W ; Instructs libhugetlbfs to place data and BSS segments in hugepages. HUGETLB_ELFMAP=RW ; Instructs libhugetlbfs to place all segments in hugepages. HUGETLB_ELFMAP=no ; Instructs libhugetlbfs not to place any segment in hugepages.
Enables idle mode power reduction methods, when in Nominal, SPS, DPS. Drop in frequency, voltage. Enable core-folding to exploit P7+ deep-sleep mode. Goal is to meet idle power criteria for EnergyStar v1 and v2 tests. 1. Enter “idle” 5-15 minutes after full load test. 2. Measure system power for 5 minutes. 3. Report system power as average measured in step 2. 4. If the value reported in step 3 is below a “computed threshold” the system qualifies for EnergyStar label. To Enable or disable the Idle Power Saving log into Advanced System Management web interface and go to System Configurator -> Power Management and click into Idle Power Saver. Select enabled or disabled in the combo box and click in Save settings.
Flag description origin markings:
For questions about the meanings of these flags, please contact the tester.
For other inquiries, please contact webmaster@spec.org
Copyright 2006-2017 Standard Performance Evaluation Corporation
Tested with SPEC CPU2006 v1.2.
Report generated on Wed Dec 20 18:27:43 2017 by SPEC CPU2006 flags formatter v6906.