IBM Open XL Compiler Flags, Common Operating System Commands and Environment Settings

IBM_Open_XL_AIX_flags-RevB IBM Open XL Compiler Flags, Common Operating System Commands and Environment Settings

Compilers: IBM Open XL C/C++ for AIX 17.1.0

Compilers: IBM Open XL Fortran for AIX 17.1.0

Libraries: tcmalloc version 2.7.1 + IBM AIX enablement patch Available for download at : https://github.com/gilamn5tr/gperftools/archive/refs/heads/aix-enablement-upstream.zip

Tcmalloc library has been built with IBM XL 16.1.0 with the compiler flags -O3 -qmaxmem=-1, where -qmaxmem=-1 permits each optimization to take as much memory as it needs without checking for limits

Operating systems: AIX V7.2 Technology Level 5, Service Pack 3 running on Power10 in Power9 compatibility mode

]]>

Pass the optimization flag that is an argument to -bplugin_opt to the linker. Searches the directory path for library files specified by the "-l" option at link time. Enable use of scalar MASS library while using link time optimization(LTO). Place uninitialized global variables in a common block. The -bbigtoc option instructs the linker to generate extra code if the size of TOC is greater than 64K. For C/C++ programs it is invoked as -Wl,-bplugin_opt:-bbigtoc and for Fortran programs it is invoked as -bbigtoc. <suboption> must be one of the following suboptions:

true : Enable simplification of compare instructions which rely on 'inbounds' semantics.
false : Disable simplification of compare instructions which rely on 'inbounds' semantics.

Default: -fold-complex-pointer-compare=true

]]> <suboption> must be one of the following suboptions:

true : Enable simplification of compare instructions which rely on 'inbounds' semantics.
false : Disable simplification of compare instructions which rely on 'inbounds' semantics.

Default: -fold-complex-pointer-compare=true

]]> Instructs the linker to search for the specified library file in the path specified by the -L option. For static and dynamic linking, the linker searches for libxxx.a. The -blpdata is a linker option and it sets the bit in the file's XCOFF header indicating that this executable will request the use of large pages when they are available on the system and when the user has an appropriate privilege. For C/C++ programs it is invoked as -Wl,-blpdata and for Fortran programs it is invoked as -blpdata. Link the mathematical acceleration subsystem libraries (MASS), which consist of a set of mathematical functions for C,C++ and Fortran language applications that are tuned for specific Power processor architectures. Link the standard C library of basic mathematical functions such as sin(x), cos(x), exp(x) etc. -lc++ Link the standard C++ library that supports C++11 features. Link the standard Posix threading library. Enables use of certain functions from the MASS library for scalars and vectors. -O3 Instruct the compiler to execute optimizations that might take longer to perform or that may generate larger code (in an attempt to make the program run faster). ]]> Generates 64-bit ABI binaries. Used as a compiler option for C/C++ programs. If -m64 is not specified, the default is to generate 32-bit ABI binaries on AIX. -m32 Generates 32-bit ABI binaries. Used as a compiler option for C/C++ programs. -q32 Generates 32-bit ABI binaries. Used as a compiler option for Fortran programs. -q64 Generates 64-bit ABI binaries. Used as a compiler option for Fortran programs. If -q64 is not specified, the default is to generate 32-bit ABI binaries on AIX. Supported values for this flag are :

native - Automatically detects the specific architecture of the compiling machine. It assumes that the execution environment will be the same as the compilation environment.
pwr7 - Produces object code containing instructions that run on the Power7 hardware platform and on newer architectures.
pwr8 - Produces object code containing instructions that run on the Power8 hardware platform and on newer architectures.
pwr9 - Produces object code containing instructions that run on the Power9 hardware platform and on newer architectures.

]]> Supported values for this flag are :

auto - Automatically detects the specific architecture of the compiling machine. It assumes that the execution environment will be the same as the compilation environment.
pwr7 - Produces object code containing instructions that run on the Power7 hardware platform and on newer architectures.
pwr8 - Produces object code containing instructions that run on the Power8 hardware platform and on newer architectures.
pwr9 - Produces object code containing instructions that run on the Power9 hardware platform and on newer architectures.

]]> This option is used in the first pass of a profile guided optimization(PGO) compile and causes PGO information to be generated. The profile guided optimization gathers data on both execution path and data values. It does not use hardware counters, nor gather any data other than path and data values for PGO specific optimizations. Instructs the compiler to generate profiling instrumentation in the generated binary. This compiler option is used for C/C++ programs. This option is used in the second pass of a profile guided optimization(PGO) compile. This flag instructs the compiler to use the profiling information in the file specified by the argument to -fprofile-use. This file summarizes profile information that was emitted into default*profraw files by the -fprofile-generate option. The profiling information is utilized to guide the compiler optimizer in making better optimization decisions. This compiler option is used for C/C++ programs. This option is used in the first pass of a profile guided optimization(PGO) compile and causes PGO information to be generated. The profile guided optimization gathers data on both execution path and data values. It does not use hardware counters, nor gather any data other than path and data values for PGO specific optimizations. Instructs the compiler to generate profiling instrumentation in the generated binary. This compiler option is used for Fortran programs. This option is used in the second pass of a profile guided optimization(PGO) compile. This flag instructs the compiler to use the profiling information in the file specified by the argument to -qprofile-use. This file summarizes profile information that was emitted into default*profraw files by the -fprofile-generate option. The profiling information is utilized to guide the compiler optimizer in making better optimization decisions. This compiler option is used for Fortran programs. Passes the option name provided following -mllvm through the compiler frontend to the optimizer. This flag is to enable automatic use of the MASS scalar library and call scalar functions in your program. This flag is to enable automatic use of the MASS SIMD library in your program. Causes the system loader to put the heap in its own segment of the size specified. This is only required for 32-bit applications, as their segments are 256M. A 32 bit application can be made to use a large or very large memory model. A large memory model allows up to 2GB to be used for data and a very large memory model allows up to 3.25GB to be used for data. Values above 2GB must use the dsa, or dynamic segment allocation parameter as in -bmaxdata:0xD0000000/dsa. Pass on the DSCR (data stream control register) value to the linker. This value controls the aggressiveness of the hardware data prefetch engine. Tries to convert __dynamic_cast function call into an address comparision when possible. The default value is off. This macro indicates that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. This macro indicates that the benchmark is being built on a PowerPC-based AIX system. This macro is a SPEC provided portability flag. ibm-clang Invoke the IBM Open XL C compiler.

ibm-clang
ibm-clang_r

ibm-clang_r is the thread-safe version of ibm-clang compiler.

]]> ibm-clang++_r Invoke the IBM Open XL C++ compiler.

ibm-clang++
ibm-clang++_r

ibm-clang++_r is the thread-safe version of ibm-clang++ compiler.

]]> xlf95_r

xlf95

xlf95_r

The xlf95_r invocation is thread-safe version of xlf95 compiler.

]]> -fgnu89-inline Instructs the compiler to use the GNU semantics for "inline" functions. -mabi=vec-extabi Enable the extended Altivec ABI on AIX, which uses volatile and nonvolatile vector registers. -flto Generate output files in LLVM format, suitable for link time optimization. When used with -S this generates LLVM intermediate language assembly files, otherwise this generates LLVM bitcode format object files. The LLVM bitcode generated is suitable for Link Time Optimization (LTO), when all modules are merged into a single combined module for optimization. -qvecnvol Instructs the compiler to use both volatile and nonvolatile vector registers. Volatile vector registers are those whose value is not preserved across function calls or across save context, jump or switch context system library functions. -qlto Generate output files in LLVM format, suitable for link time optimization. When used with -S this generates LLVM intermediate language assembly files, otherwise this generates LLVM bitcode format object files. The LLVM bitcode generated is suitable for Link Time Optimization (LTO), when all modules are merged into a single combined module for optimization. -fno-strict-aliasing Instructs the compiler that it must not assume that the aliasing requirements from the language standard specification are honored by the program. -data-layout-opt=3 This option instructs the linker to analyze the whole program to determine if the layout of data can be transformed to improve the cache utilization and memory bandwidth of the program. This option has four levels [0, 1, 2, 3]. This is effective only under -flto and must be specified on both the compilation and linking steps with same option level.

data-layout-opt=0 disables the data layout transformation. No compiler analysis is performed for data layout optimization.
data-layout-opt=1 enables the data layout transformation with whole program analysis. This is performed with a strict safety analysis.
data-layout-opt=2 enables the data layout transformation with whole program more aggressive analysis. This is performed with a strict safety analysis.
data-layout-opt=3 enables the data layout transformation and data compression with whole program analysis. The compiler may implicitly change the default data type size to a non-default data type size.

]]> -data-layout-opt=3 This option instructs the compiler to analyze the whole program to determine if the layout of data can be transformed to improve the cache utilization and m emory bandwidth of the program. This option has four levels [0, 1, 2, 3]. This is effective only under -flto and must be specified on both the compilation an d linking steps with same option level.

data-layout-opt=0 disables the data layout transformation. No compiler analysis is performed for data layout optimization.
data-layout-opt=1 enables the data layout transformation with whole program analysis. This is performed with a strict safety analysis.
data-layout-opt=2 enables the data layout transformation with whole program more aggressive analysis. This is performed with a strict safety ana lysis.
data-layout-opt=3 enables the data layout transformation and data compression with whole program analysis. The compiler may implicitly change th e default data type size to a non-default data type size.

]]> -inline-hot-callsites-aggressively Enables aggressive heuristics to decide which frequently executed functions should be inlined. -inline-hot-callsites-aggressively Enables aggressive heuristics to decide which frequently executed functions should be inlined. -enable-partial-inlining Instructs the compiler to enable the partial inlining optimization. -enable-partial-inlining Instructs the linker to enable the partial inlining optimization. -enable-vec-find Allow the compiler to vectorize simple search loops if they operate on contiguous arrays. -frestrict-args Allows the compiler to assume that function parameters of pointer type are restrict qualified. -enable-aggressive-vectorization Instructs the compiler to enable aggressive heuristics for loop vectorization. -enable-aggressive-vectorization Instructs the linker to enable aggressive heuristics for loop vectorization. -ppc-enable-redxnintr Allow the compiler to generate PPC hardware reduction instructions for reductions without proving signed overflow safety. -aggressive-late-full-unroll Instructs the compiler optimizer to use more aggressive heuristics for the loop unroll optimization when considering whether it is beneficial to fully unroll a loop. -enable-nontrivial-unswitch=false Instructs the compiler optimizer to disable the loop unswitching transformation for complex loops. -array-compress Instructs the compiler optimizer to compress a local array of integers into a smaller integer type when it is legal to do so (i.e. when all possible values stored by the program into the array elements are representable by a smaller integer type). -enable-lvi-memoryssa Instructs the compiler optimizer to use an advanced value tracking analysis that leverages LLVM's "memory ssa" representation to track values stored into memory locations.