IBM Linux Flag Disclosure SPEC OMP2001 For use with Linux submissions with the IBM XL compilers. Last Revised 25 June, 2007 Notes ===== The IBM C/C++ & Fortran compilers produce 32-bit binaries by default. Flags are described below which cause the compilers to produce 64-bit binaries. Source Level Portability Options ================================ Compiler Invocation =================== xlc Invokes the compiler for C source files with a default language level of ansi and specifies that it allow type-based aliasing. xlc_r The same as "xlc" except that it generates a threadsafe executable, compliant with the POSIX pthreads API. xlf90 Invokes the compiler for Fortran source files with a default language of Fortran 90. xlf90_r The same as "xlf90" except that it generates a threadsafe executable, compliant with the POSIX pthreads API. cleanpdf Erase the information in the PDF directory if any exists to ensure no feedback information is reused between compilations. Compiler Options ================ -O Performs optimizations that the compiler developers considered the best combination for compilation speed and runtime performance. -O3 Perform some memory and compile time intensive optimizations in addition to those executed with -O. The -O3 specific optimizations have the potential to slightly alter the semantics of a user's program. Optimizations may include, but are not limited to: Aggressive code motion, and scheduling on computations that have the potential to raise an exception, but no valid exceptions will be suppressed; Relaxed conformance to IEEE rules in cases where the difference in the results is not important to an application; Rewriting of floating point expressions. -O4 Equivalent to -O3 -qipa -qhot with automatic generation of architecture ( -qarch= )and tuning ( -qtune= )options ideal for that platform. The qipa level defaults to level=1. -O5 Equivalent to -O3 -qipa=level=2 -qhot with automatic generation of architecture ( -qarch= ) and tuning ( -qtune= ) options ideal for that platform. -Q, -qinline The -Q option without any list inlines all appropriate procedures, subject to limits on the number of inlined calls and the amount of code size increase as a result. -qinline is an alias for -Q. -q64 Selects 64-bit compiler mode. -qalign=struct=natural The compiler maps structure members to their -qalign=natural natural boundaries. The first form is used by the Fortran compiler; the second form is used by the C compiler and is a deprecated form for the Fortran compiler. -qarch=pwr6 Produces object code containing instructions that will run on power6 processors. -qarch=pwr6e Produces object code containing instructions that will run on power6 processors executing in "Enhanced" mode which includes instructions that are part of the optional instructions in the PowerPC standard. -qarch=auto Produces object code containing instructions that will run on the hardware platform on which the program is compiled. -qessl Specifies that, if either -lessl or -lesslsmp are also specified, then Engineering and Scientific Subroutine Library (ESSL) routines should be used in place of some Fortran 90 intrinsic procedures when there is a safe opportunity to do so. -qfixed Indicates that the input source program is in fixed form. Allows fixed format Fortran 77 programs to be compiled using the xlf90 compiler invocation. -qfixed= States that Fortran code is in fixed source form, with optional argument specifying the maximum line length. -qhot Perform high-order transformations on loops during optimization. -qhot=arraypad Pad the sizes of arrays to align better in cache. -qipa=level=1 Turns on interprocedural analysis with inlining, limited alias analysis, and limited call-site tailoring. This is the default level of -qipa. -qipa=level=2 Turns on interprocedural analysis with inlining, cloning, full alias analysis, constant propagation, call-site tailoring, and dead code removal. -qipa=noobject Do not generate object files during the first stage of inter- procedural analysis. -qinline Alias for -Q. See -Q. -qipa=partition=large Specifies the size of the regions within the program to analyze. Larger partitions contain more procedures, which result in better interprocedural analysis but require more storage to optimize. -qmaxmem=-1 Allows the compiler to use as much memory as it needs to execute. -qpdf1 The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. -qpdf2 The option used in the second pass of a profile directed feedback compile that causes pdf information to be utilized during optimization. -qsmp=omp Enable OpenMP parallelization directives. -qsuffix=f=f90 Sets the suffix for source files to be .f90. The .f90 suffix is required by xlf90 to compile Fortran 90 programs. -qsuppress=cmpmsg Suppress the output of the specified message(s). cmpmsg is the message put out at the compilation completion of each Fortran routine. -qtune=pwr6 Specifies the architecture system for which the executable program is optimized. This includes instruction scheduling and cache setting. -qunroll=n Unrolls inner loops in th program by a factor of n. -w Suppress warning messages from the C, C++, and Fortran compilers. Linker Options ============== -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT Enables the usage of the libhugetlbfs "ld" script in place of normal linker. BDT will link the application to store text, initialized data, and bss data into hugepages. -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=B Enables the usage of the libhugetlbfs "ld" script in place of normal linker. B will link the application to bss data into hugepages. -lessl Link the Engineering and Scientifc Subroutine Library (ESSL). -lmass Link the mathematical acceleration subsystem libraries (MASS), which contain libraries of tuned mathematical intrinsic functions. Linux Environment Variables: ========================== HUGETLB_MORECORE=yes Enables the libhugetlbfs functionality hugepage malloc() feature, instructing libhugetlbfs to override libc's normal morecore() function with a hugepage version and use it for malloc(). From sourceforge libhugetlbfs Version 1 product. LD_PRELOAD=libhugetlbfs.so This tells the dynamic linker to load the libhugetlbfs shared library, even though the program wasn't originally linked against it. Enables HUGETLB_MORECORE processing. OMP_DYNAMIC=FALSE Disables dynamic adjustment of the number of available threads. OMP_NUM_THREADS=... The exact number of threads available to be used, or if OMP_DYNAMIC is TRUE, the upper limit on the number of available threads. XLFRTEOPTS=intrinthds={num_threads} Specifies the number of threads for parallel execution for parallel execution of the MATMUL and RANDOM_NUMBER intrinsic procedures. The default value for num_threads when using the MATMUL intrinsic equals the number of processors online. The default value for num_threads when using the RANDOM_NUMBER intrinsic is equal to the number of processors online*2. Changing the number of threads available to the MATMUL and RANDOM_NUMBER intrinsic procedures can influence performance. XLSMPOPTS A list of runtime settings affecting SMP execution. Here are some of the possibilities: SCHEDULE=STATIC Work is scheduled to threads round-robin. SPINS=0 Allows work-requests to spin indefinitely without the thread having to yield the time-slice. STACK=.... Specifies the largest allowable size of a thread's stack, in bytes. YIELDS=0 Allows the thread to yield an indefinite number of times without being driven into a sleep state. STARTPROC=n When assigning threads to processor's, begin with thread n. STRIDE=X When assigning the next thread to a processor, add X to the current processor index instead of using (processor+1). Stack Size Information: ======================= Stack size set to unlimited using the command "ulimit -s unlimited".