IBM XL Compiler Flags and Common Unix Commands and Environment Settings

Compilers: IBM XL C/C++ Enterprise Edition Version 8.0 for AIX

Compilers: IBM XL Fortran Enterprise Edition Version 10.1 for AIX

Compilers: IBM XL C/C++ Enterprise Edition Version 9.0 for AIX

Compilers: IBM XL Fortran Enterprise Edition Version 11.1 for AIX

Compilers: IBM XL C/C++ Version 10.1 for AIX

Compilers: IBM XL Fortran Version 12.1 for AIX

Compilers: IBM XL C/C++ Version 11.1 for AIX

Compilers: IBM XL Fortran Version 13.1 for AIX

Last updated: 11-May-2010

Sections

Selecting one of the following will take you directly to that section:


Optimization Flags


Portability Flags


Compiler Flags


Other Flags


System and Other Tuning Information

       The fdpr command (Feedback Directed Program Restructuring) is a performance-tuning utility that may help
       improve the execution time and the real memory utilization of user-level application programs. The fdpr program
       optimizes the executable image of a program by collecting information on the behavior of the program while the
       program is used for some typical workload, and then creating a new version of the program that is optimized for
       that workload. The new program generated by fdpr typically runs faster and uses less real memory.

Usage:
fdpr [options] -p program [-x invocation]
where -p specifies the input program, in a form of executable, shared object
or archive file
-x specifies how to invoke the program

/usr/lib/perf/fdprpro -a opt -p program -f profile [options]
  is equivalent to "fdpr -3 [options] -p program"
[options] can be one or more of the following:

  Action Options:

  -123  Specifies which actions/phases to run, where:
        -1  generates instrumented program for profile gathering
        -2  runs the instrumented program and updates profile data (requires -x <invocation>)
        -3  generates optimized program
        Default is set to run all three phases (-123)

  -a/--action [action]  Specifies customized actions
  where [action] can be one of the following:
        anl          analyze program
        instr        generate instrumented program for profile gathering (same as -1)
        opt          generate optimized program (same as -3)
        check_sign   check fdpr signature in the input program


 Analysis Options:
  -aawc/-noaawc, --analyze-assembly-written-csects/--noanalyze-assembly-written-csects
                         Analyze/Do not analyze objects written in Assembly. The
                         default is set to analyze modules written in Assembly
  -acf <analysis configuration file>, --analysis-configuration-file <analysis configuration file>
                         Provide a configuration file of analysis information
                         (advanced option)
  -asd, --analyze-static-data
                         Analyze static data objects as distinct data elements
                         for data reordering (unsafe for certain compilers)
  -esa, --extra-safe-analysis
                         Limit analysis phase to compiler generated code
  -fca, --funcsect-analysis
                         Apply special analysis for an input executable that was
                         compiled with the -qfuncsect compiler option
  -ff <string>, --file-format <string>
                         Input file format: can be LM (load module) or PO
                         (program object)
  -ifl <file>, --ignored-function-list <file>
                         Set the ignored function list. The file contains names
                         of functions that should not be instrumentated or
                         optimized
  -iinf, --ignore-info   Ignore .info sections produced with the -qfdpr option
                         during compile time


 Instrumentation Options:

  -anl, --analyze-program
                         Analyze the program but does not create any modified
                         binary. This option is used to provide dump of
                         profile/code coverage information. When used with the
                         -d option it will dump the disassembly of the original
                         program
  -ccf <coverage_file>, --code-coverage-file <coverage_file>
                         Use file mapped to shared memory to collect coverage
                         information at run-time
  -ccgi <mode>, --code-coverage-generate-info <mode>
                         Produce coverage information to given file based on
                         profile information. Use <mode>=XML for XML output and
                         <mode>=FLAT for flat formatted text file. The generated
                         file is <output file>.cci[.xml]
  -cci, --code-coverage-instrumentation
                         Instrument program in order to obtain code coverage
                         information. program must be compiled with line number
                         debug info
  -ccl <level>, --code-coverage-level <level>
                         Perform Code Coverage at Basic Block level (BB) or at
                         functions level (Func). default is BB
  -ccm <coverage_map>, --code-coverage-map <coverage_map>
                         Defines the map file name of coverage instrumentation.
                         Default is <output file>.cc
  -ei, --embedded-instrumentation
                         Perform embedded instrumentation. The profile will be
                         collected into global variables
  -fd <Fdesc>, --file-descriptor <Fdesc>
                         Set the file descriptor number to be used when opening
                         the profile file. The default of <Fdesc> is set to the
                         maximum-allowed number of open files
  -imullX, --mullX-instrumentation
                         perform value profiling of RA and RB operands in mullX
                         instructions
  -infp, --ignore-not-found-procedures
                         Ignore not found procedures
  -ipcr/-noipcr, --instrumentation-preserve-condition-register/--noinstrumentation-preserve-condition-register
                         Preserve/Do not preserve Condition Register while
                         calling stubs
  -ipctr/-noipctr, --instrumentation-preserve-count-register/--noinstrumentation-preserve-count-register
                         Preserve/Do not preserve Count Register while calling
                         stubs
  -ipe/-noipe, --instrumentation-preserve-environment/--noinstrumentation-preserve-environment
                         Do not preserve registers that are not overwritten while
                         calling stubs. -noipe implies -noipvr -noipspr
  -iplr/-noiplr, --instrumentation-preserve-link-register/--noinstrumentation-preserve-link-register
                         Preserve/Do not preserve link register while calling
                         stubs
  -ipnvr, --instrumentation-preserve-non-volatile-registers
                         Preserve non volatile registers while calling stubs
  -ipspr/-noipspr, --instrumentation-preserve-special-registers/--noinstrumentation-preserve-special-registers
                         Preserve/Do not preserve special purpose registers while
                         calling stubs
  -ipvr/-noipvr, --instrumentation-preserve-volatile-registers/--noinstrumentation-preserve-volatile-registers
                         Preserve/Do not preserve volatile registers while
                         calling stubs. -noipvr implies -noipnvr and -nosfp
  -ipxer/-noipxer, --instrumentation-preserve-fixed-point-exception-register/--noinstrumentation-preserve-fixed-point-exception-register
                         Preserve/Do not preserve Fixed-Point Exception Register
                         while calling stubs
  -issu, --instrumentation-safe-stack-usage
                         Ensure additional stack space is properly allocated for
                         the instrumented run. Use this option if your
                         application uses stack extensively (e.g., when the
                         program uses alloca()). Note that this option adds
                         extra overhead on instrumentation code
  -iso <offset>, --instrumentation-stack-offset <offset>
                         Set the offset from the stack, a negative number, where
                         the instrumentation's area for saving registers is kept
                         at runtime. Use with care
  -M <addr>, --profile-map <addr>
                         Set shared memory segment address for profiling.
                         Alternative shared memory addresses are needed when the
                         instrumented program application creates a conflict
                         with the shared-memory addresses preserved for the
                         profiling. Typical alternative values are 0x40000000,
                         0x50000000, ... up to 0xC0000000. The default is set to
                         0x3000000
  -pi, --profile-instrumentation
                         Instrument program in order to obtain execution count
                         profile
  -ri/-nori, --register-instrumentation/--noregister-instrumentation
                         Instrument/Do not instrument the input program file to
                         collect profile information about indirect branches via
                         registers. The default is set to collect the profile
                         information
  -sfp/-nosfp, --save-floating-point-registers/--nosave-floating-point-registers
                         Save/Do not save floating point registers in
                         instrumented code. The default is set to save floating
                         point registers
  -spescr <0-127>, --spe-scratch-register <0-127>
                         Specify a global SPE scratch register, decreasing
                         instrumenation overhead, in order to minimize
                         possibility of local store overflow
  -ui, --user-instrumentation
                         Instrument program by insert calls to user supplied
                         functions compiled into shared library


 Profile Files Options:

  -af <prof_file>, --ascii-profile-file <prof_file>
                         Set the name of an ASCII profile file containing profile
                         information. There are three different XML entry
                         options: <Simple .. >, <Cond .. > and <Reg .. > for
                         profiling data on regular, conditional or branch via
                         register instructions, respectively
  -aop, --accept-old-profile
                         Accept the old profile file collected on previous
                         versions of the input program file (requires the -f
                         flag)
  -f <prof_file>, --profile-file <prof_file>
                         Set the profile file name. The profile file is created
                         during the instrumentation phase and read during the
                         optimization phase. The profile file is updated each
                         time you run the instrumented program


 Optimization Options:

  -A <alignment>, --align-code <alignment>
                         Align program so that hot code will be aligned on
                         <alignment>-byte addresses
  -abb <factor>, --align-basic-blocks <factor>
                         Align basic blocks that are hotter than the average by a
                         given (float) <factor>. This is a lower-level
                         machine-specific alignment compared to --align-code.
                         Value of -1 (the default) disables this option
  -bh <factor>, --branch-hint <factor>
                         add branch hints to basic blocks that are hotter then
                         the average by given (float) <factor>. This is a SPE
                         specific optimization. Value of -1 (the default)
                         disables this option
  -bldcg, --build-dcg    Build a Data Connectivity Graph (DCG) for enhanced data
                         reordering (applicable only with the -RD flag)
  -btcar, --branch-table-csect-anchor-removal
                         Eliminate load instructions used when accessing branch
                         tables
  -cbtd, --convert-bss-to-data
                         Convert BSS section into a data section. This is useful
                         for more aggressive tocload and RD optimizations
  -cib-opt, --convert-indirect-branches-optimization
                         Convert indirect branch to direct branch
  -cRD, --conservativeRD
                         Perform conservative static data reordering by packing
                         together all frequently referenced static variables
  -dce, --dead-code-elimination
                         Eliminate instructions related to unused local variables
                         within frequently executed functions. This is useful
                         mainly after applying function inlining optimization
  -dp, --data-prefetch   Insert data-cache prefetch instructions to improve
                         data-cache performance
  -dpht <threshold>, --data-placement-hotness-threshold <threshold>
                         Set data placement algorithm hotness threshold between
                         (0,1), where 0 reorders the static variables in large
                         groups based on the control flow, and 1 reorders the
                         variables in very small groups based on their access
                         frequency. (This is applicable only with the -RD flag)
  -dpnf <factor>, --data-placement-normalization-factor <factor>
                         Set data placement algorithm normalization factor
                         between (0,1), where 0 causes static variables to be
                         reordered regardless of their size, and 1 locates only
                         small sized variables first. (applicable only with the
                         -RD flag)
  -ece, --epilog-code-eliminate
                         Reduce code size by grouping common instructions in
                         function epilogs, into a single unified code
  -fc, --function-cloning
                         Enable function cloning phase only during function
                         inlining optimizations (applicable only with function
                         inlining flags: -i, -si, -ihf, -isf, -shci)
  -hr, --hco-reschedule  Relocate instructions from frequently executed code to
                         rarely executed code areas, when possible
  -hrf <factor>, --hco-resched-factor <factor>
                         Set the aggressiveness of the -hr optimization option
                         according to a factor value between (0,1), where 0 is
                         the least aggressive factor (applicable only with the
                         -hr option)
  -i, --inline           Same as --selective-inline with --inline-small-funcs 12
  -icm-opt, --icm-optimization
                         Replace a sequence of l/ltr or ly/ltr instructions with
                         and icm or icmy instruction respectively
  -ihf <pct>, --inline-hot-functions <pct>
                         Inline all function call sites to functions that have a
                         frequency count greater than the given <pct> frequency
                         percentage
  -iplte, --inline-plt-entries
                         Replaces the call to a PLT entry with the PLT entry code
                         itself, by inlining the first part of the entry
  -isf <size>, --inline-small-funcs <size>
                         Inline all functions that are smaller than or equal to
                         the given <size> in bytes
  -kr, --killed-registers
                         Eliminate stores and restores of registers that are
                         killed (overwritten) after frequently executed function
                         calls
  -lal-opt, --load-after-load-optimization
                         Replace two load instruction from the same memory
                         location to one load instruction and one placement
                         instruction
  -lap, --load-address-propagation
                         Eliminate load instructions of variable addresses by
                         re-using pre-loaded addresses of adjacent variables
  -larl-opt, --larl-optimization
                         Replace a sequence of bras/const area/llgt instructions
                         with a single lalr instruction
  -las, --load-after-store
                         Add NOP instructions to place each load instruction
                         further apart following a store instruction that
                         references the same memory address
  -ldce, --local-dead-code-optimization
                         Local dead code elimination (basic block scope only) -
                         needless when using -dce
  -ldp-opt, --long-displacement-optimization
                         Replace an instruction which has long displacement with
                         the matching insturction which has short displacement,
                         according to the displacement operand (e.g. ay-->a,
                         oy-->o, xy-->x, etc.)
  -lgfr-opt, --lgfr-optimization
                         Replace when can a 32 bit instruction with its matching
                         64 bit instruction
  -llgh-opt, --llgh-optimization
                         Replace a sequence of lh/nilh/llgfr instructions with a
                         single llgh instruction
  -lro, --link-register-optimization
                         Eliminate saves and restores of the link register in
                         frequently-executed functions
  -lu <aggressiveness_factor>, --loop-unroll <aggressiveness_factor>
                         Unroll short loops containing one to several basic
                         blocks according to an aggressiveness factor between
                         (1,9), where 1 is the least aggressive unrolling option
                         for very hot and short loops
  -lun <unrolling_number>, --loop-unrolling-number <unrolling_number>
                         Set the number of unrolled iterations in each unrolled
                         loop. The allowed range is between (2,50). Default is
                         set to 2. (Applicable only with the -lu flag)
  -mvc-opt, --mvc-optimization
                         Replace an mvc instruction with lg/stg instructions
  -nillr15-opt, --nillr15-optimization
                         Remove a nill r15,0xfffe instruction if followed by an
                         stmg r14,r12,8(r13) instruction
  -O                     Switch on basic optimizations only. Same as -RC -nop -bp
                         -bf
  -O2                    Switch on less aggressive optimization flags. Same as -O
                         -hr -pto -isf 8 -tlo -kr
  -O3                    Switch on aggressive optimization flags. Same as -O2 -RD
                         -isf 12 -si -dp -lro -las -vro -btcar -lu 9 -rt 0 -so
  -O4                    Switch on aggressive optimization flags together with
                         aggressive function inlining. Same as -O3 -sidf 50 -ihf
                         20 -sdp 9 -shci 90 and -bldcg (for XCOFF files)
  -O5                    Switch on aggressive optimization flags together with
                         HLR optimization. Same as -O4 -sa -gcpyp -gcnstp -dce
                         -vrox
  -omullX, --mullX-optimization
                         Optimize mullX instructions by adding a run-time check
                         on RA and RB and performing equivalent operations with
                         lower penalty. The optimization requires the use of
                         -imullX in the instrumentation phase
  -pbsi, --path-based-selective-inline
                         Perform selective inlining of dominant hot function
                         calls based on the control flow paths leading to hot
                         functions
  -pc, --preserve-csects
                         Preserve CSects' boundaries in reordered code
  -pca, --propagate-constant-area
                         Relocate the constant variables area to the top of the
                         code section when possible
  -pfb, --preserve-first-bb
                         Preserve original location of the entry point basic
                         block in program
  -pp, --preserve-functions
                         Preserve functions' boundaries in reordered code
  -pr/-nopr, --ptrgl-r11/--noptrgl-r11
                         Perform/Do not perform removal of R11 load instruction
                         in _ptrgl csect (the default is to perform the
                         optimization)
  -pto, --ptrgl-optimization
                         Perform optimization of indirect call instructions via
                         registers by replacing them with conditional direct
                         jumps
  -ptoht <heatness_threshold>, --ptrgl-optimization-heatness-threshold <heatness_threshold>
                         Set the frequency threshold for indirect calls that are
                         to be optimized by -pto optimization. Allowed range
                         between 0 and 1. Default is set to 0.8. (Applicable
                         only with -pto flag)
  -ptosl <limit_size>, --ptrgl-optimization-size-limit <limit_size>
                         Set the limit of the number of conditional statements
                         generated by -pto optimization. Allowed values are
                         between 1 and 100. Default value is set to 3.
                         (Applicable only with the -pto flag)
  -rcaf <aggressiveness_factor>, --reorder-code-aggressivenes-factor <aggressiveness_factor>
                         Set the aggressiveness of code reordering optimization.
                         Allowed values are [0 | 1 | 2], where 0 preserves then
                         original code order and 2 is the most aggressive.
                         Default is set to 1. (Applicable only with the -RC
                         flag)
  -rccrf <reversal_factor>, --reorder-code-condition-reversal-factor <reversal_factor>
                         Set the threshold fraction that determines when to
                         enable condition reversal for each conditional branch
                         during code reordering. Allowed input range is between
                         0.0 and 1.0 where 0.0 tries to preserve original
                         condition direction and 1.0 ignores it. Default is set
                         to 0.8 (Applicable only with the -RC flag)
  -rcctf <termination_factor>, --reorder-code-chain-termination-factor <termination_factor>
                         Set the threshold fraction that determines when to
                         terminate each chain of basic blocks during code
                         reordering. Allowed input range is between 0.0 and 1.0
                         where 0.0 generates long chains and 1.0 creates single
                         basic block chains. Default is set to 0.05. (Applicable
                         only with the -RC flag)
  -RD, --reorder-data    Perform static data reordering
  -rmte, --remove-multiple-toc-entries
                         Remove multiple TOC entries pointing to the same
                         location in the input program file
  -rt <removal_factor>, --reduce-toc <removal_factor>
                         Perform removal of TOC entries according to a removal
                         factor between (0,1), where 0 removes non-accessed TOC
                         entries only and 1 removes all possible TOC entries
  -rtb, --remove-traceback-tables
                         Remove traceback tables in reordered code
  -sal-opt, --store-after-load-optimization
                         Remove store after load when there is no change
  -sdp <aggressiveness_factor>, --stride-data-prefetch <aggressiveness_factor>
                         Perform data prefetching within frequently executed
                         loops based on stride analysis, according to an
                         aggressiveness factor between (1,9), where 1 is the
                         least aggressive
  -sdpla <iterations_number>, --stride-data-prefetch-look-ahead <iterations_number>
                         Set the number of iterations for which data is
                         prefetched into the cache ahead of time. Default value
                         is set to 4 iterations. (Applicable only with the -sdp
                         flag)
  -sdpms <stride_min_size>, --stride-data-prefetch-min-size <stride_min_size>
                         Set the minimal stride size in bytes, for which data
                         will be considered a candidate for prefetching. Default
                         value is set to 128 bytes. (Applicable only with the
                         -sdp flag)
  -shci <pct>, --selective-hot-code-inline <pct>
                         Perform selective inlining of functions in order to
                         decrease the total number of execution counts, so that
                         only functions with hotness above the given percentage
                         are inlined
  -si, --selective-inline
                         Perform selective inlining of dominant hot function
                         calls
  -sidf <percentage_factor>, --selective-inline-dominant-factor <percentage_factor>
                         Set a dominant factor percentage for selective inline
                         optimization. The allowed range is between 0 and 100.
                         Default is set to 80. (Applicable only with the -si and
                         -pbsi flags)
  -siht <frequency_factor>, --selective-inline-hotness-threshold <frequency_factor>
                         Set a hotness threshold factor percentage for selective
                         inline optimization to inline all dominant function
                         calls that have a frequency count greater than the
                         given frequency percentage. Default is set to 100.
                         (Applicable only with the -si -pbsi flags)
  -slbp, --spinlock-branch-prediction
                         Perform branch prediction bit setting for conditional
                         branches in spinlock code containing l*arx and st*cx
                         instructions. (Applicable after -bp flag)
  -sldp, --spinlock-data-prefetch
                         Perform data prefetching for memory access instructions
                         preceding spinlock code containing l*arx and st*cx
                         instructions
  -sll <Lib1:Prof1,...,LibN:ProfN>, --static-link-libraries <Lib1:Prof1,...,LibN:ProfN>
                         Statically link hot code from specified dynamically
                         linked libraries to the input program. The parameter
                         consists of a comma-separated list of libraries and
                         their profiles. IMPORTANT: Licensing rights of
                         specified libraries should be observed when applying
                         this copying optimization
  -sllht <hotness_threshold>, --static-link-libraries-hotness-threshold <hotness_threshold>
                         Set hotness threshold for the --static-link-libraries
                         optimization. The allowed input range is between 0
                         (least aggressive) and 1, or -1, which does not require
                         a profile and selects all code that might be called by
                         the input program from the given libraries. Default is
                         set at 0.5
  -so, --stack-optimization
                         Reduce the stack frame size of functions that are called
                         with a small number of arguments
  -spc, --shortcut-plt-calls
                         Shortcut PLT calls in shared libraries to local
                         functions if they exist. Note: Resolving to external
                         symbols is disabled for such calls
  -stf, --stack-flattening
                         Merge the stack frames of inlined functions with the
                         frames of the calling functions
  -tb, --preserve-traceback-tables
                         Force the restructuring of traceback tables in reordered
                         code. If -tb option is omitted, traceback tables are
                         automatically included only for C++ applications that
                         use the Try & Catch mechanism
  -tlo, --tocload-optimization
                         Replace each load instruction that references the TOC
                         with a corresponding add-immediate instruction via the
                         TOC anchor register, where possible
  -ucde, --unreachable-code-data-elimination
                         Remove unreachable code and non-accessed static data
  -vro, --volatile-registers-optimization
                         Eliminate stores and restores of non-volatile registers
                         in frequently executed functions by using available
                         volatile registers
  -vrox, --volatile-registers-extended-optimization
                         Eliminate stores and restores of non-volatile registers
                         in frequently executed functions by using available
                         volatile registers, the extended version supports FP
                         registers and transparency


 Output Options:
  -bcdf <file>, --binary-code-dump-file <file>
                         Create a binary dump of the code (opcodes) with
                         annotations of addresses
  -cep, --complement-edge-profile
                         Complements partial profile information given for the
                         basic blocks' frequencies by adding missing basic
                         block-to-basic block edge counts
  -d, --disassemble-text
                         Print the disassembled text section of the output
                         program into <output_file>.dis_text file
  -dap, --dump-ascii-profile
                         Dump profile information in ASCII format into
                         <program>.aprof (requires the -f flag)
  -db, --disassemble-bss
                         Print the disassembled bss section of the output program
                         into <output_file>.dis_bss file
  -dd, --disassemble-data
                         Print the disassembled data section of the output
                         program into <output_file>.dis_data file
  -diap, --dump-initial-ascii-profile
                         Dump initial profile information in ASCII format into
                         <program>.aprof.init (requires the -f flag)
  -dim, --dump-instruction-mix
                         Dump instruction mix statistics based on gathered
                         profile information
  -dm, --dump-mapper     Print a map of basic blocks and static variables with
                         their respective new -> old addresses into a
                         <program>.mapper file
  -enc, --encapsulate    Encapsulate SPE executables present in the PPE input
                         (see --spe-directory)
  -o <output_file>, --output-file <output_file>
                         Set the name of the output file. The default
                         instrumented file is <program>.instr. The default
                         optimized file is <program>.fdpr
  -pif, --print-inlined-funcs
                         Print the list of inlined functions along with their
                         corresponding calling functions into a
                         <output_file>.inl_list file (requires the -si or -i or
                         -isf flags)
  -plc, --preserve-linkage-conventions
                         Preserve linkage conventions
  -ppcf, --print-prof-counts-file
                         Print the profiling counters in ASCII format into a
                         <program>.counts file (requires the -f flag)
  -sf, --strip-file      Strip the optimized output file
  -simo, --single-input-multiple-outputs
                         Optimize in parallel into multiple outputs as specified
                         by option sets read from stdin
  -spedir <directory>, --spe-directory <directory>
                         Set the directory into which SPE executables will be
                         extracted and from which they will be encapsulated


 General Options:
  -cell, --cell-supervisor
                         Integrated PPE/SPE processing. Perform SPE extraction,
                         processing, and encapsulation automatically prior to
                         PPE processing
  -h, --help             Print online help
  -m <machine-model>, --machine <machine-model>
                         Generate code for the specified machine model. Target
                         machine can be one of the following models: power2,
                         power3, ppc405, ppc440, power4, ppc970, power5, power6,
                         power7, ppe, spe, spe_edp, zArch6, zArch5. Default is
                         set to no machine
  -q, --quiet            Set quiet output mode, suppressing informational
                         messages
  -st <stat_file>, --statistics <stat_file>
                         Output statistics information to <stat_file>. If
                         <stat_file> is '-', the output goes to standard output.
                         See --verbose for the default
  -v <level>, --verbose <level>
                         Set verbose output mode level. When set, various
                         statistics about the target optimized program are
                         printed into the file <program>.stat. Allowed level
                         range is between 0 and 3. Default is set to 0
  -V, --version          Print version
  -armember              For archive files - list of archive members to be
                         optimized, if -armember is not specified, all members
                         will be optimized