=========================================================== HP-UX Flag Descriptions for OMP2001 and OMPL2001 - July 2002 =========================================================== ----------------------------------------------- Specific Flags for HP-UX F90 Compiler ----------------------------------------------- +cat Concatenates all source files of the same source form together, then compiles the concatenated source all at once. This enables inlining at +O3 within the concatenated file. ----------------------------------------------- Specific Flags for HP-UX aCC Compiler ----------------------------------------------- -Ae Turns on ANSI C c89 mode. This option allows compilation of c89 compatible C source programs just like C compiler. -AOe In addition to specifying the extended ANSI C language dialect as per -Ae (the default), allows the optimizer to aggressively exploit the assumption that the source code conforms to the ANSI programming language C standard ISO 9899:1990 plus the extensions. At present, the effect is to make +Otype_safety=ansi the default (it can of course be overridden). As new independently-controllable optimizations are developed that depend on the extended ANSI C standard, the flags that enable those optimizations may also become the default under -AOe. ----------------------------------------------------------------- Common Flags for HP-UX F90 Compiler, C Compiler and aCC Compiler ----------------------------------------------------------------- Note on +O All +Ooptions are parsed as if there were no '_' characters although '_' are inserted for readability and documented as such in man pages and documents. As an example, the options +Oinline_budget and +Oinlinebudget are the same spelling. +Ofaster Selects the +Ofast option at optimization level +O4. Must be used with +P (+Oprofile=use) or else the optimization level will drop to +O3. +Ofast Select a combination of compilation options for optimum execution speed at build times. Currently: +O2, +Olibcalls, +Onolimit, +Onofltacc, +FPD, +DSnative (on IPF), and +Oshortdata. +Oall Apply maximum optimization to achieve the best runtime performance. This option is equivalent to specifying +Oaggressive and +Onolimit on the same command line. The +Oall option automatically invokes the highest level of optimization. (+O3 for F90, +O4 for C and aCC) +Olevel Invoke optimizations selected by level. These can be preceded by either +O or -O. Defined values for level are: 0 Perform no optimizations. 1 Perform optimizations within basic blocks only. This is the default. 2 Perform level 1 and global optimizations. Same as -O and +O. 3 Perform level 2 as well as interprocedural global optimizations. 4 Perform level 3 as well as doing link time optimizations. (C and aCC only) NOTE: +Oprocelim is the general default at all levels, unless the users says +ild +ildrelink or -b. NOTE: +O4 is only supported with +P. Otherwise, attempts to activate +O4 will cause the compiler to automatically drop to +O3. +Oaggressive Apply aggressive optimizations. These include new optimizations as well as optimizations invoked by the following option settings: +Oentrysched +Olibcalls +Onofltacc +Onoinitcheck +FPD If +Oaggressive is used it implies +Onofltacc. This can be overridden if +Ofltacc follows +Oaggressive on the command line. +Oentrysched Perform instruction scheduling on a subprogram's entry and exit code sequences. This option can be used at optimization level 1 and higher. The default is +Onoentrysched. +Olibcalls Use low-call-overhead versions of select library routines. This option can be used at any level. At optimization level 0 or 1, the default is +Onolibcalls; at optimization level 2 or higher, the default is +Olibcalls. +O[no]fltacc Disable [enable] floating-point optimizations that can result in numerical differences. +Onofltacc also generates Fused Multiply-Add (FMA) instructions, as does compiling your program at optimization level 2 or higher. FMA instructions can improve performance of floating-point applications and are available only on PA-RISC 2.0 systems or later. If you do not specify either +Ofltacc or +Onofltacc at optimization level 2 or higher, the optimizer will generate FMA instructions but will not perform any expression-reordering optimizations. If +Oall is used it implies +Oaggressive which in turn implies +Onofltacc. This can be overridden if +Ofltacc follows +Oall on the command line. +Onoinitcheck Disable initialization of any local, scalar, automatic variable that is found to be uninitialized. This option can be used at optimization level 2 and higher. The default is to enable initialization if the variable is uninitialized with respect to every path leading to its use. +FPD Specify how the run time environment for floating-point operations should be initialized at program start up. The default is that all trapping behaviors are disabled. See ld(1) for specific values of flags. To dynamically change these settings at run time, refer to fesetenv(3M). D (d) Enable sudden underflow (flush to zero) of denormalized values. +Onolimit Do not suppress optimizations that significantly increase compile-time or consume enormous amounts of memory. +Oloop_unroll=n Unroll [do not unroll] program loops by a factor of n. For example, specifying +Oloop_unroll=4 requests the optimizer to replicate the loop body four times. This option can be used at optimization level 2 or higher. The default is +Oloop_unroll=4. This option is only valid on PA-RISC systems. +O[no]loop_block Enable [disable] loopblocking for data cache optimizations. Available at optimization level 3 +O[no]inline Request [disable] inlining and cloning. This option can be used at optimization level 3 and higher. The default is +Oinline. +O[no]inline=function1[,function2...] Enable [disable] optimizer inlining for the named functions. This optimization can occur at optimization levels 3 and 4. The default is +Oinline. +Oinline_budget=n +Oinlinebudget=n Perform more aggressive inlining, where n specifies the degree of aggressiveness, as follows: 100 Default level of inlining. > 100 More aggressive inlining at the expense of compilation time and code size. The maximum for n is 1000000. 2 - 99 Less aggressive inlining. The optimizer gives more weight to compilation time and code size when determining whether to inline. 1 Inline only if it reduces code size. This option can be used at optimization level 3 or higher. +O[no]ptrs_to_globals[=name1,name2,...,nameN] Tell the optimizer whether global variables are modified [are not modified] through pointers. This optimization can occur at levels 2, 3, 4. The default is +Optrs_to_globals +O[no]dataprefetch Insert [do not insert] instructions within innermost loops to explicitly prefetch data from memory into the data cache. Data prefetch instructions can improve cache performance. This option can be used at optimization level 2 or higher. On HP-UX version 11i and later, +Odataprefetch is the same as +Odataprefetch=indirect and +Onodataprefetch is the same as +Odataprefetch=none. At +O2 and higher, the default is +Odataprefetch. +O[no]procelim Enable [disable] the elimination of functions that are not referenced by the application. Only functions with the hidden export class may be eliminated. The default is +Oprocelim. +Oshortdata[=size] All objects of size size bytes or smaller will be placed in the short data area, and references to such data will assume it resides in the short data area. Valid values of n are 0, or a decimal number between 8 and 4,194,304 (4MB). If no size is specified, all data is placed in the short data area. If size is 0, no data will be placed in the short data area, and all data references will use long offsets. The default is +Oshortdata=8. +DA2.0 (+DAmodel) Generate code for a specific version of the PA- RISC architecture. For a PA-RISC 2.0 32-bit executable specify +DA2.0. +DA2.0W (+DAmodel) Generate code for a specific version of the PA- RISC architecture. For a PA-RISC 2.0 64-bit executable specify +DA2.0W. +DDdata_model Generate code using either the ILP32 or LP64 data model. Defined values for data_model are: 32 Use the ILP32 data model. The sizes of the int, long and pointer data types are 32-bits. 64 Use the LP64 data model. The size of the int data type is 32-bits, and the sizes of the long and pointer data types are 64-bits. Defines __LP64__ to the preprocessor. The default is +DD32. +DSmodel Perform instruction scheduling appropriate for a specific implementation of the architecture. ON IPF the defined values for model are: blended Tune for best performance on a combination of processors (i.e., Itanium or Itanium 2 processor). itanium Tune for best performance on an Itanium processor. itanium2 Tune for best performance on an Itanium 2 processor. native Tune for best performance on the processor on which the compiler is running. The default model for HP-UX 11i v1.5 is blended. +Oopenmp Enable Openmp Directives. +Oinfo Provide feedback information about the optimization process. This option is most useful at optimization levels 3 and 4. The default is +Onoinfo. -minshared Indicates that the result of the current compilation is going into an executable file that will make minimal use of shared libraries. This option is only supported on HP-UX version 11i and later. -Wl,-aarchive (ld option -a search) Specifies library search order. Archive causes archive libraries to be searched only rather than shared libraries. Specify whether shared or archive libraries are searched with the -l option. The value of search should be one of archive, shared, archive_shared, shared_archive, or default. This option can appear more than once, interspersed among -l options, to control the searching for each library. The default is to use the shared version of a library if one is available, or the archive version if not. If either archive or shared is active, only the specified library type is accepted. If archive_shared is active, the archive form is preferred, but the shared form is allowed. If shared_archive is active, the shared form is preferred but the archive form is allowed. -N Mark output from the linker unshared, so that up to 2 gigabytes of memory can be addressed as data in a 32 bit process. This allows quadrants I and II to be combined such that the data segment tarts at the end of the text segment in quadrant I and extends to the end of quadrant II. For details and system defaults, see ld(1) [Profile Feedback Related Options] +Oprofile=collect +I Instrument the application for profile-based optimization. See ld(1), +P, and +pgm for more details. The +I option is incompatible with the -G, +P, and -S options. +I is equivalent to +Oprofile=collect. See ld(1), +P, and +pgm for more details. The +I option is incompatible with the -G, +P, and -S options. It is incompatible with the -g option only during compile time. +Oprofile=use +P Optimize the application based on profile data found in the database file flow.data, produced by compilation with +I. +P is equivalent to +Oprofile=use or +Oprofile=use:filename. See ld(1), +I, and +df, for more details. The +P option is incompatible with the +I and -S options. It is incompatible with the -g option only during compile time. +Ostatic_prediction Enables [disables] the use of static branch prediction for decision on conditional branchs. More applicable to large programs with poor locality. Available at optimization level 3 and above. ----------------------------------------------- Descriptions of Portability Flags ----------------------------------------------- +[no]extend_source Allow [do not allow] up to 254 characters on a single source line. The default, +noextend_source, is 72 characters for fixed format and 132 for free format. +source={fixed|free|default} Accept source files in fixed format (+source=fixed) or free format (+source=free). The default, +source=default, is free for .f90 files and fixed for .f and .F source files. ----------------------------------------------- Descriptions of Kernel Tunables ----------------------------------------------- (Unless otherwise noted, units are in bytes) dbc_max_pct Maximum dynamic buffer cache size as a percent of system memory dbc_min_pct Minimum dynamic buffer cache size as a percent of system memory maxdsiz Maximum data size maxdsiz_64bit Maximum data size for 64 bit applications maxssiz Maximum stack size maxssiz_64bit Maximum stack size for 64 bit applications maxtsiz Maximum thread data size maxtsiz_64bit Maximum thread data size for 64 bit applications vps_ceiling Maximum System-Selected Page Size (in Kbytes) vps_pagesize Default user page size (in Kbytes) swapmem_on Swap to memory flag. ----------------------------------------------- Descriptions of HP-UX File Systems ----------------------------------------------- VxFS (VERITAS File System) The default HPUX file system. Use if you require fast file system recovery and the ability to perform a variety of administrative tasks online. This is HP-UX's implementation of a journaled file system (JFS). Advanced JFS (OnlineJFS) A separately orderable product, HP OnlineJFS, is a system administration tool used to perform online maintenance tasks on a Mounted File System. These tasks include: . defragmenting a file system to regain performance. . resizing a file system. . creating a snapshot file system for backup purposes. ----------------------------------------------- Descriptions of Environment Variables ----------------------------------------------- CPS_STACK_SIZE A default stack size of 8 megabytes is used for additional threads created for an OpenMP program. The stack region is allocated from the program heap which is part of the data segment. The default stack size for OpenMP threads can be modified prior to program invocation by setting the environment variable CPS_STACK_SIZE to a desired number of K bytes. export CPS_STACK_SIZE=128 will establish a 128K byte stack region for each thread. OMP_NUM_THREADS Specifies the number of threads to use during execution. By default, an OpenMP application will use an implied value equal to the number of processors on the system.