=========================================================== HP-UX Flag Descriptions for OMP2001 and OMPL2001 - Oct 2003 =========================================================== ----------------------------------------------- Specific Flags for HP-UX F90 Compiler ----------------------------------------------- +cat Concatenates all source files of the same source form together, then compiles the concatenated source all at once. This enables inlining at +O3 within the concatenated file. ----------------------------------------------- Specific Flags for HP-UX aCC Compiler ----------------------------------------------- -Ae Turns on ANSI C c89 mode. This option allows compilation of c89 compatible C source programs just like C compiler. -AOe In addition to specifying the extended ANSI C language dialect as per -Ae (the default), allows the optimizer to aggressively exploit the assumption that the source code conforms to the ANSI programming language C standard ISO 9899:1990 plus the extensions. At present, the effect is to make +Otype_safety=ansi the default (it can of course be overridden). As new independently-controllable optimizations are developed that depend on the extended ANSI C standard, the flags that enable those optimizations may also become the default under -AOe. ----------------------------------------------------------------- Common Flags for HP-UX F90 Compiler, C Compiler and aCC Compiler ----------------------------------------------------------------- Note on +O All +Ooptions are parsed as if there were no '_' characters although '_' are inserted for readability and documented as such in man pages and documents. As an example, the options +Oinline_budget and +Oinlinebudget are the same spelling. +Ofaster Selects the +Ofast option at optimization level +O4. For the HP C/aCC compilers, this must be used with +P (+Oprofile=use) or else the optimization level will drop to +O3. Fortran90 will honor +O4 independent of +P (+Oprofile=use). +Ofast Select a combination of compilation options for optimum execution speed at build times. Currently: +O2, +Olibcalls, +Onolimit, +Onofltacc, +FPD, +DSnative (on IPF), and +Oshortdata. +Oall Apply maximum optimization to achieve the best runtime performance. This option is equivalent to specifying +Oaggressive and +Onolimit on the same command line. The +Oall option automatically invokes the highest level of optimization. (+O3 for F90, +O4 for C and aCC) +Olevel Invoke optimizations selected by level. These can be preceded by either +O or -O. Defined values for level are: 0 Perform no optimizations. 1 Perform optimizations within basic blocks only. This is the default. 2 Perform level 1 and global optimizations. Same as -O and +O. 3 Perform level 2 as well as interprocedural global optimizations. 4 Perform level 3 as well as doing link time optimizations. (C and aCC only) NOTE: +Oprocelim is the general default at all levels, unless the users says +ild +ildrelink or -b. NOTE: +O4 is only supported with +P. Otherwise, attempts to activate +O4 will cause the compiler to automatically drop to +O3. +Oaggressive Apply aggressive optimizations. These include new optimizations as well as optimizations invoked by the following option settings: +Oentrysched +Olibcalls +Onofltacc +Onoinitcheck +FPD If +Oaggressive is used it implies +Onofltacc. This can be overridden if +Ofltacc follows +Oaggressive on the command line. +Oentrysched Perform instruction scheduling on a subprogram's entry and exit code sequences. This option can be used at optimization level 1 and higher. The default is +Onoentrysched. +Olibcalls Use low-call-overhead versions of select library routines. This option can be used at any level. At optimization level 0 or 1, the default is +Onolibcalls; at optimization level 2 or higher, the default is +Olibcalls. +O[no]fltacc Disable [enable] floating-point optimizations that can result in numerical differences. +Onofltacc also generates Fused Multiply-Add (FMA) instructions, as does compiling your program at optimization level 2 or higher. FMA instructions can improve performance of floating-point applications and are available only on PA-RISC 2.0 systems or later. If you do not specify either +Ofltacc or +Onofltacc at optimization level 2 or higher, the optimizer will generate FMA instructions but will not perform any expression-reordering optimizations. If +Oall is used it implies +Oaggressive which in turn implies +Onofltacc. This can be overridden if +Ofltacc follows +Oall on the command line. +Onoinitcheck Disable initialization of any local, scalar, automatic variable that is found to be uninitialized. This option can be used at optimization level 2 and higher. The default is to enable initialization if the variable is uninitialized with respect to every path leading to its use. +FPD Specifies how the run time environment for floating-point operations should be initialized at program start up. The default is that all trapping behaviors are disabled. See ld(1) for specific values of flags. To dynamically change these settings at run time, refer to fesetenv(3M). +FPD is a linker specific flag recognized by the compiler driver. +FPD is equivalent to specifying -Wl,+FPD on the command line. D (d) Enable sudden underflow (flush to zero) of denormalized values. +Onolimit Do not suppress optimizations that significantly increase compile-time or consume enormous amounts of memory. +Oloop_unroll=n Unroll [do not unroll] program loops by a factor of n. For example, specifying +Oloop_unroll=4 requests the optimizer to replicate the loop body four times. This option can be used at optimization level 2 or higher. The default is +Oloop_unroll=4. +O[no]loop_block Enable [disable] loopblocking for data cache optimizations. Available at optimization level 3 +O[no]inline Request [disable] inlining and cloning. This option can be used at optimization level 3 and higher. The default is +Oinline. +O[no]inline=function1[,function2...] Enable [disable] optimizer inlining for the named functions. This optimization can occur at optimization levels 3 and 4. The default is +Oinline. +Oinline_budget=n +Oinlinebudget=n Perform more aggressive inlining, where n specifies the degree of aggressiveness, as follows: 100 Default level of inlining. > 100 More aggressive inlining at the expense of compilation time and code size. The maximum for n is 1000000. 2 - 99 Less aggressive inlining. The optimizer gives more weight to compilation time and code size when determining whether to inline. 1 Inline only if it reduces code size. This option can be used at optimization level 3 or higher. +O[no]ptrs_to_globals[=name1,name2,...,nameN] Tell the optimizer whether global variables are modified [are not modified] through pointers. This optimization can occur at levels 2, 3, 4. The default is +Optrs_to_globals +O[no]dataprefetch Insert [do not insert] instructions within innermost loops to explicitly prefetch data from memory into the data cache. Data prefetch instructions can improve cache performance. This option can be used at optimization level 2 or higher. On HP-UX version 11i and later, +Odataprefetch is the same as +Odataprefetch=indirect and +Onodataprefetch is the same as +Odataprefetch=none. At +O2 and higher, the default is +Odataprefetch. +Odataprefetch=direct enables generatation of prefetch instructions for direct memory references. +O[no]parmsoverlap Optimize with the assumption that subprogram arguments do [do not] refer to the same memory. This option can be used at optimization levels 2, 3, and 4. The default is +Oparmsoverlap. +O[no]procelim Enable [disable] the elimination of functions that are not referenced by the application. Only functions with the hidden export class may be eliminated. The default is +Oprocelim. +Oshortdata[=size] All objects of size size bytes or smaller will be placed in the short data area, and references to such data will assume it resides in the short data area. Valid values of n are 0, or a decimal number between 8 and 4,194,304 (4MB). If no size is specified, all data is placed in the short data area. If size is 0, no data will be placed in the short data area, and all data references will use long offsets. The default is +Oshortdata=8. +DA2.0 (+DAmodel) Generate code for a specific version of the PA- RISC architecture. For a PA-RISC 2.0 32-bit executable specify +DA2.0. +DA2.0W (+DAmodel) Generate code for a specific version of the PA- RISC architecture. For a PA-RISC 2.0 64-bit executable specify +DA2.0W. +DDdata_model Generate code using either the ILP32 or LP64 data model. Defined values for data_model are: 32 Use the ILP32 data model. The sizes of the int, long and pointer data types are 32-bits. 64 Use the LP64 data model. The size of the int data type is 32-bits, and the sizes of the long and pointer data types are 64-bits. Defines __LP64__ to the preprocessor. The default is +DD32. +DSmodel Perform instruction scheduling appropriate for a specific implementation of the architecture. ON IPF the defined values for model are: blended Tune for best performance on a combination of processors (i.e., Itanium or Itanium 2 processor). itanium Tune for best performance on an Itanium processor. itanium2 Tune for best performance on an Itanium 2 processor. native Tune for best performance on the processor on which the compiler is running. The default model for HP-UX 11i v1.5 is blended. +Oopenmp Enable Openmp Directives. +Oinfo Provide feedback information for the user about the optimizations performed by the compiler. This option is most useful (informative) at optimization levels 3 and 4. The default is +Onoinfo. This is not a profile feeback optimization option. -minshared Indicates that the result of the current compilation is going into an executable file that will make minimal use of shared libraries. This option is only supported on HP-UX version 11i and later. -Wl,-aarchive (ld option -a search) Specifies library search order. Archive causes archive libraries to be searched only rather than shared libraries. Specify whether shared or archive libraries are searched with the -l option. The value of search should be one of archive, shared, archive_shared, shared_archive, or default. This option can appear more than once, interspersed among -l options, to control the searching for each library. The default is to use the shared version of a library if one is available, or the archive version if not. If either archive or shared is active, only the specified library type is accepted. If archive_shared is active, the archive form is preferred, but the shared form is allowed. If shared_archive is active, the shared form is preferred but the archive form is allowed. -N Mark output from the linker unshared, so that up to 2 gigabytes of memory can be addressed as data in a 32 bit process. This allows quadrants I and II to be combined such that the data segment tarts at the end of the text segment in quadrant I and extends to the end of quadrant II. For details and system defaults, see ld(1) [Profile Feedback Related Options] +Oprofile=collect +I Instrument the application for profile-based optimization. See ld(1), +P, and +pgm for more details. The +I option is incompatible with the -G, +P, and -S options. +I is equivalent to +Oprofile=collect. See ld(1), +P, and +pgm for more details. The +I option is incompatible with the -G, +P, and -S options. It is incompatible with the -g option only during compile time. +Oprofile=use +P Optimize the application based on profile data found in the database file flow.data, produced by compilation with +I. +P is equivalent to +Oprofile=use or +Oprofile=use:filename. See ld(1), +I, and +df, for more details. The +P option is incompatible with the +I and -S options. It is incompatible with the -g option only during compile time. +Ostatic_prediction Enables [disables] the use of static branch prediction for decision on conditional branchs. More applicable to large programs with poor locality. Available at optimization level 3 and above. ----------------------------------------------- Descriptions of Portability Flags ----------------------------------------------- +[no]extend_source Allow [do not allow] up to 254 characters on a single source line. The default, +noextend_source, is 72 characters for fixed format and 132 for free format. +source={fixed|free|default} Accept source files in fixed format (+source=fixed) or free format (+source=free). The default, +source=default, is free for .f90 files and fixed for .f and .F source files. ----------------------------------------------- Descriptions of Kernel Tunables ----------------------------------------------- (Unless otherwise noted, units are in bytes) dbc_max_pct Maximum dynamic buffer cache size as a percent of system memory dbc_min_pct Minimum dynamic buffer cache size as a percent of system memory maxdsiz Maximum data size maxdsiz_64bit Maximum data size for 64 bit applications maxssiz Maximum stack size maxssiz_64bit Maximum stack size for 64 bit applications maxtsiz Maximum thread data size maxtsiz_64bit Maximum thread data size for 64 bit applications vps_ceiling Maximum System-Selected Page Size (in Kbytes) vps_pagesize Default user page size (in Kbytes) swapmem_on Swap to memory flag. ----------------------------------------------- Descriptions of HP-UX File Systems ----------------------------------------------- VxFS (VERITAS File System) The default HPUX file system. Use if you require fast file system recovery and the ability to perform a variety of administrative tasks online. This is HP-UX's implementation of a journaled file system (JFS). Advanced JFS (OnlineJFS) A separately orderable product, HP OnlineJFS, is a system administration tool used to perform online maintenance tasks on a Mounted File System. These tasks include: . defragmenting a file system to regain performance. . resizing a file system. . creating a snapshot file system for backup purposes. ----------------------------------------------- Descriptions of omp/cps Environment Variables ----------------------------------------------- CPS_STACK_SIZE A default stack size of 8 megabytes is used for additional threads created for an OpenMP program. The stack region is allocated from the program heap which is part of the data segment. The default stack size for OpenMP threads can be modified prior to program invocation by setting the environment variable CPS_STACK_SIZE to a desired number of K bytes. export CPS_STACK_SIZE=128 will establish a 128K byte stack region for each thread. OMP_NUM_THREADS Specifies the number of threads to use during execution. By default, an OpenMP application will use an implied value equal to the number of processors on the system. MP_IDLE_THREADS_WAIT=-1 OMP_FIRST_USE=0 OMP_FIRST_USE Openmp environment variable that control when threads are created. The default value is one, which will create threads on first use of a parallel region. A value of zero causes the threads to be created at program startup. MP_IDLE_THREADS_WAIT Openmp environment variable that controls how threads behave when they enter the idle schedule loop. Valid values are -1 (causes idle threads to spin wait), 0 (causes idle threads to suspend themselves) and N (causes the idle thread to spin wait for N milliseconds before suspending itself). The defualt value is 50. ------------------------------------------------------------- Descriptions of malloc Environment Variables: malloc(3C) ------------------------------------------------------------- The performance of the malloc() family can be tuned via two environment variables, _M_ARENA_OPTS and _M_SBA_OPTS. For threaded applications, malloc() uses multiple arenas. Memory requests from different threads are handled by different arenas. _M_ARENA_OPTS can be used to adjust the number of arenas and how many pages each time an arena expands itself (the expansion factor), assuming that the page size is 4096 bytes. In general, the more threads in an application, the more arenas should be used for better performance. The number of arenas can be from 1 to 64 for threaded applications. For non-threaded applications, only one arena is used. If the environment variable is not set, or the number of arenas is set to be out of the range, the default number of 8 will be used. The expansion factor is from 1 to 4096, default value is 32. Again, if the factor is out of the range, the default value will be used. _M_ARENA_OPTS=: _M_SBA_OPTS is used to turn on the small block allocator, and to set up parameters for the small block allocator, namely, maxfast, grain, and numlblks. Applications with small block allocator turned on usually run faster than with it turned off. _M_SBA_OPTS=:: ----------------------------------------------- Descriptions of chatr options: chatr(1) ----------------------------------------------- -s Perform its operation silently. +id flag Controls the preference of physical memory for the data segment. This is only important on ccNUMA (Cache Coherent Non-Uniform Memory Architecture) systems. The flag value may be either enable or disable. When enabled, the data segment will use interleaved memory. When disabled (the default), the data segment will use cell local memory. This behavior will be inherited across a fork(), but not an exec(). +pd size Request a particular virtual memory page size that should be used for data. Sizes of 4K, 16K, 64K, 256K, 1M, 4M, 16M, 64M, 256M, 1G, 4G, D, and L are supported. A size of D results in using the default page size. A size of L results in using the largest page size available. The actual page size may vary if the requested size cannot be fulfilled. +pi size Request a particular virtual memory page size that should be used for text (instructions). See the +pd option for additional information. +mergeseg flag Enable or disable the shared library segment merging features. When enabled, all data segments of shared libraries loaded at program startup are merged into a single block. Data segments for each dynamically loaded library will also be merged with the data segments of its dependent libraries. Merging of these segments increasesrun-time performance by allowing the kernel to use larger size page table entries. ----------------------------------------------- Descriptions of mpsched options: mpsched(1) ----------------------------------------------- -T policy Apply the specified policy to the threads of the process. The scheduling policies are the same as for the -P option except that they apply to newly created threads instead of processes. Also, thread policies can only be specified on commands launched from the command line of mpsched. The option can be used with the -P, -l, and -c options. FILL Fill first launch policy. Under this policy, successive threads are launched on the same locality domain as their parent until one has been launched on each processor in the locality domain. At that point, new threads are created on the next locality domain. HPC2002 Related flags -------------------------------------------------- mpicc, mpif90 - MPI specific compiler wrapper script +i8 Treat all integer or logical constants, intrinsics, and user variables (declared as integer or logical) as 8-byte values, rather than the current 4-byte default. +r8 Treat all real constants, real intrinsics, and user variables declared as real as 8-byte values, rather than the current 4-byte default. +[no]U77 Invoke [do not invoke] support for the BSD 3f library. The default is +noU77. +[no]signedzero Enable signed-zero support. This option forces a floating point value of negative zero that appears as an formatted output list item to be represented in the output record with a leading '-'. This option also changes the behavior of the SIGN intrinsic. The +nosignedzero option provides compatibility with HP's f77. The default is +signedzero. +[no]ppu Postpend [do not postpend] underscores at the end of definitions of and references to externally visible symbols (+ppu). The default is +noppu for PA-RISC 32-bit object files and +ppu for PA-RISC 64-bit object files. The default is +ppu for IPF 32-bit and 64-bit object files. MPIRUN -hmp Forces HMP (Hyper Messaging Protocal) to be used. Will cause the application to abort if HMP is unavailable. The preferred method for enabling HMP is use of the mpirun option -hmp which will enable HMP on every host. -h Starts the processes on . The default is the local host. -np # Specifies the number of processes to run. The default is 1. -e [=] Sets the environment variable for the program and gives it the value . Environment variable substitutions (for example: $FOO) are supported in the argument. The value argument cannot contain white-spaces. -e MPI_HMP=ON Instructs mpirun to use Hyper Messaging Protocol. -e MPI_WORKDIR=$cwd causes mpirun invocations to change directory to $cwd. --------------------------------- NETCDF Build Instructions --------------------------------- Src base is netcdf-3.5.0