Description of compiler flags for Intel C++ Compiler 9.0 --------------------------------------------------------- G1 : Targets optimization for the Itanium® processor. G2 : Targets optimization for the Itanium 2 processor. Generated code is also compatible with the Itanium processor (default). QIPF_fma[-]: Enables [disables] the combining of floating-point multiply operations and add/ subtract operations. QIPF_fp_ : Enables floating-point speculations with one of the following modes: fast – Speculate floatingpoint operations. off – Disables speculation of floating-point operations. safe – Speculate only when safe. strict – This is the same as specifying off. /Qauto_ilp32 : Specifies that the application cannot exceed a 32-bit address space, which allows the compiler to use 32-bit pointers whenever possible. To use this option, you must also specify -ipo[value]. Using the -auto_ilp32 option on programs that can exceed 32-bit address space (2**32) may cause unpredictable results during program execution. This option has no effect on systems with Intel® EM64T unless the -axP or -xP option is also used Qftz[-] : Flushes denormal results to zero. The option is turned ON with -O3 by This only impacts the application when the main program or dll main is compiled. Qivdep_parallel : Indicates that there is absolutely no loopcarried memory dependency in the loop where the IVDEP directive is specified. QIPF_fltacc[-] : Enables [disables] optimizations that affect floating-point accuracy. /QIPF_flt_eval_ method{0|2} : Evaluates floating-point operands to the precision indicated by the program. /Qunroll[n] : Sets the maximum number of times to unroll loops. -unroll0 disables loop unrolling. The default is -unroll, which uses default heuristics. Qrestrict[-] : Enables [disables] pointer disambiguation with the restrict qualifier. Oa : Assumes no aliasing in program. Ow: Assumes no aliasing within functions, but assumes aliasing across calls. Qalias_args[-]:Implies arguments may be aliased [not aliased]. Qopt_report : Generates an optimization report directed to stderr. /Qopt_report_ : Specifies the filename for the optimization report. filefilename /Qopt_report_ : Specifies the verbosity level of the output. Valid arguments are min (default), med, and max. Qopt_report_ :Specifies the compilation name for which PHASE NAME reports are generated. The option can be used multiple times in the same compilation to get output from multiple phases. Valid name arguments are as follows: ipo – Interprocedural Optimizer hlo – High Level Optimizer ilo – Intermediate Language Scalar Optimizer ecg – Code Generator omp – OpenMP* all – All phases Qopt_report_ : Specifies a routine rtn. Generates reports ROUTINE from all routines with names that include rtn as part of the name. /Qopt_report_ :Displays all possible settings for -opt_report_PHASE NAME HELP /Qopenmp :Enables the parallelizer to generate multithreaded code based on the OpenMP* directives. Qopenmp_ :Controls the OpenMP parallelizer’s report1. diagnostic levels. The default is /Qopenmp_ report1. /Qparallel :Detects parallel loops capable of being executed safely in parallel and automatically generates multithreaded code for these loops. /Qpar_ :Controls the auto-parallelizer’s diagnostic levels as follows: 0 – Displays no diagnostic information. 1 – Indicates loops successfully parallelized (default). 2 – Loops successfully and unsuccessfully parallelized. 3 – Adds information about any proven or assumed dependencies inhibiting parallelization. Qpar_ threshold[n] :Sets a threshold for the autoparallelization of loops based on the probability of profitable execution of the loop in parallel, n=0 to 100. Default: n=75. This option is used for loops whose computation work volume cannot be determined at compile time. 0 – Parallelize loops regardless of computation work volume. 100 – Parallelize loops only if profitable parallel execution is almost certain. /Od :No optimization. Useful during application development and debugging. /O1 :Omits optimizations that tend to increase object size. Creates the smallest optimized code in most cases. On Linux systems with IA-32 processors only, there is no difference between -O1 and -O2. /O2 :Default setting. Creates the fastest code in most cases, but may increase code size significantly over /O1. On Linux systems with IA-32 processors, -O1 and -O2 are equivalent. /Ox :Equivalent to /O2 except that /Ox does not imply /Gy (function packaging) or /Gf /O3 :Same as /O2, plus loop transformations and data prefetching for improved memory-usage efficiency. For the full benefit of /O3 on 32-bit Intel® processors, also use the /Qx{K, W, N, B, P} or /Qax{K, W, N, B, P} options for Pentium® III and Pentium 4 processors and subsequent IA-32 processors. This option is useful for a broad range of applications, particularly for the loopy, kernel-based code common in highperformance computing. /fast :The -fast option maximizes speed across the entire program. For Itanium®- based systems, -fast sets -O3, -ipo, and -static. For IA-32 and systems using Intel® EM64T, -fast sets -O3, -ipo, -static, and -xP. /Zi : Generates debug information for use with any of the common platform debuggers. /Qip :Single file optimization. Allows selective inlining optimization within a single source file. /Qipo[value] :Permits inlining and other optimizations among multiple source files. The optional value argument controls the maximum number of link-time compilations (or number of object files) that are spawned. The default for value is 1. /Qipo_separate :Creates one object file for every source file. This option overrides –ipo[value]. /Qprof_gen :Instruments a program for profiling. /Qprof_dirdir :Specifies a directory for the profiling output files, *.dyn and *.dpi. /Qprof_use :Enables the use of profiling information during optimization. /Qax{K|W|N|B|P} :Automatic Processor Dispatch. Generates specialized code for the indicated processors while also generating generic IA-32 code. You can use more than one code to tune for multiple processors in the same executable. K – Intel® Pentium® III processors and AMD Athlon* XP processors W – Intel Pentium 4 processors and AMD Athlon 64 and Opteron* processors B – Intel Pentium M and compatible Intel P – Intel Pentium 4 processor with Streaming SIMD Extensions 3 and compatible Intel processors N – Provides additional Pentium 4 processor tuning beyond W Only the -axW and -axP options are available for Intel® EM64T. Qvec_ report{0|1|2|3|4|5} :Controls amount of vectorizer diagnostic information as follows: n = 0: no information n = 1: indicates vectorized loops (default) n = 2: indicates vectorized and non-vectorized loops n = 3: indicates vectorized and non-vectorized loops and prohibits data dependence information n = 4: indicates non-vectorized loops n = 5: indicates non-vectorized loops and prohibits data dependence information Interprocedural Optimization (IPO) and Profile-Guided Optimization (PGO) Options IPO controls function-inlining to reduce function call overhead and improve data layout across functions. PGO provides runtime feedback to guide optimization decisions about data and code layout to improve instructioncache, paging, and branch prediction. Additionally, IPO can increase code size. Be sure to measure your execution performance, compile-time, and code-size tradeoffs with these options. IPO is best used in conjunction with PGO to guide which functions to inline. Instrumented Executable Step Two Run instrumented application to produce Dynamic Information Files Dynamic Information Summary File Step Three Feedback Compile with PGO Profile-Guided Application Step One Compile with PGO foo.exe Automatic Optimization Options Before you begin performance tuning, ensure that your application runs as intended with a base set of options or in debug mode (-Od and-Zi). These are general optimization options that should be at the heart of any application tuning for all 32-bit and 64-bit Intel® processors. Try these different options and measure your performance before proceeding to more advanced optimizations. Windows* Command Linux* Command Comment /Od (No Optimization) -O0 No optimization. Useful during application development and debugging. /O1 (Optimize for size) -O1 Omits optimizations that tend to increase object size. Creates the smallest optimized code in most cases. On Linux systems with IA-32 processors only, there is no difference between -O1 and -O2. This option is useful in many large server/database applications where memory paging due to larger code size is an issue. /O2 (Maximize speed) -O1 or -O2 Default setting. Creates the fastest code in most cases, but may increase code size significantly over /O1. On Linux systems with IA-32 processors, -O1 and -O2 are equivalent. /Ox (Maximize optimization) n/a Equivalent to /O2 except that /Ox does not imply /Gy (function packaging) or /Gf (string pooling). /O3 (High-level optimizations) -O3 Same as /O2, plus loop transformations and data prefetching for improved memory-usage efficiency. For the full benefit of /O3 on 32-bit Intel® processors, also use the /Qx{K, W, N, B, P} or /Qax{K, W, N, B, P} options for Pentium® III and Pentium 4 processors and subsequent IA-32 processors. This option is useful for a broad range of applications, particularly for the loopy, kernel-based code common in highperformance computing. /fast -fast The -fast option maximizes speed across the entire program. For Itanium®- based systems, -fast sets -O3, -ipo, and -static. For IA-32 and systems using Intel® EM64T, -fast sets -O3, -ipo, -static, and -xP. /Zi -g Generates debug information for use with any of the common platform debuggers. Floating-Point Arithmetic Optimizations The Intel compilers provide options for floating-point data and precision on IA-32, Intel processors with Intel EM64T, and Itanium architectures as described below. More information and finer-grained options are available in the User Guide under Floating- Point Arithmetic Optimizations. Windows* Command Linux* Command Comment /Op -mp Restricts optimizations to maintain declared precision and to ensure that floating-point arithmetic conforms more closely to the ANSI and IEEE standards. /Qprec -mp1 Improves floating-point precision, with less performance impact than /Op or -mp. /fp:name -fp-model name New option in the Intel® compilers 9.0, except Intel® Visual Fortran Compiler 9.0 for Windows*. The possible values of name are: precise – Enables only value-safe optimizations on floating point code double/extended/source – Enables intermediates to be computed in double, extended or source precision fast – Allows aggressive optimizations at the expense of accuracy (this is the default) except – Enables floating point exception semantics strict – Strictest mode of operation, enables both the precise and except options and disables contractions Windows* Command Linux* Command Comment /Qip -ip Single file optimization. Allows selective inlining optimization within a single source file. /Qipo[value] -ipo[value] Permits inlining and other optimizations among multiple source files. The optional value argument controls the maximum number of link-time compilations (or number of object files) that are spawned. The default for value is 1. /Qipo_ separate -ipo_separate Creates one object file for every source file. This option overrides –ipo[value]. /Qprof_gen -prof_gen Instruments a program for profiling. /Qprof_dirdir -prof_dirdir Specifies a directory for the profiling output files, *.dyn and *.dpi. /Qprof_use -prof_use Enables the use of profiling information during optimization. Figure 1. Profile-Guided Optimization (PGO) Steps Comment Description of compiler flags for Intel FORTRAN Compiler 9.0 ------------------------------------------------------------- G1 Targets optimization for the Itanium® processor. /G2 Targets optimization for the Itanium 2 processor. Generated code is also compatible with the Itanium processor (default). /QIPF_fma[-] Enables [disables] the combining of floating-point multiply operations and add/ subtract operations. /QIPF_fp_ speculationmode Enables floating-point speculations with one of the following modes: fast – Speculate floatingpoint operations. off – Disables speculation of floating-point operations. safe – Speculate only when safe. strict – This is the same as specifying off. /Qauto_ilp32 Specifies that the application cannot exceed a 32-bit address space, which allows the compiler to use 32-bit pointers whenever possible. To use this option, you must also specify -ipo[value]. option on programs that can exceed 32-bit address space (2**32) may cause unpredictable results during program execution. This option has no effect on systems with Intel® EM64T unless the -axP or -xP option is also used. /Qftz[-] Flushes denormal results to zero. The option is turned ON with -O3 by default. This only impacts the application when the main program or dll main is compiled. /Qivdep_parallel Indicates that there is absolutely no loopcarried memory dependency in the loop where the IVDEP directive is specified. /QIPF_fltacc[-] Enables [disables] optimizations that affect floating-point accuracy. /QIPF_flt_eval_ method{0|2} Evaluates floating-point operands to the precision indicated by the program. /Qunroll[n] Sets the maximum number of times to unroll loops. -unroll0 disables loop unrolling. The default is -unroll, which uses default heuristics. /Qrestrict[-] Enables [disables] pointer disambiguation with the restrict qualifier. -falias Assumes aliasing in the program. /Oa Assumes no aliasing in program. /Ow Assumes no aliasing within functions, but assumes aliasing across calls. /Qalias_args[-] Implies arguments may be aliased [not aliased]. /Qopt_report Generates an optimization report directed to stderr. /Qopt_report_ filefilename Specifies the filename for the optimization report. /Qopt_report_ levellevel Specifies the verbosity level of the output. Valid arguments are min (default), med, and max. /Qopt_report_ phase name Specifies the compilation name for which reports are generated. The option can be used multiple times in the same compilation to get output from multiple phases. Valid nam e arguments are as follows: ipo – Interprocedural Optimizer hlo – High Level Optimizer ilo – Intermediate Language Scalar Optimizer ecg – Code Generator omp – OpenMP* all – All phases /Qopt_report_ routine [rtn] -opt_report_ Specifies a routine rtn. Generates reports from all routines with names that include rtn as part of the name. By default, generates reports for all routines. /Qopt_report_ help Displays all possible settings for -opt_report_ phase. No compilation is performed /Qopenmp Enables the parallelizer to generate multithreaded code based on the OpenMP* directives. /Qopenmp_ report{0|1|2} Controls the OpenMP parallelizer’s diagnostic levels. The default is /Qopenmp_ report1. /Qparallel Detects parallel loops capable of being executed safely in parallel and automatically generates multithreaded code for these loops. /Qpar_ report{0|1|2|3} Controls the auto-parallelizer’s diagnostic levels as follows: 0 – Displays no diagnostic information. 1 – Indicates loops successfully parallelized (default). 2 – Loops successfully and unsuccessfully parallelized. 3 – Adds information about any proven or assumed dependencies inhibiting parallelization. /Qpar_ threshold[n] Sets a threshold for the autoparallelization of loops based on the probability of profitable execution of the loop in parallel, n=0 to 100. Default: n=75. This option is used for loops whose computation work volume cannot be determined at compile time. 0 – Parallelize loops regardless of computation work volume. 100 – Parallelize loops only if profitable parallel execution is almost certain. /Od (No Optimization) No optimization. Useful during application development and debugging. /O1 (Optimize for size) Omits optimizations that tend to increase object size. Creates the smalles optimized code in most cases. O Linux systems with IA-32 processors only, there is no difference between -O1 and -O2. This option is useful in many large server/database applications where memory paging due to larger code size is an issue. /O2 (Maximize speed) Default setting. Creates the fastest code in most cases, but may increase code size significantly over /O1. On Linux systems with IA-32 processors, -O1 and -O2 are equivalent. /Ox (Maximize optimization) Equivalent to /O2 except that /Ox does not imply /Gy (function packaging) or /Gf (string pooling). /O3 (High-level optimizations) Same as /O2, plus loop transformations and data prefetching for improved memory-usage efficiency. For the full benefit of /O3 on 32-bit Intel® processors, also use the /Qx{K, W, N, B, P} or /Qax{K, W, N, B, P} options for Pentium® III and Pentium 4 processors and subsequent IA-32 processors. This option is useful for a broad range of applications, particularly for the loopy, kernel-based code common in highperformance computing. /fast The -fast option maximizes speed across the entire program. For Itanium®- based systems, -fast sets -O3, -ipo, and -static. For IA-32 and systems using Intel® EM64T, -fast sets -O3, -ipo, -static, and -xP. /Zi Generates debug information for use with any of the common platform debuggers. Qprefetch[-] Enables or disables prefetch insertion (requires -O3). /Qfp_port Rounds floating-point results after floating-point operations, so rounding to user-declared precision happens at assignments and type conversions; this has some impact on speed. The default is to keep results of floating-point operations in higher precision. Use this if you are experiencing differences in floating-point precision versus other platforms. /Qvec_ report{0|1|2|3|4|5} Controls amount of vectorizer diagnostic information as follows: n = 0: no information n = 1: indicates vectorized loops (default) n = 2: indicates vectorized and non-vectorized loops n = 3: indicates vectorized and non-vectorized loops and prohibits data dependence information n = 4: indicates non-vectorized loops n = 5: indicates non-vectorized loops and prohibits data dependence information ---------------------------------------------------------------------------------------------------------------------------------------------------------- -Qansi_alias Directs the compiler to assume that the program adheres to the type-based aliasing rules defined in Section 6.5 of the ISO C Standard. If your program adheres to these rules, this option will allow the compiler to optimize more aggressively. If it doesn't adhere to these rules, it can cause the compiler to generate incorrect code. -Qscalar_rep(-) Enables(disables) scalar replacement performed during loop transformations (requires -O3). -Qrcd Enables fast float-to-int conversion -Qprefetch[-] Enables [disables] the insertion of software prefetching by the compiler. Default is -Qprefetch. -Qauto Causes all variables to be allocated on the stack, rather than in local static storage. Does not affect variables that appear in an EQUIVALENCE or SAVE statement, or those that are in COMMON. Makes all local variables AUTOMATIC, same as /4Ya. -FI Fixed-format F90 source code