The following text (courtesy of Intel Corp.) describes the compiler flags
used for Unisys CPU2000 and SPECompm results generated with the Intel 7.1 compilers on 
Linux-based systems.

Description of compiler flags for Intel C/C++ Compiler 7.1
------------------------------------------------------
-O1        Optimize for  maximum  speed,  but disable some optimizations 
           which increase code size for a small speed benefit. Turns off software
           pipelining to reduce code size. Enables the same optimizations as -O2 
           except for loop unrolling. Optimations include global register
           allocation, instruction scheduling, register variable detection, common 
           subexpression elimination, dead code elimination, variable renaming, copy
           propagation, constant propagation, strength reduction-induction variable, 
           and tail recursion elimination. 	

-O2        Enable optimizations (DEFAULT) done with -O1 + software pipelining, 
           loop unrolling, and inlining of intrinsics.     

-O3        Enable -O2 plus more aggressive optimizations including loop 
           transformation, OpenMP, and prefetching.  High-level optimizations 
           use the properties of source code constructs such as loops and arrays 
           in applications written in high-level programming languages. 
           Optimizes for maximum speed, but may not improve performance 
           for some programs. Enable -O2 optimizations + prefetching, scalar 
           replacement, and loop transformations. 

-O0        Disable optimizations -O1, -O2 and -O3.

-O         Same as -O2

-ansi_alias Directs the compiler to assume the following:
           .Arrays are not accessed out of bounds.
           .Pointers are not cast to non-pointer types, and vice-versa.
           .References to objects of two different scalar types cannot alias.
            For example, an object of type int cannot alias with an object of
            type float, or an object of type float cannot alias with an object
            of type double.
           If your program satisfies the above conditions, setting the -ansi_alias
           flag will help the complier better optimize the program.

-ip        Enable single-file IP optimizations 

-ipo       Multi-file ip optimizations that includes:
              - inline function expansion
              - interprocedural constant propogation
              - dead code elimination
              - propagation of function characteristics
              - passing arguments in registers
              - loop-invariant code motion

-openmp    Enables the parallelizer to generate multi-threaded code based on the 
           OpenMP directives.  The -openmp option only works at an optimization
           level of -O2 (the default) or higher.

-prof_gen  Instrument program for profiling for the first phase of 
           two-phase profile guided optimization

-prof_use  Instructs the compiler to produce a profile-optimized 
           executable and merges available dynamic information (.dyn) 
           files into a pgopti.dpi file. If you perform multiple 
           executions of the instrumented program, -prof_use merges 
           the dynamic information files again and overwrites the 
           previous pgopti.dpi file.
           Without any other options, the current directory is 
           searched for .dyn files

-static    Prevents linking with shared libraries 


Description of compiler flags for Intel FORTRAN Compiler 7.1
------------------------------------------------------------
-O1        Optimize for  maximum  speed,  but disable some optimizations 
           which increase code size for a small speed benefit. Turns off software
           pipelining to reduce code size. Enables the same optimizations as -O2 
           except for loop unrolling. Optimations include global register
           allocation, instruction scheduling, register variable detection, common 
           subexpression elimination, dead code elimination, variable renaming, copy
           propagation, constant propagation, strength reduction-induction variable, 
           and tail recursion elimination. 	

-O2        Enable optimizations (DEFAULT) done with -O1 + software pipelining, 
           loop unrolling, and inlining of intrinsics.     

-O3        Enable -O2 plus more aggressive optimizations including loop 
           transformation, OpenMP, and prefetching.  High-level optimizations 
           use the properties of source code constructs such as loops and arrays 
           in applications written in high-level programming languages. 
           Optimizes for maximum speed, but may not improve performance 
           for some programs. Enable -O2 optimizations + prefetching, scalar 
           replacement, and loop transformations. 

-72,-80,-132 Specifies 72, 80 or 132 column lines for fixed form source only.

-FI        Specifies that the source code is in fixed format (default for .for, 
           .f, or .ftn). 

-ftz       Enable flush-to-zero mode, where gradual underflows are 
           flushed to zero.

-ip        Enable single-file IP optimizations

-ipo       Multi-file ip optimizations that includes:
              - inline function expansion
              - interprocedural constant propogation
              - dead code elimination
              - propagation of function characteristics
              - passing arguments in registers
              - loop-invariant code motion

-openmp    Enables the parallelizer to generate multi-threaded code based on the 
           OpenMP directives.  The -openmp option only works at an optimization
           level of -O2 (the default) or higher.

-prof_gen   Instrument program for profiling for the first phase of 
            two-phase profile guided optimization

-prof_use   Instructs the compiler to produce a profile-optimized 
            executable and merges available dynamic information (.dyn) 
            files into a pgopti.dpi file. If you perform multiple 
            executions of the instrumented program, -prof_use merges 
            the dynamic information files again and overwrites the 
            previous pgopti.dpi file.
            Without any other options, the current directory is 
            searched for .dyn files

-stack_temps Tells the compiler to allocate temporaries on the stack whenever possible.
            OpenMP programs which repeatedly allocate heap memory sometimes show poor
            performance as the number of threads increase.  Allocating arrays on the 
            stack using -stack_temps will sometimes eliminate such performance problems. 

-static     Prevents linking with shared libraries


ALTERNATE PEAK SOURCE:

SPEC OMPL2001 (srcalt=ompl) 64-bit source modified for OMPM2001
Used for 320.wquake_m, 324.apsi_m, 328.fma3d_m


USER ENVIRONMENT:

Variable                   Description                                                                

OMP_SCHEDULE          Sets the run-time schedule type and chunk size.                       
                      Default is static, no chunk size specified.   
                                                  
OMP_NUM_THREADS       Sets the number of threads to use during execution.
                      Default is the number of processors.

OMP_DYNAMIC           Enables (true) or disables (false) the dynamic adjustment       
                      of the number of threads. Default is false. 

OMP_NESTED            Enables (true) or disables (false) nested parallelism.
                      Default is false.

KMP_STACKSIZE         Sets the number of bytes to allocate for each parallel  
                      thread to use as its private stack. Use the optional    
                      suffix b, k, m, g, or t, to specify bytes, kilobytes,
                      megabytes, gigabytes, or terabytes. Default for IA-32 is
                      2m. Default for Itanium is 4m.

KMP_LIBRARY           Selects the OpenMP runtime library throughput.                                                                                     
                      The options for the variable value are: serial,
                      turnaround, or throughput indicating the execution
                      mode. The default value of throughput is used if this
                      variable is not specified.


KMP_LIBRARY Execution Modes.

The compiler with OpenMP enables you to run an application under different execution
modes that can be specified at run time. The libraries support the serial, turnaround,
and throughput modes. These modes are selected by using the kmp_library environment
variable at run time.

Serial
The serial mode forces parallel applications to run on a single processor.

Turnaround
In a dedicated (batch or single user) parallel environment where all processors are
exclusively allocated to the program for its entire run, it is most important to
effectively utilize all of the processors all of the time. The turnaround mode is
designed to keep active all of the processors involved in the parallel computation in
order to minimize the execution time of a single job. In this mode, the worker threads
actively wait for more parallel work, without yielding to other threads.
Note:
Avoid over-allocating system resources. This occurs if either too many threads have been
specified, or if too few processors are available at run time. If system resources are
over-allocated, this mode will cause poor performance. The throughput mode should be
used instead if this occurs.

Throughput
In a multi-user environment where the load on the parallel machine is not constant or
where the job stream is not predictable, it may be better to design and tune for
throughput. This minimizes the total time to run multiple jobs simultaneously. In this
mode, the worker threads will yield to other threads while waiting for more parallel work.
The throughput mode is designed to make the program aware of its environment (that is,
the system load) and to adjust its resource usage to produce efficient execution in a
dynamic environment. This mode is the default.


SYSTEM TUNABLES:

ulimit -s    Provides control over the resources available to the shell and to
             processes started by it.
             The -s option sets the maximum stack size (kbytes).