MPI2007 Result Flag Description

Base Optimization Flags

C benchmarks

- -O3
- COPTIMIZE
- Enables O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enables optimizations for maximum speed, such as:
  - Loop unrolling, including instruction scheduling
  - Code replication to eliminate branches
  - Padding the size of certain power-of-two arrays to allow more efficient cache use.
  On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times.
  The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations.
  The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets. On IA-32 Windows platforms, -O3 sets the following:
  
  /GF (/Qvc7 and above), /Gf (/Qvc6 and below), and /Ob2
- Includes:
  - -GF
  - -Gf
  - -Ob_n
  - -O2
    - -Oi-
    - -Gs
    - -Oy
    - -Gy
    - -Os
    - -GF
    - -Gf
    - -Ob_n
    - -Og
    - -O1
      
      -unroll_n
      
      -Oi-
      
      -Op-
      
      -Oy
      
      -Gy
      
      -Os
      
      -GF
      
      -Gf
      
      -Ob_n
      
      -Og
- -xCORE-AVX2
- COPTIMIZE
- May generate Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2), Intel(R) AVX, SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions for Intel(R) processors. Optimizes for Intel(R) processors that support Intel(R) AVX2 instructions.
- -no-prec-div
- COPTIMIZE
- (disable/enable[default] -[no-]prec-div)
  -prec-div improves precision of floating-point divides. It has a slight impact on speed. -no-prec-div disables this option and enables optimizations that give slightly less precise results than full IEEE division.
  
  When you specify -no-prec-div along with some optimizations, such as -xN and -xB (Linux) or /QxN and /QxB (Windows), the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.
  
  However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, do not use -no-prec-div which will enable the default -prec-div and the result is more accurate, with some loss of performance.

C++ benchmarks

126.lammps

- -O3
- CXXOPTIMIZE
- Enables O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enables optimizations for maximum speed, such as:
  - Loop unrolling, including instruction scheduling
  - Code replication to eliminate branches
  - Padding the size of certain power-of-two arrays to allow more efficient cache use.
  On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times.
  The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations.
  The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets. On IA-32 Windows platforms, -O3 sets the following:
  
  /GF (/Qvc7 and above), /Gf (/Qvc6 and below), and /Ob2
- Includes:
  - -GF
  - -Gf
  - -Ob_n
  - -O2
    - -Oi-
    - -Gs
    - -Oy
    - -Gy
    - -Os
    - -GF
    - -Gf
    - -Ob_n
    - -Og
    - -O1
      
      -unroll_n
      
      -Oi-
      
      -Op-
      
      -Oy
      
      -Gy
      
      -Os
      
      -GF
      
      -Gf
      
      -Ob_n
      
      -Og
- -xCORE-AVX2
- CXXOPTIMIZE
- May generate Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2), Intel(R) AVX, SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions for Intel(R) processors. Optimizes for Intel(R) processors that support Intel(R) AVX2 instructions.
- -no-prec-div
- CXXOPTIMIZE
- (disable/enable[default] -[no-]prec-div)
  -prec-div improves precision of floating-point divides. It has a slight impact on speed. -no-prec-div disables this option and enables optimizations that give slightly less precise results than full IEEE division.
  
  When you specify -no-prec-div along with some optimizations, such as -xN and -xB (Linux) or /QxN and /QxB (Windows), the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.
  
  However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, do not use -no-prec-div which will enable the default -prec-div and the result is more accurate, with some loss of performance.
- -ansi-alias
- CXXOPTIMIZE
- Enable/disable(DEFAULT) use of ANSI aliasing rules in optimizations; user asserts that the program adheres to these rules.

Fortran benchmarks

- -O3
- FOPTIMIZE
- Enables O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enables optimizations for maximum speed, such as:
  - Loop unrolling, including instruction scheduling
  - Code replication to eliminate branches
  - Padding the size of certain power-of-two arrays to allow more efficient cache use.
  On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times.
  The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations.
  The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets. On IA-32 Windows platforms, -O3 sets the following:
  
  /GF (/Qvc7 and above), /Gf (/Qvc6 and below), and /Ob2
- Includes:
  - -GF
  - -Gf
  - -Ob_n
  - -O2
    - -Oi-
    - -Gs
    - -Oy
    - -Gy
    - -Os
    - -GF
    - -Gf
    - -Ob_n
    - -Og
    - -O1
      
      -unroll_n
      
      -Oi-
      
      -Op-
      
      -Oy
      
      -Gy
      
      -Os
      
      -GF
      
      -Gf
      
      -Ob_n
      
      -Og
- -xCORE-AVX2
- FOPTIMIZE
- May generate Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2), Intel(R) AVX, SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions for Intel(R) processors. Optimizes for Intel(R) processors that support Intel(R) AVX2 instructions.
- -no-prec-div
- FOPTIMIZE
- (disable/enable[default] -[no-]prec-div)
  -prec-div improves precision of floating-point divides. It has a slight impact on speed. -no-prec-div disables this option and enables optimizations that give slightly less precise results than full IEEE division.
  
  When you specify -no-prec-div along with some optimizations, such as -xN and -xB (Linux) or /QxN and /QxB (Windows), the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.
  
  However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, do not use -no-prec-div which will enable the default -prec-div and the result is more accurate, with some loss of performance.

Benchmarks using both Fortran and C

- -O3
- COPTIMIZE, FOPTIMIZE
- Enables O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enables optimizations for maximum speed, such as:
  - Loop unrolling, including instruction scheduling
  - Code replication to eliminate branches
  - Padding the size of certain power-of-two arrays to allow more efficient cache use.
  On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times.
  The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations.
  The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets. On IA-32 Windows platforms, -O3 sets the following:
  
  /GF (/Qvc7 and above), /Gf (/Qvc6 and below), and /Ob2
- Includes:
  - -GF
  - -Gf
  - -Ob_n
  - -O2
    - -Oi-
    - -Gs
    - -Oy
    - -Gy
    - -Os
    - -GF
    - -Gf
    - -Ob_n
    - -Og
    - -O1
      
      -unroll_n
      
      -Oi-
      
      -Op-
      
      -Oy
      
      -Gy
      
      -Os
      
      -GF
      
      -Gf
      
      -Ob_n
      
      -Og
- -xCORE-AVX2
- COPTIMIZE, FOPTIMIZE
- May generate Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2), Intel(R) AVX, SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions for Intel(R) processors. Optimizes for Intel(R) processors that support Intel(R) AVX2 instructions.
- -no-prec-div
- COPTIMIZE, FOPTIMIZE
- (disable/enable[default] -[no-]prec-div)
  -prec-div improves precision of floating-point divides. It has a slight impact on speed. -no-prec-div disables this option and enables optimizations that give slightly less precise results than full IEEE division.
  
  When you specify -no-prec-div along with some optimizations, such as -xN and -xB (Linux) or /QxN and /QxB (Windows), the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.
  
  However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, do not use -no-prec-div which will enable the default -prec-div and the result is more accurate, with some loss of performance.

Implicitly Included Flags

This section contains descriptions of flags that were included implicitly by other flags, but which do not have a permanent home at SPEC.

System and Other Tuning Information

SGI MPT 2.0x options and environment variables

Job startup command and options

The mpiexec_mpt command launches a Message Passing Toolkit (MPT) MPI program in a batch scheduler-managed cluster environment. mpiexec_mpt uses the list of cluster nodes it receives from the batch scheduler to generate and issue an appropriate mpirun command to launch the multi-node job.

The PBS Pro's mpiexec command provides the standard mpiexec interface on the Altix running ProPack 4 or greater. It provides equivalent functionality to mpiexec_mpt.

Environment variables

Determines the maximum number of nonblocking sends and receives that can simultaneously exist for any single MPI process. MPI generates an error message if this limit (or the default, if not set) is exceeded. Default: 16384

Determines the maximum number of data types that can simultaneously exist for any single MPI process. MPI generates an error message if this limit (or the default, if not set) is exceeded. Default: 1024

If the MPI library uses the IB driver as the inter-host interconnect it will by default use a single IB fabric. If this is set to 2, the library will try to make use of multiple available separate IB fabrics and split MPI traffic across them. Default: 1

For very large MPI jobs the time and resource cost to create an InfiniBand connection between every pair of ranks at job start time may be prodigious. If this variable is set to a number no greater than the number of ranks, then MPT will create InfiniBand RC Queue Pairs (QPs) lazily on a demand basis. If this variable is set to a number greater than the number of ranks then MPT will attempt to allocate all the InfiniBand RC QPs it needs at job start. If this varibale is not modified and InfiniBand RC QPs are in use then MPT will compare the default value against the number of ranks with the above criteria. If this is not modified and InfiniBand XRC QPs are in use then MPT will attempt to allocate the QPs at job launch but may need to switch to lazy allocation if the space needs are too large. Default: 1025

Directs MPT to open specific IB ports in each rank. If MPI_IB_DEVS is empty or not defined, MPT will assign ranks to IB ports by the formula "local rank modulo number of ports." The first rank on each host will use the first port on that host, etc. By default MPT will only use the first working port on the first HCA with a working port.

Other Tuning Information

Removes limits on the maximum size of the automatically- extended stack region of the current process and each process it creates.

For questions about the meanings of these flags, please contact the tester.
For other inquiries, please contact webmaster@spec.org
Copyright 2006-2010 Standard Performance Evaluation Corporation
Tested with SPEC MPI2007 v2.0.1.
Report generated on Thu Mar 31 11:05:38 2016 by SPEC MPI2007 flags formatter v1445.

MPI2007 Flag Description
SGI SGI Rackable C2112-4GP3 (Intel Xeon E5-2699 v4, 2.20 GHz)

Base Compiler Invocation

C benchmarks

C++ benchmarks

126.lammps

Fortran benchmarks

Benchmarks using both Fortran and C

Base Portability Flags

121.pop2

127.wrf2

130.socorro

Base Optimization Flags

C benchmarks

C++ benchmarks

126.lammps

Fortran benchmarks

Benchmarks using both Fortran and C

Base Other Flags

C benchmarks

C++ benchmarks

126.lammps

Fortran benchmarks

Benchmarks using both Fortran and C

Implicitly Included Flags

System and Other Tuning Information

SGI MPT 2.0x options and environment variables

Job startup command and options

Environment variables

Other Tuning Information

	Indicates that the flag description came from the user flags file.
	Indicates that the flag description came from the suite-wide flags file.
	Indicates that the flag description came from a per-benchmark flags file.

MPI2007 Flag DescriptionSGI SGI Rackable C2112-4GP3 (Intel Xeon E5-2699 v4, 2.20 GHz)

Base Compiler Invocation

Base Portability Flags

Base Optimization Flags

Base Other Flags

Implicitly Included Flags

SGI MPT 2.0x options and environment variables

Job startup command and options

Environment variables

Other Tuning Information

MPI2007 Flag Description
SGI SGI Rackable C2112-4GP3 (Intel Xeon E5-2699 v4, 2.20 GHz)