MPI2007 Frequently asked questions

Version 2.04

Last updated: 14 Jan 2010 10:45PM cgp

Abbreviated Contents

This FAQ Document

The SPEC Organization

MPI2007 Benchmarks

Benchmarking

MPI2007 Details

Variations Between MPI2007 Releases

Relation to other SPEC suites

Steps of usage, including troubleshooting

Getting Started

Installation

Invoking runspec

Building the executables

Setting up the run directories

Running the benchmarks

Validating a Run

MPI2007 Measurements

Metrics

Timing

Reporting Run Results

(Click on an item above, to go to the detailed contents about that item.)

Detailed Contents

This FAQ Document

Document.01: What is this document for?

Document.02: How is this document organized?

Document.03: Who wrote this document?

Document.04: Where is the latest version of this document?

Document.05: What other documentation is there?

Document.06: Where is the list of known problems with SPEC MPI2007?

Document.07: Now that I have read this document, what should I do next?

The SPEC Organization

SPEC.01: What is SPEC?

SPEC.02: Where do I find more information on SPEC?

SPEC.03: How do I join SPEC?

SPEC.04: How do I contact SPEC for more information or for technical support?

MPI2007 Benchmarks

Benchmarking

MPI2007.Benchmarking.01: What is a benchmark?

MPI2007.Benchmarking.02: Should I benchmark my own application?

MPI2007.Benchmarking.03: If not my own application, then what?

MPI2007.Benchmarking.04: Some of the benchmark names sound familiar; are these comparable to other programs?

MPI2007 Details

MPI2007.Details.01: What does SPEC MPI2007 measure?

MPI2007.Details.02: Why use SPEC MPI2007?

MPI2007.Details.03: What are the limitations of SPEC MPI2007?

MPI2007.Details.04a: What are the limits on the medium data set for SPECmpi2007?

MPI2007.Details.04b: What are the limits on the large data set for SPECmpi2007?

MPI2007.Details.05: What kinds of parallelism can I use with MPI2007?

MPI2007.Details.06: Does SPEC MPI2007 have "speed" runs and "rate" runs?

MPI2007.Details.07: What criteria were used to select the benchmarks?

MPI2007.Details.08: What source code is provided? What exactly makes up these suites?

MPI2007.Details.09: How do benchmarks combine C and Fortran sources?

MPI2007.Details.10: Do I need both C and Fortran compilers for all the combined-language benchmarks?

MPI2007.Details.11: What am I allowed to do with the source codes?

Variations Between MPI2007 Releases

MPI2007.Versions.1a: How does Version 2.0 of MPI2007 differ from Version 1.1?

MPI2007.Versions.1b: How did Version 1.1 of MPI2007 differ from Version 1.0?

MPI2007.Versions.2: Now that Version 2.0 is released, can I still make submissions with the Version 1.1 or Version 1.0?

MPI2007.Versions.3a: Are results from Versions 1.0, 1.1 and 2.0 directly compariable with each other?

MPI2007.Versions.3b: Is there a way to translate measurements from the medium to the large suite or vice versa?

Relation to other SPEC suites

MPI2007.Relatives.01: SPEC CPU2006 and OMP2001 and already available. Why create SPEC MPI2007? Will it show anything different?

MPI2007.Relatives.02: Aren't some of the SPEC MPI2007 benchmarks already in SPEC CPU2006 and HPC2002? How are they different?

MPI2007.Relatives.03: Why were most of the benchmarks not carried over from CPU2006, HPC2002 or OMP2001?

MPI2007.Relatives.04: What happens to SPEC HPC2002 now that SPEC MPI2007 has been released?

MPI2007.Relatives.05: Is there a way to translate measurements of other suites to SPEC MPI2007 results or vice versa?

Steps of usage, including troubleshooting

Getting Started

Usage.GettingStarted.01: What does the user of the SPEC MPI2007 suite have to provide?

Usage.GettingStarted.02: What are the basic steps in running the benchmarks?

Installation

Usage.Installation.01: How/where do I get the MPI2007 package?

Usage.Installation.02: How am I notified of updates?

Usage.Installation.03: What is included in the SPEC MPI2007 package?

Usage.Installation.04: How do I install the package?

Usage.Installation.05: Why am I getting a message such as "./install.sh: /bin/sh: bad interpreter: Permission denied"?

Usage.Installation.06: The DVD drive is on system A, but I want to install on system B. What do I do?

Usage.Installation.07: Why did the installation fail with "Error occurred while processing: C:\Documents"?

Usage.Installation.08: How do I uninstall? This script uninstall.sh doesn't seem to be it.

Usage.Installation.09: What if the tools cannot be run or built on a system? Can the benchmarks be run manually?

Invoking runspec

Usage.Invocation.01: What parameters do I pass to runspec?

Usage.Invocation.02: Can I (or should I) use an old config file?

Usage.Invocation.03: I'm using a config file that came with the MPI2007 image. Why isn't it working?

Usage.Invocation.04: How do I write a config file?

Usage.Invocation.05: Is there a config file for Visual C++?

Usage.Invocation.06: When I say runspec, why does it say "Can't locate strict.pm"?

Usage.Invocation.07: Why am I getting messages about "specperl: bad interpreter: No such file or directory"?

Building the executables

Usage.Building.01: How long does it take to compile the benchmarks?

Usage.Building.02: How much memory do I need to compile the benchmarks?

Usage.Building.03: Can I build the benchmarks as either 32- or 64- bit binaries?

Usage.Building.04: I'm using Windows and I can't build any benchmarks. What does "CreateProcess((null), ifort ...) failed" mean?

Usage.Building.05: The file make.clean.err is mentioned, but it does not exist. Why not?

Usage.Building.06: Why is it rebuilding the benchmarks?

Usage.Building.07: Building one of the benchmark fails. What should I do?

Usage.Building.08: Does the arrangement of compilation flags matter?

Usage.Building.09: What is a "Portability Flag"? How does it differ from an "Optimization Flag"?

Usage.Building.10: How are compilation flags managed for the combined-language benchmarks?

Usage.Building.11: How can I build different .o files using different flags?

Setting up the run directories

Usage.Setup.01: How much file system space do I need?

Usage.Setup.02: Why does the large data set suite need more space on a Big Endian system?

Usage.Setup.03: Do I need a shared file system?

Usage.Setup.04: What does "hash doesn't match after copy" mean?

Usage.Setup.05: Why does it say "ERROR: Copying executable failed"?

Running the benchmarks

Usage.Running.01: How long does it take to run the SPEC MPI2007 benchmark suites on my platform?

Usage.Running.02: Why does this benchmark take so long to run?

Usage.Running.03a: How much memory does it take to run the medium benchmarks?

Usage.Running.03b: How much memory does it take to run the large benchmarks?

Usage.Running.04a: How are the large and medium benchmark suites connected to the ltest/ltrain/lref and mtest/mtrain/mref data sets?

Usage.Running.04b: Given that I stated medium or large for the suite, why do I also have to qualify mtest/mtrain/mref or ltest/train/lref instead of just saying test, train or ref?

Usage.Running.04c: Why did I get the error message "Benchmark does not support size ...?

Usage.Running.04d: Why did I get the error message "read_reftime: "/spec/mpi2007/benchspec/MPI2007/*/data/.../reftime" does not exist"?

Usage.Running.05: The running of one of the benchmark fails. What should I do?

Usage.Running.06: Why did 126.lammps fail?

Usage.Running.07: Why do I see large changes in runtime for 122.tachyon?

Usage.Running.08: How can I run a profiler, performance monitor, etc. in conjunction with a benchmark?

Usage.Running.09a: The MPI-2 Standard recommends that I use mpiexec to run each of the benchmarks. If I start the MPD process manager at the beginning, mpiexec ought to be more efficient than mpirun for running each benchmark. Why can't I do this?

Usage.Running.09b: I see published results that use the MPD and mpiexec. If I use the same configuration file as these, will my results be accepted?

Validating a Run

Usage.Validation.01: What is the difference between a valid and a reportable run?

Usage.Validation.02: I got a message about a miscompare...

Usage.Validation.03: How were the result tolerances chosen?

Usage.Validation.04: The benchmark ran, but it took less than 1 second and there was a miscompare. Help!

Usage.Validation.05: The .mis file says short

Usage.Validation.06: My compiler is generating bad code! Help!

Usage.Validation.07: The code is bad even with low optimization! Help!

Usage.Validation.08: I looked in the .mis file and it was just full of a bunch of numbers.

MPI2007 Measurements

Metrics

Measurements.Metrics.01: What metrics can be measured?

Measurements.Metrics.02: What is the difference between a base metric and a peak metric?

Measurements.Metrics.03a: What are the different data-set sizes?

Measurements.Metrics.03b: Why aren't performance metrics generated for the test and train data sets?

Measurements.Metrics.03c: Is there a way to translate measurements between the test or train and ref data sets?

Measurements.Metrics.04: Which SPEC MPI2007 metric should be used to compare performance?

Timing

Measurements.Timing.01: Why does SPEC use reference machines? What machines were used for SPEC MPI2007?

Measurements.Timing.02: How long does it take to run the SPEC MPI2007 benchmark suites on their reference platforms?

Measurements.Timing.03: The reports don't list the reference times. Where can I find them?

Measurements.Timing.04: Why aren't the scores for the reference platform all 1's?

Measurements.Timing.05: I bought a machine from company XYZ. It doesn't score as high as what was published...

Measurements.Timing.06: The machines we're shipping do not yield the same measurements as what we published...

Measurements.Timing.07: How was the 5% tolerance decided?

Reporting Run Results

Measurements.Reporting.01: Where are SPEC MPI2007 results available?

Measurements.Reporting.02: How do I submit my results?

Measurements.Reporting.03: How do I edit a submitted report?

Measurements.Reporting.04: How do I determine the Hardware & Software availability dates? What about Firmware?

Measurements.Reporting.05: How do I describe my Interconnect topology?

Measurements.Reporting.06: What do I do if a flag or library changes in the GA?

Measurements.Reporting.07: What's all this about "Submission Check -> FAILED" littering my log file and my screen?

Measurements.Reporting.08: What is a "flags file"? What does "Unknown Flags" mean in a report?

Measurements.Reporting.09: Can SPEC MPI2007 results be published outside of the SPEC web site? Do the rules still apply?

Measurements.Reporting.10: Can I use "derived" metrics, such as cost/performance or flops?

Measurements.Reporting.11: It's hard to cut/paste into my spreadsheet...

This FAQ Document

Document.01: What is this document for?

This document answers Frequently Asked Questions and provides background information about the SPEC MPI2007 benchmark suite, as well as providing usage and troubleshooting details. SPEC hopes that this material will help you understand what the benchmark suite can, and cannot, provide; that it will serve as a shortcut reference for following the process and solving problems; and that it will help you make efficient use of the product.

Document.02: How is this document organized?

This document is organized as a series of questions and answers. It follows the natural process flow from the principles of SPEC and the MPI2007 benchmarks, to the steps of benchmarking a real system (including dealing with problems you may run into), and finally what to do with the benchmark measurements. Hopefully this arrangement will allow you to find your question in the list by following the categories.

Document.03: Who wrote this document?

This document is the merger of earlier "faq" and "readme1st" documents developed for SPEC/CPU2006, which in turn had evolved through earlier releases of SPEC/cpu. Contributors include

Document.04: Where is the latest version of this document?

The latest version of this document may be found at http://www.spec.org/mpi2007/Docs/faq.html.

Document.05: What other documentation is there?

The website http://www.spec.org/mpi2007/Docs/ contains links to all the MPI2007 documents, describing the benchmark programs, how to install and run them, and technical support and other documents.

On an installed machine, the Docs directory contains copies of these documents that are current at the time the MPI2007 image was packaged.

Document.06: Where is the list of known problems with SPEC MPI2007?

Please see the link http://www.spec.org/mpi2007/Docs/errata.html for an updated list of known problems with the SPEC MPI2007 suite.

Document.07: Now that I have read this document, what should I do next?

If you haven't bought MPI2007, it is hoped that you will consider doing so. If you are ready to get started using the suite, then you should pick a system that meets the requirements as described in

system-requirements.html

and install the suite, following the instructions in

install-guide-unix.html

or

install-guide-windows.html

Next, read runrules.html on how to perform a reportable run. The runspec.html shows how to execute the benchmarks and config.html shows how to construct a configuration file that controls the compilation and execution of the benchmarks.

The SPEC Organization

SPEC.01: What is SPEC?

SPEC is the Standard Performance Evaluation Corporation. SPEC is a non-profit organization whose members include computer hardware vendors, software companies, universities, research organizations, systems integrators, publishers and consultants. SPEC's goal is to establish, maintain and endorse a standardized set of relevant benchmarks for computer systems. Although no one set of tests can fully characterize overall system performance, SPEC believes that the user community benefits from objective tests which can serve as a common reference point.

SPEC.02: Where do I find more information on SPEC?

The website http://www.spec.org/ contains information on membership, the benchmark suites, and the posted benchmark results.

SPEC.03: How do I join SPEC?

The website http://www.spec.org/spec/membership.html describes the SPEC organization and how to join each of the component committees.

SPEC.04: How do I contact SPEC for more information or for technical support?

SPEC can be contacted in several ways. For general information, including other means of contacting SPEC, please see SPEC's Web Site at:

http://www.spec.org/

General questions can be emailed to: info@spec.org
MPI2007 Technical Support Questions can be sent to: mpi2007support@spec.org

MPI2007 Benchmarks

Benchmarking

MPI2007.Benchmarking.01: What is a benchmark?

A benchmark is "a standard of measurement or evaluation" (Webster’s II Dictionary). A computer benchmark is typically a computer program that performs a strictly defined set of operations - a workload - and returns some form of result - a metric - describing how the tested computer performed. Computer benchmark metrics usually measure speed: how fast was the workload completed; or throughput: how many workload units per unit time were completed. Running the same computer benchmark on multiple computers allows a comparison to be made.

MPI2007.Benchmarking.02: Should I benchmark my own application?

Ideally, the best comparison test for systems would be your own application with your own workload. Unfortunately, it is often impractical to get a wide base of reliable, repeatable and comparable measurements for different systems using your own application with your own workload. Problems might include generation of a good test case, confidentiality concerns, difficulty ensuring comparable conditions, time, money, or other constraints.

MPI2007.Benchmarking.03: If not my own application, then what?

You may wish to consider using standardized benchmarks as a reference point. Ideally, a standardized benchmark will be portable, and may already have been run on the platforms that you are interested in. However, before you consider the results you need to be sure that you understand the correlation between your application/computing needs and what the benchmark is measuring. Are the benchmarks similar to the kinds of applications you run? Do the workloads have similar characteristics? Based on your answers to these questions, you can begin to see how the benchmark may approximate your reality.

Note: A standardized benchmark can serve as reference point. Nevertheless, when you are doing vendor or product selection, SPEC does not claim that any standardized benchmark can replace benchmarking your own actual application.

MPI2007.Benchmarking.04: Some of the benchmark names sound familiar; are these comparable to other programs?

Many of the SPEC benchmarks are derived from publicly available application programs. The individual benchmarks in this suite may be similar, but are NOT identical to benchmarks or programs with similar names which may be available from sources other than SPEC. In particular, SPEC has invested significant effort to improve portability and to minimize hardware dependencies, to avoid unfairly favoring one hardware platform over another. For this reason, the application programs in this distribution may perform differently from commercially available versions of the same application.

Therefore, it is not valid to compare SPEC MPI2007 benchmark results with anything other than other SPEC MPI2007 benchmark results.

MPI2007 Details

MPI2007.Details.01: What does SPEC MPI2007 measure?

SPEC MPI2007 focuses on performance of compute intensive applications using the Message-Passing Interface (MPI), which means these benchmarks emphasize the performance of:

It is important to remember the contribution of all these components. SPEC MPI performance intentionally depends on more than just the processor.

SPEC MPI2007 is not intended to stress other computer components such as the operating system, graphics, or the I/O system. They may have an effect in some cases, if the component is exceptionally slow or if there is an unusually small resource bound, e.g. paging delays due to a slow disk or too little memory. Note that there are many other SPEC benchmarks, including benchmarks that specifically focus on graphics, distributed Java computing, webservers, and network file systems.

MPI2007.Details.02: Why use SPEC MPI2007?

SPEC MPI2007 provides a comparative measure of MPI-parallel, floating point, compute intensive performance, across the widest practical range of cluster and SMP hardware. If this matches with the type of workloads you are interested in, SPEC MPI2007 provides a good reference point.

Other advantages to using SPEC MPI2007 include:

MPI2007.Details.03: What are the limitations of SPEC MPI2007?

As described above, the ideal benchmark for vendor or product selection would be your own workload on your own application. Please bear in mind that no standardized benchmark can provide a perfect model of the realities of your particular system and user community.

MPI2007.Details.04a: What are the limits on the medium data set for SPECmpi2007?

The medium data set is intended to work with a range of rank sizes from 4 to 128 ranks. Valid runs up to 512 ranks have been reported but some workloads, notably 113.GemsFDTD, do not scale positively beyond 256 ranks.

MPI2007.Details.04b: What are the limits on the large data set for SPECmpi2007?

The large data set is intended to work with a range of rank sizes from 64 to 2048 ranks. 121.pop2 is not designed to work below 64 ranks. 137.lu and 145.lGemsFDTD fail above 2048 ranks, while 132.zeusmp2 will limit itself to 2048 ranks if more are specified. Some of the other benchmarks will fail at higher limits.

MPI2007.Details.05: What kinds of parallelism can I use with MPI2007?

The benchmarks utilize process-level parallelism as managed by the MPI standard. Thread-level parallelism is not allowed:

Note that this is a deviation from CPU2006, OMP2001, and HPC2002. Low-level hardware parallelism is allowed to the extent that it does not require parallel software threads.

MPI2007.Details.06: Does SPEC MPI2007 have "speed" runs and "rate" runs?

A SPEC/CPU benchmark is intended to run on a single processor core. On multiprocessor systems, a "rate" run consists of multiple copies of the benchmark running on the system concurrently. This measures the total system computing capacity, as opposed to a "speed" run which only runs a single copy on an otherwise empty system.

SPEC/HPG benchmarks are parallel programs, so only one copy of the benchmark runs on the system at a time. The program itself uses multiple processor cores. A SPEC OMP2001 benchmark decomposes into multiple threads under the OpenMP standard, while a SPEC MPI2007 benchmark decomposes into multiple processes under the MPI standard. These have characteristics common with both "speed" and "rate" runs but do not precisely correspond to either one.

Note that an OMP2001 or MPI2007 benchmark usually runs faster on larger systems, since the problem is divided into smaller pieces which execute more quickly, while with a SPEC CPU "rate" run, a benchmark tends to take longer to run on larger systems since more copies of the same program tend to cause contention for system resources.

MPI2007.Details.07: What criteria were used to select the benchmarks?

In the process of selecting applications to use as benchmarks, SPEC considered the following criteria:

Note that not every benchmark satisfies every criterion. 122.tachyon, for example, is "embarassingly parallel". Also, the total memory requirement seems to have a lower bound of 16GB for medium, and 128GB for large even when only a few ranks are being run.

MPI2007.Details.08: What source code is provided? What exactly makes up these suites?

MPI2007 is composed of MPI-parallel compute-intensive applications provided as source code. In the medium suite, 2 are written in C, 4 are written in Fortran, 6 combine C and Fortran, and 1 is written in C++. In the large suite, 3 are written in C, 4 are written in Fortran, 4 combine C and Fortran, and 1 is written in C++. The benchmarks are:

Benchmark Suite Language Application domain
104.milc medium C Physics: Quantum Chromodynamics (QCD)
107.leslie3d medium Fortran Computational Fluid Dynamics (CFD)
113.GemsFDTD medium Fortran Computational Electromagnetics (CEM)
115.fds4 medium C/Fortran Computational Fluid Dynamics (CFD)
121.pop2 medium,large C/Fortran Ocean Modeling
122.tachyon medium,large C Graphics: Parallel Ray Tracing
125.RAxML large C DNA Matching
126.lammps medium,large C++ Molecular Dynamics Simulation
127.wrf2 medium C/Fortran Weather Prediction
128.GAPgeofem medium,large C/Fortran Heat Transfer using Finite Element Methods (FEM)
129.tera_tf medium,large Fortran 3D Eulerian Hydrodynamics
130.socorro medium C/Fortran Molecular Dynamics using Density-Functional Theory (DFT)
132.zeusmp2 medium,large C/Fortran Physics: Computational Fluid Dynamics (CFD)
137.lu medium,large Fortran Computational Fluid Dynamics (CFD)
142.dmilc large C Physics: Quantum Chromodynamics (QCD)
142.dleslie large Fortran Computational Fluid Dynamics (CFD)
145.lGemsFDTD large Fortran Computational Electromagnetics (CEM)
147.l2wrf2 large C/Fortran Weather Prediction

Descriptions of the benchmarks, with reference to papers, web sites, and so forth, can be found in the individual benchmark descriptions (click the links above). Some of the benchmarks also provide additional details, such as documentation from the original program, in the nnn.benchmark/Docs directories in the SPEC benchmark tree.

The numbers used as part of the benchmark names provide an identifier to help distinguish programs from one another. For example, some programs in the large suite derive from the same source codes as, but are not identical to, programs in the medium suite:

medium large
104.milc 142.dmilc
107.leslie3d 142.deslie
113.GemsFDTD 145.lGemsFDTD
127.wrf2 147.l2wrf2

In the other cases, the source codes are identical between the medium and large suites but the data sets are not.

If you care to dig more deeply, you will see that programs in other SPEC suites derive from the same sources as well, such as 361.wrf_m from the HPC2002 suite and 433.milc from the CFP2006 suite and need to be distinguished from the MPI2007 versions. Note: even if a program has the same name as in another suite - for example, 127.wrf2 vs. 361.wrf_m from the HPC2002 suite - the updated workload and updated source code mean that it is not valid to compare the SPEC MPI2007 result to the result with the older SPEC HPC2002 benchmark.

MPI2007.Details.09: How do benchmarks combine C and Fortran sources?

In the combined-language benchmarks, most of the C files are used to implement I/O and memory-allocation utilities, or specific mathematical calculations like FFT, Bessel functions, or random number generation. But fundamental pieces of 115.fds4, 127.wrf2, 128.GAPgeofem, and 130.socorro are also written in C. In 121.pop2, the C code is limited to the netcdf library, which is also used in 127.wrf2. How the source files are compiled and linked is described below.

MPI2007.Details.10: Do I need both C and Fortran compilers for all the combined-language benchmarks?

You will only be able to study a few of the benchmarks if you do not have compilers for both C and Fortran. Of the combined-language benchmarks, you may be able to compile the Fortran files in 121.pop2 and link them with a precompiled netcdf library on your system. The C components cannot not replaced in the other combined-language benchmarks. It is important to note that for a reportable MPI2007 result, the SPEC MPI2007 Run and Reporting Rules (a) do not allow the netcdf library substitution, and (b) require that all of the benchmarks in the suite be compiled and run.

MPI2007.Details.11: What am I allowed to do with the source codes?

The SPEC license allows you to compile and run the benchmarks. Source code changes are not allowed, beyond the use of src.alt patches that are specially approved and packaged by SPEC. If you are interested in improving the codes or re-engineering them for your own purposes, contact the individual benchmark authors as identified in the HTML description file under benchspec/MPI2007/*/Docs for each benchmark.

Variations Between MPI2007 Releases

MPI2007.Versions.1a: How does Version 2.0 of MPI2007 differ from Version 1.1?

See the Changes document for the list of differences between Version 1.1 and 2.0. To summarize:

  1. The large suite has been added, including new data sets, modifications to the source codes, and extensions to the tools.
  2. The runspec tool requires that mtest/mtrain/mref or ltest/ltrain/lref be used in place of test/train/ref
  3. The Run and Reporting Rules document includes some changes.

MPI2007.Versions.1b: How did Version 1.1 of MPI2007 differ from Version 1.0?

See the Changes document for the list of differences between Version 1.0 and 1.1. To summarize:

  1. Code fixes have been made to several of the benchmarks, which will make them run correctly in more environments.
  2. The output reports are formatted better, and include more detailed descriptions of the systems under test.
  3. Corrections have been made to the documentation and other details have been added.
  4. Fixes have been made to the tools, and new functions have been added.
  5. We have not successfully tested this version under Windows.

MPI2007.Versions.2: Now that Version 2.0 is released, can I still make submissions with the Version 1.1 or Version 1.0?

The changes in MPI2007 Version 2.0 are primarily to support the large data suite; you are still allowed to use Version 1.1 submit results for the medium suite. The SPEC/HPG committee makes this exception so that non-SPEC/HPG members who purchased the medium suite are not required to buy the Version 2.0 update unless they plan to run the large data benchmarks. Since Version 2.0 contains changes to the tools, SPEC/HPG members are encouraged to use Version 2.0 to ensure that the new tools are being tested as much as possible.

Future changes to the MPI2007 benchmarks or tools may require that all users move up to the latest version. For example, Version 1.0 has not been allowed to be used for submissions since Version 1.1 came available. If you are in a position where you could make valid runs with Version 1.0 but not Version 1.1 or Version 2.0, the SPEC HPG needs to be notified.

MPI2007.Versions.3a: Are results from Versions 1.0, 1.1 and 2.0 directly compariable with each other?

Yes. The changes that have been made to the source codes, data sets and tools are intended to fix bugs, remove portability problems, and add new capabilities to the suites. They are not intended to affect the benchmark runtimes, and have been tested on a variety of platforms to show that they reproduce the same runtimes to within the required 5% tolerance.

MPI2007.Versions.3b: Is there a way to translate measurements from the medium to the large suite or vice versa?

No. There is no formula for converting between medium and large results; they are different suites. We expect some correlation between the two, i.e. machines with higher results with one suite tend to have higher results with the other, but there is no universal formula for all systems.

Relation to other SPEC suites

MPI2007.Relatives.01 SPEC CPU2006 and OMP2001 and already available. Why create SPEC MPI2007? Will it show anything different?

Technology is always changing. As the technology progresses, the benchmarks have to adapt to this. SPEC needed to address the following issues:

Application type:
Many native MPI-parallel applications have been developed for cluster systems and are widely used in industry and academia. SPEC feels that standard benchmarks need to be available for comparing cluster systems for the same reason that SPEC CPU was developed for measuring serial CPU performance and SPEC OMP was developed for measuring SMP performance.

Moving target:
OMP2001 had been available for SMP systems for six years. In the meantime, clusters had become increasingly popular as a low-cost and flexible alternative to configure parallel systems. The HPC2002 suite allowed MPI parallelism but was not an adequate bridge because it only contained 3 benchmarks, had too short of a runtime, and was not as strictly standardized as MPI2007 or OMP2001.

Application size:
As applications grow in complexity and size, older suites become less representative of what runs on current systems. For MPI2007, SPEC included some programs with both larger resource requirements and more complex source code than previous suites.

Run-time:
As of spring, 2007, many of the OMP2001 benchmarks are finishing in less than 5 minutes on leading-edge processors/systems. Small changes or fluctuations in system state or measurement conditions can therefore have significant impacts on the percentage of observed run time. SPEC chose to make run times for the CPU2006 and MPI2007 benchmarks longer to take into account future performance and prevent this from being an issue for the lifetime of the suites.

MPI2007.Relatives.02: Aren't some of the SPEC MPI2007 benchmarks already in SPEC CPU2006 and HPC2002? How are they different?

Some of the benchmarks in CPU2006 and MPI2007 derive from the same source codes. CPU2006 benchmarks are serialized versions of the applications while the corresponding MPI2007 benchmarks preserve their original MPI-parallel nature. They all have been given different workloads to better excercise a large parallel machine, and in some cases have additional source-code modifications for parallel scalability and portability. Therefore, for example, results with the CPU2006 benchmark 433.milc may be strikingly different from results with the MPI2007 benchmark 104.milc.

One benchmark, 127.wrf2, is derived from a more current version of the WRF source than is 361.wrf_m from the earlier HPC2002 suite. Its larger workload is better for excercising the performance of modern parallel systems. Further, the HPC2002 rules allow the use of OMP parallelism, while this capability has been removed from the MPI2007 source code. So, again, results with the HPC2002 benchmark 361.wrf_m may be strikingly different from results with the MPI2007 benchmark 127.wrf2.

MPI2007.Relatives.03: Why were most of the benchmarks not carried over from CPU2006, HPC2002 or OMP2001?

Many applications in the CPU2006 suite were not designed to run with MPI parallelism, so would not realistically measure MPI performance. The benchmarks in the HPC2002 and OMP2001 suites either

  1. are older applications that have been replaced by more general, more exact, or more efficient applications that solve the same problem,
  2. are not native MPI-parallel applications,
  3. do not scale well to the sizes of modern systems,
  4. for some reason it was not possible to create a longer-running or more robust workload, or
  5. SPEC felt that they did not add significant performance information compared to the other benchmarks under consideration.

MPI2007.Relatives.04: What happens to SPEC HPC2002 now that SPEC MPI2007 has been released?

The HPC2002 suite had been retired in 2008. No further results will be accepted for publication. The MPI2007 suite does a better job of measuring and standardizing MPI performance.

MPI2007.Relatives.05: Is there a way to translate measurements of other suites to SPEC MPI2007 results or vice versa?

There is no formula for converting CPU2006, OMP2001 or any other measurements to MPI2007 results and vice versa; they are different products. We expect some correlation between any two given suites, i.e. machines with higher results with one suite tend to have higher results with another suite, but there is no universal formula for all systems.

SPEC encourages SPEC licensees to publish MPI2007 numbers on older platforms to provide a historical perspective on performance.

Steps of usage, including troubleshooting

Getting Started

Usage.GettingStarted.01: What does the user of the SPEC MPI2007 suite have to provide?

Briefly, you need an SMP or cluster system, running Unix, Linux or Microsoft Windows with compilers.

For cluster configurations, the file system will need to be shared across the nodes. See the system-requirements.html document for the detailed requirements as well as a listing of systems that have successfully run the benchmarks.

Note: links to SPEC MPI2007 documents on this web page assume that you are reading the page from a directory that also contains the other SPEC MPI2007 documents. If by some chance you are reading this web page from a location where the links do not work, try accessing the referenced documents at one of the following locations:

Usage.GettingStarted.02: What are the basic steps in running the benchmarks?

Installation and use are covered in detail in the SPEC MPI2007 User Documentation. The basic steps are:

If you wish to generate results suitable for quoting in public, you will need to carefully study and adhere to the run rules.

Installation

Usage.Installation.01: How/where do I get the MPI2007 package?

If you're reading this from the Docs directory on your system, then the MPI2007 package is likely to be installed already. If the package is not installed, and likely you're reading this from the SPEC public website, you will need to (a) contact your company HPG representative for the DVD or tarfile, (b) download the tarfile from the SPEC members' website, or (c) order the DVD from SPEC.

If your company is not already a member of SPEC/HPG, you will need to join or specially purchase a copy of the DVD, as described on the SPEC public website.

Usage.Installation.02: How am I notified of updates?

You company SPEC/HPG representative will be following the hpgmail mailing-list and will receive notices of any new editions of the MPI2007 benchmark kit. You will need to make arrangements with her/him to make sure you have access to the latest kit and any possible patches.

Alternately, you can check the SPEC members' website to see if any new version or patches have been posted. This may not tell you if any new kit or patch will soon come available, however, and you may require it for technical reasons, so it is best to keep informed of such activity from within the HPG committee.

Usage.Installation.03: What is included in the SPEC MPI2007 package?

SPEC provides the following on the SPEC MPI2007 media (DVD):

Usage.Installation.04: How do I install the package?

If you are installing from the DVD sent to you by SPEC, use the following steps:

su - spec                          # Become whatever ID will be used to make the runs.
mkdir -p /spec/mpi2007             # Create the directory to put the tree.
cd /spec/mpi2007                   # Go there.
cp -R /mount/dvd/* .               # Copy the directory tree from wherever the DVD is mounted.
./install.sh                       # Run the installer script.
Use path /spec/mpi2007? yes        # Tell it where to put the directory tree.

If you are installing from a tar-file downloaded from the SPEC website, use the following steps:

su - spec                          # Become whatever ID will be used to make the runs.
mkdir -p /spec/mpi2007             # Create the directory to put the tree.
cp $SPEC_KITS/mpi2007.tar.bz2 .    # Bring over the file.
bunzip2 mpi2007.tar.bz2            # Uncompress it.
tar -xvf mpi2007.tar               # Unroll the directory tree.
./install.sh                       # Run the installer script.
Use path /spec/mpi2007? yes        # Tell it where to the directory tree is.

Usage.Installation.05: Why am I getting a message such as "./install.sh: /bin/sh: bad interpreter: Permission denied"?

If you are installing from the DVD, check to be sure that your operating system allows programs to be executed from the DVD. For example, some Linux man pages for mount suggest setting the properties for the CD or DVD drive in /etc/fstab to "/dev/cdrom /cd iso9660 ro,user,noauto,unhide", which is notably missing the property "exec". Add exec to that list in /etc/fstab, or add it to your mount command. Notice that the sample Linux mount command in install-guide-unix.html does include exec.

Perhaps install.sh lacks permission to run because you tried to copy all the files from the DVD, in order to move them to another system. If so, please don't do that. There's an easier way. See the next question.

Usage.Installation.06: The DVD drive is on system A, but I want to install on system B. What do I do?

The installation guides have an appendix just for you, which describe installing from the network or installing from a tarfile. See the appendix to install-guide-unix.html or install-guide-windows.html.

Usage.Installation.07: Why did the installation fail with "Error occurred while processing: C:\Documents"?
I was installing on Windows using the tar file. It said:

C:\Documents and Settings\John\mpi2007> install
The environment variable SPEC should point to the source of the SPEC distribution as an absolute path.
I will now try to set the variable for you...

SPEC is set to C:\Documents and Settings\John\mpi2007
If this is NOT what you want, press control-C
Press any key to continue . . .
Installing from "C:\Documents and Settings\John\mpi2007"

Checking the integrity of your source tree...

 Depending on the amount of memory in your system, and the speed of your destination disk, this may take more than 10 minutes.
 Please be patient.

The system cannot find the file specified.
Error occurred while processing: C:\Documents.
The system cannot find the file specified.
Error occurred while processing: and.
The system cannot find the path specified.
C:\Documents and Settings\John\mpi2007\tools\bin\windows-i386\specmd5sum: MANIFEST.tmp: no properly formatted MD5 checksum lines found
Package integrity check failed!
Installation NOT completed!

The problem is that the SPEC tools do NOT support spaces in path names. Sorry. This is a limitation of the SPEC toolset and there are currently no plans to change this requirement. Please use a path that does not contain spaces.

Usage.Installation.08: How do I uninstall? This script uninstall.sh doesn't seem to be it.

You are correct that uninstall.sh does not remove the whole product; it only removes the SPEC tool set, and does not affect the benchmarks (which consume the bulk of the disk space). At this time, SPEC does not provide an uninstall utility for the suite as a whole. But it's easy to do: on Unix systems, use rm -Rf on the directory where you installed the suite: e.g.

rm -Rf /home/cs3000/saturos/spec/mpi2007

On Windows systems, select the top directory in Windows Explorer and delete it.

If you have been using the output_root feature, you will have to track those down separately.

Note: instead of deleting the entire directory tree, some users find it useful to keep the config and result subdirectories, while deleting everything else.

Usage.Installation.09: What if the tools cannot be run or built on a system? Can the benchmarks be run manually?

To generate rule-compliant results, an approved toolset must be used. If several attempts at using the SPEC-provided tools are not successful, you should contact SPEC for technical support. SPEC may be able to help you, but this is not always possible -- for example, if you are attempting to build the tools on a platform that is not available to SPEC.

If you just want to work with the benchmarks and do not care to generate publishable results, SPEC provides information about how to do so.

Invocation

Usage.Invocation.01: What parameters do I pass to runspec?

The runspec document explains all the parameters available to runspec. Some important examples are the following:

Many of these parameters can also be hard-coded into the config file that is passed into runspec. Note that some of them, such as running peak without base, will result in a non-reportable run.

Usage.Invocation.02: Can I (or should I) use an old config file?

This is a risky issue. The SPEC/HPG committee will require a submitted rawfile to accurately and completely describe the system under test. A config file used to measure a different system is unlikely to contain a correct description of the system you are measuring. So expect to have to correct the details by (a) editing the config-file prior to the run, (b) editing the resulting rawfile prior to submitting it, or (c) editing the pending submission file before it is accepted for publication by the committee.

The real risk is that the config-file contains compilation flags or environment settings which are disallowed for a base run, in which case the whole measurement is invalid and cannot be corrected by editing the result file. In this regard, you can generally trust a config-file that has passed review in a prior submission. It is possible though, that in the time since that prior measurement, your compiler or OS has changed such that some portability flag or environment setting is no longer necessary, and a new measurement from that config-file will not pass review.

Usage.Invocation.03: I'm using a config file that came with the MPI2007 image. Why isn't it working?

The config subdirectory contains config files that were valid at the time the MPI2007 directory tree was packaged for delivery. There is one config file for each platform that MPI2007 was tested on. These are meant to serve as examples of use on each platform and are used for testing builds of the MPI2007.

Even if you are using the same series of compiler and/or OS as the config file in the directory, these components may have evolved since the config file was written, and portability settings in the config file may no longer be necessary so the config file cannot be used to make a valid submission. Even if all the portability settings are correct for your system, it is likely that the choice of optimization settings will not be ideal for your system.

Since these example config files are risky to use, runspec will print a warning that you may be using a stale config-file, and pause briefly to give you a chance to stop it from proceeding:

=============================================================================
Warning:  You appear to be using one of the config files that is supplied
with the SPEC MPI2007 distribution.  This can be a fine way to get started.

Each config file was developed for a specific combination of compiler / OS /
hardware.  If your platform uses different versions of the software or
hardware listed, or operates in a different mode (e.g. 32- vs. 64-bit mode),
there is the possibility that this configuration file may not work as-is. If
problems arise please see the technical support file at

  http://www.spec.org/mpi2007/Docs/techsupport.html

A more recent config file for your platform may be among result submissions at

  http://www.spec.org/mpi2007/ 

Generally, issues with compilation should be directed to the compiler vendor.
You can find hints about how to debug problems by looking at the section on
"Troubleshooting" in
  http://www.spec.org/mpi2007/Docs/config.html

This warning will go away if you rename your config file to something other
than one of the names of the presupplied config files.

==================== The run will continue in 30 seconds ====================

Usage.Invocation.04: How do I write a config file?

If one is available, start with a config file used with the same compiler and OS in a successful submission. Read the config-file document to understand line-by-line how it works. Then read the compiler, library and OS documentation to see what each setting does, and compare this with the run rules document to make sure that the arrangement is allowed.

Once you inderstand these details, you will be able to modify the config file so that it (a) works and (b) is compliant with the SPEC MPI2007 submission rules.

Usage.Invocation.05: Is there a config file for Visual C++?

Users of Microsoft Visual C++ should please note that there are 2 sample config files that can be used as a starting point for your use. One for Intel and VisualC++, the other for PGI and VisualC++. These starting points can be found in config\example-hp-win-vs_intel-hpmpi.cfg and config\example-hp-win-vs_pgi-hpmpi.cfg.

Usage.Invocation.06: When I say runspec, why does it say Can't locate strict.pm?
For example:

D:\mpi2007>runspec --help
Can't locate strict.pm in @INC (@INC contains: .) at
D:\mpi2007\bin\runspec line 62.
BEGIN failed--compilation aborted at D:\mpi2007\bin\runspec line 62.

You can't use runspec if its path is not set correctly. On Unix, Linux, or Mac OS X, you should source shrc or cshrc, as described in runspec.html section 2.4. For Windows, please edit shrc.bat and make the adjustments described in the comments. Then, execute that file, as described in runspec.html section 2.5.

Usage.Invocation.07: Why am I getting messages about "specperl: bad interpreter: No such file or directory"?
For example:

bash: /mpi2007newdir/bin/runspec: /mpi2007/bin/specperl: bad interpreter: No such file or directory

Did you move the directory where runspec was installed? If so, you can probably put everything to rights, just by going to the new top of the directory tree and typing "bin/relocate".

For example, the following unwise sequence of events is repaired after completion of the final line.

Top of SPEC benchmark tree is '/mpi2007'
Everything looks okay.  cd to /mpi2007,
source the shrc file and have at it!
$ cd /mpi2007
$ . ./shrc
$ cd ..
$ mv mpi2007 mpi2007newdir
$ runspec -h | head
bash: runspec: command not found
$ cd mpi2007newdir/
$ . ./shrc
$ runspec --help | head
bash: /mpi2007newdir/bin/runspec: /mpi2007/bin/specperl: bad interpreter: No such file or directory
$ bin/relocate

Building the executables

Usage.Building.01: How long does it take to compile the benchmarks?

Compile times can vary markedly between compilers offering different degrees of optimization. Likewise, the choice of compiler flags will affect compile times. For both the medium and large benchmark suites, their different reference clusters each took about 1 hour to build the base versions of the benchmark executables.

Usage.Building.02: How much memory do I need to compile the benchmarks?

Likewise, the memory space requirement can markedly between compilers and the degree of optimization that is applied.

Usage.Building.03: Can I build the benchmarks as either 32- or 64- bit binaries?

For the Medium input set, we expect the individual ranks to run within 2GB of memory per rank, given at least 4 ranks. On some systems, this is sufficient to fit in a 32-bit address space so the binaries can be compiled as 32-bit. Note that on some of the benchmarks, as you increase the number of ranks, the memory requirement decreases in each of the ranks, so you may be able to run a benchmark as a 32-bit binary by adding more nodes to the system.

You can choose to compile your peak binaries as 32- or 64- bit individually for each benchmark. For base you will need to compile all benchmarks of the same language the same way, unless you can demonstrate that this is not possible. In this case 64-bits must be chosen as the default, and benchmarks which cannot be run 64-bit can be compiled 32-bit.

Usage.Building.04: I'm using Windows and I can't build any benchmarks. What does "CreateProcess((null), ifort ...) failed" mean?
For example:

Error with make 'specmake build': check file
'D:\mpi2007\benchspec\MPI2007\121.pop2\run\build_base_mpi2007.win32.fast.exe.0000\make.err'
  Error with make!
*** Error building 121.pop2
-----
make.err:
process_begin: CreateProcess((null), ifort -c -Foblock_solver.obj -Ox block_solver.f, ...) failed.
make (e=2): The system cannot find the file specified.

specmake: *** [block_solver.obj] Error 2     

This CreateProcess failure occurs on Windows when specmake cannot find your compiler. (Recall from system-requirements.html that the benchmarks are supplied in source code form, and that they must be compiled.)

The reason that it cannot find your compiler is, most likely, because:

  1. You have not edited shrc.bat to correctly reference your compiler, or
  2. You have used a config file that references a compiler that does not exist.

To fix your problem, investigate and address both items.

  1. In shrc.bat, you need to either:

    1. Reference a vendor-supplied file that sets the path.

      The supplied shrc.bat mentions sample .bat files, such as c:\program files\microsoft visual studio .NET 2003\Vc7\Bin\vcvars32.bat (Visual C++) and c:\program files\Intel\Compiler\C++\9.1\IA32\Bin\iclvars.bat (Digital Visual Fortran), but other compilers also sometimes provide similar .bat files. If you want to use this option, but xxvars.bat isn't where it is shown in the example in shrc.bat, you might try searching your hard drive for *vars*bat.

      As of August, 2006 the following was observed working for Visual C++ (Express edition), but your paths may be different depending on version and installation options chosen:
      call "c:\program files\microsoft visual studio 8\Common7\Tools\vsvars32.bat"
      Notice above that the reference in this case is to vsvars32.bat, not vcvars32.bat.

    2. Or, edit the path yourself.

      Notice the PGI example in the install guide. If you want to use this option, but you can't figure out what path to set, try looking in the documentation for your compiler under topics such as "setting the path", or "command line", or "environment variables". The documentation should mention whether any other environment variables (such as LIB or INCLUDE) are required, in addition to PATH.

  2. You must also use a config file that is appropriate for your compiler.

    Microsoft C++ users: please note that the config file example-hp-win-em64t-icl10-hpmpi.cfg isn't really appropriate for you, because the Intel compiler has additional switches that the Microsoft compiler does not recognize, and it spells the compiler name differently ("icl" vs. "cl"). If you use the HP C++ config file as a starting point, read the comments in the config file, and be prepared to make adjustments.

Usage.Building.05: The file make.clean.err is mentioned, but it does not exist. Why not?
The tools print a message such as:
Building 104.milc mref base mpi2007.win32.fast.exe default:
(build_base_mpi2007.win32.fast.exe.0000)
Error with make.clean 'specmake clean': check file
'D:\mpi2007\benchspec\MPI2007\104.milc\run\build_base_mpi2007fast.exe.0000/make.clean.err'
*** Error building 104.milc
If you wish to ignore this error, please use '-I' or ignore errors.
But on investigation, the file make.clean.err does not exist. Why not?

A missing .err file has been reported on Microsoft Windows if a path from shrc.bat uses quoted strings that include semicolons within the quotes, such as:

set PATH="%PATH%;d:\My Compiler Path\bin"  <--- wrong
set PATH="%PATH%;d:\mycompiler\bin"        <--- wrong

Paths such as the above may appear to work if you enter them at the command line. But when the tools call CreateProcess, they cause mysterious-seeming failures. The correct way to do it is:

set PATH=%PATH%;"d:\My Compiler Path\bin" 

or

set PATH=%PATH%;d:\mycompiler\bin     

Notice that in both cases, there are no semicolons inside of quotes.

Usage.Building.06: Why is it rebuilding the benchmarks?

You changed something, and the tools thought that it might affect the generated binaries. See the discussion of unknown options in config.html.

Usage.Building.07: Building one of the benchmark fails. What should I do?

The building of a benchmark is done using a version of the make tool adapted by SPEC. This tool is called specmake. If the building of a benchmark fails, you can list the actual commands performed by specmake. This can help you in finding the reason of the build failure. How to do this is described in the utility.html document, in the section describing specmake.

Usage.Building.08: Does the arrangement of compilation flags matter?

Typically, when you invoke a compiler, you may specify a sequence of optional flags on the command line that control resource uses, the degree of optimization, and other details. In the MPI2007 config file, sequences of these flags are assigned to specific variables

    FPORTABILITY  = -fixed
    COPTIMIZE     = -O5
    EXTRA_LDFLAGS = -bmaxdata:0x40000000 
	

which apply the flags, in turn, to the appropriate stage of the build process. It matters which variable each flag is assigned to, as some flags are passed to the pre-processor, others to the compiler, and still others to the linker. Further, for a particular variable, the order of the flags can matter, as shown in these examples:

Example from SUN compiler:

(a)   COPTIMIZE = -fast -xO3

(b)   COPTIMIZE = -xO3 -fast
	

The -fast flag is an aggregate which is equivalent to a sequence of other flags, one of which is -xO5. In case (a), the explicit -xO3 will override the -xO5 that is implied by -fast. In case (b), since -fast comes after the -xO3, the implicit -xO5 will override the explicit -xO3 so the -xO3 is ignored.

Example from Intel compiler (version 11.0):

(a)   COPTIMIZE = -fast -xSSE3

(b)   COPTIMIZE = -xSSE3 -fast
	

The -fast aggregate includes an implicit -xHost flag; in case (a) this is overridden by the explicit -xSSE3 flag, and in case (b) the implict -xHost overrides the -xSSE3 flag.

Run Rule 2.3.2.1 requires that all flags must be applied in the same order for all compiles of a given language. This way, for base compiles, the flags have the same effect on all the benchmarks of the same language, regardless of how the flags interact with each other.

To understand the application of flags in the MPI2007 build process, refer to the config file document. To understand the effects of flags on the compilation, you will need to refer to the documentation for the particular compiler, and keep in mind that the set of available flags, their effects, and the precedence between them may vary between releases of the same compiler.

Usage.Building.09: What is a "Portability Flag"? How does it differ from an "Optimization Flag"?

Run & Reporting Rule 2.2.4 specifies the nature and usage of Portability Flags. This is a fundamental concept in MPI2007 and other SPEC benchmark suites. For benchmark runs at the base optimization level, all files in the same language are required to be compiled with the same optimization flags. It may not be the case, though, that all the benchmark sources will compile with the same flags, and special Portability Flags may need to be applied benchmark-by-benchmark. These break down into the following general types:

  1. Variations in the language level/dialect. An example is -qfixed specifying that Fortran sources are in fixed-column format.
  2. Ambiguity in the language standard. An example is -qchars=signed specifying that 8-bit integers be treated as signed quantities; the C language standard allows the compiler to implement them as signed or unsigned. Note that the source code would be more portable if the data declarations had been written as signed char.
  3. Linkage conventions. Some of the benchmark sources contain #ifdef statements for settings like NOUNDERSCORE or SPEC_MPI_CASE_FLAG, indicating how the names of compiled functions are mangled in the linkage tables of the object files. Note that the mangling convention is implementation-dependent, and lies outside the C/C++ and Fortran language standard specifications; #ifdef statements have been added to the source codes to accomodate all the conventions we have run into.
  4. Library compatibility. An example is SPEC_HPG_MPI2 indicating the level of the MPI implementation.
  5. Operating System compatibility. Examples are SPEC_MPI_LINUX and SPEC_MPI_NO_NCCONFIG_SOLARIS.
  6. Architectural compatibility. Examples are SPEC_MPI_WORDS_LITTLEENDIAN indicating that the system is a little-endian machine and cray2 indicating that this is a Cray-2 machine.

Note that these are not optimizations -- you need them because, otherwise, the benchmark will not compile, link, or run on your system configuration. Further, flags that enable other optimizations cannot be used as Portability Flags. For example, you may find that, for the base compile, one of the benchmarks fails if you apply a high degree of optimization to all of them. A flag like -qstrict, which limits the effects of the other optimizations, might make the failing benchmark work, but -qstrict must be treated as an Optimization Flag and applied to all the other benchmarks of the same language.

The SPEC/HPG committee needs to approve the use of any Portability Flags in your submittable runs. You will need to demonstrate that (a) they are necessary to run the benchmark and (b) they do not affect performance; they need to be identified specially in the submitters XML Flags-File and are usually assigned to the CPORTABILITY, CXXPORTABILITY or FPORTABILITY variables in your config file. Sample configuration files for many system architectures are contained in the SPEC MPI2007 directory-tree, which include the Portability Flags for the architecture. Note that the set of necessary Portability Flags can change with different compiler or OS levels, and can change between versions of the MPI2007 benchmark suite as well.

Usage.Building.10: How are compilation flags managed for the combined-language benchmarks?

Run & Reporting Rule 2.3.2.1 explains the conventions for building mixed-language benchmarks. For each benchmark source language (C, C++ and Fortran), the config file conventions provide separate sets of variables for compiling and linking the executables:

	  CPORTABILITY =               CXXPORTABILITY =                FPORTABILITY =
	  COPTIMIZE    =               CXXOPTIMIZE    =                FOPTIMIZE    =
	  LDCFLAGS     =               LDCXXFLAGS     =                LDFFLAGS     =
	  EXTRA_CFLAGS =               EXTRA_CXXFLAGS =                EXTRA_FFLAGS =
	

In the combined-language benchmarks, the C variables are used to control the compilation of the C source files and the Fortran variables are used to control the compilation of the Fortran source files. Since Fortran is the primary language in each of the combined-language benchmarks of MPI2007 -- the C files contain support routines -- the Fortran rules are used to control the linkage stage. For example,

	default=base=default=default:
	FOPTIMIZE = -O4 -qstrict -qalias=nostd -qhot=level=0 -qsave -bdatapsize:64K -bstackpsize:64K -btextpsize:64K
	COPTIMIZE = -O5 -D_ILS_MACROS -bdatapsize:64K -bstackpsize:64K -btextpsize:64K
	

causes the C source files to be compiled with -O5 -D_ILS_MACROS -bdatapsize:64K -bstackpsize:64K -btextpsize:64K, and -O4 -qstrict -qalias=nostd -qhot=level=0 -qsave -bdatapsize:64K -bstackpsize:64K -btextpsize:64K is used to compile the Fortran source files and link the executable. The SPEC MPI2007 reports are confusing because they list the flags as if they are merged and applied together, stating in this case

      Benchmarks using both Fortran and C:

          -O5 -D_ILS_MACROS -bdatapsize:64K -bstackpsize:64K -btextpsize:64K
		  -O4 -qstrict -qalias=nostd -qhot=level=0 -qsave
	

Note that if the same sequence of flags is assigned to the COPTIMIZE and FOPTIMIZE variables, the combined flags will read correctly since duplicate occurrences of the same flag are not shown. But if both sequences of flags are identical, you can use a single assignment

	default=base=default=default:
	OPTIMIZE =
	LDFLAGS  =
	

instead, and it will apply to both languages.

Usage.Building.11: How can I build different .o files using different flags?

The SPEC tools will apply the same set of compilation flags to each source file of a given language, consistent with the MPI2007 Run Rules. For the purpose of debugging, however, you may need to circumvent this and compile specific .o files with flags different from the ones inherited from the config-file. Note that this cannot be done in a reportable run, so be sure not to use the "dirty" executables beyond your debugging effort.

  1. You need to edit the benchspec/Makefile.defaults to override the default rules inherited from the config file contents. Save off a copy of the file before you do this.
  2. Add an explicit rule for each .o file being special-cased:
    %$(OBJ): %.cpp
    	$(CXXC) $(CXXOBJOPT) $(FINAL_CXXFLAGS) $<
    ifdef NEEDATFILE
    $(ECHO) $@ >> objectnames
    endif
    
    angle.o: angle.cpp
    	$(CXXC) $(CXXOBJOPT) $(OPTIMIZE) -DSPEC_MPI -DNDEBUG -DFFT_NONE -O5 -qipa=noobject -qipa=threads -q64 $(PORTABILITY) $(CXXPORTABILITY) $<
    ifdef NEEDATFILE
    	$(ECHO) $@ >> objectnames
    endif
    
    The first form of the rule applies generically to all files with the .cpp extension; the second form applies specifically to the angle.cpp source file and its corresponding .o file. The compilation rule is an "unrolling" of the FINAL_CXXFLAGS and CXXOPTIMIZE variables, which are composed from the optimization and portability flags from the config file and the portability flags from the benchspec/MPI2007/< benchmark>/Spec/object.pm file. Note that different benchmarks may use the same file names, e.g. main.f, so you should only debug one benchmark at a time with this method.
  3. The runspec tool automatically rebuilds binaries when it detects changes to the compilation rules in the config file. It does no such checking with the source files or the SPEC tools themselves. You will need to delete the executable in benchspec/MPI2007/< benchmark>/exe in order to force the rebuild to take place.
  4. Now use runspec to build the executable, using the Makefile.defaults overrides, and run it and validate the results.

Setting up the run directories

Usage.Setup.01: How much file system space do I need?

For the medium benchmark suite, 10GB of free disc space should be enough to contain the SPEC tree, your binaries, and all the intermediate files from the compile and run. For the large benchmark suite, you will need 17GB of free disk space on a Big Endian system and 21GB on a Little Endian system.

If you build more than one set of binaries, you will need additional space for them and any runs you make with them. See the system-requirements.html document for the detailed requirements.

Usage.Setup.02: Why does the large data set suite need more space on a Big Endian system?

The 147.l2wrf2 benchmark uses 4GB of input data which has to be coded differently for Big Endian and Little Endian systems. To reduce the size of the MPI2007 delivery DVD and tarball, SPEC HPG is only including the Big Endian input files. The Little Endian input files are generated when the -DSPEC_MPI_WORDS_LITTLEENDIAN flag is set. Note that the time to generate the Little Endian input files is not counted in the benchmark run time.

Usage.Setup.03: Do I need a shared file system?

The MPI2007 run rules require that the benchmarks be run inside a single file system. Running within a single OS partition on an SMP, nearly any file system will do. When running across a cluster, real or virtual, this single file system will need to be shared across all the nodes or partitions.

Usage.Setup.04: What does "hash doesn't match after copy" mean? I got this strange, difficult to reproduce message:
hash doesn't match after copy ... in copy_file (1 try total)!  Sleeping 2 seconds...
followed by several more tries and sleeps. Why?

During benchmark setup, certain files are checked. If they don't match what they are expected to, you might see this message. Check:

If the condition persists, try turning up the verbosity level. Look at the files with other tools; do they exist? Can you see differences? Try a different disk and controller. And, check for the specific instance of this message described in the next question.

Usage.Setup.05: Why does it say "ERROR: Copying executable failed"? I got this strange, difficult to reproduce message:
ERROR: Copying executable to run directory FAILED
or
ERROR: Copying executable from build dir to exe dir FAILED!
along with the bit about hashes not matching from the previous question. Why?

Perhaps you have attempted to build the same benchmark twice in two simultaneous jobs.

On most operating systems, the SPEC tools don't mind concurrent jobs. They use your operating system's locking facilities to write the correct outputs to the correct files, even if you fire off many runspec commands at the same time.

But there's one case of simultaneous building that is difficult for the tools to defend against: please don't try to build the very same executable from two different jobs at the same time. Notice that if you say something like this:

$ tail myconfig.cfg
130.socorro=peak:
basepeak=yes
$ runspec --config myconfig --size mtest --tune base 130.socorro &
$ runspec --config myconfig --size mtest --tune peak 130.socorro &

then you are trying to build the same benchmark twice in two different jobs, because of the presence of basepeak=yes. Please don't try to do that.

Running the benchmarks

Usage.Running.01: How long does it take to run the SPEC MPI2007 benchmark suites on my platform?

This depends on the data set size, the compiler, and the machine that is running the benchmarks. The reference cluster for the medium suite was sold circa 2003 and is correspondingly slower than contemporary machines, so expect a 2 iteration base run of the medium workload to take less than 24 hours on a 16-core system.

Expect larger data set sizes to take longer to process than the medium sized data set on the same system.

Usage.Running.02: Why does this benchmark take so long to run?

Please understand that the suite has been designed to be useful for at least 5 years. Benchmarks that seem slow today probably will not seem slow at the end of life of the suite. You can see a bit more on this topic regarding the relationship between MPI2007 and CPU2006 & OMP2001.

On the flipside -- if the benchmark is taking much longer than you think it should, check that the rank counts are being set correctly. If the rank is not set, you may be running the benchmark as a single process, which won't work for some of them and will make the others take an extraordinary amount of time to finish.

Usage.Running.03a: How much memory does it take to run the medium benchmarks?

For the Medium data set, the intention is that each benchmark in the suite be runnable with a minimum of (1) 16GB of memory for the whole system, or (2) a 1GB of memory per rank, whichever is larger. If you run one rank per processor core, which is the usual case, the second requirement implies 1GB of memory per enabled core on the system.

You may find that 1GB per rank is not sufficient on your system because of other demands on memory from the OS or libraries, or that the arrangement of the process images is fragmented in memory. We expect that 2GB of memory per rank, with the overriding 16GB system minimum, should be sufficient in all cases.

See the system-requirements.html document for a detailed list of individual resource requirements.

Usage.Running.03b: How much memory does it take to run the large benchmarks?

For the Large data set, the intention is that each benchmark in the suite be runnable with 2GB of memory per rank. If you run one rank per processor core, which is the usual case, this implies 2GB of memory per enabled core on the system.

A minimum of 64 ranks are required for the benchmarks to work, which implies a minimum memory requirement of 128GB, although higher may be required on your system. On a cluster running only a few ranks per node, the 2GB per rank may not be sufficient because of other demands on memory from the OS or libraries, or the way the arrangement of the process images is fragmented in memory.

See the system-requirements.html document for a detailed list of individual resource requirements.

Usage.Running.04a: How are the large and medium benchmark suites connected to the ltest/ltrain/lref and mtest/mtrain/mref data sets?

The MPI2007 large benchmarks are intended to exploit higher degrees of parallelism than the medium benchmarks, and their data sets are written to provide workloads for larger systems. In most cases the benchmark sources have been modified (such as 104.milc versus 142.dmilc) but there are other cases where they are not (such as 121.pop2). The data set always serves to differentiate between the two cases:

runspec ... -i mref 104.milc         # "medium" run of 104.milc
runspec ... -i lref 142.dmilc        # "large"  run of 142.dmilc

runspec ... -i mref 121.pop2         # "medium" run of 121.pop2
runspec ... -i lref 121.pop2         # "large"  run of 121.pop2

The large suite provides the ltest and ltrain data sets for validation testing, analogous with the mtest and mtrain data sets provided with the medium suite:

runspec ... -i mtest  medium         # test  run of medium suite
runspec ... -i mtrain medium         # train run of medium suite
runspec ... -i mref   medium         # ref   run of medium suite

runspec ... -i ltest  large          # test  run of large suite
runspec ... -i ltrain large          # train run of large suite
runspec ... -i lref   large          # ref   run of large suite

Usage.Running.04b: Given that I stated medium or large for the suite, why do I also have to qualify mtest/mtrain/mref or ltest/train/lref instead of just saying test, train or ref?

You may consider this question if you've been used to other suites such as SPEC OMPM2001 and OMPL2001 where the benchmark names were distinct between the two suites. With MPI2007, benchmark names like 121.pop2 do not imply a data set size, so a form like

runspec -c ... -i ref ... 121.pop2

doesn't determine whether the medium (mref) or large (lref) input set is to used. The tools require the exact specification in all cases.

Note that, if this apparent redundancy still violates your aesthetic sense, you can use config-file macros like the following

%ifdef %{INPUT}
%if '%{SIZE}' eq 'medium'
size = m%{INPUT}		# This expands to mtest/mtrain/mref.
%elif '%{SIZE}' eq 'large'
size = l%{INPUT}		# This expands to ltest/ltrain/lref.
%endif
%endif

to create command-line forms like

runspec --define SIZE=medium --define INPUT=ref   ...
runspec --define SIZE=large  --define INPUT=train ...

While this approach decouples the size of the suite from the size of the data set, it still isn't perfect in that you still pass more fields that you think you should, and you can still put together illegal forms like

runspec --define SIZE=medium --define INPUT=ref 142.dmilc
runspec --define SIZE=large  --define INPUT=ref 104.milc

Usage.Running.04c: Why did I get the error message Benchmark does not support size ...?

Version 1.0 of MPI2007 contained a data set with three elements: test, train, and ref. The ref element was to be used in reportable runs and the test and train elements were to be used to test the binaries and the tools. Version 1.1 of MPI2007 renamed these elements to mtest, mtrain, and mref since they compose the medium sized data set. Version 2.0 of MPI2007 added the large sized data set with the corresponding ltest, ltrain, and lref elements.

With the current version of MPI2007, if you use any of the test/train/ref elements as a parameter to runspec, i.e.

runspec -c ... -i test  .... 104.milc
runspec -c ... -i train .... 104.milc
runspec -c ... -i ref   .... 104.milc

it will issue the corresponding error message

Benchmark '104.milc' does not support size 'test'
Benchmark '104.milc' does not support size 'train'
Benchmark '104.milc' does not support size 'ref'

indicating that there is no such input defined. You should be using mtest/mtrain/mref as the input designator for the medium benchmarks and ltest/train/lref as the input designator for the large benchmarks. Note that you will see an analogous error if you use mtest/mtrain/mref with a large benchmark or ltest/ltrain/lref with a medium benchmark.

Usage.Running.04d: Why did I get the error message read_reftime: "/spec/mpi2007/benchspec/MPI2007/*/data/.../reftime" does not exist?

As with the question above, this is a result of your using the test/train/ref input instead of mtest/mtrain/mref or ltest/ltrain/lref, and the runspec tool is unable to find it. You will see analogous errors if you use mtest/mtrain/mref with a benchmark that only works with the large data sets (such as 125.RAxML) or you use ltest/ltrain/lref with a benchmark that only works with the medium data sets (such as 107.leslie3d).

Usage.Running.05: The running of one of the benchmark fails. What should I do?

The running of a benchmark is done using a SPEC tool called specinvoke. When something goes wrong, it is often useful to try the commands by hand. Ask specinvoke what it did, by using the -n switch. This is described in the utility.html document, in the section describing specinvoke.

Usage.Running.06: Why did 126.lammps fail?

The SPEC/HPG committee is aware of a bug with 126.lammps, likely within the MPI middleware or C run-time library, or the application itself. The symptom of this behavior is an entry like

    ERROR on proc 52: Failed to reallocate 720000 bytes for array atom:x 
	

in one of the output files

	friction.out.big
	   chain.out.big
	   chute.out.big
	     eam.out.big
	      lj.out.big
	largeeam.out.big
	

This error message is followed by an MPI specific error message which varies with different MPI implementations.

The allocation failure happens in spite of the machine having sufficient free memory, indicating that this is a software bug. The SPEC High Performance Group has tested the application extensively with a memory allocation consistency checker, without detecting any such erroneous application behavior, suggesting that the inconsistency is contained within the MPI middleware or C run-time library rather than the application code. The failure might also depend on interactions between the compiler and the libraries.

The failure has only manifested itself using MPI implementations or C run-time libraries which include ptmalloc, a dynamic memory allocation library, and appears to go away with the current version ptmalloc3. The earlier ptmalloc2 library - possibly in combination with certain parameter settings from the run-time environment - contains, with a high probability, the source of this allocation failure.

Since we have a high degree of confidence that the failure is contained outside the application sources and toolset distributed by SPEC, we advise any user experiencing this failure to report it to the MPI vendor in question or the internet community dealing with the MPI implementation, whichever is appropriate.

Usage.Running.07: Why do I see large changes in runtime for 122.tachyon?

An instability in the performance of this benchmark has been observed for core counts of 128 and above when older (before 1.3.1) versions of OpenMPI are used. Results are bi-modal, differing (randomly) by a factor of two from run to run. Upgrading to a recent version of OpenMPI removes this problem.

Usage.Running.08: How can I run a profiler, performance monitor, etc. in conjunction with a benchmark?

The config files allow declarations for invoking profilers and monitors etc. with the benchmarks. One method is with the submit directive:

submit = mpich $command

which can be rewritten to invoke a wrapper program:

submit = mpich run_profiler $command

The wrapper program starts the profiler or monitor, and then runs the $command. Note that the stdout from the $command must be captured and passed back as the output of the run_profiler wrapper.

A second form of declaration is

monitor_pre_bench  = run_profiler
submit             = mpich $command
monitor_post_bench = end_profiler

which invokes the profiler before each benchmark is executed, then stops it after the benchmark completes. These directives are described in the config file document.

Usage.Running.09a: The MPI-2 Standard recommends that I use mpiexec to run each of the benchmarks. If I start the MPD process manager at the beginning, mpiexec ought to be more efficient than mpirun for running each benchmark. Why can't I do this?

Implementations derived from MPICH2, like the Intel MPI, can be configured with an MPI daemon that runs in the background. MPI jobs can be started with reduced overhead since some of the setup work has already been done when the daemon started. It would be used with an invocation form like

    mpdboot ... ; runspec ....
   

and the config-file would contain the form

    submit = mpiexec ... $command           # Intel
   

to run the benchmarks.

However, SPEC/mpi2007 Run Rule 3.2.6 requires that the benchmark runtimes include the overhead of starting up the MPI environment, so if a daemon is invoked as part of the startup, it must be done as part of the startup of each benchmark. This can be arranged in the config-file using forms like

    submit = mpdboot ... ; mpiexec ... $command      # Causes the "mpdboot" overhead to be measured along with the benchmark.
    submit = mpirun ... $command                     # More compact & concise form.
   

instead of being done before the benchmark times are measured, i.e.

    mpdboot ... ; runspec ...
   

Note that other MPI implementations (MPICH, MPICH2, MVAPICH, OpenMPI, IBM POE, etc.) provide forms equivalent to mpirun.

Usage.Running.09b: I see published results that use the MPD and mpiexec. If I use the same configuration file as these, will my results be accepted?

Run Rule 3.2.6 was modified on August 12th of 2009, after 61 results had been published using mpiexec or equivalent constructs. For most of these results

subsequent measurements had shown that the runtime difference was not significant, so these results were allowed to stand. One pair of results

was marked Non Compliant because the run time difference was too large, and a new pair of results

using mpirun was reviewed and published to replace it. No new results will be accepted that use this convention. You may base your configuration file on the updated results, but see the caveats on using old configuration files.

Validating a Run

Usage.Validation.01: What is the difference between a valid and a reportable run?

The SPEC tools consider a run to be invalid of it does not satisfy all the following criteria:

  1. All the benchmark runs produce valid results.
  2. The mref or lref data set is used. (Potential future releases of MPI2007 would include larger data sets such as xref).
  3. The base optimization level is tested. The peak optimization level is optional.
  4. At least two iterations are run.
  5. Certain flags like check_md5 and reportable have the correct settings.

The output report will state invalid run if any of these criteria fail to hold. A report that doesn't state this, though, may still contain violations of the SPEC MPI2007 Run and Reporting Rules. and isn't guaranteed to pass the SPEC/HPG committee review. The SPEC tools cannot determine that a portability flag is really required, for example.

A report that passes SPEC/HPG review must also give a complete and accurate description of the system under test. Reports are typically submitted by the company that developed the system under test, so the submitter is typically in the best position to complete the details of the report. The committee, on the other hand, has developed the conventions for how this information is to be stated, and may require edits to the report before it will be published.

Usage.Validation.02: I got a message about a miscompare. The tools said something like:
Running Benchmarks
  Running 121.pop2 mref (ref) base oct09a default
Error with '/spec/mpi2007/bin/specinvoke -E -d /spec/cpu2006/benchspec/MPI2007/121.pop2/run/run_base_mref_oct09a.0000
                                         -c 1 -e compare.err -o compare.stdout -f compare.cmd'
          : check file '/spec/mpi2007/benchspec/MPI2007/121.pop2/run/run_base_mref_oct09a.0000/.err'
*** Miscompare of pop2.out, see /spec/mpi2007/benchspec/MPI2007/121.pop2/run/run_base_ref_oct09a.0000/pop2.out.mis
Error: 1x121.pop2
Producing Reports
mach: default
  ext: oct09a
    size: mref (ref)
      set: medium

Why did it say that? What's the problem?

We don't know. Many things can cause a benchmark to miscompare, so we really can't tell you exactly what's wrong based only on the fact that a miscompare occurred.

But don't panic.

Please notice, if you read the message carefully, that there's a suggestion of a very specific file to look in. It may be a little hard to read if you have a narrow terminal window, as in the example above, but if you look carefully you'll see that it says:

*** Miscompare of pop2.out, see 
/spec/mpi2007/benchspec/MPI2007/121.pop2/run/run_base_mref_oct09a.0000/pop2.out.mis

Now's the time to look inside that file. Simply doing so may provide a clue as to the nature of your problem.

On Unix systems, change your current directory to the run directory using the path mentioned in the message, for example:

cd /spec/mpi2007/benchspec/MPI2007/121.pop2/run/run_base_mref_oct09a.0000/

On Microsoft Windows systems, remember to turn the slashes backwards in your cd command.

Then, have a look at the file that was mentioned, using your favorite text editor. If the file does not exist, then check your paths, and check to see whether you have run out of disk space.

You can ask for more detail of the miscompare to be listed in the file. How to do this is described in the utility.html document, in the section on specdiff.

Usage.Validation.03: How were the result tolerances chosen?

In a validating run, the SPEC tools compare the program results against a reference output, and the results are valid if they differ by no more than a specified margin of tolerance. The tolerances themselves are the smallest margin that permited a "pass" on all the test platforms.

Because of communication race-conditions between processes in the MPI2007 applications, they behave less deterministically than the single process applications of CPU2006 or the threaded applications of OMP2001. Consequently the margin of tolerance for the MPI2007 applications is correspondingly wider than it is for CPU2006 or OMP2001.

Usage.Validation.04: The benchmark ran, but it took less than 1 second and there was a miscompare. Help!

If the benchmark took less than 1 second to execute, it didn't execute properly. There should be one or more .err files in the run directory which will contain some clues about why the benchmark failed to run. Common causes include libraries that were used for compilation but not available during the run, executables that crash with access violations or other exceptions, and permissions problems. See also the suggestions in the next question.

Usage.Validation.05: I looked in the .mis file and it said something like:

'rand.234923.out' short

What does short mean?

If a line like the above is the only line in the .mis file, it means that the benchmark failed to produce any output. In this case, the corresponding error file (look for files with .err extensions in the run directory) may have a clue. In this case, it was Segmentation Fault - core dumped. For problems like this, the first things to examine are the MPI parameters, followed by the portability flags used to build the benchmark.

Have a look at the sample config files in $SPEC/config or, on Windows, %SPEC%\config. If you constructed your own config file based on one of those, maybe you picked a starting point that was not really appropriate (e.g. you picked a 32-bit config file but are using 64-bit compilation options). Have a look at other samples in that directory. Check at www.spec.org/mpi2007 to see if there have been any result submissions using the platform that you are trying to test. If so, compare your portability flags to the ones in the config files for those results.

If the portability flags are okay, your compiler may be generating bad code.

Usage.Validation.06: My compiler is generating bad code! Help!

Try reducing the optimization that the compiler is doing. Instructions for doing this will vary from compiler to compiler, so it's best to ask your compiler vendor for advice if you can't figure out how to do it for yourself.

Usage.Validation.07: My compiler is generating bad code with low or no optimization! Help!

If you're using a beta compiler, try dropping down to the last released version, or get a newer copy of the beta. If you're using a version of GCC that shipped with your OS, you may want to try getting a "vanilla" (no patches) version and building it yourself.

Usage.Validation.08: I looked in the .mis file and it was just full of a bunch of numbers.

In this case, the benchmark is probably running, but it's not generating answers that are within the tolerances set. See the suggestions for how to deal with compilers that generate bad code in the previous two questions. In particular, you might see if there is a way to encourage your compiler to be careful about optimization of floating point expressions.

MPI2007 Measurements

Metrics

Measurements.Metrics.01: What metrics can be measured?

After the benchmarks are run on the system under test (SUT), a ratio for each of them is calculated using the run time on the SUT and a SPEC-determined reference time. From these ratios, the following metrics are calculated:

Larger data sets may be added later on, with the metrics like

In all cases, a higher score means better performance on the given workload.

Measurements.Metrics.02: What is the difference between a base metric and a peak metric?

In order to provide comparisons across different computer hardware, SPEC provides the benchmarks as source code. Thus, in order to run the benchmarks, they must be compiled. There is agreement that the benchmarks should be compiled the way users compile programs. But how do users compile programs?

In addition to the above, a wide range of other types of usage models could also be imagined, ranging in a continuum from -Odebug at the low end, to inserting directives and/or re-writing the source code at the high end. Which points on this continuum should SPEC MPI2007 allow?

SPEC recognizes that any point chosen from that continuum might seem arbitrary to those whose interests lie at a different point. Nevertheless, choices must be made.

For MPI2007, SPEC has chosen to allow two types of compilation:

Note that options allowed under the base metric rules are a subset of those allowed under the peak metric rules. A legal base result is also legal under the peak rules but a legal peak result is NOT necessarily legal under the base rules.

A full description of the distinctions and required guidelines can be found in the SPEC MPI2007 Run and Reporting Rules.

Measurements.Metrics.03a: What are the different data-set sizes?

MPI2007 Version 2.0 includes six data sets:

           mtest       ltest
           mtrain      ltrain
           mref        lref

The ones beginning with m (mtest, mtrain and mref) for medium are used to benchmark medium-sized systems, and the ones beginning with l (ltest, ltrain and lref) for large are used to benchmark large-sized systems.

The ones ending in test and train (mtest, ltest, mtrain and ltrain) are used for the purpose of validating the correctness of the compile and the run environment. Note that the term train is a holdover from previous suites that allowed feedback-directed optimization using measurements from training runs. For the purpose of MPI2007 it refers to a larger testing data set.

The ones ending in ref (mref and lref) are the largest and are used in reportable runs. Since they run longer they give a clearer picture of system performance. In the future they will be supplemented with even larger xref data set.

Measurements.Metrics.03b: Why aren't performance metrics generated for the test and train data sets?

The mtest, ltest, mtrain and ltrain data sets are provided for the purpose of validating the correctness of the compile and the run environment. Running with these data sets will give you a report that includes the run time, but no performance metric is derived because these input sets are not intended for performance measurement. The MPI2007 source directories include reference run times for these data sets; these have to be included because of the way the report-generation tools are built.

Measurements.Metrics.03c: Is there a way to translate measurements between the test or train and ref data sets?

No. There is no formula for converting between test, train and ref measurements. The smaller mtest/mtrain and ltest/ltrain data sets would show different scaling behavior than the larger mref and lref data sets, and their smaller memory footprints would show less difference in performance between different cache sizes. We expect some correlation between them all, i.e. machines with higher results with one data set tend to have higher results with the other, but there are no universal conversion formulae for all systems.

Measurements.Metrics.04: Which SPEC MPI2007 metric should be used to compare performance?

On the tuning side, a base measurement is required in every reportable run. The corresponding peak measurement is optional but is useful to show how high performance your system can achieve.

On the data set side, the size of your machine may determine the choice for you: the mref data set may not work above 128 ranks, and the lref data set may not work below 64 ranks. Beyond that, you may want to compare your platform against platforms that already have published mref or lref results.

Note that the data sets used for validation -- mtest, ltest, mtrain and ltrain -- do not generate any performance metric.

Timing

Measurements.Timing.01: Why does SPEC use reference machines? What machines were used for SPEC MPI2007?

SPEC uses a reference machine to normalize the performance metrics used in the MPI2007 suites. Each benchmark is run and measured on this machine to establish a reference time for that benchmark. These times are then used in the SPEC calculations. Note that the medium and large data set benchmarks use two different reference machines:

medium data set: The reference machine is an 8-node cluster of Celestica A2210 (AMD Serenade) systems connected by a TCP (GigE) Interconnect. Each A2210 system contains two 940 sockets, each holding one single-core AMD Opteron 848 processor chip running at 2200 MHz with 1 MB I+D L2 cache, plus 4 GB of DDR 333 memory per socket. The MPI implementation is MPICH2 version 1.0.3 running on SLES 9 SP3 Linux OS with Pathscale 2.5 compilers. Measurements of the reference platform are posted on the SPEC Website. 16 ranks were used.

large data set: The reference machine is a 64-node cluster of Intel SR1560SF systems connected by a TCP (GigE) Interconnect (oversubscribed by about 7-1). Each SR1560SF system contains two CPU sockets, each holding one quad-core Intel Xeon X5482 processor chip running at 3200 MHz with 12 MB I+D L2 cache (6 MB shared / 2 cores), plus 16 GB of FBDIMM DDR2-667 memory per system. The MPI implementation is Intel MPI version 3.2 running on RedHat EL 5 Update 2 OS with Intel 11.1 compilers. 64 ranks were used.

Different reference machines are used because

  1. The large data set benchmarks are designed for larger clusters than the one used to evaluate the medium data set benchmarks.
  2. Node designs have progressed over the years, including faster CPU frequencies, larger caches and denser memory.
These machine differ dramatically from the ones used for SPEC/OMP and SPEC/CPU, since the MPI2007 suites represent a fundamentally different class of applications.

Note also that when comparing any two systems measured with the MPI2007, their performance relative to each other would remain the same even if a different reference machine was used. This is a consequence of the mathematics involved in calculating the individual and overall (geometric mean) metrics.

Measurements.Timing.02: How long does it take to run the SPEC MPI2007 benchmark suites on their reference platforms?

For the medium benchmark suite, the original reference cluster took about 1 hour to build the base versions of the executables and 12 hours to finish a single iteration of mref workload. So a minimally rule-conforming run would take 24 hours (for 2 iterations) and the more common 3 iteration run would take 36 hours.

A more modern cluster was used to calibrate the large benchmark suite; it took about 30 minutes to build the base versions of the executables and 12 hours to finish a single iteration of the lref workload. So a minimally rule-conforming run would take 24 hours (for 2 iterations) and the more common 3 iteration run would take 36 hours.

See the question above for details of the two reference clusters.

Measurements.Timing.03: The reports don't list the reference times. Where can I find them?

Right here:

                    Medium    Large
104.milc             1565      
107.leslie3d         5220
113.GemsFDTD         6308
115.fds4             1951
121.pop2             4128      3891
122.tachyon          2797      1944
125.RAxML                      2919
126.lammps           2915      2459
127.wrf2             7796
128.GAPgeofem        2065      5934
129.tera_tf          2768      1099
130.socorro          3817
132.zeusmp2          3103      2120
137.lu               3676      4202
142.dmilc                      3684
143.dleslie                    2924
145.lGemsFDTD                  4411
147.l2wrf2                     8204

You can also extract these using the command grep '[0-9]' benchspec/MPI2007/*/data/[ml]ref/reftime from the top of the MPI2007 directory-tree. The MPI2007 reports include the ranks run, the run time, and the SPEC score for each observation. All this is useful, but the reference times, which were listed in the CPU2000, OMP2001, and HPC2002 reports, are omitted to limit the size of the table.

Measurements.Timing.04: Why aren't the scores for the reference platform all 1's?

To derive the MPI2007 score, the medians of the measured times are divided by the reference times, and the Geometric Mean of these quotients is the actual score. If you look at the scores for the medium data set reference platform you will see that they vary from 0.96 to 1.04, while you would expect them to all be 1.00 since the reference times are being divided by themselves. There are two reasons that this is not the case:

  1. The divisors are rounded to the closest integer for simplicity. More precise floating-point numbers could have been used to achieve quotients of 1.00, but this would imply an unrealistic degree of accuracy in the measurements.
  2. Measurements of the reference platform are gathered under reportable conditions, and are subject to the same run-to-run variations as with any other platform. An initial calibration measurement was used to derive the reference times used in the report-generation tools; then a second measurement, with the finished tools, was used to generate the posted report.

Measurements.Timing.05: I bought a machine from company XYZ. When I run the MPI2007 benchmark suite, it doesn't score as high as what was published.
What do I do?

SPEC member companies follow due diligence in preparing and reviewing the reports for publish, and these kinds of problems have not been an issue thus far. Note that SPEC/HPG allows a 5% tolerance in measured performance, to account for run-to-run variability. Measurements within this range are considered to be consistent. Measurements which are consistently below this range would indicate that something is different in the hardware or software configuration.

First, make sure that your system is configured the same as the one in the published report. Then use the same config-file as what was used in the report. If the results don't match, you will need to notify company XYZ that the performance doesn't match, either through their Marketing department or through their SPEC/HPG representative. They may be able to correct the problem. If not, contact the SPEC/HPG committee at mpi2007support@spec.org.

In many cases it is not possible to exactly reproduce the configuration that was used to produce the published report: the size or type of the memory cards may be different, or the OS or compilers may be at different levels than were used in the publish. These kinds of differences could account for the gap in performance. SPEC/HPG would require the vendor to add further details to the published report so it will be valid for the configuration that was used.

Measurements.Timing.06: The machines we're shipping do not yield the same measurements as what we published. Is this a problem?

SPEC/HPG recognizes that changes from pre-production to GA hardware and software, or changes between levels of hardware or software releases, can cause degradations in performance as measured by the benchmarks. A general tolerance of 5% is allowed, to account for run-to-run variations, and a degradation within this 5% is acceptible even if it is due to specific causes.

Measurement discrepancies have never been an issue so far. SPEC member companies have been careful with their measurement and review procedures, and in at least one case have withdrawn and replaced results on their own initiative, that had been internally determined to be irreproducible.

Assuming you don't correct the results, you take risks that disappointed customers will return your systems or that competitors' marketing literature will exploit these inconsistencies and damage your credibility. SPEC itself could publish comments on your claims or mark your published results as invalid.

Measurements.Timing.07: How was the 5% tolerance decided?

Benchmark timings cannot be exactly reproduced even between consecutive runs of the same binary with the same system settings. The reproducibility tolerance is expected to account for allowable run-to-run variability, and any degradation beyond this is likely to indicate deficiencies in the hardware or software.

Since MPI-parallel applications are composed of communicating asynchronous processes, they show higher run-to-run variability than do serial benchmarks. SPEC/HPG studied the benchmark runtimes on a variety of systems used in the MPI2007 internal acceptance tests and chose 5% as a tolerance that such systems can be expected to meet. As MPI2007 comes into wider use and is measured across a larger set of machines, the SPEC/HPG committee can increase this range of tolerance if it is decided to be necessary.

If you are overly concerned with the stability of your measured results, you can improve it by running more iterations of the benchmark suite. Statistically, both the median measurements of the individual benchmarks and the geometric mean composite of the whole suite will show much lower variability than the individual benchmark measurements themselves. At least two iterations of benchmark runs are required in a reportable run, and increasing this iteration count will reduce the variability of the composite even further.

Reporting Run Results

Measurements.Reporting.01: Where are SPEC MPI2007 results available?

Results for measurements submitted to and approved by SPEC are posted at http://www.spec.org/mpi2007.

Measurements.Reporting.02: How do I submit my results?

The runspec utility leaves a file with a path result/MPIM2007.*.rsf. If the run was reportable, you can mail this file (or attach it to a message) to submpi2007@spec.org. If your company is not a member of SPEC/HPG, you will need to make arrangements with the committee to have the results reviewed.

Measurements.Reporting.03: How do I edit a submitted report?

The SPEC members' website contains lists of reports that have been submitted or are under review. When your company has joined SPEC, you will have access to these web-pages. To edit a submitted report, you will download the .sub file for the submission, edit it, and mail it to resubmpi2007@spec.org. Note that the file will not be accepted if you edit anything below the line

# =============== do not edit below this point ===================

If your company is not a member of SPEC/HPG, you will need to make arrangements with the editor at editor@spec.org to receive a copy of the editable .sub file.

Measurements.Reporting.04: How do I determine the Hardware & Software availability dates? What about Firmware?

A SPEC MPI2007 result report is intended to contain enough information to be able to construct an approximation of the system and reproduce the result within the allowable margin of error. This margin of error gives some flexibility in the level of detail to which the system is reported -- for example, any reasonable choice of power cable is unlikely to affect the performance, nor is the arrangement of the external network, so these components are not reported at all. Some components, such as the disk and the memory DIMMs, are described using a few specifications (such as size and speed), rather than providing specific model names or numbers.

In the result reports, the Hardware Availability and Software Availability fields contain the earliest date at which all the necessary hardware and software components are available. So if some hardware components become available after others, the overall hardware availability date is the availability date of the latest component; likewise the overall software availability date is the availability date of the latest software component. The dates are set in the hw_avail and sw_avail fields of the initial config file or the final .rsf file from the run.

If components of your system are no longer available at the time of a publish, SPEC/HPG does not necessarily prevent you from publishing the results. If an old component can be replaced by one that is currently available, or planned to come available within the 90 day window, and the system would not perform more than 5% worse using the newer component, you can report the system as using the new component instead. In the case where the main component (i.e. the processing node) is neither available, nor does it have a replacement, the overall performance measurements are unlikely to be competitive with more current systems, in which case reproducing results will not be an issue.

Firmware availability dates are not reported nor are they rolled into either of the Hardware Availability or Software Availability dates.

Measurements.Reporting.05: How do I describe my Interconnect topology?

Ordinarily you use the following forms

interconnect_XXX_hw_topo             = single switch
spec.mpi2007.interconnect_XXX_hw_topo: single switch
   

to describe the Interconnect topology. The first form is what you would write in your config file and the second form is used in the header of the generated .rsf file. Note that if your config file does not contain the first form, the generated .rsf file may or may not contain the second form with an empty field, depending on whether the Interconnect had been declared; you will need to complete the field in the .rsf file yourself.

The topo field is used to describe the arrangement of the switches and cables. The internal arrangement of the switches is not to be described, unless it is reconfigurable. The topo field is assigned values like

single switch
two-level fat-tree
hypercube
3x4 torus
	

which are simple mathematical specifications of symmetric arrangements. In the case of an asymmetric or heterogeneous arrangement, or if you had to explicitly configure any switches beyond the default settings, you will need to explain the arrangement in the notes_interconnect_XXX section.

Measurements.Reporting.06: What do I do if a flag or library changes in the GA?

If you report a measurement using pre-GA software, some component may change by the time it GA's. If it is the case that only the version number changes, you can edit the published reports to state the GA version number.

If it is the case that a compilation flag or library name changes, such that the form you had used is invalid, you can append a note to the published reports to state the correct form to use. Note that if a third party downloads the config-file from your submission, and runs it on a GA system, it will fail due to the invalid flag or library name. The information you provide should be enough for them to recognize the problem and apply the change.

If it is that case that the semantics of compilation flag or function of a library changes in a way that the benchmarks still work but do not deliver the claimed level of performance, and some other choice of settings will achieve the stated level of performance, you need to repeat your measurement under these GA conditions and resubmit it to SPEC.

Measurements.Reporting.07: What's all this about "Submission Check -> FAILED" littering my log file and my screen?
At the end of my run, why did it print something like this?

format: Submission Check -> FAILED.  Found the following errors:
- The "hw_memory" field is invalid.
  It must contain leading digits, followed by a space, and a standard unit abbreviation.
  Acceptable abbreviations are KB, MB, GB, and TB.
  The current value is "20480 Megabytes".

A complete, reportable result has various information filled in for readers. These fields are listed in the table of contents for config.html. If you wish to submit a result to SPEC for publication at www.spec.org/mpi2007, these fields not only have to be filled in; they also have to follow certain formats. Although you are not required to submit your result to SPEC, for convenience the tools try to tell you which reporting details will need to be corrected. In the above example, the tools would stop complaining if the field hw_memory said something like "20 GB" instead of "20480 Megabytes".

Notice that you can repair minor formatting problems such as these without doing a re-run of your tests. You are allowed to edit the rawfile, as described in utility.html.

Measurements.Reporting.08: What is a "flags file"? What does "Unknown Flags" mean in a report?

SPEC MPI2007 provides benchmarks in source code form, which are compiled under control of SPEC's toolset. Compilation flags (such as -O5 or -unroll) are detected and reported by the tools with the help of flag description files. Therefore, to do a complete run, you need to (1) point to an existing flags file (easy) or (2) modify an existing flags file (slightly harder) or (3) write one from scratch (definitely harder).

  1. Find an existing flags file by noticing the address of the .xml file at the bottom of any result published at www.spec.org/mpi2007. You can use the --flagsurl switch to point your own runspec command at that file, or you can reference it from your config file with the flagsurl option. For example:
       runspec --config=myconfig --flagsurl=http://www.spec.org/mpi2007/flags/sun-studio.xml int
  2. You can download the .xml flags file referenced at the bottom of any published result at www.spec.org/mpi2007. Warning: clicking on the .xml link may just confuse your web browser; it's probably better to use whatever methods your browser provides to download a file without viewing it - for example, control-click in Safari, right click in Internet Explorer. Then, look at it with a text editor.
  3. You can write your own flags file by following the instructions in flag-description.html.

Notice that you do not need to re-run your tests if the only problem was Unknown flags. You can just use runspec --rawformat --flagsurl

Measurements.Reporting.09: Can SPEC MPI2007 results be published outside of the SPEC web site? Do the rules still apply?

Yes, SPEC MPI2007 results can be freely published if all the run and reporting rules have been followed. The MPI2007 license agreement binds every purchaser of the suite to the run and reporting rules if results are quoted in public. A full disclosure of the details of a performance measurement must be provided on request.

SPEC strongly encourages that results be submitted for publication on SPEC's web site, since it ensures a peer review process and uniform presentation of all results.

The run and reporting rules for research and and academic contexts recognize that it may not be practical to comply with the full set of rules in some contexts. It is always required, however, that non-compliant results must be clearly distinguished from rule-compliant results.

Measurements.Reporting.10: Can I use "derived" metrics, such as cost/performance or flops?

The MPI2007 suite defines and calculates the speed metrics SPECmpiM_base2007, SPECmpiM_peak2007, SPECmpiL_base2007 and SPECmpiL_peak2007. These are the only MPI2007 metrics defined by SPEC. Any other metrics you derive from these are *not* endorsed by SPEC, and must not be referred to as SPEC metrics. For example, using the term "SPEC megaflops" is misleading, since SPEC does not define any such metric. This is a violation of the SPEC trademark.

Further, while the SPECmpiM_base2007, SPECmpiM_peak2007, SPECmpiL_base2007 and SPECmpiL_peak2007 measurements may have been reviewed and approved by SPEC, any calculations based on these measurements will not have passed the same approval process and should always be referred to as estimates. It is recommended that such information be limited to research reports, where the speculative nature of the comparisons is well understood, as opposed to marketing literature.

Measurements.Reporting.11: It's hard to cut/paste into my spreadsheet...

Files are available with the results formatted as comma-separated values, which are easy to copy and paste into spreadsheets. With published results, the .csv formatted file is listed next to the viewable formats like HTML and PDF. If you are viewing one of these formats in your browser, you can change the extension of the link address from .html or .pdf etc. to .csv to see it.

When running a new test, you can use the invocation form runspec --format=csv ... or add the entry output_format=csv ... to your config file. For a result you've just generated, you can use the command rawformat --format=csv MPIM2007.*.rsf to generate the corresponding .csv file.

 


Copyright © 2007-2010 Standard Performance Evaluation Corporation
All Rights Reserved