SPEC MPI2007: Changes in V1.1

Last updated: $Date: 2008/09/17 14:48:09
(To check for possible updates to this document, please see http://www.spec.org/mpi2007/Docs/ )

Introduction: Who Wants V1.1?

SPEC MPI2007 V1.1 is an incremental update to SPEC MPI2007 V1.0. Results generated with V1.1 are comparable to results from V1.0 and vice versa. V1.1 is intended to improve compatibility, stability, documentation and ease of use. Changes are intended to be useful to several kinds of users:

For users of new platforms:

Updates to 5 benchmarks, to improve portability and code correctness

For anyone who reads (or produces) a result:

Cleanup of graph formats
Improvements to report readability and several detailed changes to improve accuracy.
Clarifications and additions to the Run and Reporting Rules, including rules for disclosure of "user-built" systems, and comparison of estimates.

For researchers and developers:

Documentation of the monitor hooks
Easier investigation of alternate sources or workloads, via the convert_to_development utility.

For new users of the suite:

Clarifications and additions to the documentation.

For those who test many platforms:

Fewer unexpected rebuilds
The ability to bundle up a set of binaries and their associated config file, for easy transportation and use on other systems.
Many other features intended to make benchmarking easier, more productive, and less error-prone, as summarized in the table of contents.

If you have already used SPEC MPI2007 V1.0 and already have configuration files, it is recommended that you read through this document, so as to avoid surprises during use of V1.1 Among the changes that you are likely to be affected by are the change to build directory locations, the naming of data sets, the automatic setting of test date, and the addition of debug logs.

Contents

(This table of contents proceeds in rough order of time for a user of the suite: you acquire a platform, ensure that you are familiar with the rules, build the benchmarks, run them, generate reports, and occasionally use utilities and other features.)

Benchmark source code changes

1.5 estimates

4.2.8 user-built systems

Building Benchmarks

1. Build directories separated

2. Bundle up binaries and config file

3. Parallel builds on Windows too

4. Unexpected rebuilds reduced

Running the Suite

1. Per-benchmark basepeak and copies - behavior change

2. PreENV allows setting of environment variables

3. Runtime monitoring

4. Data Sets - name changes

Reports

1. CSV updated

2. Flag reporting - multiple files supported, flag order preserved, report readability

3. Graphs cleaned up

4. Links and attachments

5. Report names have changed

6. Seconds are reported with more digits

7. Submission check automatically included with rawformat

8. Test date automatically set

Utilities

1. Convert to Development

2. Dump alternative source

3. Index

4. Make alternative source

5. ogo top takes you to $GO

6. port_progress

7. specrxp

Other New and Changed Tools Features

1. Benchmark lists and sets can be referenced

2. Debug logs

3. Keeping temporaries

4. Submit lines continued

5. Submit notes

6. Trailing spaces in config files

V1.0 Errata

Documentation

Updated Feature Index

Note: links to SPEC MPI2007 documents on this web page assume that you are reading the page from a directory that also contains the other SPEC MPI2007 documents. If by some chance you are reading this web page from a location where the links do not work, try accessing the referenced documents at one of the following locations:

www.spec.org/mpi2007/Docs/
The $SPEC/Docs/ (Unix) or %SPEC%\Docs\ (Windows) directory on a system where SPEC MPI2007 has been installed.
The Docs/ directory on your SPEC MPI2007 distribution DVD.

Changes to benchmarks

The following benchmark changes were made in V1.1:

104.milc
- Updated the code to correct potential uninitialized memory errors that have been reported.
113.GemsFDTD
- Fixes array addressing issue for rank counts above 256 and uses STOP statement for rank termination instead of call exit.
127.wrf2
- Corrects subroutine calls which used illegal parameters.
129.tera_tf
- Corrects a potential race condition that exists in the code.
130.socorro
- Allows for proper linking between MVAPICH and PathScale compilers.

Run Rules Changes

Explain a philosophy of estimates in rule 1.5, and clarify marking of estimates in rule 4.6.
Add new rule 4.2.8 regarding disclosure of configurations for user-built systems.

Building Benchmarks

Build directories separated: Benchmarks are now built in directories named benchspec/MPI2007/nnn.benchmark/build/build... (or, on Windows, benchspec\MPI2007\nnn.benchmark\build\build...), rather than under the benchmark's run subdirectory. The change is intended to make it easier to copy, backup, or delete build and run directories separately from each other. (It may also make problem diagnosis easier in some situations, since your habit of removing all the run directories will no longer destroy essential evidence 10 minutes before the compiler developer says "Wait - what exactly happened at build time?").

If you prefer the V1.0 behavior, you can revert to it by setting build_in_build_dir to 0.
You can now bundle up a set of binaries and their associated config file, for easy transportation and use on other systems.

WARNING: Although the features to create and use bundles are intended to make it easier to run SPEC MPI2007, the tester remains responsible for compliance with the run rules. And, of course, both the creators and the users of bundles are responsible for compliance with any applicable software licenses.
Parallel builds on Windows too: Users of Microsoft Windows systems can now use multiple processors to do parallel builds, by setting makeflags, for example:
```
makeflags = -j N
```
This feature has worked with SPEC CPU testing on Unix for many years; what's new in MPI2007 V1.1 is the ability to use it on Windows. Note that requesting a parallel build with makeflags = -j N causes multiple processors to be used at build time. It has no effect on whether multiple processors are used at run time, and so does not affect how you report on parallelism.
Unexpected rebuilds reduced: In V1.0, the tools were much more likely to trigger automatic rebuilds of the benchmark binaries than they are in V1.1, because unrecognized options (e.g. a mis-spelled CXXOPTIMZIE, or a user-defined option such as MY_OPTS) would be passed to specmake, and the tools had no way to know what specmake did with such options. Now, the tools record only what is actually used by specmake, plus the options that are sent to the shell (e.g. via fdo_pre0). With this more careful recording, config file changes do not trigger rebuilds unless they actually affect the generated binary.
Data Sets: The naming convention for the particular data sets that you want to run have been changed. The old names of test, train and ref are now considered classes of runs. They are no longer consider specific data sets to run. The new corresponding names are mtest, mtrain and mref. For reportable runs, it will automatically select mref as the data set. This was done to support the future addition of the large data sets (which are scheduled for V2.0).

Running Benchmarks

Per-benchmark basepeak and copies - behavior change: If you select basepeak=1 for an individual benchmark, the number of copies in peak will be forced to be the same as in base. Note that in SPEC MPI2007 V1.0, you could set basepeak for a benchmark, and still change the number of copies in peak; this was deemed to be an error. If you want to run the same tuning in both base and peak, while changing the number of copies, you will need to build two binaries with the same compiler switches.
The PreENV config file option allows setting of environment variables prior to the exectuion of runspec.
Run-time monitoring: The monitor hooks have been a little-known feature of the SPEC CPU toolset for many years. They were first described in the ACM SIGARCH article SPEC CPU2006 Benchmark Tools and are now further described in monitors.html. The monitor hooks allow advanced users to instrument the suite in a variety of ways. SPEC can provide only limited support for their use; if your monitors break files or processes that the suite expects to find, you should be prepared to do significant diagnostic work on your own.

Reporting

CSV format updated - If you populate spreadsheets from your runs, you probably shouldn't be doing cut/paste of text files; you'll get more accurate data by using --output_format csv. The V1.1 CSV output now has a format that includes much more of the information in the other reports. All runs times are now included, and the selected run times are listed separately. The flags used are also included. Although details of the new features are not shown in the documentation, you should explore them, by taking the new CSV out for a test drive. It is hoped that you will find the V1.1 format more complete and more useful.
Flag reporting - multiple files supported, flag order preserved, report readability There are several changes to reporting on compiler flags:
1. You can now format a single result using multiple flags files. This feature is intended to make it easier for multiple results to share what should be shared, while separating what should be separated. Common elements (such as a certain version of a compiler) can be placed into one flags file, while the elements that differ from one system to another (such as platform notes) can be maintained separately. Suggestions on use of this feature can be found in flag-description.html.
2. The flag reporter now does a better job of reporting flags in the same order in which they appeared on the command line.
3. Flag reporting has been re-organized in an attempt to improve readability:
  1. Within the Optimization Flags section, the report no longer prints phrases such as "Fortran benchmarks (except as noted below):" because readers may not remember which benchmarks are in Fortran. Instead, all the Fortran benchmarks are enumerated, and if some use the same flags as others, that fact is noted in line, rather than at the top of the list.
  2. Within the Portability Flags section, benchmarks appear in order by number, rather than ordered by language.
  3. When the reporter detects that base and peak are sufficiently different from each other (e.g. different compilers, or different portability options) the flags report is ordered to put all the base information first, then all the peak information - for example:
```
       Base Compiler Invocation
       Base Portability Flags
       Base Optimization
       Peak Compiler Invocation
       Peak Portability Flags
       Peak Optimization
```

Graphs cleaned up:

V1.0 format:
oldgraph

V1.1 format:
newgraph

Graphs have been changed to reduce the amount of shading, and to reduce painting of other pixels that were not essential to the data being presented (with a tip of the hat to Professor Tufte's notion of reducing "chartjunk", or apologies, depending on the reader's opinion of the change).

Links and attachments can now be added to reports.
Report names have changed:

In MPI2007 V1.0, final reports had names of the form
```
<suite>.<nnn>.<type>
```
for example, MPIM2007.003.ps, MPIM2007.003.txt, MPIM2007.022.pdf, and so forth. The form of the file names has changed to now be
```
<suite>.<nnn>.<workload>.<type>
```
for example, MPIM2007.003.ref.ps, MPIM2007.003.ref.txt, MPIM2007.022.ref.pdf.

There are two reasons for this change:
- For MPI2007, all "reportable" runs use a workload named "mref", but this is not necessarily true for other benchmarks that use the same toolset. Designating the workload in the file name reduces possible ambiguity.
- If you select the mtest or mtrain workloads (with the --size switch), output files for V1.0 were already tagged with the workload designator; this change causes ref to match the other two.
Seconds are reported with more digits:
- Background: For certain values, the SPEC tools print 3 significant digits. This is intentional. For example, if one system has a SPECmpiM_peak2007 performance of 1234.456 and another has a SPECmpiM_peak2007 performance of 1229.987, it is arguable that the performance of these systems is not materially different. Given the reality of run-to-run variation (which is, sometimes, on the order of 4%), it makes sense to report both systems' SPECmpiM_peak2007 as 1230.
  
  Although there is agreement that it is acceptable to round SPEC's computed metrics to 3 significant digits, it has been noted that the argument is weaker for rounding of original observations. In particular, if we wish to acknowledge the reality of run to run variation, then it seems reasonable to report a time of 1234.456 seconds using an integral number of seconds (1234), rather than rounding to the three significant digits, which in this case would be rounding to the nearest 10 seconds (1230).
- Change made: Ever since the release of V1.0 of SPEC MPI2007, results posted on SPEC's web site (such as the HTML, PDF, and text formats) have used 3 significant digits for computed metrics, and seconds larger than 1000 have been reported as an integral number of seconds. As of V1.1, reports produced on your own test systems now behave the same way.
The Submission Check report is now automatically included in the output_format list when using rawformat. This change was made because the typical use of rawformat is to create final (submission quality) reports. Even if you don't plan to submit your result to SPEC, the checks that are done by Submission Check can help you to create reports that are more complete and more readable.
The test_date is now automatically set from the system clock, and you should not set it yourself.

New Utilities Features

Convert to Development: In order to assist with compliance with the run rules (so that results are meaningful and comparable), the SPEC CPU tools perform various checks to ensure that benchmark code, workloads, and tools match the original distribution. Sometimes, though, researchers or developers may want to work in an environment without these checks, for example, when modifying code to add performance instrumentation.

Prior to V1.1, doing so typically required that you abandon the tools. With V1.1, you now have another choice: you can continue using the SPEC supplied toolset in a development sandbox, via the convert_to_development utility.
Dump alternative source: dumpsrcalt is a utility which shows you the content of src.alts
The index utility remains UNSUPPORTED, but is now documented for the first time
Make alternative source: makesrcalt is a utility which is used to create packages with newly developed alternative sources. This utility is ehanced in, and is documented for the first time in, V1.1.
ogo top: If you type ogo without any parameters, or if you type ogo top, the command sets your current directory to $GO instead of to $SPEC.
The port_progress utility is documented now.
The specrxp utility validates flags files. It is called automatically, or you can call it directly if you wish.

Other New and Changed Tools Features

Benchmark lists and sets: Two formerly undocumented features are now documented: your config file can reference benchmark lists and sets. Set references use the various "bset" files that are found $SPEC/benchspec/MPI2007 or %SPEC%\benchspec\MPI2007. If you are a user who already has noticed this feature, please note that the definitions of the bsets have changed, and the number of bsets has been reduced.
Debug logs: Failed runs now leave behind additional detail, in files such as MPI2007.001.log.debug. Temporary files are also left behind after a failed run. If you are managing disk space on a tight budget, you'll need to adjust your cleaning methods.
Keeping temporaries: If you are having trouble debugging your test setup (for example, if your new submit command or parallel_test option is failing), you may want to try the new keeptmp feature. When this option is set, the above-mentioned debug log is kept, along with the various temporary files that it mentions.

If you leave keeptmp at its default setting, temporary files will be automatically deleted after a successful run. If you are managing disk space on a tight budget, and keeping temporaries, you'll almost certainly need to adjust your cleaning methods.
submit lines continued: It is now possible to append a numeral to submit lines, to continue your submit commands over several lines. This feature is intended to improve the readability of your config file when using the submit feature.
Submit notes: The tools will now automatically insert a section with notes on your submit command for runs that use submit. You can customize the section.
Trailing spaces are now stripped in config files, unless preceded by a backslash, as described in the section on whitespace.

V1.0 Errata

The errata items from V1.0 have been corrected. These include:

"ranks" in base section
Compound macro names have issues
Run Rules documentation, report format
%ifdef may not work
Config file documentation
Report format
Links in html reports

Documentation Updates

Documentation has been added for the new features mentioned in this document. Most of the changes are linked from the descriptions above. A few items might not be immediately obvious from the above links, and are called out here:

config.html	A new chapter About Alternate Sources was added. A new section on automatic rebuilds suggests a way to test whether proposed changes would force a rebuild (without actually doing the build). The documentation now tells you what happens with macros that aren't defined if you try to use them. All options that affect runspec are described together. In V1.0, there were two tables, one for the options that could be mentioned either on the command line or in the config file, and a separate table for options that can only be mentioned in a config file. A sidebar about quoting was added, to try to help reduce confusion when you are trying to ensure that variables are interpreted by the correct software. The documentation of log files now suggests some useful search strings that can help you as you try to find your way through a log. The documentation of submit was rewritten and expanded.
flag-description.html	Flag file types have been clarified, using an example that points to the three files for result #00001, as posted at www.spec.org. A complete example is provided to show how you can edit a flags file and use rawformat to incorporate it. A "Recommended Practices" section has been added. The discussions of replacement of example text - both <example> and <ex_replacement> - has been considerably expanded to explain the difference between the two, and examples of their use are shown.
runspec.html	The description of directory sharing via output_root now starts with a simple summary of the steps. More details are given about --review. The documentation now describes the run order for reportable runs. The output format subcheck is explained. The description of --update now explains that additional items might be updated, not just your flags files.

Updated Feature Index

These user-visible features are new, updated, or newly documented for SPEC MPI2007 V1.1:

monitor_pre
monitor_pre_bench
monitor_specrun_wrapper
monitor_wrapper
ogo top, destination of
port_progress
post_setup
preenv
rebuilds, reduced
reports, names of
search strings in logfiles
seconds, reporting thereof
sets of benchmarks
specrxp
Submission Check, included with rawformat
submit notes
submit, continuation of
test_date
trailing spaces, stripped
unpack_bundle
use_bundle