SPECjbb2000 Run and Reporting Rules
This document specifies how Release 1.0 of the SPECjbb2000 benchmark is to be run for measuring and publicly reporting performance results. These rules abide by the norms laid down by SPEC. This ensures that results generated with this benchmark are meaningful, comparable to other generated results, and are repeatable (with documentation covering factors pertinent to duplicating the results).
Per the SPEC license agreement, all results publicly disclosed must adhere to these Run and Reporting Rules.
SPEC intends that this benchmark measure overall systems that provide environments for running server-side Java applications. It does not measure Enterprise Java Beans (EJBs), servlets, or Java Server Pages (JSPs).
The general philosophy behind the rules for running the SPECjbb2000 benchmark is to ensure that an independent party can reproduce the reported results.
For results to be publishable, SPEC expects:
SPEC is aware of the importance of optimizations in producing the best system performance. SPEC is also aware that it is sometimes hard to draw an exact line between legitimate optimizations that happen to benefit SPEC benchmarks and optimizations that specifically target the SPEC benchmarks. However, with the rules below, SPEC wants to increase the awareness by implementors and end users of issues of unwanted benchmark-specific optimizations that would be incompatible with SPEC's goal of fair benchmarking.
Furthermore, SPEC expects that any public use of results from this benchmark shall be for configurations that are appropriate for public consumption and comparison.
In the case where it appears that the above guidelines have not been followed, SPEC may investigate such a claim and request that the offending optimization (e.g. a SPEC-benchmark specific pattern matching) be backed off and the results resubmitted. Or, SPEC may request that the vendor correct the deficiency (e.g. make the optimization more general purpose or correct problems with code generation) before submitting results based on the optimization.
SPEC reserves the right to adapt the benchmark codes, workloads, and rules of SPECjbb2000 as deemed necessary to preserve the goal of fair benchmarking. SPEC will notify members and licencees whenever it makes changes to the benchmark and may rename the metrics. In the event that the workload or metric is changed, SPEC reserves the right to republish in summary form "adapted" results for previously published systems, converted to the new metric. In the case of other changes, a republication may necessitate retesting and may require support from the original test sponsor.
Tested systems must provide an environment suitable for running typical Java Version 1.1 programs and must be generally available for that purpose. Any tested system must include an implementation of the Java (tm) Virtual Machine as described by the following references, or as amended by SPEC for later Java versions:
The following are specifically allowed, within the bounds of the Java Platform:
The system must also include an implementation of those classes that are referenced by this benchmark as described within either of the following sets of references:
SPEC does not intend to check for implementation of APIs not used in the benchmark. For example, the benchmark does not use AWT, and SPEC does not intend to check for implementation of AWT.
Feedback directed optimization and precompilation are allowed. However, the classes listed at the end of this section, which are distributed in the file jbb_no_precompile.jar, must not be precompiled or feedback-optimized before the measured invocation of the benchmark.
The requirement to avoid precompilation of certain classes at compile time shall not be taken to forbid the use of run-time dynamic optimization tools that would observe the reference execution and dynamically modify the in-memory copy of the benchmark. However, such tools would not be allowed to in any way affect later executions of the same benchmark (for example, if your product compiles any of the restricted classes and saves that compilation to disk, you would need to remove everything compiled during the previous run.) Such tools would also have to be disclosed in the submission of a result, as with any other software component (see section 3.5).
The following set of classes, which are distributed in the file jbb_no_precompile.jar, must not be precompiled or feedback-optimized. The benchmarker must disclose the mechanism used to exclude these classes from precompilation or optimization, either via exclusion flags or explicit removal of precompiled code. These are in jbb_no_precomplie.jar.
spec.jbb.Company spec.jbb.NewOrder spec.jbb.Orderline spec.jbb.Order spec.jbb.History spec.jbb.Customer spec.jbb.District spec.jbb.Stock spec.jbb.Address spec.jbb.Warehouse spec.jbb.Item
The SPECjbb2000 benchmark binaries are provided in jar files containing the Java classes. Valid runs must use the provided jar files and these files must not be updated or modified in any way. While the source code of the benchmark is provided for reference, the benchmarker must not recompile any of the provided .java files. Any runs that used recompiled class files are not valid and can not be reported or published.
A set of sequential points is run starting at 1 warehouse up through the minimum of 8 warehouses or 2 * N warehouses, where N is the number of warehouses where the peak throughput is observed. The sequence must increment by 1. The test may be configured to run beyond 2 * N warehouses. These points beyond the 2 * N point will appear in the report and on the graph, but will not be used to calculated the metric.
In some cases, the system under test may not be able to run all the points up to 2*N. If the system is able to run up to M warehouses, where N < M < 2*N, the test will still be marked valid and the missing points from M+1 to 2*N will be considered to have a throughput of 0 ops/sec for the purposes of metric computation. In this situation, the benchmarker is strongly encouraged to contact the HW and SW vendors for fixes that would allow the system to run to 2*N. If such fixes are not available, the benchmarker also has the option of disabling some CPUs and rerunning the test. Since the location of the peak, N, is strongly correlated to the number of CPUs in the system, reducing the number of CPUs will reduce N, and correspondingly, 2*N.
The following are required for a valid run:
The SPECjbb2000 benchmark consists of a single executable, which runs on a single machine. Use of clusters or aggregations of machines is specifically disallowed. No network, database, or web server components are required, only a Java environment.
SPEC requires the use of a of single file system to contain the directory tree for the benchmark being run. SPEC allows any type of file system (disk-based, memory-based, NFS, DFS, FAT, NTFS, etc.) to be used. The type of file system must be disclosed in reported results.
Any deviations from the standard, default configuration for the SUT will need to be documented so an independent party would be able to reproduce the result without further assistance.
These changes must be "generally available", i.e., available, supported and documented. For example, if a special tool is needed to change the OS state, it must be provided to users and documented.
There are a number of parameters, in two properties files, that control the operation of SPECjbb2000. Parameter usage is explained in the benchmark documentation. The properties in the "Fixed Input Parameters" section of the file "SPECjbb.props" must not be changed from the values as provided by SPEC. The properties in the "Changeable Input Parameters" section may be set to any desired value.
All benchmark settings must be reported, as well as the command line used for the reported run, and for precompilation, if any.
Both JVMs and native compilers are capable of modifying their behavior based on flags. Flags which do not break conformance to section 2.1 are allowed.
The sequence of points is run as described in section 2.3. The throughputs, in ops/second, for all the points from the peak at N warehouses through the point at 2 * N warehouses, inclusive, are averaged. SPECjbb2000 reports this average as the metric, SPECjbb2000 ops/sec. As detailed in section 2.3, missing points in the range N+1 to 2*N will be considered to have a throughput of 0 ops/second in the metric computation.
The reporting tool contained within SPECjbb2000 produces a graph of the throughput at all the measured points with warehouses on the horizontal axis and throughputs on the vertical axis. All points from 1 to the minimum of 8 or 2*N are required to be reported. Missing points in the range N+1 to 2*N will be reported to have a throughput of 0 ops/second. The points being averaged for the metric will be marked on the report.
The run must meet all requirements described in section 2.4 to be a valid run.
All components, both hardware and software, must be generally available within 3 months of the publication date in order to be a valid publication. However, if JVM licensing issues cause a change in software availability date after publication date, the change will be allowed to be made without penalty, subject to subcommittee review.
If pre-release hardware or software is tested, then the test sponsor represents that the performance measured is generally representative of the performance to be expected on the same configuration of the released system. If the sponsor later finds the performance of the released system to be 5% lower than that reported for the pre-release system, then the sponsor is requested to report a corrected test result.
All configuration properties contained in the descriptive properties file must be accurate.
The descriptive properties file contains a parameter config.sw.tuning which should be used to document any system tuning information. SPEC is aware that mechanisms for doing this include environment flags, command line flags, configuration files, registries, etc. Whatever the mechanism, it must be fully disclosed here.
SPEC is aware that sometimes the spelling of command line switches or environment variables, or even their presence, changes between beta releases and final releases. For example, suppose that during a product beta the tester specifies:
java -XX:fast -XX:architecture_level=3 -XX:unroll 16
but the tester knows that in the final release the architecture level will be automatically set by -Xfast, and the product is going to change to set the default unroll level to 16. In that case, the actual command line used for the run should be recorded in the command-line parameter, config.command_line, and the final form of the command line should be reported in the config.sw.tuning parameter of the descriptive properties file.
The tester is expected to exercise due diligence regarding such flag reporting, to ensure that the disclosure correctly records the intended final product.
In order to publicly disclose SPECjbb2000 results, the benchmarker must adhere to these reporting rules in addition to having followed the run rules above. The goal of the reporting rules is to ensure the system under test is sufficiently documented such that someone could reproduce the test and its results.
Any SPECjbb2000 result produced in compliance with these run and reporting rules may be publicly disclosed and represented as valid SPECjbb2000 results.
Any test result not in full compliance with the run and reporting rules must not be represented using the SPECjbb2000 metric name.
Once you have a compliant run and wish to submit it to SPEC for review, you will need to provide the raw file created by the run.
Once you have the submission ready, please e-mail it to firstname.lastname@example.org
SPEC encourages the submission of results to SPEC for review by the relevant subcommittee and subsequent publication on SPEC's website. Vendors may publish compliant results independently, however any SPEC member may request a full disclosure report for that result and the benchmarker must comply within 10 business days. Issues raised concerning a result's compliance to the run and reporting rules will be taken up by the relevant subcommittee regardless of whether or not the result was formally submitted to SPEC.
Estimates are not allowed.
SPECjbb2000 results must not be publicly compared to results from any other benchmark. This would be a violation of the SPECjbb2000 reporting rules and, in the case of the TPC benchmarks, a serious violation of the TPC "fair use policy."
When competitive comparisons are made using SPECjbb2000 benchmark results, SPEC expects that the following template be used:
SPECjbb is a trademark of the Standard Performance Evaluation Corp. (SPEC). Competitive numbers shown reflect results published on www.spec.org as of (date). [The comparison presented is based on (basis for comparison).] For the latest SPECjbb2000 results visit http://www.spec.org/osg/jbb2000.
(Note: [...] above required only if selective comparisons are used.)
SPECjbb is a trademark of the Standard Performance Evaluation Corp. (SPEC). Competitive numbers shown reflect results published on www.spec.org as of Jan 12, 2001. The comparison presented is based on best performing 4-cpu servers currently shipping by Vendor 1, Vendor 2 and Vendor 3. For the latest SPECjbb2000 results visit http://www.spec.org/osg/jbb2000.
The rationale for the template is to provide fair comparisons, by ensuring that:
Any results tested by an organization other than the vendor for one of the primary system components (server hardware or JVM) must be clearly identified as such. Performance comparisons may be based only upon the SPEC defined metric (SPECjbb2000 ops/s). Other information from the result page may be used to differentiate systems, i.e. used to define a basis for comparing a subset of systems based on some attribute like number of CPU's or memory size.
SPEC encourages use of the SPECjbb2000 benchmark in academic and research environments. It is understood that experiments in such environments may be conducted in a less formal fashion than that demanded of licensees submitting to the SPEC web site. For example, a research environment may use early prototype hardware that simply cannot be expected to stay up for the length of time required to run the required number of points, or may use research compilers that are unsupported and are not generally available.
Nevertheless, SPEC encourages researchers to obey as many of the run rules as practical, even for informal research. SPEC respectfully suggests that following the rules will improve the clarity, reproducibility, and comparability of research results. Where the rules cannot be followed, SPEC requires the results be clearly distinguished from results officially submitted to SPEC, by disclosing the deviations from the rules.