SPECweb96 Release 1.0 Run and Reporting Rules
Table of Contents
-
Introduction
- 1.1 Philosophy
- 1.2 Caveat
-
Running the SPECweb96 Release 1.0 Benchmark
- 2.1 Environment
- 2.2 Measurement
-
Reporting Results for the SPECweb96 Release 1.0 Benchmark
- 3.1 Metrics And Reference Format
- 3.2 Server Configuration
- 3.3 Testbed Configuration
- 3.4 General Availability Dates
- 3.5 Test Sponsor
- 3.6 Notes/Summary of Tuning Parameters
- 3.7 Other Required Information
- Building the SPECweb96 Release 1.0 Benchmark
1.0 Introduction
This document specifies how the benchmarks in the SPECweb96 Release 1.0
suite are to be run for measuring and publicly reporting performance
results. These rules are according to the norms laid down by the SPEC Web
Subcommittee and approved by the SPEC Open Systems Steering Committee. This
ensures that results generated with this suite are meaningful, comparable
to other generated results, and are repeatable (with documentation covering
factors pertinent to duplicating the results).
Per the SPEC license agreement, all results publicly disclosed must adhere
to these Run and Reporting Rules.
1.1 Philosophy
The general philosophy behind the rules for running the SPECweb96 Release
1.0 benchmark is to ensure that an independent party can reproduce the
reported results.
The following attributes are expected:
- Proper use of the SPEC benchmark tools as provided.
- Availability of an appropriate full disclosure report.
- Support for all of the appropriate protocols.
Furthermore, SPEC expects that any public use of results from this benchmark suite shall be for servers and configurations that are appropriate for public consumption and comparison. Thus, it is also expected that:
- Hardware and software used to run this benchmark must provide a suitable environment for serving WWW documents.
- Optimizations utilized must improve performance for a larger class of workloads than just the ones defined by this benchmark suite.
-
The server and configuration is generally available, documented,
supported, and encouraged by the providing vendor(s).
1.2 Caveat
SPEC reserves the right to adapt the benchmark codes, workloads, and rules
of SPECweb96 Release 1.0 as deemed necessary to preserve the goal of fair
benchmarking. SPEC with notify members and licencees whenever it makes
changes to the suite and will rename the metrics (e.g. from SPECweb96 to
SPECweb97a). In the event that a workload is removed, SPEC reserves the
right to republish in summary form "adapted" results for
previously published systems, converted to the new metric. In the case of
other changes, a republication may necessitate retesting and may require
support from the original test sponsor.
Relevant standards are cited in these run rules as URL references, and are
current as of the date of publication. Changes or updates to these
referenced documents or URL's may necessitate repairs to the links
and/or amendment of the run rules. The current run rules will be available
at the SPEC web site at http://www.spec.org. SPEC with notify
members and licencees whenever it makes changes to the suite.
2.0 Running the SPECweb96 Release 1.0 Benchmark
2.1 Environment
2.1.1 Protocols
As the WWW is defined by its interoperative protocol definitions, SPECweb requires adherence to the related protocol standards. The benchmark environment shall be governed by the following standards:
- HTTP1.0
- Basic WWW protocol, as defined in http://www.w3.org/pub/WWW/Protocols/HTTP1.0/draft-ietf-http-spec.html.
- RFC 761
- DoD standard Transmission Control Protocol, as defined in http://info.internet.isi.edu/in-notes/rfc/files/rfc761.txt
- RFC791
- Internet Protocol, as defined in http://info.internet.isi.edu/in-notes/rfc/files/rfc791.txt
- RFC792
- Internet Control Message Protocol, as defined in http://info.internet.isi.edu/in-notes/rfc/files/rfc792.txt and updated by RFC0950.
- RFC 793
- Transmission Control Protocol, as defined in http://info.internet.isi.edu/in-notes/rfc/files/rfc793.txt
- RFC950
- Internet Standard Subnetting Procedure, as defined in http://info.internet.isi.edu/in-notes/rfc/files/rfc950.txt
- RFC 1122
- Requirements for Internet hosts - communication layers, as defined in http://info.internet.isi.edu/in-notes/rfc/files/rfc1122.txt
For further explanation of these protocols, the following might be helpful:
- RFC 1180
- TCP/IP tutorial [http://info.internet.isi.edu/in-notes/rfc/files/rfc1180.txt]
- RFC 1739
- A Primer On Internet and TCP/IP Tools [http://info.internet.isi.edu/in-notes/rfc/files/rfc1739.txt]
2.1.2 Server
For a run to be valid, the following attributes must hold true:
- The server supports the required protocols, and is not utilizing variations of these protocols to satisfy requests made during the benchmark. To ensure comparability of results, this release of SPECweb does not support other versions of the HTTP protocol such as 0.9 or 1.1.
-
The value of TIME_WAIT must be at least 60 seconds.
Rationale: SPEC intends to follow relevant standards wherever practical, but with respect to this performance sensitive parameter it is dificult due to ambiguity in the standards. RFC1122 requires that TIME_WAIT be 2 times the maximum segment life (MSL) and RFC793 suggests a value of 2 minutes for MSL. So TIME_WAIT itself is effectively not limited by the standards. However, current TCP/IP implementations define a de facto lower limit for TIME_WAIT of 60 seconds, the value used in most BSD derived UNIX implementations. SPEC expects that the protocol standards relating to TIME_WAIT will be clarified in time, and that future releases of SPECweb will require strict conformance with those standards. - The server returns the complete and appropriate byte streams for each request made.
- The server logs the following information for each request made: address of the requestor, a date and time stamp accurate to at least 1 second, specification of the file requested, size of the file transferred, and the final status of the request.
- The server utilizes stable storage for all data files and server logs. The log file records must be written to non-volatile storage at least as often as once per 60 seconds.
- The server is comprised of components that are generally available, or shall be generally available within six months of the first publication of these results.
Any deviations from the standard, default configuration for the server will need to be documented so an independent party would be able to reproduce the result without further assistance.
2.2 Measurement
2.2.1 File Set
The benchmark will make references to files located on the server. The
range of files access will be determined by the particular level of
requested load for each measurement. The particular files referenced shall
be determined by the random workload generation in the benchmark
itself.
The benchmark suite provides tools for the creation of the files to be
used. It is the responsibility of the benchmarker to ensure that these
files are placed on the server so that they can be accessed properly by the
benchmark. These files, and only these files shall be used as the target
file set. The benchmark shall perform internal validations to verify the
expected file(s); no modification or bypassing of this validation is
allowed.
2.2.2 Load Levels
Each benchmark run consists of a set of requested load levels for which an
actual measurement is made. The benchmark measures the actual level
achieved and the associated average response time for each of the requested
levels.
The measurement of all data points defining a performance curve is made
within a single benchmark run, starting with the lowest requested load
level and proceeding to the highest requested load level. The requested
load levels are specified in a list, from lowest to highest, from left to
right, respectively, in the parameter file.
If any requested load level must be rerun for any reason, the entire
benchmark run must be restarted and the series of requested load levels
repeated. No server or testbed configuration changes, server reboots, or
file system initializations (e.g., "newfs") are allowed between
requested load levels.
The performance curve must consist of a minimum of 10 data points of
requested load, uniformly distributed across the range from zero to the
maximum requested load. Additional points in addition to these 10 uniformly
distributed points can also be reported.
2.2.3 Benchmark Parameters
All benchmark parameter values must be left at their default values when generating reportable SPECweb96 results, except as noted in the following list:
- Server
- The means of accessing the desired server shall be defined. This includes the name or address(es) of the server, as well as the proper port number.
- Load
- A collection of clients called load generators is used to generate an aggregate load on the server being tested.
In particular, there are several settings that cannot be changed without invalidating the result.
- Server Fileset
- The size of the fileset generated on the server by the benchmark is established as a function of requested throughput. Thus, fileset size is dependent on throughput across the entire results curve. This provides a more realistic server load since more files are being manipulated on the server as the load is increased. This reflects typical server use in real-world environments. The default parameters of the benchmark allow the automatic creation of valid total and working filesets on the server being measured.
- Time parameters
- RUNTIME, the time of measurement for which results are reported, must be the default 600 seconds for reportable results. The WARMUP_TIME must be set to the default of 300 seconds for reportable results.
- Workload parameters
- The workload specifics are fixed by the benchmark specification. The given name of a workload file may specify any workload file properly built by the fileset generation step of the benchmark.
3.0 Reporting Results for the SPECweb96 Release 1.0 Benchmark
3.1 Metrics And Reference Format
The report of results for the SPECweb96 benchmark is generated in ASCII and
HTML format by the provided SPEC tools. These tools may not be changed,
except for portability reasons with prior SPEC approval. This section
describes the report generated by those tools. The tools perform error
checking and will flag many error conditions as resulting in an
"invalid run". However, these automatic checks are only there for
your convenience, and do not relieve you of your responsibility to check
your own results and follow the run and reporting rules.
While SPEC believes that a full performance curve best describes a
server's performance, the need for a single figure of merit is
recognized. The benchmark single figure of merit, SPECweb96, is the peak
throughput measured during the run (reported in operations per second). For
a result to be valid, the peak throughput must be within 5% of the
corresponding requested load. The results of a benchmark run, comprised of
several load levels, are plotted on a performance curve on the results
reporting page. The data values for the points on the curve are also
enumerated in a table.
No data point within 25% of the maximum reported throughput may be
reported where the number of failed requests for any file class is greater
than 1% of total requests for that file class, plus one. No data point
within 25% of the maximum reported throughput may be reported whose
"Actual Mix Pcnt" versus "Target Mix Pcnt" differs by
more than 10% of the "Target Mix Pcnt" for any workload class.
E.g., if the target mix percent is 0.35 then valid actual mix percents are
0.35 +/- 0.035.
3.1.1 Table Format
The server performance graph is contstructed from a table containing the data points from a single run of the benchmark. The table consists of two columns:
- Throughput in terms of operations per second rounded to the nearest whole number
- Average Server Response Time
3.1.2 Graphical Format
Server performance is depicted in a plot with the following format:
- Average Server Response Time is plotted on the Y-axis.
- Throughput is plotted on the X-axis.
All data points of the plot must be enumerated in the table described in paragraph 3.1.1.
3.1.3 Detailed Results
The SPEC tools will allow verbose output optionally to be selected, in which case additional data are reported in a table:
- Requested Load
- Throughput in terms of operations per second rounded to the nearest whole number
- File class, 1, 2, 3, or 4
- Target Mix percentage
- Actual Mix percentage. This is flagged as an error if the mix requirements of paragraph 3.1 are not met.
- Operation Success Count
- Operation Error Count. This is flagged as an error if the error rate requirements of paragraph 3.1 are not met.
- Average Server Response Time (in Milliseconds rounded to the nearest tenth)
- Standard Deviation of Server Response Time
- 95% confidence interval Server Response Time
3.2 Server Configuration
The system configuration information that is required to duplicate published performance results must be reported. This list is not intended to be all-inclusive, nor is each feature in the list required to be described. The rule of thumb is: if it affects performance or the feature is required to duplicate the results, describe it. All components must be generally available within 6 months of the or iginal publication of a performance result.
3.2.1 Server Hardware
The following server hardware components must be reported:
- Vendor's name
- System model number, type and clock rate of processor, number of processors, and main memory size.
- Size and organization of primary, secondary, and other cache, per processor. If a level of cache is shared among processors in a system that should be stated in the "notes" section.
- Memory configuration if this is an end-user option which may affect performance, e.g. interleaving and access time.
- Other hardware, e.g. write caches, or other accelerators
- Number, type, model, and capacity of disk controllers and drives
- Type of file system
3.2.2 Server Software
The following server software components must be reported:
- HTTP (Web) Server software and version.
- Operating System and version.
- The values of MSL (maximum segment life) and TIME-WAIT. If TIME-WAIT is not equal to 2*MSL, that must be noted. (Reference section 4.2.2.13 of RFC 1122).
- Any other software packages used during the benchmarking process.
- Other clarifying information as required to reproduce benchmark results (e.g. number of daemons, server buffer cache size, disk striping, non-default kernel parameters, etc.), and logging mode, must be stated in the "notes" section.
3.3 Testbed Configuration
3.3.1 Network Configuration
A brief description of the network configuration used to achieve the benchmark results is required. The minimum information to be supplied is:
- Number, type, and model of network controllers
- Number and type of networks used
- Base speed of network
-
A network configuration notes section may be used to list the following
additional information:
- Number, type, model, and relationship of external network components to support server (e.g., external routers).
- Relationship of load generators, load generator type, and networks (including routers, etc. if applicable).
- Number, type, model, and relationship of external network components .
3.3.2 Load Generators
The following load generator hardware components must be reported:
- Number of load generator (client) systems
- Processes or threads concurrently generating load on each load generator
- System model number, processor type and clock rate, number of processors
- Main memory size
- Network Controller
- Operating System and Version
- Compiler and version used to compile benchmark (client code)
- Any non-default TCP or HTTP parameters
3.4 General Availability Dates
The dates of general customer availability must be listed for the major components: hardware, HTTP server, and operating system, month and year. All the system, hardware and software features are required to be available within 6 months of the date of test.
3.5 Test Sponsor
The reporting page must list the date the test was performed , month and year, the organization which performed the test and is reporting the results, and the SPEC license number of that organization.
3.6 Notes/Summary of Tuning Parameters
This section is used to document:
- System state: single or multi-user
- System tuning parameters other than default
- Process tuning parameters other than default
- Background load, if any
- ANY portability changes made to the individual benchmark source code including module name, line number of the change.
- Additional information such as compilation options may be listed
- Critical customer-identifiable firmware or option versions such as network and disk controllers
- Additional important information required to reproduce the results, which do not fit in the space allocated above must be listed here.
- If the configuration is large and complex, added information should be supplied either by a separate drawing of the configuration or by a detailed written description which is adequate to describe the system to a person who did not originally configure it.
3.7 Other Required Information
The following additional information is also required to appear on the results reporting page for SPECweb96 Release 1.0 results:
- General Availability of the System Under Test.
- The date (month/year) that the benchmark were run
- The name and location of the organization that ran the benchmark
- The SPEC license number
The following additional information may be required to be provided for SPEC's results review:
- ASCII versions of the server log file in the Common Log Format, as defined in http://www.w3.org/pub/WWW/Daemon/User/Config/Logging.html#LogFormat.
4.0 Building the SPECweb96 Release 1.0 Benchmark
SPEC provides client driver software, which includes tools for running the
benchmark and reporting its' results. This software implements various
checks for conformance with these run and reporting rules. Therefore the
SPEC software must be used except that necessary substitution of equivalent
functionality (e.g. file set generation) may be done only with prior
approval from SPEC. Any such substitution must be reviewed and deemed
"performance-neutral" by the OSSC.
You may not change this software without prior approval from SPEC. SPEC
permits minimal performance-neutral portability changes, but only with
prior approval. All changes must be reviewed and deemed
"performance-neutral" by the OSSC. Source code changes required
for standards compliance must be reported to SPEC, citing appropriate
standards documents. SPEC will consider incorporating such changes in
future releases. Whenever possible, SPEC will strive to develop and enhance
the benchmark to be standards-compliant. The portability change will be
allowed if, without the change, the:
- Benchmark code will not compile,
- Benchmark code does not execute, or,
- Benchmark code produces invalid results, and
- The changed code implements the same workload for the server in a performance neutral manner.
Special libraries may be used in conjunction with the benchmark code as
long as they do not replace routines in the benchmark source code, and they
are not "benchmark-specific".
Driver software includes C code (ANSI C) and perl scripts (perl5). SPEC
will provide prebuilt versions of perl and the driver code, or these may be
recompiled from the provided source. SPEC requires the user to provide OS
and server software to support HTTP 1.0 as described in section 2.