SPECsip_Infrastructure2011 Run and Reporting Rules

Revision Date: July 20th, 2011
Table of Contents

1 Introduction

This document specifies the guidelines for how SPECsip_Infrastructure2011 is to be run for measuring and publicly reporting performance results. These rules abide by the norms laid down by the SPEC SIP Subcommittee and approved by the SPEC Open Systems Steering Committee. They ensure that results generated with this suite are meaningful, comparable to other generated results, and are repeatable (with documentation covering factors pertinent to duplicating the results). Per the SPEC license agreement, all results publicly disclosed must adhere to these Run and Reporting Rules.

1.1 Philosophy

SPEC believes the user community will benefit from an objective series of tests, which can serve as common reference and be considered as part of an evaluation process. SPEC is aware of the importance of optimizations in producing the best system performance. SPEC is also aware that it is sometimes hard to draw an exact line between legitimate optimizations that happen to benefit SPEC benchmarks and optimizations that specifically target the SPEC benchmarks. However, with the list below, SPEC wants to increase awareness of implementers and end users to issues of unwanted benchmark-specific optimizations that would be incompatible with SPEC's goal of fair benchmarking. SPEC expects that any public use of results from this benchmark suite shall be for Systems Under Test (SUTs) and configurations that are appropriate for public consumption and comparison. Thus, it is also required that: To ensure that results are relevant to end-users, SPEC expects that the hardware and software implementations used for running the SPEC benchmarks adhere to following conventions:

1.2 Caveat

SPEC reserves the right to investigate any case where it appears that these guidelines and the associated benchmark run and reporting rules have not been followed for a published SPEC benchmark result. SPEC may request that the result be withdrawn from the public forum in which it appears and that the benchmarker correct any deficiency in product or process before submitting or publishing future results. SPEC reserves the right to adapt the benchmark codes, workloads, and rules of SPECsip_Infrastructure2011 as deemed necessary to preserve the goal of fair benchmarking. SPEC will notify members and licensees if changes are made to the benchmark and will rename the metrics (e.g. from SPEC SPECsip_Infrastructure2011 to SPECsip_Infrastructure2011a). Relevant standards are cited in these run rules as URL references, and are current as of the date of publication. Changes or updates to these referenced documents or URL's may necessitate repairs to the links and/or amendment of the run rules. The most current run rules will be available at the SPEC web site at http://www.spec.org. SPEC will notify members and licensees whenever it makes changes to the suite. To help assure that these principles are met, any organization or individual who makes public use of SPEC benchmark results must do so in accordance with the SPEC Fair Use Rule, as posted at http://www.spec.org/fairuse.html. In the case where it appears that these guidelines have not been adhered to, SPEC may investigate and request that the published material be corrected.

2 Run Rules for the SPECsip_Infrastructure2011 Benchmark

The production of compliant SPECsip_Infrastructure2011 test results requires that the tests be run in accordance with these run rules. These rules relate to the requirements for the System Under Test (SUT) and the testbed (i.e. SUT, clients, and network), including applicable protocols or other standards, operation, configuration, test staging, optimizations and measurement.

2.1 Protocols

As the Session Initiation Protocol (SIP) is defined by its interoperative protocol definitions, SPECsip_Infrastructure2011 requires adherence to the relevant protocol standards. It is expected that the SIP server is SIP 2.0 compliant. The benchmark environment shall be governed by the following standards: For further explanation of these protocols, the following might be helpful: The current text of all IETF RFC's may be obtained from: http://ietf.org/rfc.html All marketed standards that a software product states as being adhered to must have passed the relevant test suits used to ensure compliance with the standards.

2.2 General Availability

The entire testbed (SUT, clients, and network) must be comprised of components that are generally available, or shall be generally available within three months of the first publication of these results. Products are considered generally available if they are orderable by ordinary customers and ship within a reasonable time frame. This time frame is a function of the product size and classification and common practice. Some limited quantity of the product must have shipped on or before the close of the stated availability window. Shipped products do not have to match the tested configuration in terms of CPU count, memory size, and disk count or size, but the tested configuration must be available to ordinary customers. The availability of support and documentation of the products must be coincident with the release of the products. Hardware products that are still supported by their original or primary vendor may be used if their original general availability date was within the last five years. The five-year limit is waived for hardware used in client systems. Software products that are still supported by their original or primary vendor may be used if their original general availability date was within the last three years. Community supported (open source) software products have more complex requirements; please see Section 3.5.7. Information must be provided in the disclosure to identify any component that is no longer orderable by ordinary customers.

2.3 Stable Storage

The SUT must utilize stable storage for the application specific reason. Application area systems are expected to safely store any application object it has accepted until application disposition of object. To do this, Application area systems must be able to recover the application objects without loss from multiple power failures (including cascading power failures), operating system failures, and hardware failures of components (e.g. CPU) other than the storage medium itself (e.g. disk, non-volatile RAM). At any point where the data can be cached, after the server has accepted the message and acknowledged its receipt, there must be a mechanism to ensure any cached message survives the server failure. If an uninterruptible power system (UPS) is required by the SUT to meet the stable storage requirement, the benchmarker is not required to perform the test with an UPS in place. The benchmarker must state in the disclosure that an uninterruptible power system (UPS) is required. Supplying a model number of an appropriate UPS is encouraged but not required. If a battery-backed component is used to meet the stable storage requirement, that battery must have sufficient power to maintain the data for at least 48 hours to allow any cached data to be committed to media and the system to be gracefully shut down. The system or component must also be able to detect a low battery condition and prevent the use of the component or provide for a graceful system shutdown.

2.4 Single Logical Server

The SUT must present to application area clients the appearance and behavior of a single logical server for each protocol. Specifically, the SUT must present a single system view, in that the results of any application area transaction from a client that change the state on the SUT must be visible to any/all other clients on any subsequent application area transaction. For example, transaction state created by an INVITE can be modified by a subsequent re-INVITE or CANCEL operation, even if that operation originates from a different client than sent the original INVITE. For this reason, the benchmark requires the SUT to expose a single IP address. All components in a SUT will need to be described in a submission disclosure (Section 4 below).

2.5 Application Logging

For a run to be valid, the following attributes related to logging must hold true:

2.6 Initializing the SUT for running the Benchmark

To make an official SPECsip_Infrastructure2011 test run, the benchmarker must perform the following steps:

2.7 Running the Benchmark

For statistical reasons, SPECsipInfrastructure _2011 requires a minimum of 20,000 supported subscribers to ensure a valid run. Any run fewer than 20,000 supported subscribers is invalid. The benchmark consists of several periods:

2.8 Optimization

Benchmark specific optimization is not allowed. Any optimization of either the configuration or software used on the SUT must improve performance for a larger class of workloads than that defined by this benchmark and must be supported and recommended by the provider. Optimizations that take advantage of the benchmarks specific features are forbidden. Examples of inappropriate optimization could include, but are not limited to:

2.9 Measurement

The provided SPECsip_Infrastructure2011 tools (e.g., binaries, JAR files, perl scripts) must be unmodified and used to run and produce measured SPECsip_Infrastructure2011 results. The SPECsip_Infrastructure2011 metric is a function of the workload, the associated benchmark specific working set, and the defined benchmark specific criteria. SPECsip_Infrastructure2011 results are not comparable to any other application area performance metric.

2.10 Metric

SPECsip_Infrastructure2011 expresses performance in terms of simultaneous number of supported subscribers. The definition of this metric is provided in detail in the Design Document.

2.11 Workload

The SPECsip_Infrastructure2011 workload is described in detail in the Design Document.

2.12 Quality of Service Criteria

The SPECsip_Infrastructure2011 benchmark has specific Quality of Service (QoS) criteria for response times, delivery times and error rates. These criteria are specified in the Design Document and checked for by the benchmark tools.

2.13 Load Generators

The SPECsip_Infrastructure2011 benchmark requires the use of one or more client systems to act as load generators. One client system is designated the harness client and this system will be the one on which the command that initiates the benchmark run. Clients must be instruction-set compatible. Please refer to the User Guide and Design Document for more detail on these roles. A server component of the SUT may not be used as a load generator when testing to produce valid SPECsip_Infrastructure2011 results. A server component may be used as the prime client, but this is not recommended. In order to run the benchmark tools, the client systems must include any requirements such as software versions or other configuration requirements.

2.14 SPECsip_Infrastructure2011 Parameters

The SPECsip_Infrastructure2011 User's Guide provides detailed documentation on what parameters are available to the user for modification.

3 Reporting Rules

In order to publicly disclose SPECsip_Infrastructure2011 results, the benchmarker must adhere to these reporting rules in addition to having followed the run rules above. The goal of the reporting rules is to ensure the SUT and testbed are sufficiently documented such that someone could reproduce the test and its results.

3.0.1 Publication

SPEC requires that each licensee test location (city, state/province and country) measure and submit a single compliant result for review, and have that result accepted, before publicly disclosing or representing as compliant any SPECsip_Infrastructure2011 result. Only after acceptance of a compliant result from that test location by the subcommittee may the licensee publicly disclose any future SPECsip_Infrastructure2011 result produced at that location in compliance with these run and reporting rules, without acceptance by the SPECs SIP subcommittee. The intent of this requirement is that the licensee test location demonstrates the ability to produce a compliant result before publicly disclosing additional results without review by the subcommittee. SPEC encourages the submission of results for review by the relevant subcommittee and subsequent publication on SPEC's web site. Licensees, who have met the requirements stated above, may publish compliant results independently; however, any SPEC member may request a full disclosure report for that result and the test sponsor must comply within 10 business days. Issues raised concerning a result's compliance to the run and reporting rules will be taken up by the relevant subcommittee regardless of whether or not the result was formally submitted to SPEC.

3.1 Metrics And Results Reports

The benchmark single figure of merit, SPECsip_Infrastructure2011 Number of Supported Subscribers, is the number of users supported by the system while satisfying the appropriate QoS requirements as described in the Design Document. A complete benchmark result is comprised of benchmark specific description, shown on the results reporting page. A detailed breakdown of each test is included on the reporting page. The report of results for the SPECsip_Infrastructure2011 benchmark is generated in HTML by the provided SPEC tools. These tools may not be changed, except for portability reasons with prior consent from the SPEC SIP SubCommittee. The tools perform error checking and will flag some error conditions as resulting in an "invalid run". However, these automatic checks are only there for debugging convenience, and do not relieve the benchmarker of the responsibility to check the results and follow the run and reporting rules. The section of the output.raw file that contains actual test measurement must not be altered. Corrections to the SUT descriptions may be made as needed to produce a properly documented disclosure.

3.2 Fair Use of SPECsip_Infrastructure2011 Results

Consistency and fairness are guiding principles for SPEC. To help assure that these principles are met, any organization or individual who makes public use of SPEC benchmark results must do so in accordance with the SPEC Fair Use Rule, as posted at http://www.spec.org/fairuse.html

3.3 Research and Academic usage of SPECsip_Infrastructure2011

Please consult the SPEC FairUse Rule on Research and Academic Usage at http://www.spec.org/fairuse.html#Academic

3.4 Categorization of Results

SPECsip_Infrastructure2011 will be categorized into single and multiple node results, where the terms single and multiple nodes are as defined in this section. Multiple nodes are again defined to be of two types, homogeneous and heterogeneous. Moreover, for multiple submissions involving homogeneous nodes, the subcommittee will also require a submission on a corresponding single-node platform (see details in the following paragraphs). A Single Node Platform for SPECsip_Infrastructure2011 consists of one or more processors executing a single instance of an OS and one or more instances of the same SIP server software. Externally attached storage for software may be used; all other performance critical operations must be performed within the single server node. A single common set of NICs must be used to relay all SIP traffic. Example: A Homogeneous Multi Node Platform for SPECsip_Infrastructure2011 consists of two or more electrically equivalent single Node Servers in a single chassis or connected through a shared bus. Each node contains the same number and type of processing units and devices and each node executes a single instance of an OS and one or more instances of the same SIP server software. Storage may be duplicated or shared. All incoming requests from the test harness must be load balanced by either by a single node that receives all incoming requests and balances the load across the other nodes (A) or by a separate load balancing appliance that serves that function (B). Each node must contain a single common set of NICs that must be used across all 3 workloads to relay all SIP traffic. If a separate load balancing appliance is used it must be included in the SUT's definition. A Heterogeneous/Solution Platform for SPECsip_Infrastructure2011 consists of any combination of server nodes and appliances that have been networked together to provide all the performance critical functions measured by the benchmark. All incoming requests from the test harness must be load balanced by either a single node that receives all incoming requests and balances the load across the other nodes or by a separate load balancing appliance that serves that function. Electrical equivalence between server nodes is not required. Storage may be duplicated or shared. Additional appliances that provide performance critical operations such as intelligent switches. All nodes and appliances used must be included in the SUT's definition. Examples: C & D.

3.5 Testbed Configuration

The system configuration information that is required to duplicate published performance results must be reported. This list is not intended to be all-inclusive, nor is each performance neutral feature in the list required to be described. The rule of if it affects performance or the feature is required to duplicate the results, describe it. Any deviations from the standard, default configuration for the SUT must be documented so an independent party would be able to reproduce the result without further assistance. For most of the following configuration details, there is an entry in the configuration file, and a corresponding entry in the tool-generated HTML result page. If information needs to be included that does not fit into these entries, the Notes sections must be used.
3.5.1 SUT Hardware
The SUT hardware configuration must not be changed between workload runs. However, not all hardware used in one workload is required to be used in another. In the case where multiple controllers are used for one workload, the same controllers must be electronically connected, and some subset of those controllers must be used, for the other workloads. The following SUT hardware components must be reported: The documentation of the hardware for a result in the Heterogeneous/Platform category must also include a diagram of the configuration.
3.5.2 SUT Software
The following SUT software components must be reported:
2.5.3 Network Configuration
A brief description of the network configuration used to achieve the benchmark results is required. The minimum information to be supplied is:
3.5.4 Client Workload Generators
The following load generator hardware components must be reported:
3.5.5 Configuration Diagram (if applicable)
A Configuration Diagram of the SUT must be provided in a common graphics format (e.g. .png, .jpeg, .gif). This will be included in the html formatted results page. An example would be a line drawing that provides a pictorial representation of the SUT including the network connections between clients, server nodes, switches and the storage hierarchy and any other complexities of the SUT that can best be described graphically.
3.5.6 General Availability Dates
The dates of general customer availability must be listed for the major components: hardware, server software, and operating system, by month and year. All the system, hardware and software features are required to be available within three months of the first publication of these results. With multiple components having different availability dates, the latest availability date must be listed. If pre-release hardware or software is tested, then the test sponsor represents that the performance measured is generally representative of the performance to be expected on the same configuration of the release system. If the sponsor later finds the performance to be lower than 5% of that reported for the pre-release system, then the sponsor shall resubmit a corrected test result. Hardware products that are still supported by their original or primary vendor may be used if their original general availability date was within the last five years. The five-year limit is waived for hardware used in client systems. Software products that are still supported by their original or primary vendor may be used if their original general availability date was within the last three years. In the disclosure, the benchmarker must identify any component that can no longer be ordered by ordinary customers. If pre-release hardware or software is tested, then the test sponsor represents that the performance measured is generally representative of the performance to be expected on the same configuration of the release system. If the sponsor later finds the performance to be lower than 5% of that reported for the pre-release system, then the sponsor shall resubmit a new corrected test result.
3.5.7 Rules on Community Supported Applications
In addition to the requirements stated in OSG Policy Document ( http://www.spec.org/osg/policy.html), the following guidelines will apply for a submissions that relies on Community Supported Applications. SPECsip_Infrastructure2011 does permit Community Supported Applications outside of a commercial distribution or support contract which meet the following guidelines. The following 8 items are the rules that govern the admissibility of any Community Supported Application executed on the SUT in the context of a benchmark run or implementation:
  1. Open Source operating systems or hypervisors would still require a commercial distribution and support. The following rules do not apply to Operating Systems used in the publication.
  2. Only a "stable" release can be used in the benchmark environment; “non-stable" releases (alpha, beta, or release candidates) cannot be used. A stable release must be unmodified source code or binaries as downloaded from the Community Supported site. A "stable" release is one that is clearly denoted as a stable release or a release that is available and recommended for general use. It must be a release that is not on the development fork, not designated as an alpha, beta, test, preliminary, pre-released, prototype, release-candidate, or any other terms that indicate that it may not be suitable for general use. The 3 month General Availability window (outlined above) does not apply to Community Supported Applications, since volunteer resources make predictable future release dates unlikely.
  3. The initial "stable" release of the application must be a minimum of 12 months old. Reason: This helps ensure that the software has real application to the intended user base and is not a benchmark special that's put out with a benchmark result and only available for the first three months to meet SPEC's forward availability window.
  4. At least two additional stable releases (major, minor, or bug fix) must have been completed, announced and shipped beyond the initial stable release. Reason: This helps establish a track record for the project and shows that it is actively maintained.
  5. The application must use a standard open source license such as one of those listed at http://www.opensource.org/licenses/.
  6. The "stable" release used in the actual test run must be the current stable release at the time the test result is run or the prior "stable" release if the superseding/current "stable" release will be less than 3 months old at the time the result is made public.
  7. The "stable" release used in the actual test run must be no older than 18 months. If there has not been a "stable" release within 18 months, then the open source project may no longer be active and as such may no longer meet these requirements. An exception may be made for mature projects (see below).
  8. In rare cases, open source projects may reach maturity where the software requires little or no maintenance and there may no longer be active development. If it can be demonstrated that the software is still in general use and recommended either by commercial organizations or active open source projects or user forums and the source code for the software is less than 20,000 lines, then a request can be made to the subcommittee to grant this software mature status. This status may be reviewed semi-annually. An example of a mature project would be the FastCGI library.
3.5.8 Test Sponsor
The reporting page must list:
3.5.9 Disclosure Notes
The Notes section is used to document: In general, any changes or tuning to the system should be documented in order to support reproducibility.

3.6 Log File Review

The following additional information may be required to be provided for SPEC's results review: The submitter is required to keep the entire log file from both the SUT for the duration of the review period.

4 Submission Requirements for SPECsip_Infrastructure2011

Once you have a valid run and wish to submit it to SPEC for compliance review, you will need to provide the following: Once you have the submission ready, please email SPECsip_Infrastructure2011 to subsipinf2011@spec.org. Retain the following for possible request during the review: SPEC requires the submission of results for review by the SPEC SIP subcommittee and subsequent publication on SPEC's web site. Estimates are not allowed.

5 The SPECsip_Infrastructure2011 Benchmark Kit

SPEC provides client driver software, which includes tools for running the benchmark and reporting its results. This client driver is written in Java; precompiled class files are included with the kit, so no build step is necessary. This software implements various checks for conformance with these run and reporting rules. Therefore the SPEC software must be used. The kit also includes Java code for the user database generation, C code for SIPp, and perl code for post-processing the results output.