SPECvirt_sc2010 Frequently Asked Questions

Version 1.01

1. What is SPECvirt_sc2010?

SPECvirt_sc2010 is a software benchmark product developed by the Standard Performance Evaluation Corporation (SPEC), a non-profit group of computer vendors, system integrators, universities, research organizations, publishers, and consultants. SPECvirt_sc2010 is the first generation SPEC benchmark for evaluating the performance of datacenter servers used for virtualized server consolidation. SPECvirt has been implemented as a standardized end-to-end benchmark designed to stress all layers of a system handling a workload representative of server consolidation. Performance critical components include the server hardware, the virtualization technology, the guest (VM) operating systems, and the guest application software stacks. The benchmark is intended to be run by hardware vendors, virtualization software vendors, application software vendors, datacenter managers, and academic researchers.

2. Why did SPEC develop SPECvirt_sc2010?

The goals for SPEC's first virtualization benchmark included:

Rather than offering a single benchmark workload that attempts to approximate the breadth of consolidated virtualized server characteristics found today, SPECvirt_sc2010 uses a three-workload benchmark design: a webserver, Java application server, and an IMAP server. These three workloads are derived from SPECweb2005, SPECjAppServer2004, and SPECmail2008. The SPECvirt_sc2010 harness running on the client side controls the workloads and implements the methodology and techniques from SPECpower_ssj2008 for power measurement.

3. What does SPECvirt_sc2010 measure?

The benchmark presents an overall workload that achieves the maximum performance of the platform when running one or more sets of Virtual Machines called “tiles.” Scaling the workload on the SUT consists of running an increasing number of tiles. Peak performance is the point at which the addition of another tile either fails the Quality of Service criteria or fails to improve the overall metric.

The benchmarker has the option of running with power monitoring enabled and can submit results to either the performance with SUT power category and/or performance with Server only power category.

4. What kinds of workloads are used in the SPECvirt_sc2010 benchmark?

The suite consists of several SPEC workloads that represent applications that industry surveys report to be common targets of virtualization and server consolidation. We modified each of these standard workloads to match a typical server consolidation scenario's resource requirements for CPU, memory, disk I/O, and network utilization for each workload. The SPEC workloads used are modified versions of SPECweb2005, SPECjAppServer2004, and SPECmail2008.

5. When and where will SPECvirt_sc2010 results be available?

Initial SPECvirt_sc2010 results are available on SPEC's web site. Subsequent results are posted on an ongoing basis following each two-week review cycle: results submitted by the two-week deadline are reviewed by SPECvirt committee members for conformance to the run rules, and if accepted at the end of that period are then publicly released. Results disclosures are at this URL: http://www.spec.org/virt_sc2010/results.

6. What are the limitations of SPECvirt_sc2010?

SPECvirt_sc2010 is a standardized benchmark, which means that it is an abstraction of the real world. For example all the database servers can use the same database archive to restore their copy of the database. This helps reduce the complexity of setting up the text.

7. Can I use SPECvirt_sc2010 to determine the size of the server I need?

SPECvirt_sc2010 results are not intended for use in sizing or capacity planning.

8. What is a tile?

A tile is a logical grouping of one of each kind of VM used within SPECvirt. For SPECvirt_sc2010, a tile consists of one Web Server, Mail Server, Application Server, Database Server, Infrastructure Server, and Idle Server. A valid SPECvirt_sc2010 benchmark result is achieved by correctly executing the benchmark workloads on one or more tiles.

9. What is a fractional tile?

When the SUT does not have sufficient system resources to support the full load of an additional tile, the benchmark offers the use of a fractional load tile. A fractional tile consists of an entire tile with all six VMs but running at a reduced percentage of its full load.

10. How can I obtain the benchmark?

SPECvirt_sc2010 is available via web download from the SPEC site at $3000 for new licensees and $1500 for academic and eligible non-profit organizations. The order form is at: http://www.spec.org/order.html.

11. What is included with SPECvirt_sc2010?

The benchmark includes the code necessary to run the driver system(s), the server-side file set generation tools, and dynamic content implementations. It is at the tester's discretion to choose the application stack.

The SPECvirt_sc2010 kit contains:

12. What hardware is required to run the benchmark?

See the Run Rules and the User's Guide for more detailed information and requirements.

13. What if I have a problem configuring or running the SPECvirt_sc2010 benchmark?

You can find more information on how to set up and run the benchmark in the User's Guide and the Client Harness User's Guide. If your issue is not resolved in these documents, please send email to bring it to the attention of the SPEC Virtualization subcommittee.

14. How can I submit SPECvirt_sc2010 results?

Only SPECvirt_sc2010 licensees can submit results. SPEC member companies submit results free of charge. Non-members may submit results for an additional fee. All results are subject to a two-week review by SPECvirt subcommittee members. Non-member submissions are also subject to a preliminary review. If they pass preliminary review, they may be submitted for the standard member review, and barring any issues will be published by SPEC upon payment of a fee. First-time submitters should contact SPEC's administrative office.

SPECvirt_sc2010 submissions must include both the raw output file and configuration information required by the benchmark. During the review process, other information may be requested by the subcommittee. You can find submission requirements in the run rules.

15. Where are the SPECvirt_sc2010 run rules?

The current version of the run rules can be found at http://www.spec.org/virt_sc2010/docs/SPECvirt_RunRules.html.

Note the following clarification to SPECvirt_sc2010 Release 1.0 Run and Reporting Rules v1.02, Section 2.2 Workload VMs, subsection: Infrastructure VM which states:

"The Infrastructure VM has the same requirements as the Web Server VM in its role as a web back-end (BeSim) for the web workload."

However, the current implementation allows HTTP 1.0 requests be sent from the webserver to the httpd on the infraserver. The php code allows for non-persistent connections to the BeSim backend implementations that would otherwise not handle a persistent connection (for example fast-cgi on pre-fork Apache httpd). This is controlled by the flag BESIM_PERSISTENT in SPECweb/Test.conf and by default is set to 0 (use HTTP 1.0 non-persistent connections to besim).

16. Where can I go for more information?

The SPECvirt_sc2010 Design Document contains design information on the benchmark and workloads. The Run and Reporting Rules, the User's Guide, and the Client Harness User's Guide contain instructions for installing and running the benchmark. See: http://www.spec.org/osg/virtualization for the available information on SPECvirt_sc2010.

17. What control mechanism is used to drive the workloads?

SPEC developed a test harness driver to coordinate running the component workloads in one or more tiles on the SUT. The harness allows you to run and monitor the benchmark, collects measurement data as the test runs, post-processes the data at the end of the run, validates the results, and generates the test report.

18. What is the performance metric for SPECvirt_sc2010?

The benchmark supports three categories of results, each with its own primary metric. Results may be compared only within a given category; however, the benchmarker has the option of submitting results from a given test to one or more categories. The first category is Performance-Only and its metric is SPECvirt_sc2010 which is expressed as "SPECvirt_sc2010 @ <6*Number_of_Tiles> VMs" on the reporting page. SPECvirt_sc2010_PPW (performance with SUT power) and SPECvirt_sc2010_ServerPPW (performance with Server only power) are performance per watt metrics obtained by dividing the peak performance by the peak power of the SUT or Server, respectively, during the run measurement phase.

19. Does the benchmark support multiple servers?

No. Currently the benchmark is designed for a single host system.

20. Can I report results for open source software?

Yes, you can use open source products when running the benchmark as long as you comply with open source requirements specified in the Run Rules.

21. Are the results independently audited?

The SPEC Virtualization subcommittee reviews all results but does not require that they be independently audited.

22. Can I announce my results before they are reviewed by the SPEC subcommittee?

No. SPEC must review and accept the result before it can be announced publicly.

23. Are results sensitive to components outside of the SUT -- e.g. client driver machines?

Yes, the client driver machines must be configured properly to accommodate the workloads. You also have the option to drive multiple tiles with one client if the client is well-configured. See the User's Guide for more information regarding hardware and software requirements for the clients.

24. Does SPECvirt_sc2010 have a power measurement component associated with the benchmark?

SPECvirt_sc2010 implements the SPECpower methodology for power measurement. The benchmarker has the option of running with power monitoring enabled and can submit results to any of three categories: * performance only (SPECvirt_sc2010) * performance/power for the SUT (SPECvirt_sc2010_PPW) * performance/power for the Server-only (SPECvirt_sc2010_ServerPPW)

You can find more information on power measurement in the Client Harness User's Guide and Run Rules.

25. Can I compare the results of SPECvirt_sc2010 workloads to the results of the SPEC benchmarks from which they were derived? For example, can I compare a SPECweb2005 result to the result of the SPECvirt_sc2010 webserver component?

No, they are not. Several substantive changes have been made that make the SPECvirt_sc2010 workloads unique.

26. Can I compare SPECvirt_sc2010 results in different categories?

No. Results between the different SPECvirt_sc2010 categories cannot be compared.

27. Can I compare SPECvirt_sc2010 with other virtualization benchmarks?

No. SPECvirt_sc2010 is unique and not comparable to other benchmarks.

28. What is a "compliant" result of SPECvirt_sc2010?

A compliant benchmark result meets all the requirements of the SPECvirt_sc2010 run rules for a valid result. In addition to the run and reporting rules, several validation and tolerance checks are built-in to the benchmark. If you intend to publicly use the SPECvirt_sc2010 metrics, the result must be compliant and accepted by SPEC.

29. Can I run other workload levels?

Yes, for non-compliant runs only. You may set the load level for each or all workloads to be heavier or lighter as your needs dictate. You can set these load levels by changing parameters in the Control.config file and possibly each workload’s configuration file.

30. How long does it take to run the benchmark?

The run time is approximately three hours with default settings.

31. How can I use the benchmark to research performance related to a specific component of the benchmark such as the memory, storage, hypervisor, or the application server VM?

SPECvirt has been implemented as a standardized end-to-end benchmark designed to stress all layers of a system that handles a workload representative of server consolidation. Performance critical components include the server hardware (Processors, Memory, Network, Storage, etc.), the virtualization technology (hardware virtualization, operating system virtualization, and hardware partitioning), the guest (VM) operating systems, and the guest application software stacks. Selection and tuning of any of these components can have significant effects on the overall performance of the system under test.

The best way to differentiate the performance characteristics of different versions or products for a specific element of a system is to hold all other elements constant and change only component you are interested in. For example if you want to see the effects of RAID 5 vs RAID 10, then keep the other elements of the server, virtualization products, and the guest VMs the same and install copies of the VMs on the RAID 5 storage and RAID 10 storage while keeping other storage elements such as number of LUNs the same and run your tests. Similarly if you want to compare versions of hypervisors, then you need to keep the rest of the platform constant. If you change other elements such as the software running on the VMs, it can significantly impact the overall results.

32. What types of virtualization platforms are supported by SPECvirt_sc2010?

SPECvirt_sc2010 supports hardware virtualization, operating system virtualization, and hardware partitioning. The benchmark does not address multiple host performance or application virtualization.