SPECvirt ® Datacenter 2021 FAQ

Benchmark overview

1. What is the SPECvirt ® Datacenter 2021 V1.0 benchmark?

The SPECvirt Datacenter 2021 benchmark is a software benchmark product developed by the Standard Performance Evaluation Corporation (SPEC), a non-profit group of computer vendors, system integrators, universities, research organizations, application software stacks. The benchmark is intended to be run by hardware vendors, virtualization software vendors, application software vendors, datacenter managers, and academic researchers.

2. How does the benchmark compare to the SPEC VIRT_SC ® 2013 V1.1 benchmark?

SPEC VIRT_SC 2013 is a single host benchmark and provides interesting host-level information and performance. SPEC VIRT_SC 2013 utilizes several SPEC workloads representing applications that are common targets of virtualization and server consolidation on a single host. However, most of today’s datacenters use larger, more resource-intensive workloads and require clusters for reliability, availability, serviceability, and security. The benchmark measures the efficiency of virtualization to a clustered solution for enhanced server optimization, flexibility, and application availability while reducing costs through server and datacenter consolidation.

3. What does the benchmark measure?

The benchmark presents an overall workload that achieves the maximum performance of the platform when running a set of application workloads against one or more sets of Virtual Machines called tiles. Scaling the workload on the SUT (System Under Test) consists of running an increasing number of tiles. Peak performance is the point at which the addition of another tile either fails the Quality of Service criteria or fails to improve the overall metric.

The primary goal of the benchmark is a standard method for measuring a virtualization platform’s ability to model a dynamic datacenter environment. It models typical, modern-day usage of virtualized infrastructure, such as virtual machine (VM) resource provisioning, cross-node load balancing including management operations such as VM migrations, and VM power on/off. Its multi-host environment exercises datacenter operations under load. It dynamically provisions new workload tiles by either using a VM template or powering on existing VMs. As load reaches maximum capacity of the cluster, hosts and additional load are added to the cluster to measure scheduler efficiency.

4. What is the performance metric for the benchmark?

The benchmark score for each tile is a supermetric that is the weighted geometric mean of the normalized throughput score for each workload. The overall metric is the sum of all the tiles’ scores divided by the number of hosts in the SUT. The metric is output in the format:

SPECvirt ® Datacenter-2021 <Score per Host> per host @ <# hosts> hosts.

5. Are the management operations included in the metric?

Not directly. The efficiency of management operations such as power on VM and deploy VM affect the overall score but are not directly included in the metric. Also, VM migrations are reported but are not part of metric.

6. Does the benchmark support a single server?

Not for a compliant run. The benchmark requires a SUT cluster of hosts in multiples of four (four, eight, and so on). However, for debugging and research purposes (“research mode”), you can configure the benchmark to use fewer hosts, fewer workloads, varied run times, and so on resulting in a non-compliant measurement.

7. How long does it take to run the benchmark?

The benchmark involves an initialization prior to the Measurement Interval (MI), the MI itself, and the post-MI data collection. It consists of three separate timed phases occuring sequentially over the course of three hours.

The total test time depends on the number of tiles and user pre-run scripts. For example, expect a ten-tile measurement to take 20 min for initialization, three hours for the MI, and 10 minutes for data collection (3.5 hours total).

Note that the benchmark allows the user to invoke pre-run and post-run scripts for customized data collection which may add time to the entire benchmark measurement.

8. What types of virtualization platforms do the benchmark support?

The benchmark supports Red Hat Virtualization (RHV) V4.3+ and VMware vSphere ESXi V6.7+. The SUT’s components must be generally available, documented, and supported by the vendor(s) or provider(s). We encourage users to create and measure additional virtualization platforms and their various SDKs. See Appendix A of the SPECvirt Datacenter 2021 Run and Reporting Rules for information on creating these.

Workloads

9. What is a tile?

A tile is a single unit of work that is comprised of five workloads that are driven across 12 distinct virtual machines. The load on the SUT is scaled up by configuring additional sets of the 12 instances of VMs and increasing the tile count for the benchmark.

  • Three departmental workloads which simulate the stresses of:
    • a mail server

    • a web server environment

    • a pair of collaboration servers interacting with each other

  • A database server workload based on a modified version of HammerDB that exercises an OLTP environment

  • A big data workload based on a modified version of BigBench that utilizes an Apache/Hadoop environment to execute complex database queries

10. What is a fractional tile?

If a final full tile cannot be run on the SUT, we allow running additional workloads – another mail, then another web, then another collaboration server pair – until SUT saturation. If there are still available resources, then the benchmarker can run an additional HammerDB workload. These additional workloads are called a fractional tile, and each additional workload increases the tile fraction by 0.2 tiles.

11. Can I run the benchmark in the cloud?

Yes, as long as the configuration meets the rules and requirements in SPECvirt Datacenter 2021 Run and Reporting Rules.

12. What are the limitations of the benchmark?

The benchmark is a standardized benchmark, which means that it is an abstraction of the real world. The benchmark clones the provided template as the worklaod VMs. No tuning of the guest is allowed.

13. Can I use the benchmark to determine the size of the server I need?

No. This benchmark is not intended for use in sizing or capacity planning.

14. Can I run other workload levels?

Not for compliant results. However, you can adjust the workload levels of each workload to gain measurements in research mode resulting in a non-compliant measurement.

Benchmark kit

15. How can I obtain the benchmark?

The benchmark is priced at $2500 for first-time licensees and is available via web download from the SPEC site at http://www.spec.org/order.html.

16. What is included with the benchmark?

A pre-configured template image as part of the benchmark kit must be used for all client and SUT workload VMs. No modifications can be made to the contents of the VM template. Tuning of the SUT hardware and the hypervisor is allowed.

17. What hardware is required to run the benchmark?

The SUT cluster must contain identically configured hosts in multiples of four (four, eight, 12, and so on) and used shared storage for the workload VMs. We recommend you use at least a 10 GbE network. You can find the detailed VM system resource requirements in Section 2 of the SPECvirt Datacenter 2021 User Guide.

18. What if I have a problem configuring or running the benchmark?

You can post an entry in the benchmark support forum at https://www.spec.org/forums/index.php?board=4.0. You also can send email to the mailto:virt_datacenter2021@spec.org support alias.

19. Where are the the run and reporting rules?

You can find the run and reporting rules at SPECvirt Datacenter 2021 Run and Reporting Rules.

20. What control mechanism is used to drive the workloads?

SPEC developed a measurement harness driver to coordinate running the component workloads in one or more tiles on the SUT. The harness is based on CloudPerf which runs and monitors the benchmark, collects measurement data as the measurement runs, post-processes the data at the end of the run, validates the results, and generates the full disclosure report (FDR).

21. What is an SDK?

A software development kit (SDK) is a collection of software development tools in one installable package. Hypervisor SDKs communicate with a group of APIs that allow you to implement service features programatically, such as using a script to deploy a template and power on or off VMs.

22. What is a toolset?

A toolset contains the scripts required to support hypervisor functions. The two default toolsets are included on the template VM:

  • Python for RHV

  • Perl for vSphere

You can create a toolset to support additional hypervisors and their various SDKs. See Appendix A of the SPECvirt Datacenter 2021 Run and Reporting Rules for rules and requirements.

23. What skills do I need to run the benchmark?

You need to have familiarity with virtualization concepts and implementations. You need deployment and management expertise with your selected hypervisor (Red Hat® Virtualization (RHV) and/or VMware vSphere®). You need to know how to allocate system resources to clusters and VMs including setting up and adding physical and virtual networks as well as setting up shared storage pools / datastores using the hypervisor’s management server. See Section 2 of the SPECvirt Datacenter 2021 User Guide for details.

Results submission, review, and result announcement

24. What is a compliant result of the benchmark?

A compliant benchmark result meets all the requirements of the run rules for a valid result. In addition to the run and reporting rules, several validation and tolerance checks are built-in to the benchmark. If you intend to use these metrics publicly, the result must be compliant, and SPEC must review and accept it.

25. When and where are the benchmark results available?

Submissions are posted to https://www.spec.org/virt_datacenter2021/results/.

26. Can I report results for an open source hypervisor?

Only if the virtualization solution is supported according to the SPECvirt Datacenter 2021 Run and Reporting Rules.

27. Are the results independently audited?

No. SPEC must review and accept all publicly disclosed results.

28. How can I submit benchmark results?

Only SPECvirt Datacenter 2021 license holders can submit results. SPEC member companies can submit results free of charge, and non-members may submit results for an additional fee. All results are subject to a two-week review by SPEC virtualization committee members. First-time submitters should contact SPEC’s administrative office for guidance if needed.

Submissions must include both the raw output file and configuration information required by the benchmark. During the review process, other information may be requested by the subcommittee. You can find submission requirements in the SPECvirt Datacenter 2021 Run and Reporting Rules.

29. Can I announce my results before they are reviewed by the SPEC subcommittee?

No. SPEC must review and accept the result before it can be announced publicly.

30. Are results sensitive to components outside of the SUT – e.g. client driver machines?

Yes, the client driver machines must be configured properly to accommodate the workloads. You may use one or more physical systems for client load drivers, and clients are virtualized. See Section 2 of the SPECvirt Datacenter 2021 User Guide for more information regarding hardware and software requirements for the clients.

31. Does the benchmark have a power measurement component?

Not at this time.

Relationship to other benchmarks

32. Can I compare the results of the SPECvirt Datacenter 2021 workloads to the results of the benchmarks from which they were derived? For example, can I compare an open source HammerDB result to the result of the SPECvirt Datacenter 2021 HammerDB component?

No. Several substantive changes have been made that make the workloads unique.

33. Can I compare the benchmark with other virtualization benchmarks?

No. The benchmark is unique and not comparable to other benchmarks including SPEC VIRT_SC.