The SPEC Cloud® IaaS 2016 Benchmark is an industry standard benchmark to measure the performance of infrastructure-as-a-service (IaaS) cloud implementations. It comprises a benchmark testing tool, workloads, run rules, configuration requirements, test procedures, data collection, validation, metric definition, reporting requirements, peer review, all determined by multi-vendor consensus to give representative, comparable, vendor neutral, accurate, and reproducible results. It is designed to stress and measure the provisioning as well as runtime aspects of an IaaS cloud.
The benchmark suite is targeted for use by cloud providers, cloud consumers, hardware vendors, virtualization software vendors, application software vendors, and academic researchers.
The primary goal is to provide metrics that not only quantify the relative performance and capacities of a cloud, but also how typical cloud application workloads behave as the underlying cloud resources are stretched and approach full capacity. The primary goals and objectives of the benchmark are:
It does not cover Platform as a Service (PaaS) nor does it cover Software as a Service (SaaS) performance measurements.
The following terms are commonly used in this FAQ and are defined on the glossary page:
The SUT includes one or more cloud services under test. This includes all hardware, network, base software and management systems used for the cloud service. It does not include any client(s) or driver(s) necessary to generate the cloud workload, nor the network connections between the driver(s) and SUT. The actual set of SUT constituent pieces differs based on the relationship between the SUT and the tester.
SPEC Cloud IaaS 2016 is designed to measure the performance of an IaaS cloud running real workloads.
Yes.
No, not at this time.
The benchmark measures the scalability, elasticity, and provisioning time. The elasticity and provisioning time are directly comparable.
This benchmark has been tested on the following cloud platforms:
In theory, a cloud is infinitely scalable. The benchmark is designed so that it can measure the infinite scale of a cloud. If no issues occur during infinite scale, the benchmark can be made to run forever.
However, in reality, the cloud is partitioned into data centers, and zones. So a cloud provider, public or private, does have an upper limit in terms of hardware and network resources.
If a third party is running SPEC Cloud IaaS 2016 on a public cloud, obviously they are limited by the monetary resources they can spend on cloud, that is (typically) the number of instances that can be created. In SPEC Cloud IaaS 2016, the maximum number of application instances that can be created can be specified. This helps manage the monetary resources a tester may spend while benchmarking the cloud.
If a public cloud provider were to test their cloud themselves, they are obviously not limited by how much they have to spend.
SPEC Cloud IaaS 2016 stresses control and data plane of a cloud by creating multiple instances (minimum 26 instances - 4 application instances). The benchmark cannot be used to measure the performance of a single instance.
However, when hundreds or thousands of instances exist in your cloud, and are running customer workloads, is the performance of that instance going to be the same?
Cloud, by definition, is elastic and scalable (across CPU, memory, disk, network, control plane, data plane, etc). The SPEC Cloud benchmark is designed to measure the scalability and elasticity of an IaaS cloud by running workloads that span multiple instances (application instances - AI), and that stress CPU, memory, disk, and network of instances and cloud.
The benchmark reports three primary metrics, namely, ‘Scalability’,’Elasticity’, ‘Mean Instance Provisioning time’. The primary metrics are described below.
Scalability measures the total amount of work performed by application instances running in a cloud relative to a reference platform.
The total work performed by the benchmark is an aggregate of key workloads metrics across all application instances running in a cloud normalized by workload metrics in a reference cloud platform. The reference platform metrics are an average of workload metrics measured across multiple cloud platforms during the benchmark development. The application instances are launched according to a probability distribution, and gradually increase the load on the cloud. Each application instance comprises YCSB/Cassandra or KMeans/Hadoop workload.
Elasticity measures whether the work performed by application instances scales linearly in a cloud. That is, for statistically similar work, the performance of N application instances in a cloud must be the same as the performance of application instances during baseline phase when no other load is introduced by the tester. Elasticity is expressed as a percentage (out of 100). The higher, the better. Elasticity is self-referential.
Mean Instance Provisioning time measures the average time from the initial request to getting ready to accept ssh connections.
a) Vendor A had a score of 10.5 @ 5 AIs while vendor B had a score of 9 @ 6 AIs and both had the same elasticity of 90%. Both had mean instance provisioning time of 100s.
Clearly, vendor A achieved higher scalability score and can publicize as such.
b) Vendor A had a score of 10.5 @ 5AIs while vendor B had a score of 9 @ 6AIs. Vendor A’s elasticity is 85% while vendor B’s elasticity is 90%. Both had mean instance provisioning time of 100s.
Vendor B was able to do more work (6 AIs vs 5 AIs) while maintaining better elasticity (consistent performance). So while vendor A may publicize a higher scalability score, vendor B will claim consistency in performance due to higher elasticity score.
c) Vendor A had a score of 10.5 @ 5AIs, vendor B had a score of 9 @ 6AIs, and vendor C had a score of 6 @ 4 AIs. Vendor A’s elasticity is 85%, vendor B’s elasticity is 90%, vendor C’s elasticity is 80%. vendor A and B had mean instance provisioning time of 100s, while vendor C had a mean instance provisioning time of 50s.
Vendor C had a lower score of scalability and elasticity than vendor A and B, but its mean instance provisioning time was better than vendor vendor A and B. If provisioning speed is important to a customer, but does not care about running large workloads, they may be inclined to dig deeper into vendor C’ offering.
d) Vendor A has a scalability score of 10 @ 5 AIs and vendor B has a score of 30 @ 20 AIs.
Vendor A can say that its application instances do more work than vendor B’s although vendor B achieves a higher scalability score. On the other hand, vendor B can say it achieves the best scale. For a customer whom high scale is important, they may be interested in knowing more about vendor B’s offering.
Either vendor A or B can be a whitebox or a blackbox cloud.
SPEC has identified multiple workload classifications already used in current cloud computing services. From this list, SPEC has selected I/O and CPU intensive workloads for the initial benchmark. Within the wide range of I/O and CPU intensive workloads, SPEC selected a social media, NoSQL database transaction workload (YCSB/Cassandra) and K-Means clustering workload using Hadoop (Hibench/KMeans).
Each workload runs in multiple instances, referred to as an application instance. The benchmark instantiates a single application instance during the baseline phase and creates multiple application instances during the elasticity + scalability phase according to a a uniform probability distribution. These application instances and the load they generate stress the provisioning as well as run-time aspects of a cloud. The run-time aspects include disk and network I/O, CPU and memory of the instances running in a cloud.
Any direct comparison with the open source workloads is not recommended as SPEC Cloud IaaS 2016 Benchmark runs them in particular configurations.
CBTOOL supports over 20 workloads. It allows easy addition of new workloads. The supported workloads are available as part of the SPEC kit and at the CBTOOL source code online (https://github.com/ibmcb/cbtool/tree/master/scripts).
Yes. You can use one of the supported workloads in CBTOOL or add your own workload to run the SPEC Cloud IaaS 2016 Benchmark. However, for a compliant run, the submitted results must be run using the workloads (YCSB/Cassandra KMeans/Hadoop) and their configurations as specified in the Run Rules document.
SPEC Cloud IaaS 2016 is available via web download from the SPEC site at $2000 for new licensees and $500 for academic and eligible non-profit organizations. The order form is at: http://www.spec.org/order.html.
SPEC Cloud IaaS 2016 Benchmark comprises the test harness and workloads for the benchmark. This release includes the benchmark harness (Cloudbench, baseline and elasticity drivers, and relevant configuration files) along with the YCSB and Hibench/KMeans workloads, Cassandra and Hadoop source code and operating system packages, as well as scripts to produce benchmark reports and submission files.
The benchmark kit includes example scripts to facilitate testing and data collection.
The kit also includes relevant documentation, that is User guide, Run and Reporting Rules document and the Design document. The documents may be updated from time-to-time; the latest copy is available on the SPEC website.
If your issue is not resolved in these documents, please send email to cloudiaas2016support@spec.org. For CBTOOL-specific questions, please send email to: https://groups.google.com/forum/#!forum/cbtool-users
Initial SPEC Cloud IaaS 2016 results are available on SPEC’s web site. Subsequent results are posted on an ongoing basis following each two-week review cycle: results submitted by the two-week deadline are reviewed by SPECcloud committee members for conformance to the run rules, and if accepted at the end of that period are then publicly released. Results disclosures are at: http://www.spec.org/cloud_iaas2016/results.
Yes. Please follow the rules for open source applications specified in Section 3.3.3 of Run and Reporting Rules document.
No. There is no designated independent auditor but the results have to undergo a peer review before they are published by SPEC.
No.
No.
A compliant run is a test that follows the SPEC IaaS 2016 benchmark run and reporting rules.
For testing, yes. For instance, YCSB can be run with 10 million operations, and 10 million records. But for a compliant submission, the workloads should be run with the parameters as described in the run rules document.
The approximate duration for running the benchmark is as follows
NOTE: This assumes your cloud can provisioning all VMs in parallel. Not all clouds are like that, and if so, the two phases can grow.
It takes approximately 2-3 hours to prepare the benchmark images, assuming you follow the instructions in the user guide.
If the cloud under test is not supported by the benchmark, then an adapter must be developed. The instructions for adding an adapter are in the user guide. The adapter must be reviewed by the subcommittee prior to a submission.
No.
No.
Yes, only if your cloud is generally available other users other than yourself.
Only if those customizations or prioritizations are available to other users and not only to yourself.
Yes, only if you have full admin and visibility into the hardware and can configure the benchmark to provide the appropriate supporting evidence as described by the SPEC documentation, ensuring to SPEC that no other instances or workloads are actively provisioned in your cloud. If you do not have this level of control (which we defined as “complete” control in the documentation), then you must redefine your cloud as private, black box and remove those prioritizations or customizations and make the cloud generally available to other users. If you cannot perform either of those things, then your submission is not valid. If you submit anyway without disclosing those customizations or prioritizations and then your submission is later challenged by another submission in such a way that it cannot be reproduced, then SPEC has the right to retroactively invalidate your submission.