The SPECjbb2015 Benchmark - Known Issues (June 24, 2015)
This is a place where SPEC has collected descriptions (and solutions) to installation, build, and runtime problems encountered by people using the SPECjbb20013 benchmark. If your issue is not amongst the known issues, please bring it to the attention of SPECjbb2015 Support via e-mail to: firstname.lastname@example.org with "SPECjbb" in the subject line.
Response time spikes in Response-Throughput (RT) graph in the HTML report
- When using generational GC policies where full system GC (Garbage Collection) takes much longer than regular nursery GC time, RT graph will show many spikes in response time which mostly corresponds to RT step level where old generation GC with long pauses happens. RT graph is built by increasing the IR (Injection Rate) in 1% increments of HBIR (High Bound Injection Rate) and observing each step for settling (3 sec min and 30 sec maximum) and 60 sec steady state. Since the intent is to keep the total benchmark run length reasonable, each RT step level is tested for around 60 sec steady state. This 60 sec of steady state of each RT step is not long enough to capture old generational GCs whose frequency is every ~5-10 minutes based on heap size and system capacity. As a result, RT steps where old generational GC occurs show higher 99th percentile response time compared to other RT steps.
In some cases, benchmark metric "max-jOPS" run-to-run variability could be high
- Metric "max-jOPS" is determined during RT graph building. RT graph building starts from 0% step while increasing the IR (Injection Rate) in 1% increments of HBIR (High Bound Injection Rate) and observing each step for settling (3 sec min and 30 sec maximum) and 60 sec steady state. Each RT step is evaluated for a passing criterion. The successful IR of the RT step just before the RT step where passing criterion fails (call First Failure), is called "max-jOPS". Since each RT step is evaluated for 60 sec, if a very long GC pause happens, it is possible that First Failure may happen much before the full system capacity is reached. In this case, user will observe max-jOPS red color line in RT graph much earlier than usual end of the graph. Even after First Failure, benchmark keep testing RT step levels unless three continuous RT steps fail. This is to show user more clearly as where the failures of RT steps are happening. User can also look at the IR/PR accuracy graph at the end of the HTML report to observe passing criterion details. In above evaluation criterion, if long GC pause duration and/or its temporal location in RT graph have variability, this will result in "max-JOPS" run-to-run variability. On most systems, we tested very small run-to-run variability.
Benchmark metric "critical-jOPS" run-to-run variability could be high
- Metric "critical-jOPS" is calculated based on 99th percentile of response time from all RT step level till full system capacity "max-jOPS" is reached. Criterion for critical-jOPS is 99th percentile of response time which is very sensitive to GC pauses. On most system tested with optimized configuration, critical-jOPS has very small run-to-run variability. Any configuration where long GC pause durations and temporal locations are random, critical-jOPS may show more run-to-run variability. In particular, systems running Suse Linux OS exhibited very high run-to-run variability.
In rare cases, benchmark metric max-jOPS > 100% HBIR
- Initial phase of the benchmark determines a rough approximation of full system capacity called HBIR (High Bound Injection Rate). On most systems tested, max-jOPS occurs around 80-90% of HBIR. In some rare cases, it is possible that max-jOPS > 100% HBIR.
Scaling of >16 groups inside a single OS image
- In testing, benchmark scales very well when running large number of groups across multiple OS images. When testing inside a single OS image, scaling is reasonable up to 16 groups. When running >16 groups, scaling of max-jOPS and critical-jOPS is poor due to some network resource related bottleneck inside a single OS image. Once more accurate reason is identified; this document will be accordingly updated.
No connections among SPECjbb2015-Distributed instances running across OS images when firewall enabled
- SPECjbb2015-Distributed instances running across OS images may not be able to connect if firewall is enabled. Firewall blocks the TCP-IP communication among Java instances running across OS images and as a result different Java instances are not able to communicate with each other. Disabling the firewall should resolve this issue.
CPU utilization of less than 90%
- With good optimizations and tuning a user should be able to achieve ~90% of CPU utilization. It is suggested that if CPU utilization is <90%, - Dspecjbb.forkjoin.workers= could be set 2 x that of available processor threads for each backend for better performance. Benchmark by default tries to set this property to available processor threads but affinity and/or running multiple groups configuration makes it complex for the benchmark to determine the optimal value for this setting.
Exception at the beginning of the run
- When multiple instances take longer time for the handshake with the controller, it results in exceptions being thrown. These are harmless exceptions and can be ignored.
Submit errors during the run
- During the benchmark run, "submit error" message is reported for several cases. Some of these exceptions are fatal while others are harmless. Please refer to controller log for more detailed information about these error messages.
A "Validation level CORRECTNESS is missing from the Validation Reports" error occurs
- During the benchmark run, an attempt is made to test the load 3 steps above the max-jOPs to showcase that max-jOPS determined is indeed the full system sustained capacity and not much lower max-jOPS resulted as example from a severe glitch of full system GC etc. Some systems may not be able to recover from this 3 steps above the max-jOPS load and validation is skipped resulting in this error. In such cases the user tunable property "specjbb.controller.maxir.maxFailedPoints" can be lowered to value of "1" which should help the system recover and not skip the validation.
After a completed benchmark run, the ssh session is closed
- This behavior can be changed by removing the 'exit 0' line from the end of the script used to run the benchmark.
All benchmark results are located in the benchmark root directory
- Benchmark results can be located anywhere by editing line in *.sh from 'result=./$timestamp' to 'result=./result_dir/$timestamp' or in *.bat from 'set result=%timestamp: =0%' to 'set result=result_dir\%timestamp: =0%' in the script use to run the benchmark to include the desired path.
For latest update to this document, please check here: http://www.spec.org/jbb2015/docs/knownissues.html.
Product and service names mentioned herein may be the trademarks of their respective owners.
Copyright (c) 2007-2015 Standard Performance Evaluation Corporation (SPEC).
All Rights Reserved.