SPECjbb2013 - Known Issues

This is a place where SPEC has collected descriptions (and solutions) to installation, build, and runtime problems encountered by people using the SPECjbb20013 benchmark. If your issue is not amongst the known issues, please bring it to the attention of SPECjbb2013 Support via e-mail to: support@spec.org with "SPECjbb" in the subject line.

Response time spikes in Response-Throughput (RT) graph in the HTML report

When using generational GC policies where full system GC (Garbage Collection) takes much longer than regular nursery GC time, RT graph will show many spikes in response time which mostly corresponds to RT step level where old generation GC with long pauses happens. RT graph is built by increasing the IR (Injection Rate) in 1% increments of HBIR (High Bound Injection Rate) and observing each step for settling (3 sec min and 30 sec maximum) and 60 sec steady state. Since the intent is to keep the total benchmark run length reasonable, each RT step level is tested for around 60 sec steady state. This 60 sec of steady state of each RT step is not long enough to capture old generational GCs whose frequency is every ~5-10 minutes based on heap size and system capacity. As a result, RT steps where old generational GC occurs show higher 99th percentile response time compared to other RT steps.

In some cases, benchmark metric “max-jOPS” run-to-run variability could be high

Metric “max-jOPS” is determined during RT graph building. RT graph building starts from 0% step while increasing the IR (Injection Rate) in 1% increments of HBIR (High Bound Injection Rate) and observing each step for settling (3 sec min and 30 sec maximum) and 60 sec steady state. Each RT step is evaluated for a passing criterion. The successful IR of the RT step just before the RT step where passing criterion fails (call First Failure), is called “max-jOPS”. Since each RT step is evaluated for 60 sec, if a very long GC pause happens, it is possible that First Failure may happen much before the full system capacity is reached. In this case, user will observe max-jOPS red color line in RT graph much earlier than usual end of the graph. Even after First Failure, benchmark keep testing RT step levels unless three continuous RT steps fail. This is to show user more clearly as where the failures of RT steps are happening. User can also look at the IR/PR accuracy graph at the end of the HTML report to observe passing criterion details. In above evaluation criterion, if long GC pause duration and/or its temporal location in RT graph have variability, this will result in “max-JOPS” run-to-run variability. On most systems, we tested very small run-to-run variability.

Benchmark metric “critical-jOPS” run-to-run variability could be high

Metric “critical-jOPS” is calculated based on 99th percentile of response time from all RT step level till full system capacity “max-jOPS” is reached. Criterion for critical-jOPS is 99th percentile of response time which is very sensitive to GC pauses. On most system tested with optimized configuration, critical-jOPS has very small run-to-run variability. Any configuration where long GC pause durations and temporal locations are random, critical-jOPS may show more run-to-run variability. In particular, systems running Suse Linux OS exhibited very high run-to-run variability.

In rare cases, benchmark metric max-jOPS > 100% HBIR

Initial phase of the benchmark determines a rough approximation of full system capacity called HBIR (High Bound Injection Rate). On most systems tested, max-jOPS occurs around 80-90% of HBIR. In some rare cases, it is possible that max-jOPS > 100% HBIR.

Scaling of >16 groups inside a single OS image

In testing, benchmark scales very well when running large number of groups across multiple OS images. When testing inside a single OS image, scaling is reasonable up to 16 groups. When running >16 groups, scaling of max-jOPS and critical-jOPS is poor due to some network resource related bottleneck inside a single OS image. Once more accurate reason is identified; this document will be accordingly updated.

No connections among SPECjbb2013-Distributed instances running across OS images when firewall enabled

SPECjbb2013-Distributed instances running across OS images may not be able to connect if firewall is enabled. Firewall blocks the TCP-IP communication among Java instances running across OS images and as a result different Java instances are not able to communicate with each other. Disabling the firewall should resolve this issue.

CPU utilization of less than 90%

With good optimizations and tuning a user should be able to achieve ~90% of CPU utilization. It is suggested that if CPU utilization is <90%, - Dspecjbb.forkjoin.workers= could be set 2 x that of available processor threads for each backend for better performance. Benchmark by default tries to set this property to available processor threads but affinity and/or running multiple groups configuration makes it complex for the benchmark to determine the optimal value for this setting.

Exception at the beginning of the run

When multiple instances take longer time for the handshake with the controller, it results in exceptions being thrown. These are harmless exceptions and can be ignored.

Submit errors during the run

During the benchmark run, “submit error” message is reported for several cases. Some of these exceptions are fatal while others are harmless. Please refer to controller log for more detailed information about these error messages.

Disclaimer

For latest update to this document, please check here: http://www.spec.org/jbb2013/docs/knownissues.html.
Product and service names mentioned herein may be the trademarks of their respective owners.
Copyright (c) 2007-2014 Standard Performance Evaluation Corporation (SPEC).
All Rights Reserved.