9 An Introduction to the KENBUS Benchmark 9 _K_e_n _J. _M_c_D_o_n_e_l_l Pyramid Technology Corporation Mountain View, California USA E-mail: kenj@pyramid.com Revised: December, 1990 1 Some Background - MUSBUS, Version 5.2 The Monash University Suite for Benchmarking UNIX8|-9 Systems (MUSBUS) was developed over a number of years, starting in 1982. The major- ity of the development was done at Monash University, with refine- ments and improvements coming from members of the world-wide UNIX community too numerous to mention. Refer to the companion document _A_n _I_n_t_r_o_d_u_c_t_i_o_n _t_o _t_h_e _M_o_n_a_s_h _B_e_n_c_h_m_a_r_k _S_u_i_t_e (_M_U_S_B_U_S) for a technical description of the last version (5.2) of MUSBUS to be widely distributed. Further back- ground on the philosophy of performance measurement that underlies MUSBUS may be found in the paper _T_a_k_i_n_g _P_e_r_f_o_r_m_a_n_c_e _E_v_a_l_u_a_t_i_o_n _O_u_t _o_f _t_h_e ``_S_t_o_n_e _A_g_e'' which appears in the Proceedings of Summer Usenix Technical Conference, Phoenix, Arizona, June 1987, pages 407-417. Over the years, MUSBUS has been used by system vendors for QA and multi-user performance measurements and by system purchasers for competitive performance assessment. The code has always been in the public domain and has been widely distributed. The KENBUS benchmark as distributed by SPEC is based upon MUSBUS Version 5.2, but incorporates several important changes. The pur- pose of this document is to highlight those changes and to emphasize the point that MUSBUS and KENBUS are different benchmarks and hence the results are in no way comparable. 9 _______________ 9 January 11, 1991 - 2 - MUSBUS Introduction 2. 8|-9 UNIX is a trademark of AT&T 2 Workload The original intention of MUSBUS was that the simulated user work- load should be an input parameter to any performance measurement exercise. But to provide some guidance, I included several example workloads, of which one was the default workload used by the con- trolling scripts that ran the benchmark. Unfortunately, the majority of MUSBUS users have opted to use the default workload, and it is this workload which is provided as an invariant component of the KENBUS benchmark. 2.1 The ``default'' MUSBUS Workload The default workload was not an arbitrary artifact. At the time, Monash University was searching for a new system to be acquired by the Computer Science Department for research use. The workload was created by, o+ Analysis of the process accounting records over several months for the existing system to identify which commands (tasks) were responsible for the largest components of resource utilization (principally CPU cycles). 9 The default workload reflects the relative frequencies of these tasks. o+ Modification of the identified tasks to include audit-trace information each time they were run. 9 This audit-trace provided the information necessary to charac- terize the typical complexity of the tasks, e.g. source file sizes, typical patterns for _g_r_e_p, directory sizes relevant to _l_s, edit session complexity, etc. In this way the ``default'' workload is a reasonably accurate char- acterization of user activity in a research-oriented software development environment. SPEC has chosen to use this default workload in the KENBUS bench- mark. The MUSBUS framework is much more generalized, and one could anti- cipate subsequent SPEC benchmarks that may use the same multi-user benchmark framework, but radically different workloads to charac- terize system behaviour in other application environments. 9 9 January 11, 1991 - 3 - MUSBUS Introduction 3. 2.2 Simulated User Typing Rate The rate at which human users ``peck'' at keyboards is an input parameter to MUSBUS, with a default value of 2 characters per second. SPEC has fixed this value at 3 characters per second for KENBUS, in response to some more recent human-factors studies suggesting this is a more realistic ``average'' value. At low levels of concurrency, the elapsed time to run the benchmark is constrained by the rate at which users enter commands - this is very important in the overall benchmark methodology and reflects the way real lightly loaded systems behave. Consequently, the elapsed times for KENBUS at low loads will be of the order of 30% smaller than the elapsed times for MUSBUS with the default workload and typing rate. 3 Terminal I/O In the original MUSBUS framework, terminal I/O was given substan- tial importance and great care was taken to ensure that realistic amounts of terminal I/O were generated and sent to real serial interfaces. The rationale was simple - many systems wasted expen- sive cycles handling mundane terminal character traffic, with a consequent degradation in throughput for serious computation and work. SPEC is operating under some different constraints, and in particu- lar there is no guarantee that a vendor's system will be configured with any serial line hardware, and may be not even Ethernet inter- faces. Consequently, KENBUS assigns all terminal output to the ``bit bucket'' - all of the writes still occur, they are just directed to /dev/null (in UNIX parlance). The end result is an ``even playing field'' amongst vendors, but KENBUS is now a different benchmark to MUSBUS. Once consequence of using /dev/null throughout is the KENBUS is now operationally much simpler than MUSBUS, with all of the aggregate baud rate checking from MUSBUS having been abandoned. 4 The ``Raw Speed'' Tests In the MUSBUS distributions I included a range of ``raw speed'' tests. These were only ever intended for diagnostic use, and suffer (in some cases chronically) from the sorts of perversions that SPEC Release 1.0 tried to address (e.g. aggressive compiler optimizations, sensitivity to radically different system implemen- tations, susceptibility to conjuring and system configuration, etc.). 9 9 January 11, 1991 - 4 - MUSBUS Introduction 4. Thankfully, these tests are not distributed with KENBUS. 5 Wallclock Accuracy MUSBUS includes a check for the accuracy of wallclock time. This was intended to detect problems in the implementation of _s_l_e_e_p() and _a_l_a_r_m() that influenced the accuracy of the rate at which MUSBUS generated user keystrokes. Since the test is now universally passed, and automatic the check is very difficult, it has been dropped from KENBUS (this has the desirable side-effect of trimming 60 seconds off the iteration time when debugging the benchmark installation). 6 The Performance Metric There was no single performance metric from MUSBUS; the benchmark reported the mean and variance of elapsed and CPU times for several different levels of concurrent activity. Overall system perfor- mance was gauged based upon heuristics related to degradation in elapsed time and/or increasing CPU saturation. For KENBUS the single performance metric is ``scripts per hour'' and the maximum value is reported, with unconstrained freedom to vary the concurrency level. 9 9 January 11, 1991