|
Open ForumProfile of a SPEC95 Benchmark Candidate
By Peter R. Homan Published March, 1995; see disclaimer.
The VORTEx benchmark was derived from a full Object Oriented Database Management System (OODBMS) and conforms to SPEC CINT95 (CPU component benchmark) guidelines. The VORTEx benchmark is a single-user object-oriented database transaction benchmark which contains a highly portable system kernel coded in integer ANSI C. The 39,000 Non-Comment Source Statements (NCSS) that comprise the benchmark source are distributed across 69 files. The VORTEx benchmark exercises a rich repertoire of database transaction activities, object class configurations, and entity relationships. The benchmark includes all object database functionality except for disk caching, and permanent storage on disk of the database (commit). VORTEx is an acronym for "Virtual Object Run Time Expository." This article presents background information on the VORTEx benchmark and on the experience of converting an existing application into a SPEC95 CPU component benchmark candidate. Initial Screening CriteriaBefore selecting VORTEx to provide a foundation for a CINT95 benchmark candidate, a number of issues had to be addressed in the screening process. Other Alternatives?While not an exhaustive search, a number of public domain database offerings were examined as potential benchmark candidates. When considered in terms of the SPEC95 development schedule, many of the database offerings exhibited potential difficulties meeting the requirements for multiple architecture compliance, compiler portability, simple build process, and workload derivation. What makes it Unique?A database benchmark opens up a new application area for the SPEC95 CPU integer benchmarks. Finding a benchmark in this application area that is also object-oriented further increases its appeal. A somewhat uncommon feature of the VORTEx OODBMS is that it supports multiple databases in a single environment. Each database in the environment is physically segregated and is a logical clustering of objects that are contextually related. Objects may be globally related between databases (tuples, sets). Does it Look Portable?VORTEx in its full implementation supports object oriented language interfaces for C, C++, and FORTRAN. Under the hood, the database engine itself was written in ANSI C and has proved to be highly portable. Even before the development efforts for SPEC95, VORTEx had already been ported to a variety of platforms and compilers: UNIX (RISC), three compilers under DOS (x86), WIN32(x86), and MAC(68K). Does the Code Look Reasonable?Code quality affects benchmark quality. The source code found in VORTEx exhibits high quality from many aspects. A rigorous semantic style has been adhered to throughout the development process of VORTEx. The program is written in a formalized syntax, and the format of the programming style is such that elements of the functional specification document are extracted directly from the source code. The code contains extensive runtime diagnostic routines. Is the Code Well Designed?Good application design helps make a good benchmark. VORTEx conforms to the "Golden Rules" of The Object-Oriented Database System Manifesto [Atkinson89] which describes the main features and characteristics a system must have to qualify for the designation of an object-oriented database system. The thirteen rules of conformance provide support for: complex objects, object identity, encapsulation, types and classes, class or type hierarchies, late binding, computational completeness, extensibility, persistence, secondary storage, concurrency, recovery, and an ad-hoc query facility. Is it Hard to Build and Run?Simplified build and runtime procedures are considered an asset for a benchmark. No special language preprocessor is needed to build the VORTEx benchmark. In addition, VORTEx supports using a pre-built schema, eliminating the need to generate database translation mappings at measurement time. Workloads that are similar to existing standards help a benchmark to gain acceptance. The workload of VORTEx has been modeled after common object-oriented database benchmarks with enhancements to vary the mix of transactions. (More details on this topic in the next section.) CINT95 Conversion RequirementsLike the CINT92 suite, CINT95 focuses on testing a triad of system components: processor, cache, and compiler. SPEC95 brings with it an updated set of conversion criteria. To gain acceptance as a SPEC95 CPU component benchmark, a candidate must exhibit a number of desirable attributes after conversion. Here are some of the conversion criteria, and how VORTEx was modified to meet them. Floating Point ContentOne (usually minor) consideration for CINT95 is that the instruction mix for an integer benchmark contains less than one percent of floating point operations. No modifications to the VORTEx source had to be made to meet this criteria. WorkloadsAt present writing, a CINT95 candidate needs at least three workloads. A "reference" workload for the runtime measurement, a shorter "test" workload to validate the build, and a very short "train" workload to be used as input for feedback-directed compiler optimization. VORTEx, like the other SPEC95 component candidates, was modified to accept workload configuration data from an input file. The two smaller workloads are derived from the reference workload required for measurement runs. Obtaining "realistic" workloads is also desirable for SPEC95 CPU component benchmarks. The VORTEx benchmark is somewhat compromised at the user interface. (SPEC95, like its predecessors is a batch benchmark suite.) Does using a batch workload provide for a realistic approximation of the full application? If it is assumed that the sequence of generated events is sufficiently random in nature to simulate a series of a database transactions (via an interactive interface), or even to simulate a single long transaction, then the VORTEx benchmark can be considered representative of typical OODBMS activities. Run timeThe run time of a SPEC95 CPU component benchmark for the reference workload must be in the range of 4 to 20 minutes on a "high-end" system of today. The run time range is an attempt to compensate for increasing system performance over the lifetime of the SPEC95 suite. By varying the number and type of database transactions, VORTEx can be adjusted for virtually any run time length. Dataset SizeThe SPEC95 CPU component benchmarks were not intended to measure the effects of virtual memory paging. The VORTEx input files can also be used to adjust dataset size. Limiting the dataset to a maximum near 40 MB prevents paging on most systems with 64 MB of memory. I/O ActivitySPEC95 CPU component benchmarks are not designed to measure I/O activity. Database applications typically exhibit significant I/O rates, and the complete VORTEx OODBMS is no exception. In the benchmark version of VORTEx, a goal was set to minimize I/O as much as possible. Two major features of the VORTEx OODBMS were deactivated to reach this goal: the commit function (which establishes persistence of objects in a database repository), and the memory caching function (which utilizes a temporary disk file to access resident data). There is some I/O activity involved in the initial loading of the schema database, (about 3 MB). Profile CriteriaIt is desirable for a SPEC95 CPU component benchmark candidate to exhibit a runtime profile spread evenly over the main procedures. (An even profile increases the program's resistance to benchmark-specific optimizations.) VORTEx has an even distribution of time spent in major procedures. The top six procedures of VORTEx all deal with memory management of objects, (a distribution representative of non-paging database applications.) Performance NeutralityFor SPEC95 the attempt has been made, (through the active participation of the many vendors), to keep the code base from being biased towards a particular hardware system. Here are two examples from the development of the VORTEx benchmark.
It was noted that different compilers made distinctly different decisions
about what procedures to inline. The procedures in question were located at
the end of a chain of macro definitions. One of those procedures was called
A second example involves workload definition. The database transaction series is determined by a sequence of random numbers. A portable random number generator was incorporated in the source to ensure that all implementations would be experiencing the same transaction sequence. Cache Effects
An attempt has been made in SPEC95 to select benchmarks that both reflect
real applications and to provide stress on the underlying hardware. The
VORTEx benchmark models a cyclic sequence of random events that exhibits
code variation and data access across a wide range of memory addresses. To
enhance cache stress, two modifications were made to the VORTEx benchmark.
First, the Second, the insertion of new parts on every iteration of the inner loop sequence results in attaining a better distribution of memory access. Insertions tend to create breaks in data access locality. PortabilitySince development began on the VORTEx benchmark for the CINT95 suite, the DBMS kernel and administrative modules have been ported to the wide range of hardware configurations found at SPEC benchathons and in vendor collections. The VORTEX OODBMS was originally developed on the Intel x86 architecture under MS-DOS. When VORTEx was in its early phase of development, all of the MS-DOS C compilers supported only 16 bit data types. (These compilers were restricted to one byte member and struct alignments.) Even though developed in this restrictive environment, VORTEx was designed from the start for 16 and 32 bit interchangeability (using typesdefs for supported data types). The initial port to a 32-bit architecture (with 32-bit pointers) exposed design aspects not considered before. Hardware dependencies, (revealed in the generation of class topologies), were found to impact the internal database representation and behavior of object transactions. Conflicts with coordinating the internal and external class topology maps were due to byte and struct alignment variations, and compiler dependent sizing of enumerated types. Language compilation problems across various platforms have been minimized due to the adherence to standard C/ C++ coding conventions. Several runtime errors were encountered, caused by compiler optimization (mostly loop variable ordering), but these problems were easily located and resolved in the source code. Several ANSI preprocessor commands were found to be order dependent and were interpreted inconsistently across platforms. Some macros had to be reconfigured for conformance. One of the requirements of SPEC95 benchmarks is to be portable to architectures that support 64-bit data types. For the VORTEx benchmark, this port revealed internalized database dependencies on the size of an address type (now eight bytes; was four bytes), and database schema inconsistencies. Built-in debugging tools in the VORTEx kernel were used to compare the internal database topology image and the external topology created by the compiler. Each major object topology was checked, corrected, and validated using this approach. These tools proved to be invaluable in resolving the remaining 64-bit port problems. The VORTEx OODBMS is "mandated" to support both a Data Repository Interface (DRI) and an Object Management Interface (OMI), (of which the DRI is an encapsulated subset). A major design issue was to maintain code modules that handle both schema and non-schema driven implementations of certain container class objects (such as a B+ tree and 2D dynamic arrays). The OODBMS kernel and API modules were subsequently revised to handle architecture dependent issues which affected these class objects. The multiple ports to 16, 32 and 64 (big and little endian) architectures, and to various operating systems, proved to be a rigorous test of the VORTEx OODBMS integrity. The basic requirement is to maintain a persistent database image decipherable by each of the target systems within a distributed client-server environment. A modification of the VORTEx integer benchmark (enhanced to add back complete database functionality) has proved to be a valuable tool in the VORTEx OODBMS verification and validation process. Evolution of the BenchmarkIn the beginning it was felt that existing class libraries and corresponding class methods, (already constructed and tested in the VORTEx environment), could be combined to produce a viable benchmark. A time constraint of two months to project completion provided an additional (and compelling) reason for leveraging existing class modules.
At the time SPEC95 development was initiated, three application programs
had been written to validate and verify behavior of the VORTEx OODBMS. Each
program exercised and accessed a unique library of class objects. The
intent of the combined applications was to model each of the supported
entity-relationships, data member types (attributes),
"iterators", query commands, and various class inheritance
schemes of the VORTEx Object Definition Language (ODL). Names of the three
class modules developed are the
The first version of the benchmark was based on a single database of
instantiated objects found in the Measurements of cache activity, instruction mix, and procedure profiles were collected from this test setup. For this first setup, the cache miss rate was disappointingly low, and the run time profile did not show sufficient variation.
The second attempt to design a benchmark was to create a VORTEx-based
implementation of the Engineering Database Benchmark (EDB0), [Cattell92].
EDB0 was designed to predict DBMS performance for engineering design
applications. Its primary features include object creation, object
navigation, and memory caching. For VORTEx, the Hardware measurements from the second attempt still did not exhibit significant data cache activity or Cycles Per Instruction (CPI) variation. Why was this? The workload in EDB0 only performed navigation operations: lookups, traverses, and reverses. Objects were only touched in the database, but not modified. Also there was no connectivity to other object types -- the schema was simplistic. As a result, data was frequently referenced in the cache.
A third version was now considered. This version was to combine elements of
the
The current version of the integer benchmark "VORTEx" is based on
the EDB0 event sequence mentioned above, however noteworthy modifications
were made. Transactions were added for: deletion of objects, query of
objects, traversal over different set types, and multiple databases. The
An important attribute of the VORTEx benchmark is that there is a good mix of transactions, data types, and memory accesses that do not compromise the model of "real-life" activities. Although a bit contrived, (as all benchmarks tend to be), the integration of the three libraries gives one the semblance of reality. Each person owns a set of parts; each part has a paired draw object; each object is drawn on a 2D coordinate system. Overview of the BenchmarkGeneral OverviewTransactions to and from the database are translated though a schema. A schema is a machine-readable translation that maps an internally stored data block to a model viewable in the context of the application. All objects declared in the database schema of this benchmark are derived from a common base object, called "Image01". The database schema of class descriptors and entity relationships is processed and added to the VORTEx benchmark environment during a separate administration run. For a measurement run, the environment is loaded from a local disk file. Those readers with access to the source code can obtain a complete perspective of the integrated benchmark application by looking at the files: draw*.h, rect*.h, emp*.h, and bmt01.h. The class descriptor for the base object "Image01" and its member methods is found in the header file obj01.h. ("Image01" is a pure virtual object, and is never instantiated.) The VORTEx OODBMS was designed to be extensible for compliance with the Common Object Request Broker (CORBA) of the Object Management Group (OMG). The VORTEx Object Definition Language (ODL) is used to describe the interfaces that client objects call, and object implementations provide. All attributes and data types of the VORTEx ODL resolve to the basic data types defined in the OMG IDL [CORBA91]. The interface to the ORB is incomplete, however the required topological mapping is embedded in and accessible from the database schema of the system environment. The Object Definition Language (ODL) processor accepts modified ANSI C or C++ header files as input. Library OverviewThe schema as provided with the benchmark is pre-configured to manipulate three different databases: mailing list, parts list, and geometric data. Little-endian and big-endian versions of the schema are provided.
Three databases are instantiated (created) during the program execution: a
The
|