Editor's Note: The decision of the OPC project group in early 1995 to adopt composite numbers for each of the viewsets was not unanimous. Here is a dissenting viewpoint.
by Nathan Tuck
The idea of condensing the results of several benchmarks down to a single number is not a new one. The idea and concepts behind SPEC have become accepted by many in both marketing and engineering as providing a somewhat fair approximation of comparative CPU performance. However, the decision by the OPC group to start reporting a weighted geometric mean should come as a disappointment to all customers who are seriously interested in being able to evaluate the graphics performance of their next probable purchase. How did this benchmark fail where SPEC mostly succeeded? To understand this, we need to examine the serious flaws in reasoning compared to real-world graphics use.
Real-world graphics performance is not measured in polygons per second, z-buffered lit quads per second, or frames per second. It is measured by the rate at which data can be acceptably processed, manipulated and visualized. Although this can be a very abstract concept it is an important one to remember.
Aggregate characterization of CPU performance, although difficult to measure, is a simpler task than measuring graphics performance with a composite number. General-purpose CPUs encounter a wide variety of instruction and data mixes, which SPEC attempts to characterize by using a broad mix of small applications. With recent increases in application size and sophistication, it is not unlikely that many applications might include many of these instruction mixes throughout their code. Thus, characterizing general performance by the CPU's ability to handle SPEC, although requiring a leap of faith, is not so heinous a crime.
What about the graphics subsystem? It is useful here to take a hint from the word "subsystem." It indicates custom equipment tailored for the application that the user had in mind. This is something that the OPC group must keep in mind when reporting figures for performance. Nobody currently will buy a $100 clone PC board for its ability to do texture-mapped virtual reality walkthroughs (though this may change) and no sane person buys a Reality Engine for its ability to draw an X-Window quickly. They are both bought based on their cost/performance for a given set of user-defined tasks.
We must all admit that not all graphics subsystems encounter the same mix of "instruction traffic." The users of machines without hardware texture mapping are rarely going to put up with the strain of asking their machine to do texture mapping, whereas those who can get it for free will probably turn it on if it aids in their visualization and manipulation of the data. This is not taken into account either in viewset development or deriving a composite number from a viewset. Vendors already have a tacit target machine in mind when they estimate the amount of time that people spend using a given feature of their application. The users of the product also have the aforementioned aversion to using features that "hurt them."
The effect of this is to measure performance in a manner that penalizes hardware that exhibits a performance characterization that differs from the hardware that was in use when the viewset was developed. If advanced features were slow and seldom used on old hardware, improving their performance will not be rewarded by greatly increased benchmark performance, and may even be detrimental if they incur a slight penalty in the basic operations that compose the majority of the viewset weighting.
Someone without immediate access to the full OPC published numbers, which is the case with most people, will have no basis for making a good system choice, particularly if they plan to use a different package or to use an established package differently than the benchmarked weightings.
The issue here is good marketing practice and honesty in the representation of the meaning behind OPC benchmarks. The reasoning given behind the development of a composite number has been primarily for marketing "performance." However, very real impacts upon performance are ignored in this composite number. Display list build times, for example, are completely ignored. Also, the OPC group's method of using a weighted geometric mean is highly suspect. Most users are looking for a system that performs evenly across a given feature set. A system with a few highly optimized paths and several weak ones is likely to be far less acceptable than an evenly performing one. The "smoothing" inherent in this method throws out the variance in the performance of systems and can only serve to penalize systems that have even performance and smooth degradation as advanced feature use increases.
Along the political steamroller path to this development of a single-number rating system, SGI proposed a number of stopgaps to provide the customer more information, including characterization of variance along with any composite number, but these suggestions were rejected as being unmarketable. Perhaps some other vendors were afraid of having to characterize their systems as performing at 20 frames per second, plus or minus 18. This might cause customers to look at the actual numbers and to decide for themselves whether the system was right for them.
Other companies might be comfortable reporting an aggregate number in their product literature, but SGI will continue to stress the real overall performance and uses of our systems, whether they are on top or bottom of the illusory composite performance curve.
How can graphics benchmarkers and other interested parties best distill the information that will allow users to understand whether their data can be visualized on time? The simple answer is that they can't. Full disclosure is the only way that customers in the graphics world can be made aware of whether they are getting what they want.
Although Nathan Tuck spends much of his time in empty cinemas waiting for SGI technical credits to roll by, he manages to stay focused on graphics benchmarking issues with his company's high-end desktop systems. He can be reached at firstname.lastname@example.org or (415) 390-2539.