Weighted Geometric
Mean Selected
for SPECviewperf™ Composite Numbers |
by Bill Licea-Kane
At its February 1995 meeting in Salt Lake City, a subcommittee within the SPECopcSM project group was given the task of recommending a method for deriving a single composite metric for each viewset running under the SPECviewperf™ benchmark. Composite numbers had been discussed by the SPECopc group for more than a year.
In May 1995, the SPECopc project group decided to adopt a weighted geometric mean as the single composite metric for each viewset.
Above is the formula for determining a weighted geometric mean, where "n" is the number of individual tests in a viewset, and "w" is the weight of each individual test, expressed as a number between 0.0 and 1.0. (A test with a weight of "10.0%" is a "w" of 0.10. Note the sum of the weights of the individual tests must equal 1.00.)
Why the Weighted Geometric Mean?
The SPECopc subcommittee that recommended a method for determining composite numbers started with the description for assigning weights that is provided to each creator of a viewset: "Assign a weight to each path based on the percentage of time in each path..."
Given this description, the weighted geometric mean of each viewset is the correct composite metric. This composite metric is a derived quantity that is exactly as if you ran the viewset tests for 100 seconds, where test 1 was run for 100 × weight1 seconds, test 2 for 100 × weight2 seconds, and so on.
The end result would be the number of frames rendered/total time which will equal frames/second. It also has the desirable property of "bigger is better"; that is, the higher the number, the better the performance.
Given this description, the weighted harmonic mean would be as if you ran the viewset tests for 100 frames, where 100 × weight1 frames were drawn with test1, the next 100 × weight2 frames were drawn by test2, and so on. The 100 frames divided by the total time would be the weighted harmonic mean.
Since the weights for the viewsets were selected on percentage of time, not percentage of operations, we chose the weighted geometric mean over the weighted harmonic mean.
Consider for a moment a trivial example, where there are two tests,
equally weighted in a viewset:
|
|
|
|
System A |
|
|
|
System B |
|
|
|
System C |
|
|
|
System B is 10-percent faster at Test1 than System A. System C is 10-percent faster at Test2 than System A. But look at the weighted arithmetic means. System B's weighted arithmetic mean is only .1-percent higher than System A's, while System C's weighted arithmetic mean is 10-percent higher. Even normalization doesn't help here.
Since our weights were percentage of time and since the results from SPECviewperf are expressed in frames/sec, we were not obligated to normalize. Normalization introduces many issues of its own, starting with something as simple as how to select a reference system.
We invite readers to select two different systems whose results are published in this newsletter and to use each one as the reference system. You will discover quickly that the normalized weighted geometric means change only in absolute magnitude. If the weighted geometric mean of System B is 10-percent higher than System A, for example, the normalized weighted geometric mean of System B will be 10-percent higher than System A, no matter what reference system you choose.
Please don't rely exclusively on any synthetic benchmark such as SPECviewperf. In the end, isn't actual application performance on an actual computer system what you are really attempting to find?
Bill Licea-Kane is a former chair of SPECopc and a current member of SPEC's
Board of Directors. He can be reached by e-mail at
gpcopc-info@spec.org.