Skip navigation

Standard Performance Evaluation Corporation


2007 SPEC Benchmark Workshop

The HPC Challenge Benchmark: A Candidate for Replacing Linpack in the TOP500 ?
Jack Dongarra
University of Tennessee
Oak Ridge National Laboratory

The HPC Challenge suite of benchmarks examines the performance of HPC architectures using kernels with memory access patterns more challenging than those of the High Performance Linpack (HPL) benchmark used in the Top500 list. The HPC Challenge suite is designed to provide benchmarks that bound the performance of many real applications as a function of memory access characteristics e.g., spatial and temporal locality, and provide a framework for including additional benchmarks. The HPC Challenge benchmarks are scalable with the size of data sets being a function of the largest HPL matrix for a system. The HPC Challenge benchmark suite has been released by the DARPA HPCS program to help define the performance boundaries of future Petascale computing systems. The suite is composed of several well known computational kernels (STREAM, High Performance Linpack, matrix multiply -- DGEMM, matrix transpose, FFT, RANDA, and bandwidth/latency tests) that attempt to span high and low spatial and temporal locality space.
[slides (PDF)]

Jack Dongarra holds an appointment as University Distinguished Professor of Computer Science in the Computer Science Department at the University of Tennessee and holds the title of Distinguished Research Staff in the Computer Science and Mathematics Division at Oak Ridge National Laboratory (ORNL), and an Adjunct Professor in the Computer Science Department at Rice University. He specializes in numerical algorithms in linear algebra, parallel computing, use of advanced-computer architectures, programming methodology, and tools for parallel computers. His research includes the development, testing and documentation of high quality mathematical software. He has contributed to the design and implementation of the following open source software packages and systems: EISPACK, LINPACK, the BLAS, LAPACK, ScaLAPACK, Netlib, PVM, MPI, NetSolve, Top500, ATLAS, and PAPI. He has published approximately 200 articles, papers, reports and technical memoranda and he is coauthor of several books. He is a Fellow of the AAAS, ACM, and the IEEE and a member of the National Academy of Engineering.

Benchmarking Sparse Matrix-Vector Multiply in Five Minutes
Hormozd Gahvari, Mark Hoemmen, James Demmel, and Katherine Yelick (University of California, Berkeley)

We present a benchmark for evaluating the performance of Sparse matrix-dense vector multiply (abbreviated as SpMV) on scalar uniprocessor machines. Though SpMV is an important kernel in scientific computation, there are currently no adequate benchmarks for measuring its performance across many platforms. Our work serves as a reliable predictor of expected SpMV performance across many platforms, and takes no more than five minutes to obtain its results.
[slides (PDF)]

SPEC MPI2007 Benchmarks for HPC Systems
Ron Lieberman (HP), Matthias S. Mueller (Dresden University of Technology), Tom Elken (QLogic Corporation)

[slides (PDF)]

Benchmarking for Power and Performance
Heather Hansony (IBM Austin Research Lab and University of Texas at Austin), Karthick Rajamani, Juan Rubio, Soraya Ghiasi and Freeman Rawson (IBM Austin Research Lab)

There has been a tremendous increase in focus on power consumption and cooling of computer systems from both the design and management perspectives. Managing power has significant implications for system performance, and has drawn the attention of the computer architecture and systems research communities. Researchers rely on benchmarks to develop models of system behavior and experimentally evaluate new ideas. But benchmarking for combined power and performance analysis has unique features distinct from traditional performance benchmarking.

In this extended abstract, we present our experiences with adapting performance benchmarks for use in power/performance research. We focus on two areas: the problem of variability and its effect on system power management and that of collecting correlated power and performance data. For the rst, we have learned that benchmarks should capture the two sources of workload variability - intensity and nature of activity - and the benchmarking process should take into account the inherent variation in component power. Benchmarks not only have to test across all of these forms of variability, but they also must capture the dynamic nature of real workloads and real systems. The workload and the system's response to it change across time, and how fast and well the system responds to change is an important consideration in evaluating its power management capabilities. In the second area, we have developed tools for collecting correlated power and performance data, and we briey discuss our experience with them.
[slides (PDF)]

Is CPU2006 the last of SPEC's CPU benchmarks?
John Henning (Sun Microsystems)


Benchmark Design for Robust Profile-Directed Optimization
Paul Berube, Jose Nelson Amaral (University of Alberta)

Profile-guided code transformations specialize program code according to the profile provided by execution on training data. Consequently, the performance of the code generated usind this feedback is sensitive to the selection of training data. Used in this fashion, the principle behind profileguided optimization techniques is the same as off-line learning commonly used in the field of machine learning. However, scant use of proper validation techniques for profileguided optimizations have appeared in the literature. Given the broad use of SPEC benchmarks in the computer architecture and optimizing compiler communities, SPEC is in a position to influence the proper evaluation and validation of profile-guided optimizations. Thus, we propose an evaluation methodology appropriate for profile-guided optimization based on cross-validation. Cross-validation is a methodology from machine learning that takes input sensitivity into account, and provides a measure of the generalizability of results.
[slides (PDF)]

Designing a Workload Scenario for Benchmarking Message-Oriented Middleware
Kai Sachs (TU Darmstadt, Germany), Samuel Kounev (TU Darmstadt, Germany, University of Cambridge, UK), Marc Carter (IBM Hursley Labs), Alejandro Buchmann (TU Darmstadt, Germany)

Message-oriented middleware (MOM) is increasingly adopted as an enabling technology for modern informationdriven applications like event-driven supply chain management, transport information monitoring, stock trading and online auctions to name just a few. There is a strong interest in the commercial and research domains for a standardized benchmark suite for evaluating the performance and scalability of MOM. With all major vendors adopting JMS (Java Message Service) as a standard interface to MOM servers, there is at last a means for creating a standardized workload for evaluating products in this space. This paper describes a novel application in the supply chain management domain that has been specifically designed as a representative workload scenario for evaluating the performance and scalability of MOM products. This scenario is used as a basis in SPEC’s new SPECjms benchmark which will be the world’s first industry-standard benchmark for MOM.
[slides (PDF)]

SPECjbb2005 -- A Year in the Life of a Benchmark
Alan Adamson (IBM Canada Ltd.), David Dagastine (Sun Microsystems), and Stefan Sarne (BEA Systems)

Performance benchmarks have a limited lifetime of currency and relevance. This paper discusses the process used in updating SPECjbb2000 to SPECjbb2005 and presents some initial reflections on the implications and effects of the update now active.
[slides (PDF)]

Measuring the Performance of Multithreaded Processors
Javier Vera, Francisco J. Cazorla, Alex Pajuelo, Oliverio J. Santana, Enrique Fernandez, Mateo Valero (Barcelona Supercomputing Center, Spain; Universitat Politecnica de Catalunya, Spain; Universidad de Las Palmas de Gran Canaria, Spain)

Nowadays, multithreaded architectures are becoming more and more popular. In fact, many processor vendors have already shipped processors with multithreaded features. Regardless of this push on multithreaded processors, still today there is not a clear procedure that defines how to measure the behavior of a multithreaded processor.

This paper presents FAME, a new evaluation methodology aimed to fairly measure the performance of multithreaded processors. FAME can be used in conjunction with any of the metrics proposed for multithreaded processors like IPC throughput, weighted speedup, etc. The idea behind FAME is to reexecute all threads in a multithreaded workload until all of them are fairly represented in the final measurements taken from the workload. Then these measurements will be combined with the corresponding metric to obtain a final value that quantifies the performance of the processor under consideration.

Characterization of Performance of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor
Sarah Bird, Aashish Phansalkar, Lizy K. John, Alex Mericas, Rajeev Indukuru (University of Texas at Austin)

The newly released CPU2006 benchmarks are long and have large data access footprint. In this paper we study the behavior of CPU2006 benchmarks on the newly released Intel's Woodcrest processor based on the Core microarchitecture. CPU2000 benchmarks, the predecessors of CPU2006 benchmarks, are also characterized to see if they both stress the system in the same way. Specifically, we compare the differences between the ability of SPEC CPU2000 and CPU2006 to stress areas traditionally shown to impact CPI such as branch prediction, first and second level caches and new unique features of the Woodcrest processor.

The recently released Core microarchitecture based processors have many new features that help to increase the performance per watt rating. However, the impact of these features on various workloads has not been thoroughly studied. We use our results to analyze the impact of new feature called "Macro-fusion" on the SPEC Benchmarks. Macro-fusion reduces the run time and hence improves absolute performance. We found that although floating point benchmarks do not gain a lot from macro-fusion, it has a significant impact on a majority of the integer benchmarks.
[slides (PDF)]