SPECseis96 Benchmark Q&A

1. How many applications are included in the SPECseis96 Benchmark?

SPECseis96 currently includes one application named Seismic. The benchmark may include additional applications in the future.

2. Can you give some background on the Seismic application?

Seismic was developed in 1993 at ARCO by Charles Mosher and Siamak Hassanzadeh. The application began as a prototype system for seismic data processing on parallel computing architectures. The prototype was subsequently used to design and implement production seismic processing on ARCO's iPSC/860.

Seismic was designed from the start as a portable, parallel environment for developing, benchmarking, and sharing seismic application codes. The authors placed the code in the public domain as a benchmark suite in the hope that it could be used for code development and sharing between academia, US national laboratories, and industry. Working closely with the SPEC High Performance Group, the authors have committed to ensuring that Seismic's functionality continues to reflect current technology trends in the oil industry.

3. What metrics are reported for the SPECseis96 benchmark?

SPECseis96_SM
SPECseis96_MD
SPECseis96_LG
SPECseis96_XL

Since there is currently one application in SPECseis96, the metrics reflect performance of the single code.

4. Why are there four metrics?

Because there are four different problem sizes a benchmarker may wish to run. For the Seismic application, these problem sizes relate to the number of seismic traces (and therefore, the amount of data) that is processed.

Benchmarkers are not required to report on all problem sizes, but are left to decide which problem sizes are most meaningful for the benchmarked machine. Refer to the SPEChpc96 Run and Reporting Rules for more information.

5. Are there parallel versions of the Seismic application?

Yes. Seismic can be run in traditional serial mode or it can be run in parallel mode. The parallel version of the code is implemented with PVM (Parallel Virtual Machine) message passing library. For more information, see the Oak Ridge National Laboratory web page for PVM.

Instructions on how to build and run the parallel version are included with the source code. For any given problem size, benchmarkers may report results for runs on differing numbers of processors, thereby showing scalability of the problem.

The parallel model used by Seismic is the SPMD, or single program multiple data, model. That is, several copies of the same executable (single program) run simultaneously on separate processors but they are operating on different parts of the problem domain (multiple data).

6. Exactly how large are the problem sizes?

As mentioned previously, the different problem sizes are reflected in the number of seismic traces that the application will procsses, which in turn is reflected in the size of the input/output datasets.

The table below gives you some idea of the size of Seismic. Your process size will vary depending on whether you compile it for serial or PVM execution, the default size of you REAL and COMPLEX variables, etc. In this case, the program was compiled for serial execution on a 32-bit system and statically linked.

Problem Size	Serial Execution Time (Elapsed at 100 mflops)	Memory (MBs)	Size of Trace File (GBs)
Small	0.5	approx. 2 MB	0.11
Medium	8	approx. 23 MB	2
Large	48	---	19
Xlarge	240	---	93

7. What are the code charactaristics of Seismic?

The following list briefly summarizes the major code charactaristics:

Approx. 15,000 lines of code
Approx. 230 Fortran subroutines (for computational work)
Approx. 119 C functions (for I/O, message operations, system utilities)
Single precision floating point
Dynamic memory allocation
Main algorithms:
- FFTs
- Equations 15-17 of Lih-Zhimings finite difference compensation algorithm
- Multiple implict complex tridiagonal solver

8. What is the general organization of Seismic?

A single run of Seismic consists of four invocations, or "passes", of the Seismic Executive. Each of these passes are parallelized, and each produces a results and a timing file as output. The total execution time of Seismic is determined by adding the elapsed times, in seconds, for all four passes of the Seismic Executive.

The passes equate to the following tasks:

Pass 1 - Generate the data
Pass 2 - Stack the data
Pass 3 - Perform time migration
Pass 4 - Perform depth migration

At the start of each invocation of the Seismic Executive, a group of parameters are read from the benchmark input file. These parameters specify the chain of processing functions to be performed by the executive, and they describe the problem, i.e. dataset, to be operated upon. The Seismic Executive processes these tasks sequentially, but an indvidual task may have a parallel implementation. The tasks are outlined in the tables below.

Some tasks included in the Seismic application are not activated as part of the SPECseis96 benchmark. They are FXMG, TR12, TR23, and XSUM. Because these tasks are traditional parts of the ARCO Seismic application, which has been widely used within the seismic community, they are included for completeness.

**Pass 1 - Generate seismic data**
Task Name	Description	Parallelism
VSBF	Data Generation	none
GEOM	Generate source/recieve geometry	parallel, no communications required
DGEN	Generate seismic data	Parallel, no communications required
FANF	Apply spatial filters to data	Parallel, no communications required
DCON	Apply predictive deconvolution	Parallel, no communications required
NMOC	Apply moveout corrections	Parallel, no communications required
PFWR	Parallel write	parallel, see description
VRFY	Validate	none
RATE	Measure processing rate	none

**Pass 2 - Stack the data**
Task Name	Description	Parallelism
PFRD	Parallel read	parallel, see description
DMOC	Apply residual moveout corrections	Transformed domain parallel, fair amount of communications
STAK	Sum input traces into zero offset section	Transformed domain parallel, fair amount of communications
PFWR	Parallel write	parallel, see description
VRFY	Validate	none
RATE	Measure processing rate	none

**Pass 3 - Perform time migration**
Task Name	Description	Parallelism
PFRD	Parallel read	parallel, see description
M3FK	Fourier domain migration	Transformed domain parallel, fair amount of communications
PFWR	Parallel write	parallel, see description
VRFY	Validate	none
RATE	Measure processing rate	none

**Pass 4 - Perform depth migration**
Task Name	Description	Parallelism
VSBF	Data generation	none
PFRD	Parallel read	parallel, see description
MG3D	Finite difference migration	Domain decomposition parallel, small amount of communications
PFWR	Parallel write	parallel, see description
VRFY	Validate	none
RATE	Measure processing rate	none

9. What exactly is meant by "parallel I/O", as in tasks PFRD and PFWR? Specifically, how is this implemented?

Seismic reads two-dimensional arrays HDF style files on disk. Parallel I/O is supported for both a single large file read by multiple process, and a separate file read by each process. Note that a significant part of the seismic processing flow requires data to be read in transposed fashion across all processes.

During each pass of the Seismic Executive, data is read from disk at the beginning and is written to disk at the end. These read and write operations happen in parallel, in either one of two modes: to a single filesystem shared by all processes or to separate filesystems, one per process.

In the case of using separate filesystems, each process operates on a separate disk file (which may or may not reside on a physically different disk); hence the I/O is fully parallel.

In the case of using a shared filesystem on the same disk, each process computes an individual offset into the file based on their assigned data partition. The I/O operations are logically fully parallel because they access non-overlapping sections of the disk file. Note, however, there may some serialization due to the operating system coordinating the access to the same disk. Depending on the way of linking, it may be necessary to add or enable synchronization that coordinates parallel access to the I/O libraries. Synchonization is necessary if the I/O library is shared by the parallel processes, which may be the case on shared-memory systems.

10. What exactly is meant by "domain decomposition parallelism", as in tasks GEOM, DCON, DGEN, FANF, and NMOC?

As mentioned in question 5, the parallel implementation of Seismic follows the SPMD model. This model requies that the data space (or problem domain) be divided (or decomposed), into equal parts. Thus, multiple processes can operate in parallel on the separate, smaller pieces of the problem.

The parallel processes communicate periodically in phases exchanging small amounts of data and they communicate at the end of the computation for data collection. The communication phases are relatively short compared to the computation phases, and this fact allows Seismic application to exploit parallel processors efficiently.

11. What exactly is meant by "transformed domain parallelism", as in tasks M3FK, DMOC, and STAK?

Transformed domain parallelism is similar to domain decomposition parallelism. The difference is that certain algorithms in Seismic require the seimic data space to be transposed. To accomplish this, the processes must perform more commuincations.

12. What input parameters may a benchmarker legally change?

As mentioned in question #7 above, Seismic can be configured to write all trace data to a single file in a single filesystem, or it can write I/O to multiple files in multiple filesystem.

The SPEChpc96 run tools automatically set up the benchmark for the single file method of I/O. If this is how you wish to run the benchmark there is no need to make any changes to the run script.

If you wish to write to multiple trace files, there are two things that must change. First, in the run scripts, the "nfiles" should be set to the desired number of files. For instance, if on a cluster of 8 workstations, this parameter might be set to 8.

Next, specify the paths to the directories in which the output files will be created. Do this by creating a text file that contains the directory paths you wish to use separated by whitespace. The, in the runscript, set the "PATHFILE" variable to point to this text file. The trace files that make up the the benchmark output dataset are allocated to the paths in round-robin order.

13. Is there any sort of debugging capability built into the application?

Yes, debug tracing can be enabled for any subset of tasks executed by the Seismic Executive (e.g. FANF, MG3D, STAK, and so on). Trace messages are written to stdout. To turn debug tracing on, edit the appropriate run script and add the line <task-name> debug="yes" in the appropriate locations.

14. What sort of load balancing does Seismic application do?

The load balancing scheme is rudimentary. The parallel processes operate on fixed-data partitions that are determined at the beginning of the program. Before each communication phase, the processes synchronize via a barrier operation, hence they wait for the slowest process to catch up before continuing computation. Therefore, this scheme will be more optimal for systems of homogeneous processors, where is performs quite well.

Note however, that the finite difference alogithms in pass four of the Seismic Executive do exhibit some load imbalance. This can be traced to the methods used for solving the system of linear equations given a tridiagonal matrix.

15. How long do the four passes of the Seismic Executive execute relative to the each other?

For the Small and Medium problem sizes, the ratio is roughly as follows:

Pass	Small	Medium	Large	Xlarge
Data Generation	20%	10%	---	---
Stacking	10%	3%	---	---
Time Migration	2%	2%	---	---
Depth Migration	68%	85%	---	---

16. How do I know if Seismic produced correct answers?

All of the SPEChpc96 benchmarks have built-in verification procedures. In the case of Seismic, the standard output from the benchmark run will contain four messages, one for each pass of the Seismic Executive, informing whether the pass succeeded or failed. A successful run of Seismic requires that all four passes succeed.

If you are using the SPEChpc96 run tools, then your results directory will contain a file called "validation" that records the verification results of all four passes of the Seismic Executive.

The verification process is outlined as follows:

Read a supplied table of known correct answers, i.e. an amplitude table.
Perform a checksum of the known amplitude table.
Perform a checksum of the corresponding benchmark amplitudes.
Compute the error as the maximum absolute difference between the checksums.
If the difference is greater than or equal to 0.5%, the verification fails.

The standard output from each pass will record what the maximum absolute difference was computed to be.

17. In the PVM version of the code, what are the characteristics of the messages passed?

The following table describes the message traffic for a 4-processor PVM run of Seismic. These results correspond to a binary compiled and run on a 32-bit system.

**Message Traffic for 4-Process PVM Run**
Pass/Prob Size	Num Msgs Sent	Total Bytes Sent	Mean Bytes/Msg	Smallest Msg in Bytes	Largest Msg in Bytes
1 / Small	69	16164	234	4	2468
2 / Small	159	3744010	23547	4	38912
3 / Small	63	1.5 MB	25097	4	262144
4 / Small	372	2.3 MB	6421	4	301568
1 / Medium	69	16164	234	4	2468
2 / Medium	447	55.5 MB	123174	4	143360
3 / Medium	63	23.3 MB	38711	4	4063232
4 / Medium	120	14.7 MB	128214	4	4444160

18. Are there any additional references I might consult in regards to this code?

Yes there are. Primarily, you will find a PostScript copy of the original documentation for the Arco Seismic Benchmark (PostScript) as published by Arco Exploration and Technology in the ./doc directory in the benchmark distribution. Be aware that some minor modifications have been made to the SPEChpc96 Seismic application, such as adding self-validation and enhancing the I/O functions, which are not noted in this document. Instead, these changes are listed in the ./doc/CHANGES file.

Rudolf Eigenmann and Siamak Hassanzadeh. Benchmarking with Real Industrial Applications: The SPEC High-Perfomance Group. IEEE Computational Science and Engineering, Spring issue, 1996. Also available at http://www.spec.org/hpg/hpc96/docs/RelatedPublications/cse.ps (PostScript)

Research papers describing sequential code and/or algorithms : Yilmaz, Ozdogan, 1990, Seismic Data Processing: Investigations in Geophysics , vol. 2, Society of Exploration Geophysicists, P.O. Box 702740, Tulsa, Oklahoma, 74170

Research papers describing parallel code and/or algorithms : Mosher, C., Hassanzadeh, S., and Schneider, D., 1992, A Benchmark Suite for Parallel Seismic Processing, Supercomputing 1992 proceedings.

Bill Pottenger and Rudolf Eigenmann. Targeting a Shared Address Space version of the Seismic Benchmark Seis1.1. http://www.spec.org/hpc96/docs/RelatedPublications/sas.ps (PostScript)