SPEC ACCEL Benchmark Description

117.bfs

Lijuan Luo, John A. Stratton

Electronic Design Automation, Graph Traversals

The Breadth-first Search algorithm is commonly used in graph problems such as finding the shortest path between two nodes. The Parboil benchmark is credited to Luo et al., and specifically optimized for a particular EDA application finding the shortest paths between a source node and every other node in a graph. In a sequential implementation, the CPU takes every node in the current frontier and enqueues all unexplored neighbors to the next frontier. This process iterates until all the nodes in the graph have been visited.

In the parallel implementations, a task is created for every node in the current frontier. Each task explores all the neighbors of the node for unvisited nodes, and updates the next frontier accordingly. Each task adds a variable number of items to the work queue for the next frontier, which suggested a dynamic queue with atomic updates as the primary data structure. However, atomic updates to the variable representing the queue tail can be a major point of contention for a single, centralized queue.

To reduce the amount of contention when threads update the next frontier, multiple levels of privatization were applied to the queue and its tail, resulting in a hierarchical queue management system. At the lowest levels, threads are hashed into one of several local queues to reduce local memory contention. When the tasks finish discovering new frontier nodes, they merge the local queues into a compact, centralized queue for the group. Finally, the group commits the group queue contents to the global queue with one representative thread performing one atomic update to the global tail. The complexity of this parallel algorithm is O(V+E) with V being the number of vertices and E being the number of edges in the graph.

The SPEC benchmark computes the SSSP for a number of sample source nodes in parallel, and outputs the average distance computed for each node in the graph given the sample of source nodes.

117.bfs is adapated from the Parboil BFS benchmark.

117.bfs's input consists of a graph definition file, which lists the number of nodes, followed by the node list with the format "edgeStart degree" for each line. The first field defines the entry in the edge list for the first edge leaving the given node, and the second field defining the number of edges leaving the node. Following the node list is the number of nodes to sample as sources, the number of edges, and the edge descriptors themselves, in "destinationNode edgeWeight" format. Each edge's source node is defined by which node references it as part of its edge list.

117.bfs outputs a file containing the number of nodes as the first line, and another line for each node holding the nodeID and the average distance from the SSSP computations.

The output file bfs_.out contains detailed timing information about the run. It also shows which device was selected along with what devices where available to OpenCL.

*C, OpenCL*

OpenCL

None.

- L. Luo, M. Wong, and W.-m. Hwu. An effective GPU implementation of breadth-first search. In Proceedings of the 47th Design Automation Conference, pages 52-55, June 2010.

Last updated: $Date: 2015-03-02 15:15:22 -0500 (Mon, 02 Mar 2015) $