I-Jui Sung, John A. Stratton
The Lattice-Boltzman Method (LBM) is a method of solving the systems of partial differential equations governing fluid dynamics. Its implementations typically represent a cell of the lattice with 20 words of data: 18 represent fluid flows through the 6 faces and 12 edges of the lattice cell, one represents the density of fluid within the cell, and one represents cell type or other properties e.g., to differentiate obstacles from fluid. In a timestep, each cell uses the input flows to compute the resulting output flows from that cell and an updated local fluid density.
Although some similarities are apparent, the major difference between LBM and a stencil application is that no input data is shared between cells; the fluid flowing into a cell is not read by any other cell. Therefore, the application has been memory-bandwidth bound in current studies, and optimization efforts have focused on improving achieved memory bandwidth.
To reach for better bandwidth, optimizations for GPU architectures focus on addressing the layout of the lattice data. The effective bandwidth delivered by GPU architectures is primarily determined by two factors: coalescing and bank conflicts. Coalescing is an attribute of a single instruction executed for many work-items. If the addresses accessed by those work-items are contiguous, then the memory system delivers a single DRAM burst to satisfy the "coalesced" accesses. Memory bank conflicts arise due to the distributed nature of the DRAM memory system. Different DRAM modules, or "banks" contain different sets of addresses. If the active threads in the GPU all happen to access different addresses in the same memory bank, the accesses will be serialized at that bank even as the other banks remain idle. The resulting memory bandwidth delivered is therefore much less than the peak bandwidth that could have been available.
The LBM benchmark uses the most logical layout for software engineering: a large array of cell structures. The source file layout_config.h header file contains the macros for defining the data structure layout.
This benchmark is modified and extended from the SPEC OMP benchmark of the same name.
104.lbm's input consists of an obstacle file, which defines the cells in the simulation volume that are fluid or a solid object.
104.lbm outputs the final velocity field at the end of the simulation, with the three-component fluid velocity for a cell on each line.
The output file lbm.out contains detailed timing information about the run. It also shows which device was selected along with what devices where available to OpenCL. It also contains progress updates of the run.
Last updated: $Date: 2014-02-03 16:05:20 -0500 (Mon, 03 Feb 2014) $