We describe a methodology for constructing parallel algorithms that are transportable among parallel computers having different numbers of processors, different bandwidths of interprocessor communication, and different periodicity of global synchronization. We do this for the bulk-synchronous parallel (BSP) model, which abstracts the characteristics of a parallel machine into three numerical parameters p, g, and L, corresponding to processors, bandwidth, and periodicity, respectively. The model differentiates memory that is local to a processor from that which is not, but, for the sake of universality, does not differentiate network proximity. The advantages of this model in supporting shared memory or PRAM style programming have been treated elsewhere. Here we emphasize the viability of an alternative direct style of programming where, for the sake of efficiency, the programmer retains controls of memory allocation. We show that optimality to within a multiplicative factor close to one can be achieved for the problems of Gauss-Jordan elimination and sorting, by transportable algorithms that can be applied for a wide range of values of the parameters p, g, and L. While these algorithms are fairly simple themselves, descriptions of their behavior in terms of these parameters are somewhat complicated. The main reward of quantifying these complications, whether derived by analysis, as we do here, or by simulations, is that it allows software to be written once and for all that can be transported efficiently among parallel machines that fit the model. By way of contrast to these direct algorithms, we also give some simulation results for PRAMs on the BSP that identify the level of slack at which corresponding efficiencies can be approached by shared memory simulations, provided that the bandwidth parameter g is good enough.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Hardware and Architecture
- Computer Networks and Communications
- Artificial Intelligence