An oversampling-based randomized algorithm is introduced for sorting n keys of any abstract data type on the p processors of a latency-tolerant parallel system such as a bulk-synchronous parallel computer. The algorithm is asymptotically, for large n, optimal in computation and communication compared to the best available sequential sorting algorithm, even when constant factors are taken into consideration. Its parallel time is within a (1 + o(1))/p multiplicative factor of the corresponding sequential method for sorting, improving upon other approaches for a wider range of values of p relative to n. It also improves upon other randomized sorting algorithms that have been developed for latency-tolerant models of computation on the amount of parallel slack (ratio n over p) required to achieve optimal speedup and also, on the associated failure probability. For values of p closer to n than other latency-tolerant randomized approaches, it can be turned into a PRAM-like algorithm but for such cases a speedup of O(p) rather than p/(1 + o(1)) is then achievable. Although the general framework of the proposed algorithm relies on the well-established prior idea of sample-based sorting, there are some novel features in its design such as the choice of splitter size, the way keys are split, and the handling of the base case of the recursive sorting algorithm that contribute to its performance.
All Science Journal Classification (ASJC) codes
- Discrete Mathematics and Combinatorics
- BSP model
- Randomized sorting
- latency-tolerant algorithms
- random sampling