Tunnel: Parallel-inducing sort for large string analytics

Zhihui Du, Sen Zhang, David A. Bader

Research output: Contribution to journalArticlepeer-review

Abstract

The suffix array is a crucial data structure for efficient string analysis. Over the course of twenty-six years, sequential suffix array construction algorithms have achieved O(n) time complexity and in-place sorting. In this paper, we present the Tunnel algorithm, the first large-scale parallel suffix array construction algorithm with a time complexity of O[Formula presented] based on the parallel random access machine (PRAM) model. The Tunnel algorithm is built on three key ideas: dividing the problem of size O(n) into p sub-problems of reduced size O[Formula presented] by replacing long suffixes with shorter prefixes of size at most a constant D; introducing a Tunnel mechanism to efficiently induce the order of a set of suffixes with long common prefixes; developing a strategy to transform a partially ordered suffix set into a total order relation by iteratively applying the Tunnel inducing method. We provide a detailed description of the algorithm, along with a thorough analysis of its time and space complexity, to demonstrate its correctness and efficiency. The proposed Tunnel algorithm exhibits scalable performance, making it suitable for large string analytics on large-scale parallel systems.

Original languageEnglish (US)
Pages (from-to)650-663
Number of pages14
JournalFuture Generation Computer Systems
Volume149
DOIs
StatePublished - Dec 2023

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Keywords

  • Parallel sorting
  • String algorithm
  • String analysis
  • Suffix array

Fingerprint

Dive into the research topics of 'Tunnel: Parallel-inducing sort for large string analytics'. Together they form a unique fingerprint.

Cite this