Parallel Suffix Sorting for Large String Analytics

Zhihui Du, Sen Zhang, David A. Bader

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

The suffix array is a fundamental data structure to support string analysis efficiently. It took about 26 years for the sequential suffix array construction algorithm to achieve O(n) time complexity and in-place sorting. In this paper, we develop the D-Limited Parallel Induce (DLPI) algorithm, the first O(np) time parallel suffix array construction algorithm. The basic idea of DLPI includes two aspects: dividing the O(n) size problem into p reduced sub-problems with size O(np) so we can handle them on p processors in parallel; developing an efficient parallel induce sorting method to achieve correct order for all the reduced sub-problems. The complete algorithm description is given to show the implementation method of the proposed idea. The time and space complexity analysis and proof are also given to show the correctness and efficiency of the proposed algorithm. The proposed DLPI algorithm can handle large strings with scalable performance.

Original languageEnglish (US)
Title of host publicationParallel Processing and Applied Mathematics - 14th International Conference, PPAM 2022, Revised Selected Papers
EditorsRoman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski
PublisherSpringer Science and Business Media Deutschland GmbH
Pages71-82
Number of pages12
ISBN (Print)9783031304415
DOIs
StatePublished - 2023
Event14th International Conference on Parallel Processing and Applied Mathematics, PPAM 2022 - Gdansk, Poland
Duration: Sep 11 2022Sep 14 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13826 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th International Conference on Parallel Processing and Applied Mathematics, PPAM 2022
Country/TerritoryPoland
CityGdansk
Period9/11/229/14/22

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Keywords

  • Optimal Algorithm
  • Parallel Sorting
  • String Algorithm
  • String Analysis
  • Suffix Array

Fingerprint

Dive into the research topics of 'Parallel Suffix Sorting for Large String Analytics'. Together they form a unique fingerprint.

Cite this