Is text compression by prefixes and suffixes practical?

A. S. Fraenkel, M. Mor, Y. Perl

Research output: Contribution to journalArticlepeer-review

20 Scopus citations

Abstract

One approach to text compression is to replace high-frequency variable-length fragments of words by fixed-length codes pointing to a compression table containing these high-frequency fragments. It is shown that the problem of optimal fragment compression is NP-hard even if the fragments are restricted to prefixes and suffixes. This seems to be a simplest fragment compression problem which is NP-hard, since a polynomial algorithm for compressing by prefixes only (or suffixes only) has been found recently. Various compression heuristics based on using both prefixes and suffixes have been tested on large Hebrew and English texts. The best of these heuristics produce a net compression of some 37% for Hebrew and 45% for English using a prefix/suffix compression table of size 256.

Original languageEnglish (US)
Pages (from-to)371-389
Number of pages19
JournalActa Informatica
Volume20
Issue number4
DOIs
StatePublished - Dec 1983
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Is text compression by prefixes and suffixes practical?'. Together they form a unique fingerprint.

Cite this