Is text compression by prefixes and suffixes practical?

A. S. Fraenkel, M. Mor, Y. Perl

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

One approach to text compression is to replace high-frequency variable-length fragments of words by fixed-length codes pointing to a compression table containing these high-frequency fragments. It is shown that the problem of optimal fragment compression is NP-hard even if the fragments are restricted to prefixes and suffixes. This seems to be a simplest fragment compression problem which is NP-hard, since a polynomial algorithm for compressing by prefixes only (or suffixes only) has been found recently. Various compression heuristics based on using both prefixes and suffixes have been tested on large Hebrew and English texts. The best of these heuristics produce a net compression of some 37% for Hebrew and 45% for English using a prefix/suffix compression table of size 256.

Original languageEnglish (US)
Title of host publicationResearch and Development in Information Retrieval - Proceedings
EditorsGerard Salton, Hans-Jochen Schneider
PublisherSpringer Verlag
Pages289-311
Number of pages23
ISBN (Print)9783540119784
DOIs
StatePublished - 1983
Externally publishedYes
Event5th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1982 - Berlin, Germany
Duration: May 18 1982May 20 1982

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume146 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other5th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1982
CountryGermany
CityBerlin
Period5/18/825/20/82

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Is text compression by prefixes and suffixes practical?'. Together they form a unique fingerprint.

Cite this