Fast retrieval of electronic messages that contain mistyped words or spelling errors

Jason Tsong Li Wang, Chia Yo Chang

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

This paper presents an index structure for retrieving electronic messages that contain mistyped words or spelling errors. Given a query string (e.g., a search key), we want to find those messages that approximately contain the query, i.e., certain inserts, deletes and mismatches are allowed when matching the query with a word (or phrase) in the messages. Our approach is to store the messages sequentially in a database and hash their "fingerprints" into a number of "fingerprint files." When the query is given, its fingerprints are also hashed into the files and a histogram of votes is constructed on the messages. We derive a lower bound, based on which one can prune a large number of nonqualifying messages (i.e., those whose votes are below the lower bound) during searching. The paper presents some experimental results, which demonstrate the effectiveness of the index structure and the lower bound.

Original languageEnglish (US)
Pages (from-to)441-451
Number of pages11
JournalIEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Volume27
Issue number3
DOIs
StatePublished - 1997

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Software
  • Information Systems
  • Human-Computer Interaction
  • Computer Science Applications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Fast retrieval of electronic messages that contain mistyped words or spelling errors'. Together they form a unique fingerprint.

Cite this