Fast retrieval of electronic documents in digital libraries (extended abstract)

Jason T.L. Wang, Chia Yo Chang

Research output: Contribution to journalConference articlepeer-review

Abstract

This paper presents an index structure for retrieving electronic documents in digital libraries. The documents we consider may contain mistyped words or spelling errors. Given a query string (e.g., a search key), we want to find those documents that approximately contain the query, i.e., certain inserts, deletes and mismatches are allowed when matching the query with a word (or phrase) in the documents. Our approach is to store the documents sequentially in a database and hash their 'fingerprints' into a number of 'fingerprint files.' When the query is given, its fingerprints are also hashed into the files and a histogram of votes is constructed on the documents. We derive a lower bound, based on which one can prune a large number of nonqualifying documents (i.e., those whose votes are below the lower bound) during searching. The paper presents some experimental results, which demonstrate the effectiveness of the index structure and the lower bound.

Original languageEnglish (US)
Pages (from-to)208-215
Number of pages8
JournalProceedings of the International Conference on Tools with Artificial Intelligence
StatePublished - 1995
EventProceedings of the 1995 IEEE 7th International Conference on Tools with Artificial Intelligence - Herndon, VA, USA
Duration: Nov 5 1995Nov 8 1995

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint

Dive into the research topics of 'Fast retrieval of electronic documents in digital libraries (extended abstract)'. Together they form a unique fingerprint.

Cite this