TY - JOUR
T1 - Fast retrieval of electronic messages that contain mistyped words or spelling errors
AU - Wang, Jason Tsong Li
AU - Chang, Chia Yo
N1 - Funding Information:
Manuscript received April 29, 1995; revised March 8, 1996. This work was supported in part by the National Science Foundation under Grants IRI-9531548 and IRI-9224602, by the New Jersey Institute of Technology under Grant SBR-421280, and by a Grant from the AT&T Foundation.
PY - 1997
Y1 - 1997
N2 - This paper presents an index structure for retrieving electronic messages that contain mistyped words or spelling errors. Given a query string (e.g., a search key), we want to find those messages that approximately contain the query, i.e., certain inserts, deletes and mismatches are allowed when matching the query with a word (or phrase) in the messages. Our approach is to store the messages sequentially in a database and hash their "fingerprints" into a number of "fingerprint files." When the query is given, its fingerprints are also hashed into the files and a histogram of votes is constructed on the messages. We derive a lower bound, based on which one can prune a large number of nonqualifying messages (i.e., those whose votes are below the lower bound) during searching. The paper presents some experimental results, which demonstrate the effectiveness of the index structure and the lower bound.
AB - This paper presents an index structure for retrieving electronic messages that contain mistyped words or spelling errors. Given a query string (e.g., a search key), we want to find those messages that approximately contain the query, i.e., certain inserts, deletes and mismatches are allowed when matching the query with a word (or phrase) in the messages. Our approach is to store the messages sequentially in a database and hash their "fingerprints" into a number of "fingerprint files." When the query is given, its fingerprints are also hashed into the files and a histogram of votes is constructed on the messages. We derive a lower bound, based on which one can prune a large number of nonqualifying messages (i.e., those whose votes are below the lower bound) during searching. The paper presents some experimental results, which demonstrate the effectiveness of the index structure and the lower bound.
UR - http://www.scopus.com/inward/record.url?scp=0031164072&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0031164072&partnerID=8YFLogxK
U2 - 10.1109/3477.584951
DO - 10.1109/3477.584951
M3 - Article
C2 - 18255883
AN - SCOPUS:0031164072
SN - 1083-4419
VL - 27
SP - 441
EP - 451
JO - IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
JF - IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
IS - 3
ER -