TEXPROS (TEXT PROcessing System) is an intelligent document processing, system; it supports storing, extracting, classifying, categorizing, retrieving, and browsing information from a variety of office documents . This article presents a retrieval subsystem for TEXPROS, which is capable of processing incomplete, imprecise, and vague queries, and providing semantically meaningful responses to the user. The design of the retrieval subsystem is highly integrated with various mechanisms for achieving these goals. First, a system catalog including a thesaurus is used to store the knowledge about the database. Second, there is a query transformation mechanism composed of context construction and algebraic query formulation modules. Given an incomplete or imprecise query, the context construction module searches the system for the required terms and constructs a query that has a complete and precise representation: The resulting query is then formulated into an algebraic expression. Third, in practice, the user may not have a clear idea of what he is searching for. A browing mechanism is employed for such situations to assist the user in the retrieval process. With the browser, vague queries can be entered into the system until sufficient information, is obtained to the extent that the user is able to construct a query for his request. Finally, when processing of queries fails by responding with a null answer to the user, a generalizer mechanism is used to give the user cooperative explanation for the null answer. The presented techniques will contribute to our research toward development of highly intelligent data processing facilities beyond the present scope of database technology.
All Science Journal Classification (ASJC) codes
- Earth and Planetary Sciences(all)
- Data management systems
- information retrieval
- integrated systems
- intelligent toolbox
- office automation