Keyword search enables Web users to easily access XML data without the need to learn a structured query language and to study possibly complex data schemas. Existing work has addressed the problem of selecting qualified data nodes that match keywords and connecting them in a meaningful way, in the spirit of inferring the where clause in XQuery. However, how to infer the return clause for keyword searches is an open problem. To address this challenge, we present a keyword search engine for data-centric XML, XSeek, to infer the semantics of the search and identify return nodes effectively. XSeek recognizes possible entities and attributes inherently represented in the data. It also distinguishes between predicates and return specifications in query keywords. Then based on the analysis of both XML data structures and keyword patterns, XSeek generates return nodes. Furthermore, when the query is ambiguous and it is hard or impossible to determine the desirable return information, XSeek clusters the query results according to their semantics based on the user-specified granularity, and enables the user to easily browse and select the desired ones. Extensive experimental studies show the effectiveness and efficiency of XSeek.
All Science Journal Classification (ASJC) codes
- Information Systems
- Keyword search
- Result clustering