Snippets are used by almost every text search engine to complement the ranking scheme in order to effectively handle user searches, which are inherently ambiguous and whose relevance semantics are difficult to assess. Despite the fact that XML is a standard representation format of Web data, research on generating result snippets for XML search remains limited. To tackle this important yet open problem, in this article, we present a system eXtract which generates snippets for XML search results. We identify that a good XML result snippet should be a meaningful information unit of a small size that effectively summarizes this query result and differentiates it from others, according to which users can quickly assess the relevance of the query result. We have designed and implemented a novel algorithm to satisfy these requirements. Furthermore, we propose to cluster the query results based on their snippets. Since XML result clustering can only be done at query time, snippet-based clustering significantly improves the efficiency while compromising little clustering accuracy.We verified the efficiency and effectiveness of our approach through experiments.
All Science Journal Classification (ASJC) codes
- Information Systems