TY - GEN
T1 - Efficient keyword search on large tree structured datasets
AU - Dimitriou, Aggeliki
AU - Theodoratos, Dimitri
PY - 2012
Y1 - 2012
N2 - Keyword search is the most popular paradigm for querying XML data on the web. In this context, three challenging problems are (a) to avoid missing useful results in the answer set, (b) to rank the results with respect to some relevance criterion and (c) to design algorithms that can efficiently compute the results on large datasets. In this paper, we present a novel multi-stack based algorithm that returns as an answer to a keyword query all the results ranked on their size. Our algorithm exploits a lattice of stacks each corresponding to a partition of the keyword set of the query. This feature empowers a linear time performance on the size of the input data for a given number of query keywords. As a result, our algorithm can run efficiently on large input data for several keywords. We also present a variation of our algorithm which accounts for infrequent keywords in the query and show that it can significantly improve the execution time. An extensive experimental evaluation of our approach confirms the theoretical analysis, and shows that it scales smoothly when the size of the input data and the number of input keywords increases.
AB - Keyword search is the most popular paradigm for querying XML data on the web. In this context, three challenging problems are (a) to avoid missing useful results in the answer set, (b) to rank the results with respect to some relevance criterion and (c) to design algorithms that can efficiently compute the results on large datasets. In this paper, we present a novel multi-stack based algorithm that returns as an answer to a keyword query all the results ranked on their size. Our algorithm exploits a lattice of stacks each corresponding to a partition of the keyword set of the query. This feature empowers a linear time performance on the size of the input data for a given number of query keywords. As a result, our algorithm can run efficiently on large input data for several keywords. We also present a variation of our algorithm which accounts for infrequent keywords in the query and show that it can significantly improve the execution time. An extensive experimental evaluation of our approach confirms the theoretical analysis, and shows that it scales smoothly when the size of the input data and the number of input keywords increases.
KW - Keyword search
KW - LCA
KW - Ranking
KW - Search algorithm
KW - Tree-structured data
KW - XML
UR - http://www.scopus.com/inward/record.url?scp=84863469120&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863469120&partnerID=8YFLogxK
U2 - 10.1145/2254736.2254749
DO - 10.1145/2254736.2254749
M3 - Conference contribution
AN - SCOPUS:84863469120
SN - 9781450311984
T3 - KEYS 2012 - Proceedings of the 3rd International Workshop on Keyword Search on Structured Data
SP - 63
EP - 74
BT - KEYS 2012 - Proceedings of the 3rd International Workshop on Keyword Search on Structured Data
T2 - 3rd International Workshop on Keyword Search on Structured Data, KEYS 2012
Y2 - 20 May 2012 through 20 May 2012
ER -