TY - GEN
T1 - Diversification of keyword query result patterns
AU - Aksoy, Cem
AU - Dass, Ananya
AU - Theodoratos, Dimitri
AU - Wu, Xiaoying
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2016.
PY - 2016
Y1 - 2016
N2 - Keyword search allows the users to search for information on tree data without making use of a complex query language and without knowing the schema of the data sources. However, keyword queries are usually ambiguous in expressing the user intent. Most of the current keyword search approaches either filter or use a scoring function to rank the candidate result set. These techniques do not differentiate the results and might return to the user a result set which is not the intended. To address this problem, we introduce in this paper an original approach for diversification of keyword search results on tree data which aims at returning a subset of the candidate result set trading off relevance for diversity. We formally define the problem of diversification of patterns of keyword search results on tree data as an optimization problem. We introduce relevance and diversity measures on result pattern sets. We design a greedy heuristic algorithm that chooses top-k most relevant and diverse result patterns for a given keyword query. Our experimental results show that the introduced relevance and diversity measures can be used effectively and that our algorithm can efficiently compute a set of result patterns for keyword queries which is both relevant and diverse.
AB - Keyword search allows the users to search for information on tree data without making use of a complex query language and without knowing the schema of the data sources. However, keyword queries are usually ambiguous in expressing the user intent. Most of the current keyword search approaches either filter or use a scoring function to rank the candidate result set. These techniques do not differentiate the results and might return to the user a result set which is not the intended. To address this problem, we introduce in this paper an original approach for diversification of keyword search results on tree data which aims at returning a subset of the candidate result set trading off relevance for diversity. We formally define the problem of diversification of patterns of keyword search results on tree data as an optimization problem. We introduce relevance and diversity measures on result pattern sets. We design a greedy heuristic algorithm that chooses top-k most relevant and diverse result patterns for a given keyword query. Our experimental results show that the introduced relevance and diversity measures can be used effectively and that our algorithm can efficiently compute a set of result patterns for keyword queries which is both relevant and diverse.
UR - http://www.scopus.com/inward/record.url?scp=84976615700&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84976615700&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-39958-4_14
DO - 10.1007/978-3-319-39958-4_14
M3 - Conference contribution
AN - SCOPUS:84976615700
SN - 9783319399577
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 171
EP - 183
BT - Web-Age Information Management - 17th International Conference, WAIM 2016, Proceedings
A2 - Cui, Bin
A2 - Lian, Xiang
A2 - Liu, Dexi
A2 - Zhang, Nan
A2 - Xu, Jianliang
PB - Springer Verlag
T2 - 17th International Conference on Web-Age Information Management, WAIM 2016
Y2 - 3 June 2016 through 5 June 2016
ER -