Cohesive keyword search on tree data

Aggeliki Dimitriou, Dimitri Theodoratos, Ananya Dass, Yannis Vassiliou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Keyword search is the most popular querying technique on semistructured data. Keyword queries are simple and convenient. However, as a consequence of their imprecision, there is usually a huge number of candidate results of which only very few match the user's intent. Unfortunately, the existing semantics for keyword queries are ad-hoc and they generally fail to "guess" the user intent. Therefore, the quality of their answers is poor and the existing algorithms do not scale satisfactorily. In this paper, we introduce the novel concept of cohesive keyword queries for tree data. Intuitively, a cohesiveness relationship on keywords indicates that they should form a cohesive whole in a query result. Cohesive keyword queries allow term nesting and keyword repetition. Cohesive keyword queries bridge the gap between flat keyword queries and structured queries. Although more expressive, they are as simple as flat keyword queries and not require any schema knowledge. We provide formal semantics for cohesive keyword queries and rank query results on the proximity of the keyword instances. We design a stack based algorithm which efficiently evaluates cohesive keyword queries. Our experiments demonstrate that our approach outperforms in quality previous filtering semantics and our algorithm scales smoothly on queries of even 20 keywords on large datasets.

Original languageEnglish (US)
Title of host publicationAdvances in Database Technology - EDBT 2016
Subtitle of host publication19th International Conference on Extending Database Technology, Proceedings
EditorsIoana Manolescu, Evaggelia Pitoura, Amelie Marian, Sofian Maabout, Letizia Tanca, Georgia Koutrika, Kostas Stefanidis
PublisherOpenProceedings.org
Pages137-148
Number of pages12
ISBN (Electronic)9783893180707
DOIs
StatePublished - 2016
Event19th International Conference on Extending Database Technology, EDBT 2016 - Bordeaux, France
Duration: Mar 15 2016Mar 18 2016

Publication series

NameAdvances in Database Technology - EDBT
Volume2016-March
ISSN (Electronic)2367-2005

Other

Other19th International Conference on Extending Database Technology, EDBT 2016
Country/TerritoryFrance
CityBordeaux
Period3/15/163/18/16

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Software
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Cohesive keyword search on tree data'. Together they form a unique fingerprint.

Cite this