A heuristic approach for checking containment of generalized tree-pattern queries

Pawel Placek, Dimitri Theodoratos, Stefanos Souldatos, Theodore Dalamagas, Timos Sellis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Query processing techniques for XML data have focused mainly on tree-pattern queries (TPQs). However, the need for querying XML data sources whose structure is very complex or not fully known to the user, and the need to integrate multiple XML data sources with different structures have driven, recently, the suggestion of query languages that relax the complete specification of a tree pattern. In order to implement the processing of such languages in current DBMSs, their containment problem has to be efficiently solved. In this paper, we consider a query language which generalizes TPQs by allowing the partial specification of a tree pattern. Partial tree-pattern queries (PTPQs) constitute a large fragment of XPath that flexibly permits the specification of a broad range of queries from keyword queries without structure, to queries with partial specification of the structure, to complete TPQs. We address the containment problem for PTPQs. This problem becomes more complex in the context of PTPQs because the partial specification of the structure allows new, non-trivial, structural expressions to be inferred from those explicitly specified in a query. We show that the containent problem cannot be characterized by homomorphisms between PTPQs, even when PTPQs are put in a canonical form that comprises all derived structural expressions. We provide necessary and sufficient conditions for this problem in terms of homomorphisms between PTPQs and (a possibly exponential number of) TPQs. To cope with the high complexity of PTPQ containment, we suggest a heuristic approach for this problem that trades accuracy for speed. An extensive experimental evaluation of our heuristic shows that our heuristic approach can be efficiently implemented in a query optimizer.

Original languageEnglish (US)
Title of host publicationProceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM'08
Pages551-560
Number of pages10
DOIs
StatePublished - 2008
Event17th ACM Conference on Information and Knowledge Management, CIKM'08 - Napa Valley, CA, United States
Duration: Oct 26 2008Oct 30 2008

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Other

Other17th ACM Conference on Information and Knowledge Management, CIKM'08
Country/TerritoryUnited States
CityNapa Valley, CA
Period10/26/0810/30/08

All Science Journal Classification (ASJC) codes

  • General Decision Sciences
  • General Business, Management and Accounting

Keywords

  • Partial tree-pattern query
  • Query containment
  • XML

Fingerprint

Dive into the research topics of 'A heuristic approach for checking containment of generalized tree-pattern queries'. Together they form a unique fingerprint.

Cite this