TY - GEN
T1 - Efficient evaluation of generalized tree-pattern queries with same-path constraints
AU - Wu, Xiaoying
AU - Theodoratos, Dimitri
AU - Souldatos, Stefanos
AU - Dalamagas, Theodore
AU - Sellis, Timos
PY - 2009
Y1 - 2009
N2 - Querying XML data is based on the specification of structural patterns which in practice are formulated using XPath. Usually, these structural patterns are in the form of trees (Tree-Pattern Queries - TPQs). Requirements for flexible querying of XML data including XML data from scientific applications have motivated recently the introduction of query languages that are more general and flexible than TPQs. These query languages correspond to a fragment of XPath larger than TPQs for which efficient non-main-memory evaluation algorithms are not known. In this paper, we consider a query language, called Partial Tree-Pattern Query (PTPQ) language, which generalizes and strictly contains TPQs. PTPQs represent a broad fragment of XPath which is very useful in practice. We show how PTPQs can be represented as directed acyclic graphs augmented with same-path constraints. We develop an original polynomial time holistic algorithm for PTPQs under the inverted list evaluation model. To the best of our knowledge, this is the first algorithm to support the evaluation of such a broad structural fragment of XPath. We provide a theoretical analysis of our algorithm and identify cases where it is asymptotically optimal. In order to assess its performance, we design two other techniques that evaluate PTPQs by exploiting the state-of-the-art existing algorithms for smaller classes of queries. An extensive experimental evaluation shows that our holistic algorithm outperforms the other ones.
AB - Querying XML data is based on the specification of structural patterns which in practice are formulated using XPath. Usually, these structural patterns are in the form of trees (Tree-Pattern Queries - TPQs). Requirements for flexible querying of XML data including XML data from scientific applications have motivated recently the introduction of query languages that are more general and flexible than TPQs. These query languages correspond to a fragment of XPath larger than TPQs for which efficient non-main-memory evaluation algorithms are not known. In this paper, we consider a query language, called Partial Tree-Pattern Query (PTPQ) language, which generalizes and strictly contains TPQs. PTPQs represent a broad fragment of XPath which is very useful in practice. We show how PTPQs can be represented as directed acyclic graphs augmented with same-path constraints. We develop an original polynomial time holistic algorithm for PTPQs under the inverted list evaluation model. To the best of our knowledge, this is the first algorithm to support the evaluation of such a broad structural fragment of XPath. We provide a theoretical analysis of our algorithm and identify cases where it is asymptotically optimal. In order to assess its performance, we design two other techniques that evaluate PTPQs by exploiting the state-of-the-art existing algorithms for smaller classes of queries. An extensive experimental evaluation shows that our holistic algorithm outperforms the other ones.
UR - http://www.scopus.com/inward/record.url?scp=69049104538&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=69049104538&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-02279-1_27
DO - 10.1007/978-3-642-02279-1_27
M3 - Conference contribution
AN - SCOPUS:69049104538
SN - 3642022782
SN - 9783642022784
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 361
EP - 379
BT - Scientific and Statistical Database Management - 21st International Conference, SSDBM 2009, Proceedings
T2 - 21st International Conference on Scientific and Statistical Database Management, SSDBM 2009
Y2 - 2 June 2009 through 4 June 2009
ER -