TY - GEN
T1 - Efficient evaluation of generalized path pattern queries on XML data
AU - Wu, Xiaoying
AU - Souldatos, Stefanos
AU - Theodoratos, Dimitri
AU - Dalamagas, Theodore
AU - Sellis, Timos
PY - 2008
Y1 - 2008
N2 - Finding the occurrences of structural patterns in XML data is a key operation in XML query processing. Existing algorithms for this operation focus almost exclusively on path-patterns or tree-patterns. Requirements inflexible querying of XML data have motivated recently the introduction of query languages that allow a partial specification of path-patterns in a query. In this paper, we focus on the efficient evaluation of partial path queries, a generalization of path pattern queries. Our approach explicitly deals with repeated labels (that is, multiple occurrences of the same label in a query). We show that partial path queries can be represented as rooted dags for which a topological ordering of the nodes exists. We present three algorithms for the efficient evaluation of these queries under the indexed streaming evaluation model. The first one exploits a structural summary of data to generate a set of path-patterns that together are equivalent to a partial path query. To evaluate these path-patterns, we extend PathStack so that it can work on path-patterns with repeated labels. The second one extracts a spanning tree from the query dag, uses a stack-based algorithm to find the matches of the root-to-leaf paths in the tree, and merge-joins the matches to compute the answer. Finally, the third one exploits multiple pointers of stack entries and a topological ordering of the query dag to apply a stack-based holistic technique. An analysis of the algorithms and extensive experimental evaluation shows that the holistic algorithm outperforms the other ones.
AB - Finding the occurrences of structural patterns in XML data is a key operation in XML query processing. Existing algorithms for this operation focus almost exclusively on path-patterns or tree-patterns. Requirements inflexible querying of XML data have motivated recently the introduction of query languages that allow a partial specification of path-patterns in a query. In this paper, we focus on the efficient evaluation of partial path queries, a generalization of path pattern queries. Our approach explicitly deals with repeated labels (that is, multiple occurrences of the same label in a query). We show that partial path queries can be represented as rooted dags for which a topological ordering of the nodes exists. We present three algorithms for the efficient evaluation of these queries under the indexed streaming evaluation model. The first one exploits a structural summary of data to generate a set of path-patterns that together are equivalent to a partial path query. To evaluate these path-patterns, we extend PathStack so that it can work on path-patterns with repeated labels. The second one extracts a spanning tree from the query dag, uses a stack-based algorithm to find the matches of the root-to-leaf paths in the tree, and merge-joins the matches to compute the answer. Finally, the third one exploits multiple pointers of stack entries and a topological ordering of the query dag to apply a stack-based holistic technique. An analysis of the algorithms and extensive experimental evaluation shows that the holistic algorithm outperforms the other ones.
KW - XML
KW - XPath query evaluation
UR - http://www.scopus.com/inward/record.url?scp=57349197038&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=57349197038&partnerID=8YFLogxK
U2 - 10.1145/1367497.1367610
DO - 10.1145/1367497.1367610
M3 - Conference contribution
AN - SCOPUS:57349197038
SN - 9781605580852
T3 - Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08
SP - 835
EP - 844
BT - Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08
T2 - 17th International Conference on World Wide Web 2008, WWW'08
Y2 - 21 April 2008 through 25 April 2008
ER -