Evaluation Techniques for Generalized Path Pattern Queries on XML Data

Xiaoying Wu, Dimitrios Theodoratos, Stefanos Souldatos, Theodore Dalamagas, Timos Sellis

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

Finding the occurrences of structural patterns in XML data is a key operation in XML query processing. Existing algorithms for this operation focus almost exclusively on path patterns or tree patterns. Current applications of XML require querying of data whose structure is complex or is not fully known to the user, or integrating XML data sources with different structures. These applications have motivated recently the introduction of query languages that allow a partial specification of path patterns in a query. In this paper, we consider partial path queries, a generalization of path pattern queries, and we focus on their efficient evaluation under the indexed streaming evaluation model. Our approach explicitly deals with repeated labels (that is, multiple occurrences of the same label in a query). We show that partial path queries can be represented as rooted dags for which a topological ordering of the nodes exists. We present three algorithms for the efficient evaluation of these queries. The first one exploits a structural summary of data to generate a set of path patterns that together are equivalent to a partial path query. To evaluate these path patterns, we extend a previous algorithm for path-pattern queries so that it can work on path patterns with repeated labels. The second one extracts a spanning tree from the query dag, uses a stack-based algorithm to find the matches of the root-to-leaf paths in the tree, and merge-joins the matches to compute the answer. Finally, the third one exploits multiple pointers of stack entries and a topological ordering of the query dag to apply a stack-based holistic technique. We analyze our algorithms and perform extensive experimental evaluations. Our experimental results show that the holistic algorithm outperforms the other ones. Our approaches are the first ones to efficiently evaluate this class of queries in the indexed streaming model.

Original languageEnglish (US)
Pages (from-to)441-474
Number of pages34
JournalWorld Wide Web
Volume13
Issue number4
DOIs
StatePublished - Sep 3 2010

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Keywords

  • XML
  • XPath query evaluation

Fingerprint Dive into the research topics of 'Evaluation Techniques for Generalized Path Pattern Queries on XML Data'. Together they form a unique fingerprint.

Cite this