The wide adoption of XML has increased the interest on data models that are based on tree-structured data. Querying capabilities are provided through tree-pattern queries (TPQs). The need for querying tree-structured data sources when their structure is not fully known, and the need to integrate multiple data sources with different tree structures have driven, recently, the suggestion of query languages that relax the complete specification of a tree pattern. Assigning semantics to the queries of these languages so that they return meaningful answers is a challenging issue. In this paper, we introduce a query language which allows the specification of partial tree-pattern queries (PTPQs). The structure in a PTPQ can be flexibly specified fully, partially or not at all. We define index graphs which summarize the structural information of data trees. Using index graphs, we show that PTPQs can be evaluated through the generation of an equivalent set of "complete" TPQs. We suggest an original approach that exploits the set of complete TPQs of a PTPQ to assign meaningful semantics to the PTPQ language. In contrast to previous approaches that operate locally on the data to compute meaningful answers (usually by computing lowest common ancestors), our approach operates globally on index graphs to detect meaningful complete TPQs. We implemented and experimentally evaluated our approach on DBLP-based data sets with irregularities. Its comparison to previous ones shows that it succeeds in finding all the meaningful answers when the others fail (perfect recall). Further, it outperforms approaches with similar recall in excluding meaningless answers (better precision). Finally, it is superior to and scales better than the only previous approach that allows for structural constraints in the queries. Our approach generates TPQs and therefore, it can be easily implemented on top of an XQuery engine.
All Science Journal Classification (ASJC) codes
- Information Systems and Management
- Keyword query
- Meaningful answer
- Partial tree-pattern query
- Query language semantics
- Structural summary of XML data