A novel framework for the efficient evaluation of hybrid tree-pattern queries on large data graphs

Xiaoying Wu, Dimitri Theodoratos, Dimitrios Skoutas, Michael Lan

Research output: Contribution to journalArticlepeer-review

Abstract

Graph pattern matching is a fundamental operation for exploring and analyzing large graphs, which are widely used to represent entities and their relationships in a plethora of applications. However, existing algorithms do not perform satisfactorily as they generate a large number of intermediate results. In this paper, we introduce a novel framework to address the problem of efficiently finding homomorphic matches of hybrid tree-patterns over large graphs. These are patterns that can include two types of edges: child edges (which are mapped to edges in the data graph) and/or descendant edges (which are mapped to paths in the data graph), thus allowing for higher expressiveness and flexibility in query formulation. We introduce the concept of answer graph to compactly represent the query results and exploit computation sharing. Leveraging the answer graph, our approach is able to avoid the generation of intermediate results not participating in the query answer, and can produce the query answer in time linear on the size of the input data graph and the output. Moreover, the answer graph allows query counting without explicitly enumerating query results. We design a bottom-up algorithm (BUP) and a simulation-based algorithm (SIM) for building the answer graph, each having their own strengths in different cases. We also present two algorithms for enumerating the results using the answer graph. Our extensive experimental evaluation on real and synthetic datasets shows that our proposed techniques greatly outperform the state-of-the-art graph matching algorithms as well as a popular graph DBMS on evaluating various hybrid tree-pattern queries. Our source code, datasets and queries are publicly available at https://github.com/wuxyng/TwigMatchOnGraph.

Original languageEnglish (US)
Article number102249
JournalInformation Systems
Volume117
DOIs
StatePublished - Jul 2023

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Hardware and Architecture

Keywords

  • Data graph
  • Graph homomorphism
  • Graph simulation
  • Hybrid tree pattern
  • Pattern matching

Fingerprint

Dive into the research topics of 'A novel framework for the efficient evaluation of hybrid tree-pattern queries on large data graphs'. Together they form a unique fingerprint.

Cite this