Abstract
Graph pattern matching is a fundamental operation for exploring and analyzing large graphs, which are widely used to represent entities and their relationships in a plethora of applications. However, existing algorithms do not perform satisfactorily as they generate a large number of intermediate results. In this paper, we introduce a novel framework to address the problem of efficiently finding homomorphic matches of hybrid tree-patterns over large graphs. These are patterns that can include two types of edges: child edges (which are mapped to edges in the data graph) and/or descendant edges (which are mapped to paths in the data graph), thus allowing for higher expressiveness and flexibility in query formulation. We introduce the concept of answer graph to compactly represent the query results and exploit computation sharing. Leveraging the answer graph, our approach is able to avoid the generation of intermediate results not participating in the query answer, and can produce the query answer in time linear on the size of the input data graph and the output. Moreover, the answer graph allows query counting without explicitly enumerating query results. We design a bottom-up algorithm (BUP) and a simulation-based algorithm (SIM) for building the answer graph, each having their own strengths in different cases. We also present two algorithms for enumerating the results using the answer graph. Our extensive experimental evaluation on real and synthetic datasets shows that our proposed techniques greatly outperform the state-of-the-art graph matching algorithms as well as a popular graph DBMS on evaluating various hybrid tree-pattern queries. Our source code, datasets and queries are publicly available at https://github.com/wuxyng/TwigMatchOnGraph.
Original language | English (US) |
---|---|
Article number | 102249 |
Journal | Information Systems |
Volume | 117 |
DOIs | |
State | Published - Jul 2023 |
All Science Journal Classification (ASJC) codes
- Software
- Information Systems
- Hardware and Architecture
Keywords
- Data graph
- Graph homomorphism
- Graph simulation
- Hybrid tree pattern
- Pattern matching