TY - JOUR
T1 - Discovering closed and maximal embedded patterns from large tree data
AU - Wu, Xiaoying
AU - Theodoratos, Dimitri
AU - Mamoulis, Nikos
N1 - Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/5
Y1 - 2021/5
N2 - Many current applications and systems produce large tree datasets and export, exchange, and represent data in tree-structured form. Extracting informative patterns from large data trees is an important research direction with multiple applications in practice. Pattern mining research initially focused on mining induced patterns and gradually evolved into mining embedded patterns. A well-known problem of frequent pattern mining is the huge number of patterns it produces. This affects not only the efficiency but also the effectiveness of mining. A typical solution to this problem is to summarize frequent patterns through closed and maximal patterns. No previous work addresses the problem of mining closed and/or maximal embedded tree patterns, not even in the framework of mining multiple small trees. We address the problem of summarizing embedded tree patterns extracted from large data trees, by defining and mining closed and maximal embedded unordered tree patterns. We design an embedded frequent pattern mining algorithm extended with a local closedness checking technique. This algorithm is called closedEmbTM-eager as it eagerly eliminates non-closed patterns. To mitigate the generation of intermediate patterns, we devise pattern search space pruning rules to proactively detect and prune branches in the pattern search space which do not correspond to closed patterns. The pruning rules are accommodated into the extended embedded pattern miner to produce a new algorithm, called closedEmbTM-prune, for mining all the closed and maximal embedded frequent patterns. Our extensive experiments on synthetic and real large-tree datasets demonstrate that, on dense datasets, closedEmbTM-prune not only generates a complete closed and maximal pattern set which is substantially smaller than that generated by the embedded pattern miner, but also runs much faster with negligible overhead on pattern pruning.
AB - Many current applications and systems produce large tree datasets and export, exchange, and represent data in tree-structured form. Extracting informative patterns from large data trees is an important research direction with multiple applications in practice. Pattern mining research initially focused on mining induced patterns and gradually evolved into mining embedded patterns. A well-known problem of frequent pattern mining is the huge number of patterns it produces. This affects not only the efficiency but also the effectiveness of mining. A typical solution to this problem is to summarize frequent patterns through closed and maximal patterns. No previous work addresses the problem of mining closed and/or maximal embedded tree patterns, not even in the framework of mining multiple small trees. We address the problem of summarizing embedded tree patterns extracted from large data trees, by defining and mining closed and maximal embedded unordered tree patterns. We design an embedded frequent pattern mining algorithm extended with a local closedness checking technique. This algorithm is called closedEmbTM-eager as it eagerly eliminates non-closed patterns. To mitigate the generation of intermediate patterns, we devise pattern search space pruning rules to proactively detect and prune branches in the pattern search space which do not correspond to closed patterns. The pruning rules are accommodated into the extended embedded pattern miner to produce a new algorithm, called closedEmbTM-prune, for mining all the closed and maximal embedded frequent patterns. Our extensive experiments on synthetic and real large-tree datasets demonstrate that, on dense datasets, closedEmbTM-prune not only generates a complete closed and maximal pattern set which is substantially smaller than that generated by the embedded pattern miner, but also runs much faster with negligible overhead on pattern pruning.
KW - Embedded tree pattern
KW - Frequent pattern mining
KW - Hierarchical graph data
KW - Maximal and closed frequent pattern
KW - Pattern summarization
UR - http://www.scopus.com/inward/record.url?scp=85105693215&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105693215&partnerID=8YFLogxK
U2 - 10.1016/j.datak.2021.101890
DO - 10.1016/j.datak.2021.101890
M3 - Article
AN - SCOPUS:85105693215
SN - 0169-023X
VL - 133
JO - Data and Knowledge Engineering
JF - Data and Knowledge Engineering
M1 - 101890
ER -