TY - GEN
T1 - Efficiently mining homomorphic patterns from large data trees
AU - Wu, Xiaoying
AU - Theodoratos, Dimitri
AU - Peng, Zhiyong
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2016.
PY - 2016
Y1 - 2016
N2 - Finding interesting tree patterns hidden in large datasets is a central topic in data mining with many practical applications. Unfortunately, previous contributions have focused almost exclusively on mining induced patterns from a set of small trees. The problem of mining homomorphic patterns from a large data tree has been neglected. This is mainly due to the challenging unbounded redundancy that homomorphic tree patterns can display. However, mining homomorphic patterns allows for discovering large patterns which cannot be extracted when mining induced or embedded patterns. Large patterns better characterize big trees which are important for many modern applications in particular with the explosion of big data. In this paper, we address the problem of mining frequent homomorphic tree patterns from a single large tree. We propose a novel approach that extracts non-redundant maximal homomorphic patterns. Our approach employs an incremental frequency computation method that avoids the costly enumeration of all pattern matchings required by previous approaches. Matching information of already computed patterns is materialized as bitmaps a technique that not only minimizes the memory consumption but also the CPU time. We conduct detailed experiments to test the performance and scalability of our approach. The experimental evaluation shows that our approach mines larger patterns and extracts maximal homomorphic patterns from real datasets outperforming stateof- the-art embedded tree mining algorithms applied to a large data tree.
AB - Finding interesting tree patterns hidden in large datasets is a central topic in data mining with many practical applications. Unfortunately, previous contributions have focused almost exclusively on mining induced patterns from a set of small trees. The problem of mining homomorphic patterns from a large data tree has been neglected. This is mainly due to the challenging unbounded redundancy that homomorphic tree patterns can display. However, mining homomorphic patterns allows for discovering large patterns which cannot be extracted when mining induced or embedded patterns. Large patterns better characterize big trees which are important for many modern applications in particular with the explosion of big data. In this paper, we address the problem of mining frequent homomorphic tree patterns from a single large tree. We propose a novel approach that extracts non-redundant maximal homomorphic patterns. Our approach employs an incremental frequency computation method that avoids the costly enumeration of all pattern matchings required by previous approaches. Matching information of already computed patterns is materialized as bitmaps a technique that not only minimizes the memory consumption but also the CPU time. We conduct detailed experiments to test the performance and scalability of our approach. The experimental evaluation shows that our approach mines larger patterns and extracts maximal homomorphic patterns from real datasets outperforming stateof- the-art embedded tree mining algorithms applied to a large data tree.
UR - http://www.scopus.com/inward/record.url?scp=84962339307&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84962339307&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-32025-0_12
DO - 10.1007/978-3-319-32025-0_12
M3 - Conference contribution
AN - SCOPUS:84962339307
SN - 9783319320243
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 180
EP - 196
BT - Database Systems for Advanced Applications - 21st International Conference, DASFAA 2016, Proceedings
A2 - Navathe, Shamkant B.
A2 - Wu, Weili
A2 - Shekhar, Shashi
A2 - Du, Xiaoyong
A2 - Xiong, Hui
A2 - Wang, X. Sean
PB - Springer Verlag
T2 - 21st International Conference on Database Systems for Advanced Applications, DASFAA 2016
Y2 - 16 April 2016 through 19 April 2016
ER -