TY - GEN
T1 - Leveraging homomorphisms and bitmaps to enable the mining of embedded patterns from large data trees
AU - Wu, Xiaoying
AU - Theodoratos, Dimitri
N1 - Publisher Copyright:
© 2015, Springer International Publishing Switzerland, All rights Reserved.
PY - 2015
Y1 - 2015
N2 - Finding interesting tree patterns hidden in large datasets is an important research area that has many practical applications. Along the years, research has evolved from mining induced patterns to mining embedded patterns. Embedded patterns allow for discovering useful relationships which cannot be captured by induced patterns. Unfortunately, previous contributions have focused almost exclusively on mining patterns from a set of small trees. The problem of mining embedded patterns from large data trees has been neglected. This is mainly due to the complexity of this task related to the problem of unordered tree embedding test being NP-Complete. However, mining embedded patterns from large trees is important for many modern applications that arise naturally and in particular with the explosion of big data. In this paper, we address the problem of mining unordered frequent embedded tree patterns from large trees. We propose a novel approach that exploits efficient homomorphic pattern matching algorithms to compute pattern support incrementally and avoids the costly enumeration of all pattern matchings required by previous approaches. A further originality of our approach is that matching information of already computed patterns is materialized as bitmaps. This technique not only minimizes the memory consumption but also reduces CPU costs by translating pattern evaluation to bitwise operations. An extensive experimental evaluation shows that our approach not only mines embedded patterns from real datasets up to several orders of magnitude faster than state-of-theart tree mining algorithms applied to large data trees but also scales well empowering the extraction of patterns from large datasets where previous approaches fail.
AB - Finding interesting tree patterns hidden in large datasets is an important research area that has many practical applications. Along the years, research has evolved from mining induced patterns to mining embedded patterns. Embedded patterns allow for discovering useful relationships which cannot be captured by induced patterns. Unfortunately, previous contributions have focused almost exclusively on mining patterns from a set of small trees. The problem of mining embedded patterns from large data trees has been neglected. This is mainly due to the complexity of this task related to the problem of unordered tree embedding test being NP-Complete. However, mining embedded patterns from large trees is important for many modern applications that arise naturally and in particular with the explosion of big data. In this paper, we address the problem of mining unordered frequent embedded tree patterns from large trees. We propose a novel approach that exploits efficient homomorphic pattern matching algorithms to compute pattern support incrementally and avoids the costly enumeration of all pattern matchings required by previous approaches. A further originality of our approach is that matching information of already computed patterns is materialized as bitmaps. This technique not only minimizes the memory consumption but also reduces CPU costs by translating pattern evaluation to bitwise operations. An extensive experimental evaluation shows that our approach not only mines embedded patterns from real datasets up to several orders of magnitude faster than state-of-theart tree mining algorithms applied to large data trees but also scales well empowering the extraction of patterns from large datasets where previous approaches fail.
KW - Bitmap view
KW - Holistic twig-join algorithm
KW - Tree pattern mining
UR - http://www.scopus.com/inward/record.url?scp=84942587295&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84942587295&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-18120-2_1
DO - 10.1007/978-3-319-18120-2_1
M3 - Conference contribution
AN - SCOPUS:84942587295
SN - 9783319181196
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 3
EP - 20
BT - Database Systems for Advanced Applications - 20th International Conference, DASFAA 2015, Proceedings Hanoi, Vietnam, April 20-23, 2015 Proceedings, Part I
A2 - Shahabi, Cyrus
A2 - Cheema, Muhammad Aamir
A2 - Renz, Matthias
A2 - Zhou, Xiaofang
PB - Springer Verlag
T2 - 20th International Conference on Database Systems for Advanced Applications, DASFAA 2015
Y2 - 20 April 2015 through 23 April 2015
ER -