Abstract
Discovering informative tree patterns hidden in large datasets is an important research area that has many practical applications. Along the years, research has evolved from mining induced patterns to mining embedded patterns. Mixed patterns allow extracting all the information extracted by embedded or induced patterns but also more detailed information which cannot be extracted by the other two. Unfortunately, the problem of extracting unconstrained mixed patterns from data trees has not been addressed up to now. In this paper, we address the problem of mining unordered frequent mixed patterns from large trees. We propose a novel approach that nonredundantly extracts most-specific mixed patterns. Our approach utilizes effective pruning techniques to reduce the pattern search space. It exploits efficient homomorphic pattern matching algorithms to compute pattern support incrementally and avoids the costly enumeration of all pattern matchings required by older approaches. An extensive experimental evaluation shows that our approach not only mines mixed patterns from real and synthetic datasets up to several orders of magnitude faster than older state-of-the-art embedded tree mining algorithms applied to large data trees but also scales well empowering the extraction of informative mixed patterns from large datasets for which no previous approaches exist.
Original language | English (US) |
---|---|
Pages (from-to) | 279-294 |
Number of pages | 16 |
Journal | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Volume | 10177 LNCS |
DOIs | |
State | Published - 2017 |
Event | 22nd International Conference on Database Systems for Advanced Applications, DASFAA 2017 - Suzhou, China Duration: Mar 27 2017 → Mar 30 2017 |
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- General Computer Science