Efficiently discovering most-specific mixed patterns from large data trees

Xiaoying Wu, Dimitri Theodoratos

Research output: Contribution to journalConference articlepeer-review

Abstract

Discovering informative tree patterns hidden in large datasets is an important research area that has many practical applications. Along the years, research has evolved from mining induced patterns to mining embedded patterns. Mixed patterns allow extracting all the information extracted by embedded or induced patterns but also more detailed information which cannot be extracted by the other two. Unfortunately, the problem of extracting unconstrained mixed patterns from data trees has not been addressed up to now. In this paper, we address the problem of mining unordered frequent mixed patterns from large trees. We propose a novel approach that nonredundantly extracts most-specific mixed patterns. Our approach utilizes effective pruning techniques to reduce the pattern search space. It exploits efficient homomorphic pattern matching algorithms to compute pattern support incrementally and avoids the costly enumeration of all pattern matchings required by older approaches. An extensive experimental evaluation shows that our approach not only mines mixed patterns from real and synthetic datasets up to several orders of magnitude faster than older state-of-the-art embedded tree mining algorithms applied to large data trees but also scales well empowering the extraction of informative mixed patterns from large datasets for which no previous approaches exist.

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Efficiently discovering most-specific mixed patterns from large data trees'. Together they form a unique fingerprint.

Cite this