Mining frequent agreement subtrees in phylogenetic databases

Sen Zhang, Jason T.L. Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations

Abstract

We present a new data mining problem to discover frequent agreement subtree patterns from a database of rooted phylogenetic trees. This problem is a natural extension of the traditional MAST (maximum agreement subtree) problem. To solve the problem, we first present a novel canonical form for leaf-labeled trees and an efficient tree expansion algorithm for generating candidate subtrees level by level. We then show how to efficiently discover all frequent agreement subtrees from a given set of phylogenetic trees, through an Apriori-like data mining approach. We discuss the correctness and completeness of the proposed method. Experimental results demonstrate that the proposed method can discover interesting patterns from different phylogenetic trees for multiple species. The algorithms were implemented in C++ and integrated into an online toolkit, which is fully operational and accessible on the World Wide Web.

Original languageEnglish (US)
Title of host publicationProceedings of the Sixth SIAM International Conference on Data Mining
PublisherSociety for Industrial and Applied Mathematics
Pages222-233
Number of pages12
ISBN (Print)089871611X, 9780898716115
DOIs
StatePublished - Jan 1 2006
EventSixth SIAM International Conference on Data Mining - Bethesda, MD, United States
Duration: Apr 20 2006Apr 22 2006

Publication series

NameProceedings of the Sixth SIAM International Conference on Data Mining
Volume2006

Other

OtherSixth SIAM International Conference on Data Mining
CountryUnited States
CityBethesda, MD
Period4/20/064/22/06

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Fingerprint Dive into the research topics of 'Mining frequent agreement subtrees in phylogenetic databases'. Together they form a unique fingerprint.

Cite this