TY - GEN
T1 - Mining frequent agreement subtrees in phylogenetic databases
AU - Zhang, Sen
AU - Wang, Jason T.L.
PY - 2006
Y1 - 2006
N2 - We present a new data mining problem to discover frequent agreement subtree patterns from a database of rooted phylogenetic trees. This problem is a natural extension of the traditional MAST (maximum agreement subtree) problem. To solve the problem, we first present a novel canonical form for leaf-labeled trees and an efficient tree expansion algorithm for generating candidate subtrees level by level. We then show how to efficiently discover all frequent agreement subtrees from a given set of phylogenetic trees, through an Apriori-like data mining approach. We discuss the correctness and completeness of the proposed method. Experimental results demonstrate that the proposed method can discover interesting patterns from different phylogenetic trees for multiple species. The algorithms were implemented in C++ and integrated into an online toolkit, which is fully operational and accessible on the World Wide Web.
AB - We present a new data mining problem to discover frequent agreement subtree patterns from a database of rooted phylogenetic trees. This problem is a natural extension of the traditional MAST (maximum agreement subtree) problem. To solve the problem, we first present a novel canonical form for leaf-labeled trees and an efficient tree expansion algorithm for generating candidate subtrees level by level. We then show how to efficiently discover all frequent agreement subtrees from a given set of phylogenetic trees, through an Apriori-like data mining approach. We discuss the correctness and completeness of the proposed method. Experimental results demonstrate that the proposed method can discover interesting patterns from different phylogenetic trees for multiple species. The algorithms were implemented in C++ and integrated into an online toolkit, which is fully operational and accessible on the World Wide Web.
UR - http://www.scopus.com/inward/record.url?scp=33745457798&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33745457798&partnerID=8YFLogxK
U2 - 10.1137/1.9781611972764.20
DO - 10.1137/1.9781611972764.20
M3 - Conference contribution
AN - SCOPUS:33745457798
SN - 089871611X
SN - 9780898716115
T3 - Proceedings of the Sixth SIAM International Conference on Data Mining
SP - 222
EP - 233
BT - Proceedings of the Sixth SIAM International Conference on Data Mining
PB - Society for Industrial and Applied Mathematics
T2 - Sixth SIAM International Conference on Data Mining
Y2 - 20 April 2006 through 22 April 2006
ER -