Discovering frequent agreement subtrees from phylogenetic data

Sen Zhang, Jason T.L. Wang

Research output: Contribution to journalArticlepeer-review

21 Scopus citations

Abstract

We study a new data mining problem concerning the discovery of frequent agreement subtrees (FASTs) from a set of phylogenetic trees. A phylogenetic tree, or phylogeny, is an unordered tree in which the order among siblings is unimportant. Furthermore, each leaf in the tree has a label representing a taxon (species or organism) name whereas internal nodes are unlabeled. The tree may have a root, representing the common ancestor of all species in the tree, or may be unrooted. An unrooted phylogeny arises due to the lack of sufficient evidence to infer a common ancestor of the taxa in the tree. The FAST problem addressed here is a natural extension of the MAST (maximum agreement subtree) problem widely studied in the computational phylogenetics community. The paper establishes a framework for tackling the FAST problem for both rooted and unrooted phylogenetic trees using data mining techniques. We first develop a novel canonical form for rooted trees together with a phylogeny-aware tree expansion scheme for generating candidate subtrees level by level. Then we present an efficient algorithm to find all frequent agreement subtrees in a given set of rooted trees, through an Apriori-like approach. We show the correctness and completeness of the proposed method. Finally we discuss extensions of the techniques to unrooted trees. Experimental results demonstrate that the proposed methods work well, capable of finding interesting patterns in both synthetic data and real phylogenetic trees.

Original languageEnglish (US)
Pages (from-to)68-82
Number of pages15
JournalIEEE Transactions on Knowledge and Data Engineering
Volume20
Issue number1
DOIs
StatePublished - Jan 2008

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Keywords

  • Algorithmic design
  • Computational phylogenetics
  • Data mining
  • Evolutionary bioinformatics
  • Pattern discovery

Fingerprint

Dive into the research topics of 'Discovering frequent agreement subtrees from phylogenetic data'. Together they form a unique fingerprint.

Cite this