TY - JOUR
T1 - Combinatorial perspectives on Dollo-k characters in phylogenetics
AU - Bouckaert, Remco
AU - Fischer, Mareike
AU - Wicke, Kristina
N1 - Publisher Copyright:
© 2021 Elsevier Inc.
PY - 2021/10
Y1 - 2021/10
N2 - Recently, the perfect phylogeny model with persistent characters has attracted great attention in the literature. It is based on the assumption that complex traits or characters can only be gained once and lost once in the course of evolution. Here, we consider a generalization of this model, namely Dollo parsimony, that allows for multiple character losses. More precisely, we take a combinatorial perspective on the notion of Dollo-k characters, i.e. traits that are gained at most once and lost precisely k times throughout evolution. We first introduce an algorithm based on the notion of spanning subtrees for finding a Dollo-k labelling for a given character and a given tree in linear time. We then compare persistent characters (consisting of the union of Dollo-0 and Dollo-1 characters) and general Dollo-k characters. While it is known that there is a strong connection between Fitch parsimony and persistent characters, we show that Dollo parsimony and Fitch parsimony are in general very different. Moreover, while it is known that there is a direct relationship between the number of persistent characters and the Sackin index of a tree, a popular index of tree balance, we show that this relationship does not generalize to Dollo-k characters. In fact, determining the number of Dollo-k characters for a given tree is much more involved than counting persistent characters, and we end this manuscript by introducing a recursive approach for the former. This approach leads to a polynomial time algorithm for counting the number of Dollo-k characters, and both this algorithm as well as the algorithm for computing Dollo-k labellings are publicly available in the Babel package for BEAST 2.
AB - Recently, the perfect phylogeny model with persistent characters has attracted great attention in the literature. It is based on the assumption that complex traits or characters can only be gained once and lost once in the course of evolution. Here, we consider a generalization of this model, namely Dollo parsimony, that allows for multiple character losses. More precisely, we take a combinatorial perspective on the notion of Dollo-k characters, i.e. traits that are gained at most once and lost precisely k times throughout evolution. We first introduce an algorithm based on the notion of spanning subtrees for finding a Dollo-k labelling for a given character and a given tree in linear time. We then compare persistent characters (consisting of the union of Dollo-0 and Dollo-1 characters) and general Dollo-k characters. While it is known that there is a strong connection between Fitch parsimony and persistent characters, we show that Dollo parsimony and Fitch parsimony are in general very different. Moreover, while it is known that there is a direct relationship between the number of persistent characters and the Sackin index of a tree, a popular index of tree balance, we show that this relationship does not generalize to Dollo-k characters. In fact, determining the number of Dollo-k characters for a given tree is much more involved than counting persistent characters, and we end this manuscript by introducing a recursive approach for the former. This approach leads to a polynomial time algorithm for counting the number of Dollo-k characters, and both this algorithm as well as the algorithm for computing Dollo-k labellings are publicly available in the Babel package for BEAST 2.
KW - Dollo parsimony
KW - Fitch algorithm
KW - Persistent characters
KW - Sackin index
KW - Spanning trees
UR - http://www.scopus.com/inward/record.url?scp=85111663308&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85111663308&partnerID=8YFLogxK
U2 - 10.1016/j.aam.2021.102252
DO - 10.1016/j.aam.2021.102252
M3 - Article
AN - SCOPUS:85111663308
SN - 0196-8858
VL - 131
JO - Advances in Applied Mathematics
JF - Advances in Applied Mathematics
M1 - 102252
ER -