## Abstract

In this paper we present a dynamic programming algorithm for identifying the similar consensus (SC) (or the largest approximately common substructures) of two ordered labeled trees based on the alignment distance. We consider a substructure of a tree T to be a connected subgraph of T. Given two trees T_{1}, T_{2} and an integer d, the SC problem is to find a substructure U_{1} of T_{1} and a substructure U_{2} of T_{2} such that U_{1} is within distance d of U_{2} and where there does not exist any other substructure V_{1} of T_{1} and V_{2} of T_{2} such that V_{1} and V_{2} satisfy the distance constraint and the sum of the sizes of V_{1} and V_{2} is greater than the sum of the sizes of U_{1} and U_{2}. The proposed algorithm solves the SC problem in time O(d^{2} × |T_{1}| × |T_{2}| × (G_{1} + G_{2})^{2}), where |T_{i}|, i = 1, 2, is the size of tree T_{i} and G_{i} is the maximum degree of T_{i}. This is as fast as the best known algorithm for aligning two ordered labeled trees when the distance allowed in the common substructures is a constant independent of the input trees. To demonstrate the utility of our algorithm, we discuss its application to characterizing the conformation space of RNA sequences pertaining to the poliovirus, human rhinovirus and coxsackievirus, where the RNA structures are represented by ordered labeled trees.

Original language | English (US) |
---|---|

Pages (from-to) | 165-189 |

Number of pages | 25 |

Journal | Information sciences |

Volume | 126 |

Issue number | 1 |

DOIs | |

State | Published - Jul 2000 |

## All Science Journal Classification (ASJC) codes

- Software
- Control and Systems Engineering
- Theoretical Computer Science
- Computer Science Applications
- Information Systems and Management
- Artificial Intelligence