## Abstract

In this paper we present a dynamic programming algorithm for identifying the largest approximately common substructures (LACSs) of two ordered labeled trees. We consider a substructure of a tree T to be a connected subgraph of T. Given two trees T_{1}, T_{2} and an integer d, the LACS problem is to find a substructure U_{1} of T_{1} and a substructure U_{2} of T_{2} such that U_{1} is within distance d of U_{2} and where there does not exist any other substructure V_{1} of T_{1} and V_{2} of T_{2} such that V_{1} and V_{2} satisfy the distance restriction and the sum of the sizes of V_{1} and V_{2} is greater than the sum of the sizes of U_{1} and U_{2}. The distance measure considered in the paper is a restricted edit distance originated from Selkow. The LACS problem is motivated by the studies of program and document comparison. The proposed algorithm solves the problem in time O(d^{2}×|T_{1}|×|T_{2}|), which is as fast as the best known algorithm for calculating Selkow's distance of two trees when the distance allowed in the common substructures is a constant independent of the input trees.

Original language | English (US) |
---|---|

Pages (from-to) | 367-386 |

Number of pages | 20 |

Journal | Information sciences |

Volume | 121 |

Issue number | 3 |

DOIs | |

State | Published - Dec 2 1999 |

## All Science Journal Classification (ASJC) codes

- Software
- Control and Systems Engineering
- Theoretical Computer Science
- Computer Science Applications
- Information Systems and Management
- Artificial Intelligence