Identifying approximately common substructures in trees based on a restricted edit distance

Jason T.L. Wang, Kaizhong Zhang, Chia Yo Chang

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

In this paper we present a dynamic programming algorithm for identifying the largest approximately common substructures (LACSs) of two ordered labeled trees. We consider a substructure of a tree T to be a connected subgraph of T. Given two trees T1, T2 and an integer d, the LACS problem is to find a substructure U1 of T1 and a substructure U2 of T2 such that U1 is within distance d of U2 and where there does not exist any other substructure V1 of T1 and V2 of T2 such that V1 and V2 satisfy the distance restriction and the sum of the sizes of V1 and V2 is greater than the sum of the sizes of U1 and U2. The distance measure considered in the paper is a restricted edit distance originated from Selkow. The LACS problem is motivated by the studies of program and document comparison. The proposed algorithm solves the problem in time O(d2×|T1|×|T2|), which is as fast as the best known algorithm for calculating Selkow's distance of two trees when the distance allowed in the common substructures is a constant independent of the input trees.

Original languageEnglish (US)
Pages (from-to)367-386
Number of pages20
JournalInformation sciences
Volume121
Issue number3
DOIs
StatePublished - Dec 2 1999

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Identifying approximately common substructures in trees based on a restricted edit distance'. Together they form a unique fingerprint.

Cite this