Approximate tree matching in the presence of variable length don′t cares

Kaizhong Zhang, Dennis Shasha, Jason T.L. Wang

Research output: Contribution to journalArticlepeer-review

71 Scopus citations

Abstract

Ordered labeled trees are trees in which the sibling order matters. This paper presents algorithms for three problems having to do with approximate matching for such trees with variable length don′t cares (VLDCs). In strings, a VLDC symbol in the pattern may substitute for zero or more symbols in the data string. For example, if “com*er” is the pattern, then the “*” would substitute for the substring “put” when matching the data string “computer.” Approximate VLDC matching in strings means that after the best possible substitution, the pattern still need not be the same as the data string for a match to be allowed. For example, “com*er” matches “counter” within distance one (representing the cost of removing the “m” from “com*er” and having the “*” substitute for “unt”). We generalize approximate VLDC string matching to three algorithms for approximate VLDC matching on trees. The time complexity of our algorithms is O(|P| × |D| × min(depth(P), leaves(P)) × min(depth(D), leaves(D))) (where |P| and |D| are the number of nodes respectively of the pattern P and the data tree D), the same as for the best approximate tree matching algorithm without VLDCs previously reported.

Original languageEnglish (US)
Pages (from-to)33-66
Number of pages34
JournalJournal of Algorithms
Volume16
Issue number1
DOIs
StatePublished - Jan 1994

All Science Journal Classification (ASJC) codes

  • Control and Optimization
  • Computational Mathematics
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Approximate tree matching in the presence of variable length don′t cares'. Together they form a unique fingerprint.

Cite this