Pattern Discovery in Combinatorial Databases: Algorithms, Applications, and Software for the Scientific Community

Project: Research project

Project Details


This is an interinstitutional collaborative project. Combinatorial data consisting of sequences, trees, and graphs arise in many scientific disciplines. For example, the primary structure of proteins is a sequence, whereas the tertiary structure is a graph. Comparing such data to find similarities entails the use of a 'distance metric' that mea sures the difference between two data items. Numerous distance metrics are possible. This work consists primarily of (i) inventing efficient ways to compute known distance metrics; (ii) developing a data structure to decide which of a set of data items is 'closest' (according to a given distance metric) to a new data item; (iii) techniques and s oftware for discovering patterns with minimum or near-minimum distance to a given set of data items with respect to a given distance metric; and (iv) software to solve such discovery problems on networks of occasionally idle workstations. This work will help every field in which approximate matching is important. Significant applications are expe cted to molecular biology and rational drug design, as well as to finding patterns in linguistic strings.

Effective start/end date8/1/961/31/00


  • National Science Foundation: $207,532.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.