The goal of this research project is to make it possible to process approximate queries on combinatorial structures such as trees and graphs in a fast (i.e., sublinear with respect to the number of objects being searched) and efficient manner. The ultimate goal is to be as fast as approximate keyword search in text documents. The approach consists of: (1) discovering the best heuristics for answering these NP-complete problems in polynomial time with high quality, and (2) discovering data structures to make queries on thousands or millions of such objects fast. The results of this project will provide techniques and software to search for patterns among graph and tree structured data. Possible applications of such a search engine include searches among proteins, chemical compounds, neuroanatomical structures, Web/text filters and XML documents. The techniques developed in this research are in particular suitable for bioinformatics and biocomputing applications. Professors Shasha and Wang plan to collaborate with five additional researchers in these areas: Jack Collins, National Cancer Institute working in small molecule-protein docking for drug design; Michael Donohue, Professor of Biology and Director of the Harvard University Herbaria planning to apply this research to phylogenic trees; Bruce Shapiro, National Cancer Institute developing algorithms and computational systems for determining structure/function of nucleic acids; Cathy Wu, National Biomedical Research Foundation doing research in analysis and classification of protein sequences; and Daniel Zaharevitz, National Cancer Institute whose goal is to make biological and structural data more available to the research community via better search capabilities. These collaborators help to motivate and validate the tree and graph matching tools and algorithms development for biological applications.
|Effective start/end date||9/15/00 → 8/31/04|
- National Science Foundation: $194,656.00