Scientific data classification is the activity of determining whether or not an unlabeled scientific object belongs to an existing class. It is an important operation in the management of scientific databases. In this paper we present a case study for scientific data classification. Specifically, we develop a tool for DNA sequence classification. The tool works by generating and matching gapped fingerprints of DNA sequences. Experimental results obtained by applying our tool to classifying a set of Alu sequences demonstrate the good performance of the tool. While the reported research focuses on DNA classification, our techniques should generalize to any domain (e.g. multimedia) where data are naturally represented as sequences.
|Number of pages
|Proceedings of the International Conference on Tools with Artificial Intelligence
|Published - Dec 1 1997
All Science Journal Classification (ASJC) codes