Abstract
Scientific data mining is the activity of finding significant information in scientific data. This paper presents an example of scientific data mining: the discovery of approximately common patterns in RNA secondary structures. We represent an RNA secondary structure by an ordered labeled tree based on a previously proposed scheme. The patterns in the trees are substructures that can differ in both substitutions and deletions/insertions of nodes of the trees. Our techniques incorporate approximate tree matching algorithms and novel heuristics for discovery and optimization. Experimental results obtained by running these algorithms on both generated data and RNA secondary structures show the good performance of the algorithms. It is shown that the optimization heuristics speed up the discovery algorithm by a factor of 10. Moreover, our optimized approach is 100,000 times faster than the brute force method.
Original language | English (US) |
---|---|
Pages (from-to) | 77-96 |
Number of pages | 20 |
Journal | International Journal of Software Engineering and Knowledge Engineering |
Volume | 8 |
Issue number | 1 |
DOIs | |
State | Published - Mar 1998 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Software
- Computer Networks and Communications
- Computer Graphics and Computer-Aided Design
- Artificial Intelligence
Keywords
- Ordered labeled trees
- Pattern matching
- Query optimization heuristics
- RNA secondary structures
- Scientific databases