Abstract
Identification or prediction of coding sequences from within genomic DNA has been a major rate-limiting step in the pursuit of genes. Programs currently available are far from being powerful enough to elucidate the gene structure completely. In this paper, we develop effective hidden Markov models (HMMs) to represent the consensus and degeneracy features of splicing junction sites in eukaryotic genes. Our HMM system based on the developed HMMs is fully trained using an expectation maximization (EM) algorithm and the system performance is evaluated using a 10-way cross-validation method. Experimental results show that the proposed HMM system can correctly detect 92% of the true donor sites and 91.5% of the true acceptor sites in the test data set containing real vertebrate gene sequences. These results suggest that our approach provide a useful tool in discovering the splicing junction sites in eukaryotic genes.
Original language | English (US) |
---|---|
Pages (from-to) | 139-163 |
Number of pages | 25 |
Journal | Information sciences |
Volume | 139 |
Issue number | 1-2 |
DOIs | |
State | Published - Nov 2001 |
Event | Bioinformatics - Atlantic City, NJ, United States Duration: Feb 27 2000 → Mar 3 2000 |
All Science Journal Classification (ASJC) codes
- Software
- Control and Systems Engineering
- Theoretical Computer Science
- Computer Science Applications
- Information Systems and Management
- Artificial Intelligence
Keywords
- Bioinformatics
- Computational biology
- Gene finding
- Hidden Markov models
- Splicing junction