Effective hidden Markov models for detecting splicing junction sites in DNA sequences

Michael M. Yin, Jason T.L. Wang

Research output: Contribution to journalConference articlepeer-review

32 Scopus citations

Abstract

Identification or prediction of coding sequences from within genomic DNA has been a major rate-limiting step in the pursuit of genes. Programs currently available are far from being powerful enough to elucidate the gene structure completely. In this paper, we develop effective hidden Markov models (HMMs) to represent the consensus and degeneracy features of splicing junction sites in eukaryotic genes. Our HMM system based on the developed HMMs is fully trained using an expectation maximization (EM) algorithm and the system performance is evaluated using a 10-way cross-validation method. Experimental results show that the proposed HMM system can correctly detect 92% of the true donor sites and 91.5% of the true acceptor sites in the test data set containing real vertebrate gene sequences. These results suggest that our approach provide a useful tool in discovering the splicing junction sites in eukaryotic genes.

Original languageEnglish (US)
Pages (from-to)139-163
Number of pages25
JournalInformation sciences
Volume139
Issue number1-2
DOIs
StatePublished - Nov 1 2001
EventBioinformatics - Atlantic City, NJ, United States
Duration: Feb 27 2000Mar 3 2000

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Keywords

  • Bioinformatics
  • Computational biology
  • Gene finding
  • Hidden Markov models
  • Splicing junction

Fingerprint Dive into the research topics of 'Effective hidden Markov models for detecting splicing junction sites in DNA sequences'. Together they form a unique fingerprint.

Cite this