Application of hidden Markov models to biological data mining: A case study

Michael M. Yin, Jason Wang

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations

Abstract

In this paper we present an example of biological data mining: the detection of splicing junction acceptors in eukaryotic genes. Identification or prediction of transcribed sequences from within genomic DNA has been a major rate-limiting step in the pursuit of genes. Programs currently available are far from being powerful enough to elucidate the gene structure completely. Here we develop a hidden Markov model (HMM) to represent the degeneracy features of splicing junction acceptor sites in eukaryotic genes. The HMM system is fully trained using an expectation maximization (EM) algorithm and the system performance is evaluated using the 10-way cross-validation method. Experimental results show that our HMM system can correctly classify more than 94% of the candidate sequences (including true and false acceptor sites) into right categories. About 90% of the true acceptor sites and 96% of the false acceptor sites in the test data are classified correctly. These results are very promising considering that only the local information in DNA is used. The proposed model will be a very important component of an effective and accurate gene structure detection system currently being developed in our lab.

Original languageEnglish (US)
Pages (from-to)352-358
Number of pages7
JournalProceedings of SPIE - The International Society for Optical Engineering
Volume4057
StatePublished - Jan 1 2000
EventData Mining and Knowledge Discovery: Theory, Tools, and Technology II - Orlando, FL, USA
Duration: Apr 24 2000Apr 25 2000

All Science Journal Classification (ASJC) codes

  • Electronic, Optical and Magnetic Materials
  • Condensed Matter Physics
  • Computer Science Applications
  • Applied Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Application of hidden Markov models to biological data mining: A case study'. Together they form a unique fingerprint.

Cite this