Mining genes in DNA using genescout

Michael M. Yin, Jason T.L. Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution


In this paper, we present a new system, called GeneScout, for predicting gene structures in vertebrate genomic DNA. The system contains specially designed hidden Markov models (HMMs) for detecting functional sites including protein-translation start sites, mRNA splicing junction donor and acceptor sites, etc. Our main hypothesis is that, given a vertebrate genomic DNA sequence S, it is always possible to construct a directed acyclic graph G such that the path for the actual coding region of S is in the set of all paths on G. Thus, the gene detection problem is reduced to that of analyzing the paths in the graph G. A dynamic programming algorithm is used to find the optimal path in G. The proposed system is trained using an expectation-maximization (EM) algorithm and its performance on vertebrate gene prediction is evaluated using the 10-way cross-validation method. Experimental results show the good performance of the proposed system and its complementarity to a widely used gene detection system.

Original languageEnglish (US)
Title of host publicationProceedings - 2002 IEEE International Conference on Data Mining, ICDM 2002
Number of pages4
StatePublished - 2002
Externally publishedYes
Event2nd IEEE International Conference on Data Mining, ICDM '02 - Maebashi, Japan
Duration: Dec 9 2002Dec 12 2002

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786


Other2nd IEEE International Conference on Data Mining, ICDM '02

All Science Journal Classification (ASJC) codes

  • General Engineering


  • Bioinformatics
  • Data mining
  • Gene finding
  • Hidden Markov models
  • Knowledge discovery


Dive into the research topics of 'Mining genes in DNA using genescout'. Together they form a unique fingerprint.

Cite this