DNA sequence classification via an expectation maximization algorithm and neural networks: A case study

Qicheng Ma, Jason T.L. Wang, Dennis Shasha, Cathy H. Wu

Research output: Contribution to journalArticlepeer-review

43 Scopus citations

Abstract

This paper presents new techniques for biosequence classification, with a focus on recognizing E. Coli promoters in DNA. Specifically, given an unlabeled DNA sequence S, we want to determine whether or not S is an E. Coli promoter. We use an expectation maximization (EM) algorithm to locate the -35 and -10 binding sites in an E. Coli promoter sequence. The EM algorithm differs from previously published EM algorithms in that, instead of assuming a uniform distribution for the lengths of the spacer between the -35 binding site and the -10 binding site as well as the spacer between the -10 binding site and the transcriptional start site, our algorithm deduces the probability distribution for these lengths. Based on the located binding sites, we select features in each E. Coli promoter sequence according to their information contents and represent the features using an orthogonal encoding method. We then feed the features to a neural network (NN) for promoter recognition. Empirical studies show that the proposed approach achieves good performance on different datasets.

Original languageEnglish (US)
Pages (from-to)468-475
Number of pages8
JournalIEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews
Volume31
Issue number4
DOIs
StatePublished - Nov 2001
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Software
  • Information Systems
  • Human-Computer Interaction
  • Computer Science Applications
  • Electrical and Electronic Engineering

Keywords

  • Bayesian inference
  • Bioinformatics
  • Data mining
  • Expectation maximization (EM)
  • Neural networks (NNs)
  • Promoter recognition

Fingerprint

Dive into the research topics of 'DNA sequence classification via an expectation maximization algorithm and neural networks: A case study'. Together they form a unique fingerprint.

Cite this