Computational methods for identifying number of clusters in gene expression data

Habtom Ressom, Dali Wang, Padma Natarajan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the advent of microarray technology, there is a growing need to reliably extract biologically significant information from massive gene expression data. Clustering is one of the key steps in analyzing gene expression data by identifying groups of genes that manifest similar expression patterns. Many algorithms for clustering gene expression data have been reported in the literature. However, there has been limited progress on cluster validation and identifying the number of clusters available in gene expression data. In this paper, we investigate the relative merits of four algorithms in clustering two gene expression data sets. The clustering methods we investigated are the poplar self-organizing maps (SOM), adaptive double self-organizing maps (ADSOM), fuzzy c-means (FCM), and model based clustering method. Their corresponding clusters are validated using figure of merit (FOM), a hierarchical tree-based index, Xie-Beni index that gives a measure of compactness and separation of clusters, and an approximation called the Bayesian information criterion (BIC). Our intent is to provide with a useful guide for choosing the appropriate computational method for identification of number of clusters in gene expression data analysis. It was observed that ADSOM outsmarted the three other clustering methods in detecting the number of clusters available in the two gene expression data sets.

Original languageEnglish (US)
Title of host publicationProceedings of the IASTED International Conference on Neural Networks and Computational Intelligence
EditorsO. Coastillo
Pages233-238
Number of pages6
StatePublished - Dec 1 2003
Externally publishedYes
EventProceedings of the IASTED International Conference on Neural Networks and Computational Intelligence - Cancun, Mexico
Duration: May 19 2003May 21 2003

Publication series

NameProceedings of the IASTED International Conference on Neural Networks and Computational Intelligence

Other

OtherProceedings of the IASTED International Conference on Neural Networks and Computational Intelligence
CountryMexico
CityCancun
Period5/19/035/21/03

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Keywords

  • Cluster validation
  • Clustering
  • Gene expression data
  • Microarray

Fingerprint Dive into the research topics of 'Computational methods for identifying number of clusters in gene expression data'. Together they form a unique fingerprint.

Cite this