TY - GEN
T1 - Computational methods for identifying number of clusters in gene expression data
AU - Ressom, Habtom
AU - Wang, Dali
AU - Natarajan, Padma
PY - 2003
Y1 - 2003
N2 - With the advent of microarray technology, there is a growing need to reliably extract biologically significant information from massive gene expression data. Clustering is one of the key steps in analyzing gene expression data by identifying groups of genes that manifest similar expression patterns. Many algorithms for clustering gene expression data have been reported in the literature. However, there has been limited progress on cluster validation and identifying the number of clusters available in gene expression data. In this paper, we investigate the relative merits of four algorithms in clustering two gene expression data sets. The clustering methods we investigated are the poplar self-organizing maps (SOM), adaptive double self-organizing maps (ADSOM), fuzzy c-means (FCM), and model based clustering method. Their corresponding clusters are validated using figure of merit (FOM), a hierarchical tree-based index, Xie-Beni index that gives a measure of compactness and separation of clusters, and an approximation called the Bayesian information criterion (BIC). Our intent is to provide with a useful guide for choosing the appropriate computational method for identification of number of clusters in gene expression data analysis. It was observed that ADSOM outsmarted the three other clustering methods in detecting the number of clusters available in the two gene expression data sets.
AB - With the advent of microarray technology, there is a growing need to reliably extract biologically significant information from massive gene expression data. Clustering is one of the key steps in analyzing gene expression data by identifying groups of genes that manifest similar expression patterns. Many algorithms for clustering gene expression data have been reported in the literature. However, there has been limited progress on cluster validation and identifying the number of clusters available in gene expression data. In this paper, we investigate the relative merits of four algorithms in clustering two gene expression data sets. The clustering methods we investigated are the poplar self-organizing maps (SOM), adaptive double self-organizing maps (ADSOM), fuzzy c-means (FCM), and model based clustering method. Their corresponding clusters are validated using figure of merit (FOM), a hierarchical tree-based index, Xie-Beni index that gives a measure of compactness and separation of clusters, and an approximation called the Bayesian information criterion (BIC). Our intent is to provide with a useful guide for choosing the appropriate computational method for identification of number of clusters in gene expression data analysis. It was observed that ADSOM outsmarted the three other clustering methods in detecting the number of clusters available in the two gene expression data sets.
KW - Cluster validation
KW - Clustering
KW - Gene expression data
KW - Microarray
UR - http://www.scopus.com/inward/record.url?scp=1542747721&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=1542747721&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:1542747721
SN - 0889863474
T3 - Proceedings of the IASTED International Conference on Neural Networks and Computational Intelligence
SP - 233
EP - 238
BT - Proceedings of the IASTED International Conference on Neural Networks and Computational Intelligence
A2 - Coastillo, O.
T2 - Proceedings of the IASTED International Conference on Neural Networks and Computational Intelligence
Y2 - 19 May 2003 through 21 May 2003
ER -