Gene selection for microarray data analysis using principal component analysis

Antai Wang, Edmund A. Gehan

Research output: Contribution to journalArticlepeer-review

59 Scopus citations


Principal component analysis (PCA) has been widely used in multivariate data analysis to reduce the dimensionality of the data in order to simplify subsequent analysis and allow for summarization of the data in a parsimonious manner. It has become a useful tool in microarray data analysis. For a typical microarray data set, it is often difficult to compare the overall gene expression difference between observations from different groups or conduct the classification based on a very large number of genes. In this paper, we propose a gene selection method based on the strategy proposed by Krzanowski. We demonstrate the effectiveness of this procedure using a cancer gene expression data set and compare it with several other gene selection strategies. It turns out that the proposed method selects the best gene subset for preserving the original data structure.

Original languageEnglish (US)
Pages (from-to)2069-2087
Number of pages19
JournalStatistics in Medicine
Issue number13
StatePublished - Jul 15 2005
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Epidemiology
  • Statistics and Probability


  • Gene selection
  • Microarray data
  • Principal component analysis
  • Procrustes criterion


Dive into the research topics of 'Gene selection for microarray data analysis using principal component analysis'. Together they form a unique fingerprint.

Cite this