Abstract
Principal component analysis (PCA) has been widely used in multivariate data analysis to reduce the dimensionality of the data in order to simplify subsequent analysis and allow for summarization of the data in a parsimonious manner. It has become a useful tool in microarray data analysis. For a typical microarray data set, it is often difficult to compare the overall gene expression difference between observations from different groups or conduct the classification based on a very large number of genes. In this paper, we propose a gene selection method based on the strategy proposed by Krzanowski. We demonstrate the effectiveness of this procedure using a cancer gene expression data set and compare it with several other gene selection strategies. It turns out that the proposed method selects the best gene subset for preserving the original data structure.
Original language | English (US) |
---|---|
Pages (from-to) | 2069-2087 |
Number of pages | 19 |
Journal | Statistics in Medicine |
Volume | 24 |
Issue number | 13 |
DOIs | |
State | Published - Jul 15 2005 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Epidemiology
- Statistics and Probability
Keywords
- Gene selection
- Microarray data
- Principal component analysis
- Procrustes criterion