Selection of the number of clusters via the bootstrap method

Yixin Fang, Junhui Wang

Research output: Contribution to journalArticlepeer-review

107 Scopus citations

Abstract

Here the problem of selecting the number of clusters in cluster analysis is considered. Recently, the concept of clustering stability, which measures the robustness of any given clustering algorithm, has been utilized in Wang (2010) for selecting the number of clusters through cross validation. In this paper, an estimation scheme for clustering instability is developed based on the bootstrap, and then the number of clusters is selected so that the corresponding estimated clustering instability is minimized. The proposed selection criterion's effectiveness is demonstrated on simulations and real examples.

Original languageEnglish (US)
Pages (from-to)468-477
Number of pages10
JournalComputational Statistics and Data Analysis
Volume56
Issue number3
DOIs
StatePublished - Mar 1 2012
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Computational Mathematics
  • Computational Theory and Mathematics
  • Applied Mathematics

Keywords

  • Cluster analysis
  • K-means
  • Spectral clustering
  • Stability

Fingerprint

Dive into the research topics of 'Selection of the number of clusters via the bootstrap method'. Together they form a unique fingerprint.

Cite this