TY - JOUR
T1 - Accelerated Two-Stage Particle Swarm Optimization for Clustering Not-Well-Separated Data
AU - Xu, Xiangping
AU - Li, Jun
AU - Zhou, Meng Chu
AU - Xu, Jun
AU - Cao, Jinde
N1 - Funding Information:
Manuscript received June 28, 2017; revised February 13, 2018; accepted May 10, 2018. Date of publication June 29, 2018; date of current version October 15, 2020. This work was supported in part by the National Natural Science Foundation of China under Grant 61773115, Grant 61374069, Grant 61374148, Grant 61573096, and Grant 61771249, and in part by the Jiangsu Province Natural Science Foundation under Grant BK20161427. This paper was recommended by Associate Editor S. Mostaghim. (Corresponding author: Jun Li.) X. Xu and J. Li are with the Ministry of Education Key Laboratory of Measurement and Control of CSE, Southeast University, Nanjing 210096, China (e-mail: 230169103@seu.edu.cn; j.li@seu.edu.cn) M. Zhou is with the Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ 07102 USA, and also with the Renewable Energy Research Group, King Abdulaziz University, Jeddah 21589, Saudi Arabia (e-mail: zhou@njit.edu).
Publisher Copyright:
© 2018 IEEE.
PY - 2020/11
Y1 - 2020/11
N2 - Cluster analysis is a data mining technique that has been widely used to exploit useful information in a great amount of data. Because of their evaluation mechanism based on an intracluster distance (ICD) function, traditional single-objective clustering algorithms are not appropriate for not-well-separated data. Specifically, they may easily result in the drop of the optimal solution accuracy on their late stages of search when dealing with the latter. To overcome the problem, in this paper a novel index reflecting the similarity of data within a cluster is presented and called intracluster cohesion (ICC). However, if a multiobjective method is used to cluster with ICD and ICC as the specified objectives, its clustering accuracy may depend on one's experience. Motivated by these, we propose an accelerated two-stage particle swarm optimization (ATPSO) in which K -means is utilized to accelerate particles' convergence during the population initialization. Its clustering process consists of two stages. First, the main objective of minimizing ICD is to execute preliminary clustering; second, ICC is optimized to promote the clustering accuracy. Extensive experiments with the help of 17 open-source clustering sets in various geometric distributions are conducted. The results show that ATPSO outperforms PSO, K -means PSO (KPSO), chaotic PSO (CPSO), and accelerated CPSO in terms of accuracy, and its efficiency is approximate to that of KPSO. Its convergence trend indicates that the adoption of the proposed ICC contributes to the clustering accuracy. Remarkably, compared with the Pareto-based multiobjective PSO, ATPSO can detect clusters more accurately and quickly through the proposed two-stage search.
AB - Cluster analysis is a data mining technique that has been widely used to exploit useful information in a great amount of data. Because of their evaluation mechanism based on an intracluster distance (ICD) function, traditional single-objective clustering algorithms are not appropriate for not-well-separated data. Specifically, they may easily result in the drop of the optimal solution accuracy on their late stages of search when dealing with the latter. To overcome the problem, in this paper a novel index reflecting the similarity of data within a cluster is presented and called intracluster cohesion (ICC). However, if a multiobjective method is used to cluster with ICD and ICC as the specified objectives, its clustering accuracy may depend on one's experience. Motivated by these, we propose an accelerated two-stage particle swarm optimization (ATPSO) in which K -means is utilized to accelerate particles' convergence during the population initialization. Its clustering process consists of two stages. First, the main objective of minimizing ICD is to execute preliminary clustering; second, ICC is optimized to promote the clustering accuracy. Extensive experiments with the help of 17 open-source clustering sets in various geometric distributions are conducted. The results show that ATPSO outperforms PSO, K -means PSO (KPSO), chaotic PSO (CPSO), and accelerated CPSO in terms of accuracy, and its efficiency is approximate to that of KPSO. Its convergence trend indicates that the adoption of the proposed ICC contributes to the clustering accuracy. Remarkably, compared with the Pareto-based multiobjective PSO, ATPSO can detect clusters more accurately and quickly through the proposed two-stage search.
KW - Clustering
KW - intracluster cohesion (ICC)
KW - particle swarm optimization (PSO)
KW - two-stage strategy
UR - http://www.scopus.com/inward/record.url?scp=85059111373&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85059111373&partnerID=8YFLogxK
U2 - 10.1109/TSMC.2018.2839618
DO - 10.1109/TSMC.2018.2839618
M3 - Article
AN - SCOPUS:85059111373
SN - 2168-2216
VL - 50
SP - 4212
EP - 4223
JO - IEEE Transactions on Systems, Man, and Cybernetics: Systems
JF - IEEE Transactions on Systems, Man, and Cybernetics: Systems
IS - 11
M1 - 8400589
ER -