TY - JOUR
T1 - On a two-stage progressive clustering algorithm with graph-augmented density peak clustering
AU - Niu, Xinzheng
AU - Zheng, Yunhong
AU - Liu, Wuji
AU - Wu, Chase Q.
N1 - Funding Information:
This research is sponsored by the Science and Technology Planning Project of Sichuan Province under Grant No. 2020YFG0054 , and the Scientific Research Project of State Grid Sichuan Electric Power Company Information and Communication Company under Grant No. SGSCXT00XGJS1800219 . We would also like to thank Bowen Shi for his comments, which helped improve some parts of this research.
Publisher Copyright:
© 2021 Elsevier Ltd
PY - 2022/2
Y1 - 2022/2
N2 - Due to the rapidly growing volume and velocity of big data, real-time streaming data analysis has become increasingly important in many applications. To discover knowledge from such data, a wide range of machine learning techniques have been proposed and used in practice. Among them, clustering, which aims at grouping objects into different classes on the basis of their similarity, is the most common form of unsupervised learning. However, most existing clustering algorithms are designed for static data, and hence are not best suited for streaming data. In this paper, we propose PC-DPC, a two-stage progressive clustering algorithm with graph-augmented density peak clustering. PC-DPC first identifies clusters of streaming data using an improved density peak clustering algorithm, and then merges newly arriving data into the existing data pool by measuring inter-cluster structural similarity, which considers the distance between a center and representative points. We illustrate the superiority of PC-DPC over several state-of-the-art clustering algorithms in terms of clustering accuracy and running time on publicly available benchmark datasets.
AB - Due to the rapidly growing volume and velocity of big data, real-time streaming data analysis has become increasingly important in many applications. To discover knowledge from such data, a wide range of machine learning techniques have been proposed and used in practice. Among them, clustering, which aims at grouping objects into different classes on the basis of their similarity, is the most common form of unsupervised learning. However, most existing clustering algorithms are designed for static data, and hence are not best suited for streaming data. In this paper, we propose PC-DPC, a two-stage progressive clustering algorithm with graph-augmented density peak clustering. PC-DPC first identifies clusters of streaming data using an improved density peak clustering algorithm, and then merges newly arriving data into the existing data pool by measuring inter-cluster structural similarity, which considers the distance between a center and representative points. We illustrate the superiority of PC-DPC over several state-of-the-art clustering algorithms in terms of clustering accuracy and running time on publicly available benchmark datasets.
KW - Progressive clustering
KW - Streaming data
KW - Structural similarity
UR - http://www.scopus.com/inward/record.url?scp=85120998845&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85120998845&partnerID=8YFLogxK
U2 - 10.1016/j.engappai.2021.104566
DO - 10.1016/j.engappai.2021.104566
M3 - Article
AN - SCOPUS:85120998845
SN - 0952-1976
VL - 108
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 104566
ER -