TY - GEN
T1 - Density peak-based pre-clustering support vector machine for multi-class imbalanced classification
AU - Di, Zonglin
AU - Kang, Qi
AU - Peng, Daogang
AU - Zhou, Mengchu
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - Imbalanced classification using a support vector machine (SVM) is a normal but crucial problem in machine learning. Compared with binary classification, multiclass classification is much more complicated. Most existing studies on imbalanced classification using SVM focus on binary imbalanced classification; while only few of them look into imbalanced classification with multiple classes. Pre-clustering is a useful technique to prepare proper data from an imbalanced dataset for a classifier. It can be used to extract the feature of a dataset first and improve classification performance. Density peak based on Euclidean distance proves its effectiveness and generality in clustering. Motivated by this and the fact that the number of clusters is known in multi-class classification using a one-vs-rest strategy, we combine density peak clustering and SVM to propose a new pre-clustering method to perform effective imbalanced classification with multiple classes. Specifically, we transform a multi-class classification problem into several binary classification tasks. The results on 5 public datasets in terms of F-measure, G-mean and Area Under Curve (AUC) show its superiority over the original SVM and SVM with other methods including random under-sampling, Synthetic Minority Oversampling Technique, pre-clustering using K-Means and EasyEnsemble methods using either a one-vs-rest or one-vs-one strategy.
AB - Imbalanced classification using a support vector machine (SVM) is a normal but crucial problem in machine learning. Compared with binary classification, multiclass classification is much more complicated. Most existing studies on imbalanced classification using SVM focus on binary imbalanced classification; while only few of them look into imbalanced classification with multiple classes. Pre-clustering is a useful technique to prepare proper data from an imbalanced dataset for a classifier. It can be used to extract the feature of a dataset first and improve classification performance. Density peak based on Euclidean distance proves its effectiveness and generality in clustering. Motivated by this and the fact that the number of clusters is known in multi-class classification using a one-vs-rest strategy, we combine density peak clustering and SVM to propose a new pre-clustering method to perform effective imbalanced classification with multiple classes. Specifically, we transform a multi-class classification problem into several binary classification tasks. The results on 5 public datasets in terms of F-measure, G-mean and Area Under Curve (AUC) show its superiority over the original SVM and SVM with other methods including random under-sampling, Synthetic Minority Oversampling Technique, pre-clustering using K-Means and EasyEnsemble methods using either a one-vs-rest or one-vs-one strategy.
UR - http://www.scopus.com/inward/record.url?scp=85076726688&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85076726688&partnerID=8YFLogxK
U2 - 10.1109/SMC.2019.8914451
DO - 10.1109/SMC.2019.8914451
M3 - Conference contribution
AN - SCOPUS:85076726688
T3 - Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
SP - 27
EP - 32
BT - 2019 IEEE International Conference on Systems, Man and Cybernetics, SMC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE International Conference on Systems, Man and Cybernetics, SMC 2019
Y2 - 6 October 2019 through 9 October 2019
ER -