TY - GEN
T1 - An Adaptive Pre-clustering Support Vector Machine for Binary Imbalanced Classification
AU - Di, Zonglin
AU - Yao, Siya
AU - Kang, Qi
AU - Zhou, Mengchu
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - Imbalance classification is a common but critical problem in machine learning and artificial intelligence. Derived from structural risk minimization, a Support Vector Machine (SVM) enjoys great reputation in classification. However, the original SVM is not suitable for the imbalance classification and the existing modifications of SVM for this kind of problems fail to take the distribution of datasets into full consideration, thereby leading to some remarkable loss in their classification performance. Recently, an Adaptive Clustering by Fast Search and Find of Density Peaks (ADPclust) is proposed and performs well in finding cluster centroids in a sample space automatically and more reliably by using adaptive density peak detection and silhouette theory. Motivated by this, this work proposes an adaptive pre-clustering SVM (AP-SVM) such that the information of the original dataset distribution is well utilized to yield balanced sub-datasets for accurate and efficient classification. Specifically, AP-SVM clusters the majority into several groups given a dataset and then applies undersampling on every cluster to re-balance the dataset to be used in the SVM classification step. After experiments on 10 binary public datasets and evaluation using Area Under Curve (AUC), F-Measure, G-Mean, we well show the superiority of the proposed method over SVM, Synthetic Minority Over-sampling Technique algorithm (SMOTE), Undersampling-SVM (U-SVM), K-Means, Fuzzy C Means and EasyEnsemble.
AB - Imbalance classification is a common but critical problem in machine learning and artificial intelligence. Derived from structural risk minimization, a Support Vector Machine (SVM) enjoys great reputation in classification. However, the original SVM is not suitable for the imbalance classification and the existing modifications of SVM for this kind of problems fail to take the distribution of datasets into full consideration, thereby leading to some remarkable loss in their classification performance. Recently, an Adaptive Clustering by Fast Search and Find of Density Peaks (ADPclust) is proposed and performs well in finding cluster centroids in a sample space automatically and more reliably by using adaptive density peak detection and silhouette theory. Motivated by this, this work proposes an adaptive pre-clustering SVM (AP-SVM) such that the information of the original dataset distribution is well utilized to yield balanced sub-datasets for accurate and efficient classification. Specifically, AP-SVM clusters the majority into several groups given a dataset and then applies undersampling on every cluster to re-balance the dataset to be used in the SVM classification step. After experiments on 10 binary public datasets and evaluation using Area Under Curve (AUC), F-Measure, G-Mean, we well show the superiority of the proposed method over SVM, Synthetic Minority Over-sampling Technique algorithm (SMOTE), Undersampling-SVM (U-SVM), K-Means, Fuzzy C Means and EasyEnsemble.
UR - http://www.scopus.com/inward/record.url?scp=85062206106&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062206106&partnerID=8YFLogxK
U2 - 10.1109/SMC.2018.00124
DO - 10.1109/SMC.2018.00124
M3 - Conference contribution
AN - SCOPUS:85062206106
T3 - Proceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018
SP - 681
EP - 686
BT - Proceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018
Y2 - 7 October 2018 through 10 October 2018
ER -