BO-SMOTE: A Novel Bayesian-Optimization-Based Synthetic Minority Oversampling Technique

Shen Yan, Ziyan Zhao, Shixin Liu, Mengchu Zhou

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

An oversampling technique balances a dataset by increasing the number of minority samples. It is a common and effective method in imbalanced learning. However, most oversampling methods have randomness in generating minority samples, which would have negative impacts on the prediction performance of subsequent classifiers. This study treats the prediction made by classifiers as a black-box optimization problem. The optimization objective is to improve the classification accuracy of subsequent classifiers for minority samples. The solution of this optimization problem can be regarded as a minority sample that can be and added to the imbalanced dataset. The minority samples are iteratively generated by Bayesian optimization (BO). We determine two valuable intervals for each 1-D continuous variable feature. One is the interval with the densest minority samples. The other is that with the sparsest majority samples distributed among the minority samples. By adjusting the proportion of samples generated in the two areas, the presented algorithm can be flexibly applied to different datasets. In order to reduce the noise that may be caused by the exploration phase of BO, a sample selection procedure is carried out to eliminate the samples that are worse than those generated at the previous iteration. The samples generated in this way are based on the principle of improving the performance of the classifier, thus avoiding the negative effects of randomness. Experimental results via twenty open imbalanced datasets show that the proposed method obtains better results than existing state-of-The-Art oversampling models, thus well advancing the important field of imbalanced learning.

Original languageEnglish (US)
Pages (from-to)2079-2091
Number of pages13
JournalIEEE Transactions on Systems, Man, and Cybernetics: Systems
Volume54
Issue number4
DOIs
StatePublished - Apr 1 2024

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Electrical and Electronic Engineering
  • Control and Systems Engineering
  • Computer Science Applications

Keywords

  • Bayesian optimization (BO)
  • data analysis
  • imbalance problems
  • machine learning multilayer perceptron
  • oversampling
  • synthetic minority oversampling technique (SMOTE)

Fingerprint

Dive into the research topics of 'BO-SMOTE: A Novel Bayesian-Optimization-Based Synthetic Minority Oversampling Technique'. Together they form a unique fingerprint.

Cite this