Abstract
An oversampling technique balances a dataset by increasing the number of minority samples. It is a common and effective method in imbalanced learning. However, most oversampling methods have randomness in generating minority samples, which would have negative impacts on the prediction performance of subsequent classifiers. This study treats the prediction made by classifiers as a black-box optimization problem. The optimization objective is to improve the classification accuracy of subsequent classifiers for minority samples. The solution of this optimization problem can be regarded as a minority sample that can be and added to the imbalanced dataset. The minority samples are iteratively generated by Bayesian optimization (BO). We determine two valuable intervals for each 1-D continuous variable feature. One is the interval with the densest minority samples. The other is that with the sparsest majority samples distributed among the minority samples. By adjusting the proportion of samples generated in the two areas, the presented algorithm can be flexibly applied to different datasets. In order to reduce the noise that may be caused by the exploration phase of BO, a sample selection procedure is carried out to eliminate the samples that are worse than those generated at the previous iteration. The samples generated in this way are based on the principle of improving the performance of the classifier, thus avoiding the negative effects of randomness. Experimental results via twenty open imbalanced datasets show that the proposed method obtains better results than existing state-of-The-Art oversampling models, thus well advancing the important field of imbalanced learning.
Original language | English (US) |
---|---|
Pages (from-to) | 2079-2091 |
Number of pages | 13 |
Journal | IEEE Transactions on Systems, Man, and Cybernetics: Systems |
Volume | 54 |
Issue number | 4 |
DOIs | |
State | Published - Apr 1 2024 |
All Science Journal Classification (ASJC) codes
- Software
- Human-Computer Interaction
- Electrical and Electronic Engineering
- Control and Systems Engineering
- Computer Science Applications
Keywords
- Bayesian optimization (BO)
- data analysis
- imbalance problems
- machine learning multilayer perceptron
- oversampling
- synthetic minority oversampling technique (SMOTE)