A Noise-Filtered Under-Sampling Scheme for Imbalanced Classification

Qi Kang, Xiao Shuang Chen, Si Si Li, Meng Chu Zhou

Research output: Contribution to journalArticlepeer-review

201 Scopus citations

Abstract

Under-sampling is a popular data preprocessing method in dealing with class imbalance problems, with the purposes of balancing datasets to achieve a high classification rate and avoiding the bias toward majority class examples. It always uses full minority data in a training dataset. However, some noisy minority examples may reduce the performance of classifiers. In this paper, a new under-sampling scheme is proposed by incorporating a noise filter before executing resampling. In order to verify the efficiency, this scheme is implemented based on four popular under-sampling methods, i.e., Undersampling + Adaboost, RUSBoost, UnderBagging, and EasyEnsemble through benchmarks and significance analysis. Furthermore, this paper also summarizes the relationship between algorithm performance and imbalanced ratio. Experimental results indicate that the proposed scheme can improve the original undersampling-based methods with significance in terms of three popular metrics for imbalanced classification, i.e., the area under the curve, F-measure, and G-mean.

Original languageEnglish (US)
Article number7589046
Pages (from-to)4263-4274
Number of pages12
JournalIEEE Transactions on Cybernetics
Volume47
Issue number12
DOIs
StatePublished - Dec 2017

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Information Systems
  • Human-Computer Interaction
  • Computer Science Applications
  • Electrical and Electronic Engineering

Keywords

  • Big data
  • class imbalance
  • ensemble
  • learning method
  • noise filter
  • resampling
  • under-sampling

Fingerprint

Dive into the research topics of 'A Noise-Filtered Under-Sampling Scheme for Imbalanced Classification'. Together they form a unique fingerprint.

Cite this