NUS: Noisy-Sample-Removed Undersampling Scheme for Imbalanced Classification and Application to Credit Card Fraud Detection

Honghao Zhu, Meng Chu Zhou, Guanjun Liu, Yu Xie, Shijun Liu, Cheng Guo

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

Since minority samples are substantially less common than majority samples, many industrial applications, such as credit card fraud detection (CCFD) and defective part identification, call for imbalanced classification. The performance of a classifier tends to suffer from the noisy samples in majority or minority classes. This work proposes a new undersampling scheme, called a clustering-based noisy-sample-removed undersampling scheme (NUS) for imbalanced classification. The majority class samples are first clustered. The distance of the majority class sample from the cluster center that is furthest away is used as the radius to build a hypersphere, with each cluster's center assumed to be a spherical center. We determine the Euclidean distance between the center of a cluster and each minority sample to find whether they are in the hypersphere or not. Afterward, we exclude noisy samples from the minority class. The noisy samples of majority classes are removed by using the same procedure. Second, we propose an NUS, which combines noisy sample removal with undersampling techniques. Finally, to prove the effectiveness of NUS, we integrate NUS with the basic classifiers random forest (RF), decision tree (DT), and logistics regression (LR). We conduct their comparison with seven undersampling, oversampling, and noisy-sample-removed methods. This work performs experiments on 13 public and three real transaction datasets related to e-commerce. The results show that NUS plays a positive role in promoting existing classifiers' performance.

Original languageEnglish (US)
Pages (from-to)1793-1804
Number of pages12
JournalIEEE Transactions on Computational Social Systems
Volume11
Issue number2
DOIs
StatePublished - Apr 1 2024

All Science Journal Classification (ASJC) codes

  • Human-Computer Interaction
  • Social Sciences (miscellaneous)
  • Modeling and Simulation

Keywords

  • Clustering
  • credit card fraud detection (CCFD)
  • noisy sample removed
  • undersampling

Fingerprint

Dive into the research topics of 'NUS: Noisy-Sample-Removed Undersampling Scheme for Imbalanced Classification and Application to Credit Card Fraud Detection'. Together they form a unique fingerprint.

Cite this