Weighted Gini index feature selection method for imbalanced data

Haoyue Liu, Mengchu Zhou, Xiaoyu Sean Lu, Cynthia Yao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

44 Scopus citations

Abstract

An imbalanced class problem occurs within abundant real-world applications, e.g., fraud detection, text classification, and cancer diagnosis. Beside balancing the imbalanced data distribution to deal with imbalanced data problems, another significant way to solve the bias-to-majority problem is via proper feature selection. This work is intended to use a feature selection method that can choose a subset of features and make ROC AUC and F-measure results in order to achieve high performance on a minority class. In this paper, a weighted Gini index(WGI) feature selection method is proposed. In order to evaluate the proposed method, a comparison result among Chi-square, F-statistic and Gini index feature selection is shown, and Xgboost is the classifier that is used to test the performance of the subset of features. Experimental results indicate that F-statistic contains the best performance when a few features are selected. However, when the number of selected features increases, WGI feature selection achieves the best results. A comparison between the average results from ROC AUC and F-measure are also presented. It shows that ROC AUC always contains a good performance, even if only a few features are selected, and only changes slightly as the subset of features expands. However, the performance of F-measure achieves a good performance after 60% of features are chosen. The results are helpful for practitioners to select a proper feature selection method when facing a practical problem.

Original languageEnglish (US)
Title of host publicationICNSC 2018 - 15th IEEE International Conference on Networking, Sensing and Control
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-6
Number of pages6
ISBN (Electronic)9781538650530
DOIs
StatePublished - May 18 2018
Event15th IEEE International Conference on Networking, Sensing and Control, ICNSC 2018 - Zhuhai, China
Duration: Mar 27 2018Mar 29 2018

Publication series

NameICNSC 2018 - 15th IEEE International Conference on Networking, Sensing and Control

Other

Other15th IEEE International Conference on Networking, Sensing and Control, ICNSC 2018
Country/TerritoryChina
CityZhuhai
Period3/27/183/29/18

All Science Journal Classification (ASJC) codes

  • Instrumentation
  • Artificial Intelligence
  • Computer Networks and Communications
  • Control and Optimization
  • Modeling and Simulation

Keywords

  • feature selection
  • imbalanced data
  • weighted gini index

Fingerprint

Dive into the research topics of 'Weighted Gini index feature selection method for imbalanced data'. Together they form a unique fingerprint.

Cite this