Decision tree rule-based feature selection for large-scale imbalanced data

Haoyue Liu, Mengchu Zhou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

A class imbalance problem often appears in many real world applications, e.g. fault diagnosis, text categorization, fraud detection. When dealing with a large-scale imbalanced dataset, feature selection becomes a great challenge. To confront it, this work proposes a feature selection approach based on a decision tree rule. The effectiveness of the proposed approach is verified by classifying a large-scale dataset from Santander Bank. The results show that our approach can achieve higher Area Under the Curve (AUC) and less computational time. We also compare it with filter-based feature selection approaches, i.e., Chi-Square and F-statistic. The results show that it outperforms them but needs slightly more computational efforts.

Original languageEnglish (US)
Title of host publication2017 26th Wireless and Optical Communication Conference, WOCC 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509049097
DOIs
StatePublished - May 15 2017
Event26th Wireless and Optical Communication Conference, WOCC 2017 - Newark, United States
Duration: Apr 7 2017Apr 8 2017

Publication series

Name2017 26th Wireless and Optical Communication Conference, WOCC 2017

Other

Other26th Wireless and Optical Communication Conference, WOCC 2017
Country/TerritoryUnited States
CityNewark
Period4/7/174/8/17

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems and Management
  • Information Systems
  • Electronic, Optical and Magnetic Materials

Keywords

  • Decision tree
  • feature selection
  • large-scale imbalanced data

Fingerprint

Dive into the research topics of 'Decision tree rule-based feature selection for large-scale imbalanced data'. Together they form a unique fingerprint.

Cite this