Abstract
As a crucial data preprocessing method in data mining, feature selection (FS) can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected features. Evolutionary computing (EC) is promising for FS owing to its powerful search capability. However, in traditional EC-based methods, feature subsets are represented via a length-fixed individual encoding. It is ineffective for high-dimensional data, because it results in a huge search space and prohibitive training time. This work proposes a length-adaptive non-dominated sorting genetic algorithm (LA-NSGA) with a length-variable individual encoding and a length-adaptive evolution mechanism for bi-objective high-dimensional FS. In LA-NSGA, an initialization method based on correlation and redundancy is devised to initialize individuals of diverse lengths, and a Pareto dominance-based length change operator is introduced to guide individuals to explore in promising search space adaptively. Moreover, a dominance-based local search method is employed for further improvement. The experimental results based on 12 high-dimensional gene datasets show that the Pareto front of feature subsets produced by LA-NSGA is superior to those of existing algorithms.
Original language | English (US) |
---|---|
Pages (from-to) | 1834-1844 |
Number of pages | 11 |
Journal | IEEE/CAA Journal of Automatica Sinica |
Volume | 10 |
Issue number | 9 |
DOIs | |
State | Published - Sep 1 2023 |
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Information Systems
- Control and Optimization
- Artificial Intelligence
Keywords
- Bi-objective optimization
- feature selection (FS)
- genetic algorithm
- high-dimensional data
- length-adaptive