TY - JOUR
T1 - A Length-Adaptive Non-Dominated Sorting Genetic Algorithm for Bi-Objective High-Dimensional Feature Selection
AU - Gong, Yanlu
AU - Zhou, Junhai
AU - Wu, Quanwang
AU - Zhou, Mengchu
AU - Wen, Junhao
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China (62172065, 62072060).
Publisher Copyright:
© 2014 Chinese Association of Automation.
PY - 2023/9/1
Y1 - 2023/9/1
N2 - As a crucial data preprocessing method in data mining, feature selection (FS) can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected features. Evolutionary computing (EC) is promising for FS owing to its powerful search capability. However, in traditional EC-based methods, feature subsets are represented via a length-fixed individual encoding. It is ineffective for high-dimensional data, because it results in a huge search space and prohibitive training time. This work proposes a length-adaptive non-dominated sorting genetic algorithm (LA-NSGA) with a length-variable individual encoding and a length-adaptive evolution mechanism for bi-objective high-dimensional FS. In LA-NSGA, an initialization method based on correlation and redundancy is devised to initialize individuals of diverse lengths, and a Pareto dominance-based length change operator is introduced to guide individuals to explore in promising search space adaptively. Moreover, a dominance-based local search method is employed for further improvement. The experimental results based on 12 high-dimensional gene datasets show that the Pareto front of feature subsets produced by LA-NSGA is superior to those of existing algorithms.
AB - As a crucial data preprocessing method in data mining, feature selection (FS) can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected features. Evolutionary computing (EC) is promising for FS owing to its powerful search capability. However, in traditional EC-based methods, feature subsets are represented via a length-fixed individual encoding. It is ineffective for high-dimensional data, because it results in a huge search space and prohibitive training time. This work proposes a length-adaptive non-dominated sorting genetic algorithm (LA-NSGA) with a length-variable individual encoding and a length-adaptive evolution mechanism for bi-objective high-dimensional FS. In LA-NSGA, an initialization method based on correlation and redundancy is devised to initialize individuals of diverse lengths, and a Pareto dominance-based length change operator is introduced to guide individuals to explore in promising search space adaptively. Moreover, a dominance-based local search method is employed for further improvement. The experimental results based on 12 high-dimensional gene datasets show that the Pareto front of feature subsets produced by LA-NSGA is superior to those of existing algorithms.
KW - Bi-objective optimization
KW - feature selection (FS)
KW - genetic algorithm
KW - high-dimensional data
KW - length-adaptive
UR - http://www.scopus.com/inward/record.url?scp=85168799494&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85168799494&partnerID=8YFLogxK
U2 - 10.1109/JAS.2023.123648
DO - 10.1109/JAS.2023.123648
M3 - Article
AN - SCOPUS:85168799494
SN - 2329-9266
VL - 10
SP - 1834
EP - 1844
JO - IEEE/CAA Journal of Automatica Sinica
JF - IEEE/CAA Journal of Automatica Sinica
IS - 9
ER -