As one of the most complicated processes in biological development, ageing remains poorly understood. These days more and more ageing-related gene datasets become available on the web, where each instance is characterized by a set of hierarchically-organized binary features. Traditional data mining methods show limitations in exploiting this hierarchical feature space. This paper proposes a hybrid hierarchical feature selection (HHFS) method for classifying genes into pro-longevity or anti-longevity ones. HHFS conducts lazy and eager feature selections sequentially, taking into account both uniqueness of a test instance and the whole characteristics of datasets. It adopts two complementary relevancy metrics (i.e., Gini purity and mutual information) to remove hierarchical redundancy. The experiments are conducted based on the ageing-related gene data of four model organisms. The results show that HHFS achieves significantly better prediction performance than several state-of-the-art methods.
|Original language||English (US)|
|Number of pages||1|
|Journal||IEEE Transactions on Cognitive and Developmental Systems|
|State||Accepted/In press - 2022|
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Feature extraction
- feature selection
- gene ontology
- hierarchical feature space.