Abstract
The small-for-gestational-age (SGA) condition often causes serious problems. Therefore, identifying the risk factors for SGA is important. Traditional statistical methods such as stepwise logistic regression (LR) have been widely utilized to discover possible risk factors. However, other feature selection methods from machine learning field have rarely been employed for the task. In this paper, a comparison of five feature selection methods from both fields for SGA risk factors analysis is conducted for the first time. To evaluate their performance, four classification algorithms are used to construct SGA prediction models. The evaluation criteria are precision and the area under the receiver operator characteristic curve. Stepwise LR achieves the best performance among the five feature selection methods, because it conducts both a univariate significance test and a model significance test, which make it more suitable for handling the complex relations among features. The top 20 features selected by each feature selection method and the 27 features selected by four or five of them could assist physicians to revise traditional SGA evaluation models. Ensemble method is also exploited to build effective SGA prediction models based on the feature subsets, which is indeed superior compared with the individual ones shown in the results.
Original language | English (US) |
---|---|
Pages (from-to) | 1-15 |
Number of pages | 15 |
Journal | Journal of Ambient Intelligence and Humanized Computing |
DOIs | |
State | Accepted/In press - Jun 7 2018 |
All Science Journal Classification (ASJC) codes
- General Computer Science
Keywords
- Feature selection
- Machine learning
- Prediction model
- Small-for-gestational-age