TY - GEN
T1 - DICE
T2 - 35th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2023
AU - Wu, Hailun
AU - Dong, Ziqian
AU - Rojas-Cessa, Roberto
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Understanding key factors that affect users' commute mode choice is essential to design policies that promote sustainable transportation. However, the reliance on survey data for these studies often faces incomplete data challenges. One of the regional transportation surveys obtained for the study on commute mode decision-making misses 97% of the parking cost data, an important factor in people's decision-making. To tackle the problem, we propose the data imputation for cost estimates (DICE) scheme to synthesize data from multiple sources to infer the missing data. DICE linearly maps imputed values to missing entries based on the assumption that higher-income users can spend more on their commute. In the absence of ground truth data, we propose to use the accuracy of the regression model trained with the imputed data as a metric to evaluate DICE. We train the regression model with 75% of the imputed data, test it with the remainder, and evaluate it with the complete cases. The prediction accuracy of the test data and the evaluation data are 0.89 and 0.77, respectively. The results indicate that the imputed data and complete cases share similar distributions and the model trained with the imputed data can perform classification. We tested DICE using a 1995 transportation survey and a 2021 housing survey data sets where cost is considered a key feature in decision-making. In both cases, the regression model achieves higher than 0.7 prediction accuracy, which proves the applicability of DICE on different data sets.
AB - Understanding key factors that affect users' commute mode choice is essential to design policies that promote sustainable transportation. However, the reliance on survey data for these studies often faces incomplete data challenges. One of the regional transportation surveys obtained for the study on commute mode decision-making misses 97% of the parking cost data, an important factor in people's decision-making. To tackle the problem, we propose the data imputation for cost estimates (DICE) scheme to synthesize data from multiple sources to infer the missing data. DICE linearly maps imputed values to missing entries based on the assumption that higher-income users can spend more on their commute. In the absence of ground truth data, we propose to use the accuracy of the regression model trained with the imputed data as a metric to evaluate DICE. We train the regression model with 75% of the imputed data, test it with the remainder, and evaluate it with the complete cases. The prediction accuracy of the test data and the evaluation data are 0.89 and 0.77, respectively. The results indicate that the imputed data and complete cases share similar distributions and the model trained with the imputed data can perform classification. We tested DICE using a 1995 transportation survey and a 2021 housing survey data sets where cost is considered a key feature in decision-making. In both cases, the regression model achieves higher than 0.7 prediction accuracy, which proves the applicability of DICE on different data sets.
KW - commute mode choice
KW - data imputation
KW - decision-making
KW - logistic regression
KW - multiple data sources
KW - regression
UR - http://www.scopus.com/inward/record.url?scp=85182390379&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85182390379&partnerID=8YFLogxK
U2 - 10.1109/ICTAI59109.2023.00029
DO - 10.1109/ICTAI59109.2023.00029
M3 - Conference contribution
AN - SCOPUS:85182390379
T3 - Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
SP - 149
EP - 154
BT - Proceedings - 2023 IEEE 35th International Conference on Tools with Artificial Intelligence, ICTAI 2023
PB - IEEE Computer Society
Y2 - 6 November 2023 through 8 November 2023
ER -