TY - JOUR
T1 - Short-Term Segment-Level Crash Risk Prediction Using Advanced Data Modeling with Proactive and Reactive Crash Data
AU - Dimitrijevic, Branislav
AU - Khales, Sina Darban
AU - Asadi, Roksana
AU - Lee, Joyoung
N1 - Funding Information:
Funding: This research was funded by the National University Transportation Center Consortium led by CAIT, as well as John A. Reif, Jr. Department of Civil and Environmental Engineering at the New Jersey Institute of Technology, grant number DTRT13-G-UTC28.
Publisher Copyright:
© 2022 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - Highway crashes, along with the property damage, personal injuries, and fatalities that they cause, continue to present one of the most significant and critical transportation problems. At the same time, provision of safe travel is one of the main goals of any transportation system. For this reason, both in transportation research and practice much attention has been given to the analysis and modeling of traffic crashes, including the development of models that can be applied to predict crash occurrence and crash severity. In general, such models assess short-term crash risks at a given highway facility, thus providing intelligence that can be used to identify and implement traffic operations strategies for crash mitigation and prevention. This paper presents several crash risk and injury severity assessment models applied at a highway segment level, considering the input data that is typically collected or readily available to most transportation agencies in real-time and at a regional network scale, which would render them readily applicable in practice. The input data included roadway geometry characteristics, traffic flow characteristics, and weather condition data. The paper develops, tests, and compares the performance of models that employ Random effects Bayesian Logistics Regression, Gaussian Naïve Bayes, K-Nearest Neighbor, Random Forest, and Gradient Boosting Machine methods. The paper applies random oversampling examples (ROSE) method to deal with the problem of data imbalance associated with the injury severity analysis. The models were trained and tested using a dataset of 10,155 crashes that occurred on two interstate highways in New Jersey over a two-year period. The paper also analyzes the potential improvement in the prediction abilities of the tested models by adding reactive data to the analysis. To that end, traffic crashes were classified in multiple classes based on the driver age and the vehicle age to assess the impact of these attributes on driver injury severity outcomes. The results of this analysis are promising, showing that the simultaneous use of reactive and proactive data can improve the prediction performance of the presented models.
AB - Highway crashes, along with the property damage, personal injuries, and fatalities that they cause, continue to present one of the most significant and critical transportation problems. At the same time, provision of safe travel is one of the main goals of any transportation system. For this reason, both in transportation research and practice much attention has been given to the analysis and modeling of traffic crashes, including the development of models that can be applied to predict crash occurrence and crash severity. In general, such models assess short-term crash risks at a given highway facility, thus providing intelligence that can be used to identify and implement traffic operations strategies for crash mitigation and prevention. This paper presents several crash risk and injury severity assessment models applied at a highway segment level, considering the input data that is typically collected or readily available to most transportation agencies in real-time and at a regional network scale, which would render them readily applicable in practice. The input data included roadway geometry characteristics, traffic flow characteristics, and weather condition data. The paper develops, tests, and compares the performance of models that employ Random effects Bayesian Logistics Regression, Gaussian Naïve Bayes, K-Nearest Neighbor, Random Forest, and Gradient Boosting Machine methods. The paper applies random oversampling examples (ROSE) method to deal with the problem of data imbalance associated with the injury severity analysis. The models were trained and tested using a dataset of 10,155 crashes that occurred on two interstate highways in New Jersey over a two-year period. The paper also analyzes the potential improvement in the prediction abilities of the tested models by adding reactive data to the analysis. To that end, traffic crashes were classified in multiple classes based on the driver age and the vehicle age to assess the impact of these attributes on driver injury severity outcomes. The results of this analysis are promising, showing that the simultaneous use of reactive and proactive data can improve the prediction performance of the presented models.
KW - Crash injury severity
KW - Crash likelihood
KW - Crash prediction
KW - Crash risk analysis
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=85122921628&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85122921628&partnerID=8YFLogxK
U2 - 10.3390/app12020856
DO - 10.3390/app12020856
M3 - Article
AN - SCOPUS:85122921628
SN - 2076-3417
VL - 12
JO - Applied Sciences (Switzerland)
JF - Applied Sciences (Switzerland)
IS - 2
M1 - 856
ER -