TY - GEN
T1 - Hierarchical Vision Transformer-Based Deep Learning Architecture, RGB-D Sensing Fusion, and Multimodal LLM-Based Generative AI Pipeline for Automated Pavement Pothole Segmentation, Quantification, and Repair Recommendations
AU - Hu, Xi
AU - Assaad, Rayan H.
AU - Awada, Mohamad
AU - Catuosco, Thomas
N1 - Publisher Copyright:
© ASCE.
PY - 2025
Y1 - 2025
N2 - This paper proposes an automated approach for pavement pothole segmentation, quantification, and repair solution generation. The proposed approach integrates a deep learning (DL) architecture with hierarchical vision transformer (HViT) based on the SAM2-UNet architecture, RGB-D sensing fusion, and a multimodal large language model (LLM). It is designed to automatically (1) perform pothole segmentation on RGB images using a HViT-based DL model that allows multi-scale capturing of pavement pothole features, (2) retrieve comprehensive pothole properties through RGB-D fusion and point cloud processing, and (3) generate pothole-induced pavement repair solutions using multimodal LLM. Experimental results showed that the developed HViT-based DL model (i.e., SAM2-UNet) yields superior performance of pothole segmentation with mDice of 0.965 and mIoU of 0.937, while the integrated LLM can generate reasonable pothole-induced pavement repair recommendations. This paper contributes to the body of knowledge by offering a novel approach to advance automated pavement inspection towards more accurate, intelligent, and actionable direction.
AB - This paper proposes an automated approach for pavement pothole segmentation, quantification, and repair solution generation. The proposed approach integrates a deep learning (DL) architecture with hierarchical vision transformer (HViT) based on the SAM2-UNet architecture, RGB-D sensing fusion, and a multimodal large language model (LLM). It is designed to automatically (1) perform pothole segmentation on RGB images using a HViT-based DL model that allows multi-scale capturing of pavement pothole features, (2) retrieve comprehensive pothole properties through RGB-D fusion and point cloud processing, and (3) generate pothole-induced pavement repair solutions using multimodal LLM. Experimental results showed that the developed HViT-based DL model (i.e., SAM2-UNet) yields superior performance of pothole segmentation with mDice of 0.965 and mIoU of 0.937, while the integrated LLM can generate reasonable pothole-induced pavement repair recommendations. This paper contributes to the body of knowledge by offering a novel approach to advance automated pavement inspection towards more accurate, intelligent, and actionable direction.
UR - https://www.scopus.com/pages/publications/105031090185
UR - https://www.scopus.com/pages/publications/105031090185#tab=citedBy
U2 - 10.1061/9780784486436.013
DO - 10.1061/9780784486436.013
M3 - Conference contribution
AN - SCOPUS:105031090185
T3 - Computing in Civil Engineering 2025: Computational and Intelligent Technologies - Selected Papers from the ASCE International Conference on Computing in Civil Engineering 2025
SP - 116
EP - 125
BT - Computing in Civil Engineering 2025
A2 - Jafari, Amirhosein
A2 - Zhu, Yimin
PB - American Society of Civil Engineers (ASCE)
T2 - ASCE International Conference on Computing in Civil Engineering, i3CE 2025
Y2 - 11 May 2025 through 14 May 2025
ER -