Skip to main navigation Skip to search Skip to main content

Hierarchical Vision Transformer-Based Deep Learning Architecture, RGB-D Sensing Fusion, and Multimodal LLM-Based Generative AI Pipeline for Automated Pavement Pothole Segmentation, Quantification, and Repair Recommendations

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper proposes an automated approach for pavement pothole segmentation, quantification, and repair solution generation. The proposed approach integrates a deep learning (DL) architecture with hierarchical vision transformer (HViT) based on the SAM2-UNet architecture, RGB-D sensing fusion, and a multimodal large language model (LLM). It is designed to automatically (1) perform pothole segmentation on RGB images using a HViT-based DL model that allows multi-scale capturing of pavement pothole features, (2) retrieve comprehensive pothole properties through RGB-D fusion and point cloud processing, and (3) generate pothole-induced pavement repair solutions using multimodal LLM. Experimental results showed that the developed HViT-based DL model (i.e., SAM2-UNet) yields superior performance of pothole segmentation with mDice of 0.965 and mIoU of 0.937, while the integrated LLM can generate reasonable pothole-induced pavement repair recommendations. This paper contributes to the body of knowledge by offering a novel approach to advance automated pavement inspection towards more accurate, intelligent, and actionable direction.

Original languageEnglish (US)
Title of host publicationComputing in Civil Engineering 2025
Subtitle of host publicationComputational and Intelligent Technologies - Selected Papers from the ASCE International Conference on Computing in Civil Engineering 2025
EditorsAmirhosein Jafari, Yimin Zhu
PublisherAmerican Society of Civil Engineers (ASCE)
Pages116-125
Number of pages10
ISBN (Electronic)9780784486436
DOIs
StatePublished - 2025
EventASCE International Conference on Computing in Civil Engineering, i3CE 2025 - New Orleans, United States
Duration: May 11 2025May 14 2025

Publication series

NameComputing in Civil Engineering 2025: Computational and Intelligent Technologies - Selected Papers from the ASCE International Conference on Computing in Civil Engineering 2025

Conference

ConferenceASCE International Conference on Computing in Civil Engineering, i3CE 2025
Country/TerritoryUnited States
CityNew Orleans
Period5/11/255/14/25

All Science Journal Classification (ASJC) codes

  • Civil and Structural Engineering
  • Computer Science Applications
  • Artificial Intelligence
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Hierarchical Vision Transformer-Based Deep Learning Architecture, RGB-D Sensing Fusion, and Multimodal LLM-Based Generative AI Pipeline for Automated Pavement Pothole Segmentation, Quantification, and Repair Recommendations'. Together they form a unique fingerprint.

Cite this