TY - GEN
T1 - Integrating Large Language Models (LLMs) with Autonomous Aerial Drone Robotics and Computer Vision for Contextual Adaptive Construction Site Safety Management and Risk Assessment
AU - Poudel, Oscar
AU - Assaad, Rayan H.
AU - Awada, Mohamad
N1 - Publisher Copyright:
© ASCE.
PY - 2025
Y1 - 2025
N2 - Integrating large language models (LLMs) and robotics offers transformative potential for enhancing construction site safety monitoring, real-time risk assessment, and situational response. This research proposes an intelligent drone-based system that leverages real-time object detection and contextual image analysis with advanced LLM reasoning capabilities for construction site supervision. A fine-tuned YOLOv11n model (using transfer learning) was developed for detecting 10 different construction site safety-related classes. In critical safety violations (e.g., "No Hardhat"), the corresponding image frame is sent to CLIP (Contrastive Language-Image Pretraining) for generating image-based descriptions. These data are processed by a fine-tuned LLM to generate construction-specific textual prompts, which are converted to audio and broadcast via a drone-mounted speaker. The system operates autonomously using a D* planning algorithm. Detection, response generation, and navigation capabilities were evaluated in a simulated environment using Webots, and the pipeline from object segmentation to audio generation was ported to a real-world drone.
AB - Integrating large language models (LLMs) and robotics offers transformative potential for enhancing construction site safety monitoring, real-time risk assessment, and situational response. This research proposes an intelligent drone-based system that leverages real-time object detection and contextual image analysis with advanced LLM reasoning capabilities for construction site supervision. A fine-tuned YOLOv11n model (using transfer learning) was developed for detecting 10 different construction site safety-related classes. In critical safety violations (e.g., "No Hardhat"), the corresponding image frame is sent to CLIP (Contrastive Language-Image Pretraining) for generating image-based descriptions. These data are processed by a fine-tuned LLM to generate construction-specific textual prompts, which are converted to audio and broadcast via a drone-mounted speaker. The system operates autonomously using a D* planning algorithm. Detection, response generation, and navigation capabilities were evaluated in a simulated environment using Webots, and the pipeline from object segmentation to audio generation was ported to a real-world drone.
UR - https://www.scopus.com/pages/publications/105030973905
UR - https://www.scopus.com/pages/publications/105030973905#tab=citedBy
U2 - 10.1061/9780784486443.056
DO - 10.1061/9780784486443.056
M3 - Conference contribution
AN - SCOPUS:105030973905
T3 - Computing in Civil Engineering 2025: Resilient, Robotic, and Educational Systems - Selected Papers from the ASCE International Conference on Computing in Civil Engineering 2025
SP - 509
EP - 518
BT - Computing in Civil Engineering 2025
A2 - Jafari, Amirhosein
A2 - Zhu, Yimin
PB - American Society of Civil Engineers (ASCE)
T2 - ASCE International Conference on Computing in Civil Engineering, i3CE 2025
Y2 - 11 May 2025 through 14 May 2025
ER -