Skip to main navigation Skip to search Skip to main content

Real-time workforce monitoring and management through robotic teleoperation, mobile multi-modal visual and auditory sensing, and edge deep learning analytics

Research output: Contribution to journalArticlepeer-review

Abstract

Effective workforce monitoring is crucial for ensuring workplace performance. Vision-based workforce monitoring (WFM) is promising but still struggles with adaptability, scene coverage, and occlusion. This study proposes a novel robotic teleoperation-enhanced multi-modal WFM framework with improved YOLOv9 (“You Only Look Once” version 9) architecture and ergonomics-informed learning paradigm for real-time workforce and site monitoring. The framework integrates teleoperated robots, deep learning (DL), edge computer vision, machine listening, and cloud-based visualization into a unified pipeline to perform imagery analytics for worker safety evaluation (WSE) and worker productivity assessment (WPA), alongside auditory analytics for jobsite monitoring. WSE involves detecting personal protective equipment (PPE) and assessing musculoskeletal disorder risk using a three-dimensional (3D) pose-based Rapid Entire Body Assessment. WPA executes 3D pose-based worker activity recognition (WAR) to analyze workers’ behaviors over time. A lightweight DL model with multi-scale feature extraction is developed to recognize 14 audio-based onsite activities, supplementing WSE and WPA when occlusion occurs. Experimental results showed: (1) the improved YOLOv9 architecture with transfer learning achieved a mean-average-precision at 0.5-0.95 intersection-over-union threshold (mAP50-95) of 66.93% (outperforming the YOLOv9 baseline by 7.5%); (2) random forest with ergonomics-informed learning outperformed other models for WAR with 84.40% accuracy (5% improvement than baseline); (3) the audio-based activity recognition model reached 88.10% accuracy, 86.86% recall, 88.64% precision, and 86.92% F-1 score; and (4) field tests demonstrated an inference speed of 4.3 frames per second for workforce monitoring and a 0.5 Hz refreshing rate for robot tracking. This study advances automated workforce monitoring towards greater intelligence, reliability, and comprehensiveness.

Original languageEnglish (US)
Article number114729
JournalEngineering Applications of Artificial Intelligence
Volume176
DOIs
StatePublished - Jul 15 2026

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Keywords

  • Computer vision
  • Edge deep learning
  • Machine listening
  • Multi-modality
  • Robotic teleoperation
  • Worker safety and productivity

Fingerprint

Dive into the research topics of 'Real-time workforce monitoring and management through robotic teleoperation, mobile multi-modal visual and auditory sensing, and edge deep learning analytics'. Together they form a unique fingerprint.

Cite this