ARSFineTune: On-the-Fly Tuning of Vision Models for Unmanned Ground Vehicles

Masud Ahmed, Zahid Hasan, Abu Zaher Md Faridee, Mohammad Saeid Anwar, Kasthuri Jayarajah, Sanjay Purushotham, Suya You, Nirmalya Roy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The performance of semantic segmentation (SS) can degrade when the data distribution in the deployed environment is different from what the model initially learned during their training. While domain adaptation (DA) and continual learning (CL) methods have been proposed to improve performance in new or unseen domains over time, the effort required for annotating large swathes of training data during deployment is non-trivial; acquiring new data for training incurs both significant network, device memory costs, and manual effort for labeling. To address this, we propose ARSFineTune, a novel framework that actively selects the most informative regions of visuals encountered by a mobile robot for the CL network to learn from, greatly minimizing the data transfer overhead related to annotations. We first propose a proficient entropy-driven ranking mechanism to identify candidate regions and rank challenging images at the edge node. We then facilitate a cyclical feedback loop between the server and edge, continuously refining the accuracy of semantic segmentation by fine-tuning the model with minimal transferred data to/from the field deployed device. We implement ARSFineTune in a real-time setting using the Robotics Operating System (ROS), where a Jackal (an unmanned ground vehicle - UGV) collaborates with the central server. Through extensive experiments, we found that ARSFineTune delivers competitive performance, closely aligning with existing state-of-the-art techniques, while requiring substantially less data for fine-tuning. Specifically, with only 5% of the total labeled regions (25% challenging regions of the most 20% problematic samples) of the entire dataset for fine-tuning, ARSFineTune reaches a performance level nearly identical (≈ 97%) to the previous state-of-the-art model, boasting mIoU scores of 59.5% on the Cityscape dataset and 41% on the CAD-EdgeTune dataset, which is a challenging dataset due to varying lighting conditions over time. The reduction in annotation efforts also contribute to a 23.5% improved network latency and 41% less memory usage during model inference stage on the UGV vehicle; 79% reduction in data transfer time between UGV and annotation server and finally, 16.59% reduction in latency, 28.57% less power usage and 10% less memory usage in the server during model fine-tuning stage..

Original languageEnglish (US)
Title of host publicationProceedings - 2024 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things, DCOSS-IoT 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages170-178
Number of pages9
ISBN (Electronic)9798350369441
DOIs
StatePublished - 2024
Event20th Annual International Conference on Distributed Computing in Smart Systems and the Internet of Things, DCOSS-IoT 2024 - Abu Dhabi, United Arab Emirates
Duration: Apr 29 2024May 1 2024

Publication series

NameProceedings - 2024 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things, DCOSS-IoT 2024

Conference

Conference20th Annual International Conference on Distributed Computing in Smart Systems and the Internet of Things, DCOSS-IoT 2024
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period4/29/245/1/24

All Science Journal Classification (ASJC) codes

  • Modeling and Simulation
  • Artificial Intelligence
  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Information Systems and Management
  • Control and Optimization

Keywords

  • Active learning
  • continual learning
  • Robot Operating System (ROS)
  • semantic segmentation

Fingerprint

Dive into the research topics of 'ARSFineTune: On-the-Fly Tuning of Vision Models for Unmanned Ground Vehicles'. Together they form a unique fingerprint.

Cite this