TY - GEN
T1 - ARSFineTune
T2 - 20th Annual International Conference on Distributed Computing in Smart Systems and the Internet of Things, DCOSS-IoT 2024
AU - Ahmed, Masud
AU - Hasan, Zahid
AU - Faridee, Abu Zaher Md
AU - Anwar, Mohammad Saeid
AU - Jayarajah, Kasthuri
AU - Purushotham, Sanjay
AU - You, Suya
AU - Roy, Nirmalya
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - The performance of semantic segmentation (SS) can degrade when the data distribution in the deployed environment is different from what the model initially learned during their training. While domain adaptation (DA) and continual learning (CL) methods have been proposed to improve performance in new or unseen domains over time, the effort required for annotating large swathes of training data during deployment is non-trivial; acquiring new data for training incurs both significant network, device memory costs, and manual effort for labeling. To address this, we propose ARSFineTune, a novel framework that actively selects the most informative regions of visuals encountered by a mobile robot for the CL network to learn from, greatly minimizing the data transfer overhead related to annotations. We first propose a proficient entropy-driven ranking mechanism to identify candidate regions and rank challenging images at the edge node. We then facilitate a cyclical feedback loop between the server and edge, continuously refining the accuracy of semantic segmentation by fine-tuning the model with minimal transferred data to/from the field deployed device. We implement ARSFineTune in a real-time setting using the Robotics Operating System (ROS), where a Jackal (an unmanned ground vehicle - UGV) collaborates with the central server. Through extensive experiments, we found that ARSFineTune delivers competitive performance, closely aligning with existing state-of-the-art techniques, while requiring substantially less data for fine-tuning. Specifically, with only 5% of the total labeled regions (25% challenging regions of the most 20% problematic samples) of the entire dataset for fine-tuning, ARSFineTune reaches a performance level nearly identical (≈ 97%) to the previous state-of-the-art model, boasting mIoU scores of 59.5% on the Cityscape dataset and 41% on the CAD-EdgeTune dataset, which is a challenging dataset due to varying lighting conditions over time. The reduction in annotation efforts also contribute to a 23.5% improved network latency and 41% less memory usage during model inference stage on the UGV vehicle; 79% reduction in data transfer time between UGV and annotation server and finally, 16.59% reduction in latency, 28.57% less power usage and 10% less memory usage in the server during model fine-tuning stage..
AB - The performance of semantic segmentation (SS) can degrade when the data distribution in the deployed environment is different from what the model initially learned during their training. While domain adaptation (DA) and continual learning (CL) methods have been proposed to improve performance in new or unseen domains over time, the effort required for annotating large swathes of training data during deployment is non-trivial; acquiring new data for training incurs both significant network, device memory costs, and manual effort for labeling. To address this, we propose ARSFineTune, a novel framework that actively selects the most informative regions of visuals encountered by a mobile robot for the CL network to learn from, greatly minimizing the data transfer overhead related to annotations. We first propose a proficient entropy-driven ranking mechanism to identify candidate regions and rank challenging images at the edge node. We then facilitate a cyclical feedback loop between the server and edge, continuously refining the accuracy of semantic segmentation by fine-tuning the model with minimal transferred data to/from the field deployed device. We implement ARSFineTune in a real-time setting using the Robotics Operating System (ROS), where a Jackal (an unmanned ground vehicle - UGV) collaborates with the central server. Through extensive experiments, we found that ARSFineTune delivers competitive performance, closely aligning with existing state-of-the-art techniques, while requiring substantially less data for fine-tuning. Specifically, with only 5% of the total labeled regions (25% challenging regions of the most 20% problematic samples) of the entire dataset for fine-tuning, ARSFineTune reaches a performance level nearly identical (≈ 97%) to the previous state-of-the-art model, boasting mIoU scores of 59.5% on the Cityscape dataset and 41% on the CAD-EdgeTune dataset, which is a challenging dataset due to varying lighting conditions over time. The reduction in annotation efforts also contribute to a 23.5% improved network latency and 41% less memory usage during model inference stage on the UGV vehicle; 79% reduction in data transfer time between UGV and annotation server and finally, 16.59% reduction in latency, 28.57% less power usage and 10% less memory usage in the server during model fine-tuning stage..
KW - Active learning
KW - continual learning
KW - Robot Operating System (ROS)
KW - semantic segmentation
UR - http://www.scopus.com/inward/record.url?scp=85202341903&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85202341903&partnerID=8YFLogxK
U2 - 10.1109/DCOSS-IoT61029.2024.00033
DO - 10.1109/DCOSS-IoT61029.2024.00033
M3 - Conference contribution
AN - SCOPUS:85202341903
T3 - Proceedings - 2024 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things, DCOSS-IoT 2024
SP - 170
EP - 178
BT - Proceedings - 2024 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things, DCOSS-IoT 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 29 April 2024 through 1 May 2024
ER -