TY - JOUR
T1 - Towards real-time embodied AI agent
T2 - a bionic visual encoding framework for mobile robotics: Towards real-time embodied AI agent: a bionic visual..: X. Hou et al.
AU - Hou, Xueyu
AU - Guan, Yongjie
AU - Han, Tao
AU - Wang, Cong
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2024.
PY - 2024/12
Y1 - 2024/12
N2 - Embodied artificial intelligence (AI) agents, which navigate and interact with their environment using sensors and actuators, are being applied for mobile robotic platforms with limited computing power, such as autonomous vehicles, drones, and humanoid robots. These systems make decisions through environmental perception from deep neural network (DNN)-based visual encoders. However, the constrained computational resources and the large amounts of visual data to be processed can create bottlenecks, such as taking almost 300 milliseconds per decision on an embedded GPU board (Jetson Xavier). Existing DNN acceleration methods need model retraining and can still reduce accuracy. To address these challenges, our paper introduces a bionic visual encoder framework, }Robye, to support real-time requirements of embodied AI agents. The proposed framework complements existing DNN acceleration techniques. Specifically, we integrate motion data to identify overlapping areas between consecutive frames, which reduces DNN workload by propagating encoding results. We bifurcate processing into high-resolution for task-critical areas and low-resolution for less-significant regions. This dual-resolution approach allows us to maintain task performance while lowering the overall computational demands. We evaluate }Robye across three robotic scenarios: autonomous driving, vision-and-language navigation, and drone navigation, using various DNN models and mobile platforms. }Robye outperforms baselines in speed (1.2–3.3 ×), performance (+4% to +29%), and power consumption (-36% to -47%).
AB - Embodied artificial intelligence (AI) agents, which navigate and interact with their environment using sensors and actuators, are being applied for mobile robotic platforms with limited computing power, such as autonomous vehicles, drones, and humanoid robots. These systems make decisions through environmental perception from deep neural network (DNN)-based visual encoders. However, the constrained computational resources and the large amounts of visual data to be processed can create bottlenecks, such as taking almost 300 milliseconds per decision on an embedded GPU board (Jetson Xavier). Existing DNN acceleration methods need model retraining and can still reduce accuracy. To address these challenges, our paper introduces a bionic visual encoder framework, }Robye, to support real-time requirements of embodied AI agents. The proposed framework complements existing DNN acceleration techniques. Specifically, we integrate motion data to identify overlapping areas between consecutive frames, which reduces DNN workload by propagating encoding results. We bifurcate processing into high-resolution for task-critical areas and low-resolution for less-significant regions. This dual-resolution approach allows us to maintain task performance while lowering the overall computational demands. We evaluate }Robye across three robotic scenarios: autonomous driving, vision-and-language navigation, and drone navigation, using various DNN models and mobile platforms. }Robye outperforms baselines in speed (1.2–3.3 ×), performance (+4% to +29%), and power consumption (-36% to -47%).
KW - Computer vision
KW - Embodied AI
KW - Mobile robotics
KW - Visual encoding
UR - http://www.scopus.com/inward/record.url?scp=85201279190&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85201279190&partnerID=8YFLogxK
U2 - 10.1007/s41315-024-00363-w
DO - 10.1007/s41315-024-00363-w
M3 - Article
AN - SCOPUS:85201279190
SN - 2366-5971
VL - 8
SP - 1038
EP - 1056
JO - International Journal of Intelligent Robotics and Applications
JF - International Journal of Intelligent Robotics and Applications
IS - 4
M1 - 104184
ER -