A Lightweight De-confounding Transformer for Image Captioning in Wearable Assistive Navigation Device

Zhengcai Cao, Ji Xia, Yinbin Shi, Meng Chu Zhou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Image captioning is a multi-modal task that enables the transformation from scene images to natural language, providing valuable insights for visually impaired individuals to understand their environment. Therefore, its application to wearable navigation devices for visually impaired individuals holds immense potential. However, in practical applications, confusion between scene visuals and semantics, coupled with model complexity, often leads to performance degradation, resulting in inaccurate environmental interpretation. In light of this, we introduce a Lightweight De-confounding Transformer Network (LDTNet) for image captioning equipped with a Causal Adjustment module to eliminate confounders. Moreover, we design a Suppression Gate Unit that efficiently integrates fine-grained information from shallow features, while reducing the number of network layers to have a lightweight model. Experimental results demonstrate that our approach not only addresses the visual-semantic confusion issue effectively but also improves the response speed of wearable devices in comparison with the state of the art. Twenty volunteers are recruited to evaluate LDTNet's efficacy in real-world settings in terms of both response speed and generated outputs by wearing the resulting assistive navigation devices. The outcomes well show its outstanding performance and great potential for visualy impaired individuals to use.

Original languageEnglish (US)
Title of host publication2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages7422-7428
Number of pages7
ISBN (Electronic)9798350377705
DOIs
StatePublished - 2024
Externally publishedYes
Event2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024 - Abu Dhabi, United Arab Emirates
Duration: Oct 14 2024Oct 18 2024

Publication series

NameIEEE International Conference on Intelligent Robots and Systems
ISSN (Print)2153-0858
ISSN (Electronic)2153-0866

Conference

Conference2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period10/14/2410/18/24

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Software
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'A Lightweight De-confounding Transformer for Image Captioning in Wearable Assistive Navigation Device'. Together they form a unique fingerprint.

Cite this