TY - GEN
T1 - Beyond Visual Perception
T2 - 2025 CHI Conference on Human Factors in Computing Systems, CHI 2025
AU - Xie, Jingyi
AU - Yu, Rui
AU - Zhang, He
AU - Billah, Syed Masum
AU - Lee, Sooyeon
AU - Carroll, John M.
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2025/4/26
Y1 - 2025/4/26
N2 - Large multimodal models (LMMs) have enabled new AI-powered applications that help people with visual impairments (PVI) receive natural language descriptions of their surroundings through audible text. We investigated how this emerging paradigm of visual assistance transforms how PVI perform and manage their daily tasks. Moving beyond usability assessments, we examined both the capabilities and limitations of LMM-based tools in personal and social contexts, while exploring design implications for their future development. Through interviews with 14 visually impaired users of Be My AI (an LMM-based application) and analysis of its image descriptions from both study participants and social media platforms, we identified two key limitations. First, these systems' context awareness suffers from hallucinations and misinterpretations of social contexts, styles, and human identities. Second, their intent-oriented capabilities often fail to grasp and act on users' intentions. Based on these findings, we propose design strategies for improving both human-AI and AI-AI interactions, contributing to the development of more effective, interactive, and personalized assistive technologies.
AB - Large multimodal models (LMMs) have enabled new AI-powered applications that help people with visual impairments (PVI) receive natural language descriptions of their surroundings through audible text. We investigated how this emerging paradigm of visual assistance transforms how PVI perform and manage their daily tasks. Moving beyond usability assessments, we examined both the capabilities and limitations of LMM-based tools in personal and social contexts, while exploring design implications for their future development. Through interviews with 14 visually impaired users of Be My AI (an LMM-based application) and analysis of its image descriptions from both study participants and social media platforms, we identified two key limitations. First, these systems' context awareness suffers from hallucinations and misinterpretations of social contexts, styles, and human identities. Second, their intent-oriented capabilities often fail to grasp and act on users' intentions. Based on these findings, we propose design strategies for improving both human-AI and AI-AI interactions, contributing to the development of more effective, interactive, and personalized assistive technologies.
KW - Be My AI
KW - Be My Eyes
KW - Human-AI interaction
KW - large multimodal models (LMMs)
KW - People with visual impairments (PVI)
KW - remote sighted assistance (RSA)
KW - visual question answering (VQA)
UR - http://www.scopus.com/inward/record.url?scp=105005756223&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105005756223&partnerID=8YFLogxK
U2 - 10.1145/3706598.3714210
DO - 10.1145/3706598.3714210
M3 - Conference contribution
AN - SCOPUS:105005756223
T3 - Conference on Human Factors in Computing Systems - Proceedings
BT - CHI 2025 - Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems
PB - Association for Computing Machinery
Y2 - 26 April 2025 through 1 May 2025
ER -