Beyond Visual Perception: Insights from Smartphone Interaction of Visually Impaired Users with Large Multimodal Models

Jingyi Xie, Rui Yu, He Zhang, Syed Masum Billah, Sooyeon Lee, John M. Carroll

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Large multimodal models (LMMs) have enabled new AI-powered applications that help people with visual impairments (PVI) receive natural language descriptions of their surroundings through audible text. We investigated how this emerging paradigm of visual assistance transforms how PVI perform and manage their daily tasks. Moving beyond usability assessments, we examined both the capabilities and limitations of LMM-based tools in personal and social contexts, while exploring design implications for their future development. Through interviews with 14 visually impaired users of Be My AI (an LMM-based application) and analysis of its image descriptions from both study participants and social media platforms, we identified two key limitations. First, these systems' context awareness suffers from hallucinations and misinterpretations of social contexts, styles, and human identities. Second, their intent-oriented capabilities often fail to grasp and act on users' intentions. Based on these findings, we propose design strategies for improving both human-AI and AI-AI interactions, contributing to the development of more effective, interactive, and personalized assistive technologies.

Original languageEnglish (US)
Title of host publicationCHI 2025 - Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems
PublisherAssociation for Computing Machinery
ISBN (Electronic)9798400713941
DOIs
StatePublished - Apr 26 2025
Externally publishedYes
Event2025 CHI Conference on Human Factors in Computing Systems, CHI 2025 - Yokohama, Japan
Duration: Apr 26 2025May 1 2025

Publication series

NameConference on Human Factors in Computing Systems - Proceedings

Conference

Conference2025 CHI Conference on Human Factors in Computing Systems, CHI 2025
Country/TerritoryJapan
CityYokohama
Period4/26/255/1/25

All Science Journal Classification (ASJC) codes

  • Human-Computer Interaction
  • Computer Graphics and Computer-Aided Design
  • Software

Keywords

  • Be My AI
  • Be My Eyes
  • Human-AI interaction
  • large multimodal models (LMMs)
  • People with visual impairments (PVI)
  • remote sighted assistance (RSA)
  • visual question answering (VQA)

Fingerprint

Dive into the research topics of 'Beyond Visual Perception: Insights from Smartphone Interaction of Visually Impaired Users with Large Multimodal Models'. Together they form a unique fingerprint.

Cite this