Human–AI Collaboration for Remote Sighted Assistance: Perspectives from the LLM Era †

Rui Yu, Sooyeon Lee, Jingyi Xie, Syed Masum Billah, John M. Carroll

Research output: Contribution to journalReview articlepeer-review

Abstract

Remote sighted assistance (RSA) has emerged as a conversational technology aiding people with visual impairments (VI) through real-time video chat communication with sighted agents. We conducted a literature review and interviewed 12 RSA users to understand the technical and navigational challenges faced by both agents and users. The technical challenges were categorized into four groups: agents’ difficulties in orienting and localizing users, acquiring and interpreting users’ surroundings and obstacles, delivering information specific to user situations, and coping with poor network connections. We also presented 15 real-world navigational challenges, including 8 outdoor and 7 indoor scenarios. Given the spatial and visual nature of these challenges, we identified relevant computer vision problems that could potentially provide solutions. We then formulated 10 emerging problems that neither human agents nor computer vision can fully address alone. For each emerging problem, we discussed solutions grounded in human–AI collaboration. Additionally, with the advent of large language models (LLMs), we outlined how RSA can integrate with LLMs within a human–AI collaborative framework, envisioning the future of visual prosthetics.

Original languageEnglish (US)
Article number254
JournalFuture Internet
Volume16
Issue number7
DOIs
StatePublished - Jul 2024
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications

Keywords

  • artificial intelligence
  • computer vision
  • conversational assistance
  • human–AI collaboration
  • large language models
  • people with visual impairments
  • remote sighted assistance

Fingerprint

Dive into the research topics of 'Human–AI Collaboration for Remote Sighted Assistance: Perspectives from the LLM Era †'. Together they form a unique fingerprint.

Cite this