SAFARI: Speech-Associated Facial Authentication for AR/VR Settings via Robust VIbration Signatures

Tianfang Zhang, Qiufan Ji, Zhengkun Ye, Md Mojibur Rahman Redoy Akanda, Ahmed Tanvir Mahdad, Cong Shi, Yan Wang, Nitesh Saxena, Yingying Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

In AR/VR devices, the voice interface, serving as one of the primary AR/VR control mechanisms, enables users to interact naturally using speeches (voice commands) for accessing data, controlling applications, and engaging in remote communication/meetings. Voice authentication can be adopted to protect against unauthorized speech inputs. However, existing voice authentication mechanisms are usually susceptible to voice spoofing attacks and are unreliable under the variations of phonetic content. In this work, we propose SAFARI, a spoofing-resistant and text-independent speech authentication system that can be seamlessly integrated into AR/VR voice interfaces. The key idea is to elicit phonetic-invariant biometrics from the facial muscle vibrations upon the headset. During speech production, a user’s facial muscles are deformed for articulating phoneme sounds. The facial deformations associated with the phonemes are referred to as visemes. They carry rich biometrics of the wearer’s muscles, tissue, and bones, which can propagate through the head and vibrate the headset. SAFARI aims to derive reliable facial biometrics from the viseme-associated facial vibrations captured by the AR/VR motion sensors. Particularly, it identifies the vibration data segments that contain rich viseme patterns (prominent visemes) less susceptible to phonetic variations. Based on the prominent visemes, SAFARI learns on the correlations among facial vibrations of different frequencies to extract biometric representations invariant to the phonetic context. The key advantages of SAFARI are that it is suitable for commodity AR/VR headsets (no additional sensors) and is resistant to voice spoofing attacks as the conductive property of the facial vibrations prevents biometric disclosure via the air media or the audio channel. To mitigate the impacts of body motions in AR/VR scenarios, we also design a generative diffusion model trained to reconstruct the viseme patterns from the data distorted by motion artifacts. We conduct extensive experiments with two representative AR/VR headsets and 35 users under various usage and attack settings. We demonstrate that SAFARI can achieve over 96% true positive rate on verifying legitimate users while successfully rejecting different kinds of spoofing attacks with over 97% true negative rates.

Original languageEnglish (US)
Title of host publicationCCS 2024 - Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security
PublisherAssociation for Computing Machinery, Inc
Pages153-167
Number of pages15
ISBN (Electronic)9798400706363
DOIs
StatePublished - Dec 9 2024
Event31st ACM SIGSAC Conference on Computer and Communications Security, CCS 2024 - Salt Lake City, United States
Duration: Oct 14 2024Oct 18 2024

Publication series

NameCCS 2024 - Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security

Conference

Conference31st ACM SIGSAC Conference on Computer and Communications Security, CCS 2024
Country/TerritoryUnited States
CitySalt Lake City
Period10/14/2410/18/24

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications
  • Software

Keywords

  • AR/VR headsets
  • Authentication
  • Speech vibrations

Fingerprint

Dive into the research topics of 'SAFARI: Speech-Associated Facial Authentication for AR/VR Settings via Robust VIbration Signatures'. Together they form a unique fingerprint.

Cite this