Face-Mic: Inferring live speech and speaker identity via subtle facial dynamics captured by AR/VR motion sensors

Cong Shi, Xiangyu Xu, Tianfang Zhang, Payton Walker, Yi Wu, Jian Liu, Nitesh Saxena, Yingying Chen, Jiadi Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

53 Scopus citations

Abstract

Augmented reality/virtual reality (AR/VR) has extended beyond 3D immersive gaming to a broader array of applications, such as shopping, tourism, education. And recently there has been a large shift from handheld-controller dominated interactions to headset-dominated interactions via voice interfaces. In this work, we show a serious privacy risk of using voice interfaces while the user is wearing the face-mounted AR/VR devices. Specifically, we design an eavesdropping attack, Face-Mic, which leverages speech-Associated subtle facial dynamics captured by zero-permission motion sensors in AR/VR headsets to infer highly sensitive information from live human speech, including speaker gender, identity, and speech content. Face-Mic is grounded on a key insight that AR/VR headsets are closely mounted on the user's face, allowing a potentially malicious app on the headset to capture underlying facial dynamics as the wearer speaks, including movements of facial muscles and bone-borne vibrations, which encode private biometrics and speech characteristics. To mitigate the impacts of body movements, we develop a signal source separation technique to identify and separate the speech-Associated facial dynamics from other types of body movements. We further extract representative features with respect to the two types of facial dynamics. We successfully demonstrate the privacy leakage through AR/VR headsets by deriving the user's gender/identity and extracting speech information via the development of a deep learning-based framework. Extensive experiments using four mainstream VR headsets validate the generalizability, effectiveness, and high accuracy of Face-Mic.

Original languageEnglish (US)
Title of host publicationACM MobiCom 2021 - Proceedings of the 27th ACM Annual International Conference On Mobile Computing And Networking
PublisherAssociation for Computing Machinery
Pages478-490
Number of pages13
ISBN (Electronic)9781450383424
DOIs
StatePublished - 2021
Externally publishedYes
Event27th ACM Annual International Conference On Mobile Computing And Networking, MobiCom 2021 - New Orleans, United States
Duration: Mar 28 2022Apr 1 2022

Publication series

NameProceedings of the Annual International Conference on Mobile Computing and Networking, MOBICOM
ISSN (Print)1543-5679

Conference

Conference27th ACM Annual International Conference On Mobile Computing And Networking, MobiCom 2021
Country/TerritoryUnited States
CityNew Orleans
Period3/28/224/1/22

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture
  • Software

Keywords

  • AR/VR headsets
  • facial dynamics
  • speech and speaker privacy

Fingerprint

Dive into the research topics of 'Face-Mic: Inferring live speech and speaker identity via subtle facial dynamics captured by AR/VR motion sensors'. Together they form a unique fingerprint.

Cite this