Face-Mic: Inferring live speech and speaker identity via subtle facial dynamics captured by AR/VR motion sensors

Cong Shi, Xiangyu Xu, Tianfang Zhang, Payton Walker, Yi Wu, Jian Liu, Nitesh Saxena, Yingying Chen, Jiadi Yu

Research output: Contribution to conferencePaperpeer-review

33 Scopus citations

Abstract

Augmented reality/virtual reality (AR/VR) has extended beyond 3D immersive gaming to a broader array of applications, such as shopping, tourism, education. And recently there has been a large shift from handheld-controller dominated interactions to headset-dominated interactions via voice interfaces. In this work, we show a serious privacy risk of using voice interfaces while the user is wearing the face-mounted AR/VR devices. Specifically, we design an eavesdropping attack, Face-Mic, which leverages speech-Associated subtle facial dynamics captured by zero-permission motion sensors in AR/VR headsets to infer highly sensitive information from live human speech, including speaker gender, identity, and speech content. Face-Mic is grounded on a key insight that AR/VR headsets are closely mounted on the user's face, allowing a potentially malicious app on the headset to capture underlying facial dynamics as the wearer speaks, including movements of facial muscles and bone-borne vibrations, which encode private biometrics and speech characteristics. To mitigate the impacts of body movements, we develop a signal source separation technique to identify and separate the speech-Associated facial dynamics from other types of body movements. We further extract representative features with respect to the two types of facial dynamics. We successfully demonstrate the privacy leakage through AR/VR headsets by deriving the user's gender/identity and extracting speech information via the development of a deep learning-based framework. Extensive experiments using four mainstream VR headsets validate the generalizability, effectiveness, and high accuracy of Face-Mic.

Original languageEnglish (US)
Pages478-490
Number of pages13
DOIs
StatePublished - 2021
Externally publishedYes
Event27th ACM Annual International Conference On Mobile Computing And Networking, MobiCom 2021 - New Orleans, United States
Duration: Oct 25 2021Oct 29 2021

Conference

Conference27th ACM Annual International Conference On Mobile Computing And Networking, MobiCom 2021
Country/TerritoryUnited States
CityNew Orleans
Period10/25/2110/29/21

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture
  • Software

Keywords

  • AR/VR headsets
  • facial dynamics
  • speech and speaker privacy

Fingerprint

Dive into the research topics of 'Face-Mic: Inferring live speech and speaker identity via subtle facial dynamics captured by AR/VR motion sensors'. Together they form a unique fingerprint.

Cite this