TY - GEN
T1 - WearID
T2 - 36th Annual Computer Security Applications Conference, ACSAC 2020
AU - Shi, Cong
AU - Wang, Yan
AU - Chen, Yingying
AU - Saxena, Nitesh
AU - Wang, Chen
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/12/7
Y1 - 2020/12/7
N2 - Due to the open nature of voice input, voice assistant (VA) systems (e.g., Google Home and Amazon Alexa) are vulnerable to various security and privacy leakages (e.g., credit card numbers, passwords), especially when issuing critical user commands involving large purchases, critical calls, etc. Though the existing VA systems may employ voice features to identify users, they are still vulnerable to various acoustic-based attacks (e.g., impersonation, replay, and hidden command attacks). In this work, we propose a training-free voice authentication system, WearID, leveraging the cross-domain speech similarity between the audio domain and the vibration domain to provide enhanced security to the ever-growing deployment of VA systems. In particular, when a user gives a critical command, WearID exploits motion sensors on the user's wearable device to capture the aerial speech in the vibration domain and verify it with the speech captured in the audio domain via the VA device's microphone. Compared to existing approaches, our solution is low-effort and privacy-preserving, as it neither requires users' active inputs (e.g., replying messages/calls) nor to store users' privacy-sensitive voice samples for training. In addition, our solution exploits the distinct vibration sensing interface and its short sensing range to sound (e.g., 25cm) to verify voice commands. Examining the similarity of the two domains' data is not trivial. The huge sampling rate gap (e.g., 8000Hz vs. 200Hz) between the audio and vibration domains makes it hard to compare the two domains' data directly, and even tiny data noises could be magnified and cause authentication failures. To address the challenges, we investigate the complex relationship between the two sensing domains and develop a spectrogram-based algorithm to convert the microphone data into the lower-frequency "motion sensor data"to facilitate cross-domain comparisons. We further develop a user authentication scheme to verify that the received voice command originates from the legitimate user based on the cross-domain speech similarity of the received voice commands. We report on extensive experiments to evaluate the WearID under various audible and inaudible attacks. The results show WearID can verify voice commands with 99.8% accuracy in the normal situation and detect 97.2% fake voice commands from various attacks, including impersonation/replay attacks and hidden voice/ultrasound attacks.
AB - Due to the open nature of voice input, voice assistant (VA) systems (e.g., Google Home and Amazon Alexa) are vulnerable to various security and privacy leakages (e.g., credit card numbers, passwords), especially when issuing critical user commands involving large purchases, critical calls, etc. Though the existing VA systems may employ voice features to identify users, they are still vulnerable to various acoustic-based attacks (e.g., impersonation, replay, and hidden command attacks). In this work, we propose a training-free voice authentication system, WearID, leveraging the cross-domain speech similarity between the audio domain and the vibration domain to provide enhanced security to the ever-growing deployment of VA systems. In particular, when a user gives a critical command, WearID exploits motion sensors on the user's wearable device to capture the aerial speech in the vibration domain and verify it with the speech captured in the audio domain via the VA device's microphone. Compared to existing approaches, our solution is low-effort and privacy-preserving, as it neither requires users' active inputs (e.g., replying messages/calls) nor to store users' privacy-sensitive voice samples for training. In addition, our solution exploits the distinct vibration sensing interface and its short sensing range to sound (e.g., 25cm) to verify voice commands. Examining the similarity of the two domains' data is not trivial. The huge sampling rate gap (e.g., 8000Hz vs. 200Hz) between the audio and vibration domains makes it hard to compare the two domains' data directly, and even tiny data noises could be magnified and cause authentication failures. To address the challenges, we investigate the complex relationship between the two sensing domains and develop a spectrogram-based algorithm to convert the microphone data into the lower-frequency "motion sensor data"to facilitate cross-domain comparisons. We further develop a user authentication scheme to verify that the received voice command originates from the legitimate user based on the cross-domain speech similarity of the received voice commands. We report on extensive experiments to evaluate the WearID under various audible and inaudible attacks. The results show WearID can verify voice commands with 99.8% accuracy in the normal situation and detect 97.2% fake voice commands from various attacks, including impersonation/replay attacks and hidden voice/ultrasound attacks.
KW - Motion Sensor
KW - User Authentication
KW - Voice Assistant Systems
UR - http://www.scopus.com/inward/record.url?scp=85098048769&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098048769&partnerID=8YFLogxK
U2 - 10.1145/3427228.3427259
DO - 10.1145/3427228.3427259
M3 - Conference contribution
AN - SCOPUS:85098048769
T3 - ACM International Conference Proceeding Series
SP - 829
EP - 842
BT - Proceedings - 36th Annual Computer Security Applications Conference, ACSAC 2020
PB - Association for Computing Machinery
Y2 - 7 December 2020 through 11 December 2020
ER -