TY - GEN
T1 - Practical adversarial attacks against speaker recognition systems
AU - Li, Zhuohang
AU - Shi, Cong
AU - Xie, Yi
AU - Liu, Jian
AU - Yuan, Bo
AU - Chen, Yingying
N1 - Publisher Copyright:
© 2020 Association for Computing Machinery.
PY - 2020/3/3
Y1 - 2020/3/3
N2 - Unlike other biometric-based user identification methods (e.g., fingerprint and iris), speaker recognition systems can identify individuals relying on their unique voice biometrics without requiring users to be physically present. Therefore, speaker recognition systems have been becoming increasingly popular recently in various domains, such as remote access control, banking services and criminal investigation. In this paper, we study the vulnerability of this kind of systems by launching a practical and systematic adversarial attack against X-vector, the state-of-the-art deep neural network (DNN) based speaker recognition system. In particular, by adding a well-crafted inconspicuous noise to the original audio, our attack can fool the speaker recognition system to make false predictions and even force the audio to be recognized as any adversary-desired speaker. Moreover, our attack integrates the estimated room impulse response (RIR) into the adversarial example training process toward practical audio adversarial examples which could remain effective while being played over the air in the physical world. Extensive experiment using a public dataset of 109 speakers shows the effectiveness of our attack with a high attack success rate for both digital attack (98%) and practical over-the-air attack (50%).
AB - Unlike other biometric-based user identification methods (e.g., fingerprint and iris), speaker recognition systems can identify individuals relying on their unique voice biometrics without requiring users to be physically present. Therefore, speaker recognition systems have been becoming increasingly popular recently in various domains, such as remote access control, banking services and criminal investigation. In this paper, we study the vulnerability of this kind of systems by launching a practical and systematic adversarial attack against X-vector, the state-of-the-art deep neural network (DNN) based speaker recognition system. In particular, by adding a well-crafted inconspicuous noise to the original audio, our attack can fool the speaker recognition system to make false predictions and even force the audio to be recognized as any adversary-desired speaker. Moreover, our attack integrates the estimated room impulse response (RIR) into the adversarial example training process toward practical audio adversarial examples which could remain effective while being played over the air in the physical world. Extensive experiment using a public dataset of 109 speakers shows the effectiveness of our attack with a high attack success rate for both digital attack (98%) and practical over-the-air attack (50%).
KW - Adversarial Example
KW - Deep Learning
KW - Room Impulse Response
KW - Speaker Recognition
UR - http://www.scopus.com/inward/record.url?scp=85082044641&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85082044641&partnerID=8YFLogxK
U2 - 10.1145/3376897.3377856
DO - 10.1145/3376897.3377856
M3 - Conference contribution
AN - SCOPUS:85082044641
T3 - HotMobile 2020 - Proceedings of the 21st International Workshop on Mobile Computing Systems and Applications
SP - 9
EP - 14
BT - HotMobile 2020 - Proceedings of the 21st International Workshop on Mobile Computing Systems and Applications
PB - Association for Computing Machinery, Inc
T2 - 21st International Workshop on Mobile Computing Systems and Applications, HotMobile 2020
Y2 - 3 March 2020 through 4 March 2020
ER -