TY - GEN
T1 - Real-Time, Universal, and Robust Adversarial Attacks Against Speaker Recognition Systems
AU - Xie, Yi
AU - Shi, Cong
AU - Li, Zhuohang
AU - Liu, Jian
AU - Chen, Yingying
AU - Yuan, Bo
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - As the popularity of voice user interface (VUI) exploded in recent years, speaker recognition system has emerged as an important medium of identifying a speaker in many security-required applications and services. In this paper, we propose the first real-time, universal, and robust adversarial attack against the state-of-the-art deep neural network (DNN) based speaker recognition system. Through adding an audio-agnostic universal perturbation on arbitrary enrolled speaker's voice input, the DNN-based speaker recognition system would identify the speaker as any target (i.e., adversary-desired) speaker label. In addition, we improve the robustness of our attack by modeling the sound distortions caused by the physical over-the-air propagation through estimating room impulse response (RIR). Experiment using a public dataset of 109 English speakers demonstrates the effectiveness and robustness of our proposed attack with a high attack success rate of over 90%. The attack launching time also achieves a 100× speedup over contemporary non-universal attacks.
AB - As the popularity of voice user interface (VUI) exploded in recent years, speaker recognition system has emerged as an important medium of identifying a speaker in many security-required applications and services. In this paper, we propose the first real-time, universal, and robust adversarial attack against the state-of-the-art deep neural network (DNN) based speaker recognition system. Through adding an audio-agnostic universal perturbation on arbitrary enrolled speaker's voice input, the DNN-based speaker recognition system would identify the speaker as any target (i.e., adversary-desired) speaker label. In addition, we improve the robustness of our attack by modeling the sound distortions caused by the physical over-the-air propagation through estimating room impulse response (RIR). Experiment using a public dataset of 109 English speakers demonstrates the effectiveness and robustness of our proposed attack with a high attack success rate of over 90%. The attack launching time also achieves a 100× speedup over contemporary non-universal attacks.
KW - adversarial examples
KW - speaker recognition systems
KW - universal adversarial attack
UR - http://www.scopus.com/inward/record.url?scp=85089242352&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089242352&partnerID=8YFLogxK
U2 - 10.1109/ICASSP40776.2020.9053747
DO - 10.1109/ICASSP40776.2020.9053747
M3 - Conference contribution
AN - SCOPUS:85089242352
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 1738
EP - 1742
BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Y2 - 4 May 2020 through 8 May 2020
ER -