TY - GEN
T1 - Audio-domain position-independent backdoor attack via unnoticeable triggers
AU - Shi, Cong
AU - Zhang, Tianfang
AU - Li, Zhuohang
AU - Phan, Huy
AU - Zhao, Tianming
AU - Wang, Yan
AU - Liu, Jian
AU - Yuan, Bo
AU - Chen, Yingying
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/10/14
Y1 - 2022/10/14
N2 - Deep learning models have become key enablers of voice user interfaces. With the growing trend of adopting outsourced training of these models, backdoor attacks, stealthy yet effective training-phase attacks, have gained increasing attention. They inject hidden trigger patterns through training set poisoning and overwrite the model's predictions in the inference phase. Research in backdoor attacks has been focusing on image classification tasks, while there have been few studies in the audio domain. In this work, we explore the severity of audio-domain backdoor attacks and demonstrate their feasibility under practical scenarios of voice user interfaces, where an adversary injects (plays) an unnoticeable audio trigger into live speech to launch the attack. To realize such attacks, we consider jointly optimizing the audio trigger and the target model in the training phase, deriving a position-independent, unnoticeable, and robust audio trigger. We design new data poisoning techniques and penalty-based algorithms that inject the trigger into randomly generated temporal positions in the audio input during training, rendering the trigger resilient to any temporal position variations. We further design an environmental sound mimicking technique to make the trigger resemble unnoticeable situational sounds and simulate played over-The-Air distortions to improve the trigger's robustness during the joint optimization process. Extensive experiments on two important applications (i.e., speech command recognition and speaker recognition) demonstrate that our attack can achieve an average success rate of over 99% under both digital and physical attack settings.
AB - Deep learning models have become key enablers of voice user interfaces. With the growing trend of adopting outsourced training of these models, backdoor attacks, stealthy yet effective training-phase attacks, have gained increasing attention. They inject hidden trigger patterns through training set poisoning and overwrite the model's predictions in the inference phase. Research in backdoor attacks has been focusing on image classification tasks, while there have been few studies in the audio domain. In this work, we explore the severity of audio-domain backdoor attacks and demonstrate their feasibility under practical scenarios of voice user interfaces, where an adversary injects (plays) an unnoticeable audio trigger into live speech to launch the attack. To realize such attacks, we consider jointly optimizing the audio trigger and the target model in the training phase, deriving a position-independent, unnoticeable, and robust audio trigger. We design new data poisoning techniques and penalty-based algorithms that inject the trigger into randomly generated temporal positions in the audio input during training, rendering the trigger resilient to any temporal position variations. We further design an environmental sound mimicking technique to make the trigger resemble unnoticeable situational sounds and simulate played over-The-Air distortions to improve the trigger's robustness during the joint optimization process. Extensive experiments on two important applications (i.e., speech command recognition and speaker recognition) demonstrate that our attack can achieve an average success rate of over 99% under both digital and physical attack settings.
KW - audio-domain backdoor attacks
KW - over-The-Air physical attacks
KW - position-independent attacks
UR - http://www.scopus.com/inward/record.url?scp=85140921224&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85140921224&partnerID=8YFLogxK
U2 - 10.1145/3495243.3560531
DO - 10.1145/3495243.3560531
M3 - Conference contribution
AN - SCOPUS:85140921224
T3 - Proceedings of the Annual International Conference on Mobile Computing and Networking, MOBICOM
SP - 583
EP - 595
BT - ACM MobiCom 2022 - Proceedings of the 2022 28th Annual International Conference on Mobile Computing and Networking
PB - Association for Computing Machinery
T2 - 28th ACM Annual International Conference on Mobile Computing and Networking, MobiCom 2022
Y2 - 17 October 2202 through 21 October 2202
ER -