TY - GEN
T1 - Inaudible Backdoor Attack via Stealthy Frequency Trigger Injection in Audio Spectrogram
AU - Zhang, Tianfang
AU - Phan, Huy
AU - Tang, Zijie
AU - Shi, Cong
AU - Wang, Yan
AU - Yuan, Bo
AU - Chen, Yingying
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/5/29
Y1 - 2024/5/29
N2 - Deep learning-enabled Voice User Interfaces (VUIs) have surpassed human-level performance in acoustic perception tasks. However, the significant cost associated with training these models compels users to rely on third-party data or outsource training services. Such emerging trends have drawn substantial attention to training-phase attacks, particularly backdoor attacks. Such attacks implant hidden trigger patterns (e.g., tones, environmental sounds) into the model during training, thereby manipulating the model’s predictions in the inference phase. However, existing backdoor attacks can be easily undermined in practice as the inserted triggers are audible. Users may notice such attacks when listening to the training data and remaining alert for suspicious sounds. In this work, we present a novel audio backdoor attack that exploits completely inaudible triggers in the frequency domain of the audio spectrograms. Specifically, we optimize the trigger to be a frequency-domain pattern with the energy below the noise floor (e.g., background and hardware noises) at any given frequency, thereby rendering the trigger inaudible. To realize such attacks, we design a strategy that automatically generates inaudible triggers in the spectrum supported by commodity playback devices (e.g., smartphones and laptops). We further develop optimization techniques to enhance the trigger’s robustness against speech content and onset variations. Experiments on hotword and speaker recognition indicate that our attack can achieve attack success rates of more than 98.2% and 81.0% under digital and physical attack scenarios. The results also demonstrate the trigger’s inaudibility with a Signal-to-Noise Ratio (SNR) less than -3.54 dB against background noises. We further verify that our attack can successfully bypass state-of-the-art backdoor defense strategies based on learning and audio processing.
AB - Deep learning-enabled Voice User Interfaces (VUIs) have surpassed human-level performance in acoustic perception tasks. However, the significant cost associated with training these models compels users to rely on third-party data or outsource training services. Such emerging trends have drawn substantial attention to training-phase attacks, particularly backdoor attacks. Such attacks implant hidden trigger patterns (e.g., tones, environmental sounds) into the model during training, thereby manipulating the model’s predictions in the inference phase. However, existing backdoor attacks can be easily undermined in practice as the inserted triggers are audible. Users may notice such attacks when listening to the training data and remaining alert for suspicious sounds. In this work, we present a novel audio backdoor attack that exploits completely inaudible triggers in the frequency domain of the audio spectrograms. Specifically, we optimize the trigger to be a frequency-domain pattern with the energy below the noise floor (e.g., background and hardware noises) at any given frequency, thereby rendering the trigger inaudible. To realize such attacks, we design a strategy that automatically generates inaudible triggers in the spectrum supported by commodity playback devices (e.g., smartphones and laptops). We further develop optimization techniques to enhance the trigger’s robustness against speech content and onset variations. Experiments on hotword and speaker recognition indicate that our attack can achieve attack success rates of more than 98.2% and 81.0% under digital and physical attack scenarios. The results also demonstrate the trigger’s inaudibility with a Signal-to-Noise Ratio (SNR) less than -3.54 dB against background noises. We further verify that our attack can successfully bypass state-of-the-art backdoor defense strategies based on learning and audio processing.
KW - Audio Backdoor Attack
KW - Audio Spectrogram
KW - Frequency Injection
KW - Inaudible Attack
UR - http://www.scopus.com/inward/record.url?scp=85202152120&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85202152120&partnerID=8YFLogxK
U2 - 10.1145/3636534.3649345
DO - 10.1145/3636534.3649345
M3 - Conference contribution
AN - SCOPUS:85202152120
T3 - ACM MobiCom 2024 - Proceedings of the 30th International Conference on Mobile Computing and Networking
SP - 31
EP - 45
BT - ACM MobiCom 2024 - Proceedings of the 30th International Conference on Mobile Computing and Networking
PB - Association for Computing Machinery, Inc
T2 - 30th International Conference on Mobile Computing and Networking, ACM MobiCom 2024
Y2 - 18 November 2024 through 22 November 2024
ER -