An Adaptive Black-box Defense against Trojan Attacks on Text Data

Fatima Alsharadgah, Abdallah Khreishah, Mahmoud Al-Ayyoub, Yaser Jararweh, Guanxiong Liu, Issa Khalil, Muhannad Almutiry, Nasir Saeed

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Trojan backdoor is a poisoning attack against Neural Network (NN) classifiers in which adversaries try to exploit the (highly desirable) model reuse property to implant Trojans into model parameters for backdoor breaches through a poisoned training process. Most of the proposed defenses against Trojan attacks assume a white-box setup, in which the defender either has access to the inner state of NN or can run back-propagation through it. Moreover, most of exiting works that propose white-box and black-box methods to defend Trojan backdoor focus on image data. Due to the the difference in the data structure, these defenses cannot be directly applied for textual data. We propose T-TROJDEF which is a more practical but challenging black-box defense method for text data that only needs to run forward-pass of the NN model. T-TROJDEF tries to identify and filter out Trojan inputs (i.e., inputs augmented with the Trojan trigger) by monitoring the changes in the prediction confidence when the input is repeatedly perturbed. The intuition is that Trojan inputs are more stable as the misclassification only depends on the trigger, while benign inputs will suffer when perturbed due to the perturbation of the classification features.

Original languageEnglish (US)
Title of host publication2021 8th International Conference on Social Network Analysis, Management and Security, SNAMS 2021
EditorsChristian Guetl, Paolo Ceravolo, Yaser Jararweh, Elhadj Benkhelifa, Oluwasegun Adedugbe
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665494953
DOIs
StatePublished - 2021
Event8th International Conference on Social Network Analysis, Management and Security, SNAMS 2021 - Virtual, Gandia, Spain
Duration: Dec 6 2021Dec 9 2021

Publication series

Name2021 8th International Conference on Social Network Analysis, Management and Security, SNAMS 2021

Conference

Conference8th International Conference on Social Network Analysis, Management and Security, SNAMS 2021
Country/TerritorySpain
CityVirtual, Gandia
Period12/6/2112/9/21

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Communication

Keywords

  • defense system
  • Neural networks
  • Trojan attack

Fingerprint

Dive into the research topics of 'An Adaptive Black-box Defense against Trojan Attacks on Text Data'. Together they form a unique fingerprint.

Cite this