Abstract
In this work we ask the question: is an ensemble of single hidden layer sign activation 01 loss networks more robust to white box and black box adversarial attacks than an ensemble of its differentiable counterpart of cross-entropy loss with relu activations and an ensemble of the approximate differentiable counterpart of cross-entropy loss with sign activations? We consider a simple experimental setting of attacking models trained for binary classification on pairwise CIFAR10 datasets - altogether a total of 45 datasets. We study ensembles of bcebp: binary cross-entropy loss with relu activations trained with back-propagation, bceban: binary cross-entropy loss with sign activations trained with back-propagation with the straight through estimator gradient, 01scd: 01-loss with sign activations trained with gradient-free stochastic coordinate descent, and bcescd: binary cross-entropy loss with relu activation trained with gradient-free stochastic coordinate descent (to isolate the effect of 01 loss from gradient-free training). We train each model in an ensemble with a different random number generator seed. Our four models have similar mean test accuracies in the mid to high 80s on pairwise CIFAR10 datasets but under powerful PGD white-box attacks they each drop to near 0% except for our 01 loss network ensemble that has 31% accuracy. Even training with the gradient-free stochastic coordinate descent can be attacked thus suggesting that the defense lies in 01 loss. In a black-box transfer attack we find adversaries produced from the bcebp model fully transfer to bceban but much less to 01scd - we see the same transferability pattern from bceban to bcebp and 01scd. We also find that adversaries from 01scd barely transfer to bcebp and bceban. While our results are far from those of multi-class and convolutional networks, they suggest that 01 loss models are hard to attack naturally without any adversarial training. All models, data, and code to reproduce results here are available from https://github.com/xyzacademic/mlp01example.
| Original language | English (US) |
|---|---|
| State | Published - 2023 |
| Externally published | Yes |
| Event | 1st Tiny Papers at 11th International Conference on Learning Representations, Tiny Papers @ ICLR 2023 - Kigali, Rwanda Duration: May 5 2023 → May 5 2023 |
Conference
| Conference | 1st Tiny Papers at 11th International Conference on Learning Representations, Tiny Papers @ ICLR 2023 |
|---|---|
| Country/Territory | Rwanda |
| City | Kigali |
| Period | 5/5/23 → 5/5/23 |
All Science Journal Classification (ASJC) codes
- Linguistics and Language
- Language and Linguistics
- Computer Science Applications
- Education
Fingerprint
Dive into the research topics of 'ACCURACY OF WHITE BOX AND BLACK BOX ADVERSARIAL ATTACKS ON A SIGN ACTIVATION 01 LOSS NEURAL NETWORK ENSEMBLE'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver