TY - GEN
T1 - Towards Debiasing DNN Models from Spurious Feature Influence
AU - Du, Mengnan
AU - Tang, Ruixiang
AU - Fu, Weijie
AU - Hu, Xia
N1 - Publisher Copyright:
Copyright © 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2022/6/30
Y1 - 2022/6/30
N2 - Recent studies indicate that deep neural networks (DNNs) are prone to show discrimination towards certain demographic groups. We observe that algorithmic discrimination can be explained by the high reliance of the models on fairness sensitive features. Motivated by this observation, we propose to achieve fairness by suppressing the DNN models from capturing the spurious correlation between those fairness sensitive features with the underlying task. Specifically, we first train a bias-only teacher model which is explicitly encouraged to maximally employ fairness sensitive features for prediction. The teacher model then counter-teaches a debiased student model so that the interpretation of the student model is orthogonal to the interpretation of the teacher model. The key idea is that since the teacher model relies explicitly on fairness sensitive features for prediction, the orthogonal interpretation loss enforces the student network to reduce its reliance on sensitive features and instead capture more taskrelevant features for prediction. Experimental analysis indicates that our framework substantially reduces the model's attention on fairness sensitive features. Experimental results on four datasets further validate that our framework has consistently improved model fairness with respect to group fairness metrics, with a comparable or even better accuracy.
AB - Recent studies indicate that deep neural networks (DNNs) are prone to show discrimination towards certain demographic groups. We observe that algorithmic discrimination can be explained by the high reliance of the models on fairness sensitive features. Motivated by this observation, we propose to achieve fairness by suppressing the DNN models from capturing the spurious correlation between those fairness sensitive features with the underlying task. Specifically, we first train a bias-only teacher model which is explicitly encouraged to maximally employ fairness sensitive features for prediction. The teacher model then counter-teaches a debiased student model so that the interpretation of the student model is orthogonal to the interpretation of the teacher model. The key idea is that since the teacher model relies explicitly on fairness sensitive features for prediction, the orthogonal interpretation loss enforces the student network to reduce its reliance on sensitive features and instead capture more taskrelevant features for prediction. Experimental analysis indicates that our framework substantially reduces the model's attention on fairness sensitive features. Experimental results on four datasets further validate that our framework has consistently improved model fairness with respect to group fairness metrics, with a comparable or even better accuracy.
UR - http://www.scopus.com/inward/record.url?scp=85147662549&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147662549&partnerID=8YFLogxK
U2 - 10.1609/aaai.v36i9.21185
DO - 10.1609/aaai.v36i9.21185
M3 - Conference contribution
AN - SCOPUS:85147662549
T3 - Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022
SP - 9521
EP - 9528
BT - AAAI-22 Technical Tracks 9
PB - Association for the Advancement of Artificial Intelligence
T2 - 36th AAAI Conference on Artificial Intelligence, AAAI 2022
Y2 - 22 February 2022 through 1 March 2022
ER -