TY - JOUR
T1 - Mitigating social biases of pre-trained language models via contrastive self-debiasing with double data augmentation
AU - Li, Yingji
AU - Du, Mengnan
AU - Song, Rui
AU - Wang, Xin
AU - Sun, Mingchen
AU - Wang, Ying
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/7
Y1 - 2024/7
N2 - Pre-trained Language Models (PLMs) have been shown to inherit and even amplify the social biases contained in the training corpus, leading to undesired stereotype in real-world applications. Existing techniques for mitigating the social biases of PLMs mainly rely on data augmentation with manually designed prior knowledge or fine-tuning with abundant external corpora to debias. However, these methods are not only limited by artificial experience, but also consume a lot of resources to access all the parameters of the PLMs and are prone to introduce new external biases when fine-tuning with external corpora. In this paper, we propose a Contrastive Self-Debiasing Model with Double Data Augmentation (named CD3) for mitigating social biases of PLMs. Specifically, CD3 consists of two stages: double data augmentation and contrastive self-debiasing. First, we build on counterfactual data augmentation to perform a secondary augmentation using biased prompts that are automatically searched by maximizing the differences in PLMs' encoding across demographic groups. Double data augmentation further amplifies the biases between sample pairs to break the limitations of previous debiasing models that heavily rely on prior knowledge in data augmentation. We then leverage the augmented data for contrastive learning to train a plug-and-play adapter to mitigate the social biases in PLMs' encoding without tuning the PLMs. Extensive experimental results on BERT, ALBERT, and RoBERTa on several real-world datasets and fairness metrics show that CD3 outperforms baseline models on gender debiasing and race debiasing while retaining the language modeling capabilities of PLMs.
AB - Pre-trained Language Models (PLMs) have been shown to inherit and even amplify the social biases contained in the training corpus, leading to undesired stereotype in real-world applications. Existing techniques for mitigating the social biases of PLMs mainly rely on data augmentation with manually designed prior knowledge or fine-tuning with abundant external corpora to debias. However, these methods are not only limited by artificial experience, but also consume a lot of resources to access all the parameters of the PLMs and are prone to introduce new external biases when fine-tuning with external corpora. In this paper, we propose a Contrastive Self-Debiasing Model with Double Data Augmentation (named CD3) for mitigating social biases of PLMs. Specifically, CD3 consists of two stages: double data augmentation and contrastive self-debiasing. First, we build on counterfactual data augmentation to perform a secondary augmentation using biased prompts that are automatically searched by maximizing the differences in PLMs' encoding across demographic groups. Double data augmentation further amplifies the biases between sample pairs to break the limitations of previous debiasing models that heavily rely on prior knowledge in data augmentation. We then leverage the augmented data for contrastive learning to train a plug-and-play adapter to mitigate the social biases in PLMs' encoding without tuning the PLMs. Extensive experimental results on BERT, ALBERT, and RoBERTa on several real-world datasets and fairness metrics show that CD3 outperforms baseline models on gender debiasing and race debiasing while retaining the language modeling capabilities of PLMs.
KW - Contrastive learning
KW - Data augmentation
KW - Pre-trained language models
KW - Prompt learning
KW - Social bias
UR - http://www.scopus.com/inward/record.url?scp=85192216345&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85192216345&partnerID=8YFLogxK
U2 - 10.1016/j.artint.2024.104143
DO - 10.1016/j.artint.2024.104143
M3 - Article
AN - SCOPUS:85192216345
SN - 0004-3702
VL - 332
JO - Artificial Intelligence
JF - Artificial Intelligence
M1 - 104143
ER -