BATED: Learning fair representation for Pre-trained Language Models via biased teacher-guided disentanglement

  • Yingji Li
  • , Mengnan Du
  • , Rui Song
  • , Mu Liu
  • , Ying Wang

Research output: Contribution to journalArticlepeer-review

Abstract

With the rapid development of Pre-trained Language Models (PLMs) and their widespread deployment in various real-world applications, social biases of PLMs have attracted increasing attention, especially the fairness of downstream tasks, which potentially affects the development and stability of society. Among existing debiasing methods, intrinsic debiasing methods are not necessarily effective when applied to downstream tasks, and the downstream fine-tuning process may introduce new biases or catastrophic forgetting. Most extrinsic debiasing methods rely on sensitive attribute words as prior knowledge to supervise debiasing training. However, it is difficult to collect sensitive attribute information of real data due to privacy and regulation. Moreover, limited sensitive attribute words may lead to inadequate debiasing training. To this end, this paper proposes a debiasing method to learn fair representation for PLMs via BiAsed TEacher-guided Disentanglement (called BATED). Specific to downstream tasks, BATED performs debiasing training under the guidance of a biased teacher model rather than relying on sensitive attribute information of the training data. First, we leverage causal contrastive learning to train a task-agnostic general biased teacher model. We then employ Variational Auto-Encoder (VAE) to disentangle the PLM-encoded representation into the fair representation and the biased representation. The Biased representation is further decoupled via biased teacher-guided disentanglement, while the fair representation learn downstream tasks. Therefore, BATED guarantees the performance of downstream tasks while improving the fairness. Experimental results on seven PLMs testing three downstream tasks demonstrate that BATED outperforms the state-of-the-art overall in terms of fairness and performance on downstream tasks.

Original languageEnglish (US)
Article number104401
JournalArtificial Intelligence
Volume348
DOIs
StatePublished - Nov 2025
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language
  • Artificial Intelligence

Keywords

  • Causal contrastive learning
  • Fairness
  • Feature disentanglement
  • Pre-trained Language Models
  • Social bias

Fingerprint

Dive into the research topics of 'BATED: Learning fair representation for Pre-trained Language Models via biased teacher-guided disentanglement'. Together they form a unique fingerprint.

Cite this