Understanding Social Biases Behind Location Names in Contextual Word Embedding Models

Fangsheng Wu, Mengnan Du, Chao Fan, Ruixiang Tang, Yang Yang, Ali Mostafavi, Xia Hu

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Embeddings of textual data containing location names (e.g., social media posts) have essential applications in various contexts such as marketing and disaster management. In these downstream implementations, social biases behind location names are highly prone to introduce unfair results through their embeddings; for example, emergent text messages with swapped location names might result in varied rescue responses. Hence, it is critical to address social biases encoded in location names and to seek its mitigation. Prevalent works addressing biases in embeddings mainly focus on individual attributes like gender or ethnicity. Yet, a large number of social attributes behind location names (e.g., income level and population density) makes it challenging to originate the source of biases. Existing mitigation methods based on finding attribute subspaces cannot be simply applied to address social biases. Moreover, bias mitigation tends to simultaneously remove necessary semantics from embeddings, making it difficult to achieve a balance between mitigation performance and semantics retention. In this article, we first employ the concept of counterfactual fairness to investigate the social biases encoded in training data. Then, we quantify the biases in the contextual embeddings (BERT and ELMo). We report a high correlation between biases in the training data and embeddings. Next, we introduce a novel bias mitigation algorithm that customizes bias representations for any location names. The method yields debiased location name vectors for various social attributes simultaneously. The proposed algorithm achieves a better mitigation performance on overall attributes compared with a prevalent postprocessing method, while maintaining correctness by retaining semantic information.

Original languageEnglish (US)
Pages (from-to)458-468
Number of pages11
JournalIEEE Transactions on Computational Social Systems
Volume9
Issue number2
DOIs
StatePublished - Apr 1 2022
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Modeling and Simulation
  • Social Sciences (miscellaneous)
  • Human-Computer Interaction

Keywords

  • Contextual word embeddings
  • fairness
  • mitigation
  • social attributes

Fingerprint

Dive into the research topics of 'Understanding Social Biases Behind Location Names in Contextual Word Embedding Models'. Together they form a unique fingerprint.

Cite this