WEFE: The Word Embeddings Fairness Evaluation Framework

WEFE: The Word Embeddings Fairness Evaluation Framework

Pablo Badilla, Felipe Bravo-Marquez, Jorge Pérez

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
Main track. Pages 430-436. https://doi.org/10.24963/ijcai.2020/60

Word embeddings are known to exhibit stereotypical biases towards gender, race, religion, among other criteria. Severa fairness metrics have been proposed in order to automatically quantify these biases. Although all metrics have a similar objective, the relationship between them is by no means clear. Two issues that prevent a clean comparison is that they operate with different inputs, and that their outputs are incompatible with each other. In this paper we propose WEFE, the word embeddings fairness evaluation framework, to encapsulate, evaluate and compare fairness metrics. Our framework needs a list of pre-trained embeddings and a set of fairness criteria, and it is based on checking correlations between fairness rankings induced by these criteria. We conduct a case study showing that rankings produced by existing fairness methods tend to correlate when measuring gender bias. This correlation is considerably less for other biases like race or religion. We also compare the fairness rankings with an embedding benchmark showing that there is no clear correlation between fairness and good performance in downstream tasks.
Keywords:
AI Ethics: Fairness
Natural Language Processing: Embeddings
Trust, Fairness, Bias: General
Natural Language Processing: NLP Applications and Tools