Metamorphic Testing and Certified Mitigation of Fairness Violations in NLP Models

Metamorphic Testing and Certified Mitigation of Fairness Violations in NLP Models

Pingchuan Ma, Shuai Wang, Jin Liu

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
Main track. Pages 458-465. https://doi.org/10.24963/ijcai.2020/64

Natural language processing (NLP) models have been increasingly used in sensitive application domains including credit scoring, insurance, and loan assessment. Hence, it is critical to know that the decisions made by NLP models are free of unfair bias toward certain subpopulation groups. In this paper, we propose a novel framework employing metamorphic testing, a well-established software testing scheme, to test NLP models and find discriminatory inputs that provoke fairness violations. Furthermore, inspired by recent breakthroughs in the certified robustness of machine learning, we formulate NLP model fairness in a practical setting as (ε, k)-fairness and accordingly smooth the model predictions to mitigate fairness violations. We demonstrate our technique using popular (commercial) NLP models, and successfully flag thousands of discriminatory inputs that can cause fairness violations. We further enhance the evaluated models by adding certified fairness guarantee at a modest cost.
Keywords:
AI Ethics: Fairness
Natural Language Processing: NLP Applications and Tools