Quantifying Metric and Model Agreement in Bias Evaluation of Large Language Models
Published in The 64th Annual Meeting of the Association for Computational Linguistics (ACL), 2026
This paper presents a comprehensive study quantifying the agreement across various bias evaluation metrics and Large Language Models (LLMs), aiming to provide a more robust understanding of fairness assessments in natural language processing. By systematically comparing how different models and metrics align (or diverge) when evaluating demographic biases, this work highlights critical considerations for ensuring reliable and consistent fairness benchmarks.
Recommended citation: Asgari, A., Wu, H., Naziri, A., Kolahdouzi, M., & Seyyed-Kalantari, L. (2026). "Quantifying Metric and Model Agreement in Bias Evaluation of Large Language Models." The 64th Annual Meeting of the Association for Computational Linguistics.
Download Paper
