Data Quality Issues in Multilingual Speech Datasets: The Need for Sociolinguistic Awareness and Proactive Language Planning

Mingfei Lau
Allen Chen
Yeming Fang
Tingting Xu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (ACL), Vienna, Austria (2025), 7466–7492

Abstract