OpenAI has released a multilingual dataset that evaluates the performance of AI models in 14 languages. As Michael Nuñez reports for VentureBeat, the Multilingual Massive Multitask Language Understanding (MMMLU) dataset includes languages such as Arabic, German, Swahili, and Yoruba. It was shared on the open data platform Hugging Face and builds on the popular MMLU benchmark, which previously only covered English. By using professional translators, OpenAI ensures higher accuracy than comparable machine-translated datasets. This initiative could improve global access to AI technology and help companies evaluate their AI systems in an international context.
From the official Hugging Face announcement:
The MMLU is a widely recognized benchmark of general knowledge attained by AI models. It covers a broad range of topics from 57 different categories, covering elementary-level knowledge up to advanced professional subjects like law, physics, history, and computer science.
We translated the MMLU’s test set into 14 languages using professional human translators. Relying on human translators for this evaluation increases confidence in the accuracy of the translations, especially for low-resource languages like Yoruba. We are publishing the professional human translations and the code we use to run the evaluations.
This effort reflects our commitment to improving the multilingual capabilities of AI models, ensuring they perform accurately across languages, particularly for underrepresented communities. By prioritizing high-quality translations, we aim to make AI technology more inclusive and effective for users worldwide.