nmT5 - Is parallel data still relevant for pre-training massively multilingual language models?

Aditya Siddhant

Linting Xue

Melvin Johnson

Mihir Sanjay Kale

Noah Constant

Rami Al-Rfou

Annual Meeting of the Association for Computational Linguistics (ACL) (2021) (to appear)

Google Scholar

Abstract

Recently, mT5 - a massively multilingual version of T5 - leveraged a unified text-to-text format to attain state-of-the-art results on a wide variety of multilingual NLP tasks. In this paper, we investigate the impact of incorporating parallel data into mT5 pre-training. We find that simply multi-tasking language modeling with objectives such as machine translation during pre-training leads to improved performance on downstream multilingual and cross-lingual tasks. However, the gains start to diminish as the model capacity increases, suggesting that parallel data might not be as essential for larger models. At the same time, even at larger model sizes, we find that pre-training with parallel data still provides benefits in the limited labelled data regime.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

nmT5 - Is parallel data still relevant for pre-training massively multilingual language models?

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

nmT5 - Is parallel data still relevant for pre-training massively multilingual language models?

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities