Machine Translation

Machine Translation is an excellent example of how cutting-edge research and world-class infrastructure come together at Google. We focus our research efforts on developing statistical translation techniques that improve with more data and generalize well to new languages. Our large scale computing infrastructure allows us to rapidly experiment with new models trained on web-scale data to significantly improve translation quality. This research backs the translations served at translate.google.com, allowing our users to translate text, web pages and even speech. Deployed within a wide range of Google services like GMail, Books, Android and web search, Google Translate is a high-impact, research-driven product that bridges language barriers and makes it possible to explore the multilingual web in 90 languages. Exciting research challenges abound as we pursue human quality translation and develop machine translation systems for new languages.

Recent Publications

Preview abstract The increasing complexity of cybersecurity and artificial intelligence (AI) executive orders, frameworks, and policies has made translating high-level directives into actionable implementation a persistent challenge. Policymakers, framework authors, and engineering teams often lack a unified approach for interpreting and operationalizing these documents, resulting in inefficiencies, misalignment, and delayed compliance. While existing standards such as the Open Security Controls Assessment Language (OSCAL) address control-level specifications, no standardized, machine-readable format exists for authoring and structuring high-level governance documents. This gap hinders collaboration across disciplines and obscures critical directives’ underlying intent and rationale. This report introduces Governance Schema (GovSCH), an open-source schema designed to standardize the authoring and translation of cybersecurity and AI governance documents into a consistent, machine-readable format. By analyzing prior executive orders, regulatory frameworks, and policies, GovSCH identifies common structures and authoring practices to create an interoperable model that bridges policymakers, regulatory framework authors, and engineering teams. This approach enables more precise articulation of policy intent, improves transparency, and accelerates the technical implementation of regulations. Ultimately, GovSCH aims to enhance collaboration, standardization, and efficiency in cybersecurity and AI governance. View details
Preview abstract This paper presents a novel approach to train a direct speech-to-speech translation model from monolingual datasets only in a fully unsupervised manner. The proposed approach combines back-translation, denoising autoencoder, and unsupervised embedding mapping techniques to achieve this goal. We demonstrate the effectiveness of the proposed approach by comparing it against a cascaded baseline using two Spanish and English datasets. The proposed approach achieved a significant improvement over the cascaded baseline on synthesized unpaired conversational and synthesized Common Voice $11$ datasets. View details
Connecting Language Technologies with Rich, Diverse Data Sources Covering Thousands of Languages
Sandy Ritchie
Sebastian Ruder
Julia Kreutzer
Clara Rivera
Ishank Saxena
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Preview abstract Contrary to common belief, there are rich and diverse data sources available for many thousands of languages, which can be used to develop technologies for these languages. In this paper, we provide an overview of some of the major online data sources, the types of data that they provide access to, potential applications of this data, and the number of languages that they cover. Even this covers only a small fraction of the data that exists; for example, printed books are published in many languages but few online aggregators exist. View details
MetricX-23: The Google Submission to the WMT 2023 Metrics Shared Task
Jurik Juraska
Mara Finkelstein
Aditya Siddhant
Mahdi Mirzazadeh
Conference on Machine Translation (2023)
Preview abstract This report details the MetricX-23 submission to the Workshop on Machine Translation's 2023 Metrics Shared Task and provides an overview of the experiments that informed which metrics were submitted. Our three submissions---each with a quality estimation (or reference-free) version---are all learned regression-based metrics that vary in the data used for training and which pretrained language model was used for initialization. We report results related to understanding (1) which supervised training data to use, (2) the impact of how the training labels are normalized, (3) the amount of synthetic training data to use, (4) how metric performance is related to model size, and (5) the effect of initializing the metrics with different pretrained language models. The training recipes that we found to be most successful are detailed in this report. View details
Ties Matter: Meta-Evaluating Modern Metrics with Pairwise Accuracy and Tie Calibration
George Foster
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Singapore, pp. 12914-12929
Preview abstract Kendall's tau is frequently used to meta-evaluate how well machine translation (MT) evaluation metrics score individual translations. Its focus on pairwise score comparisons is intuitive but raises the question of how ties should be handled, a gray area that has motivated different variants in the literature. We demonstrate that, in settings like modern MT meta-evaluation, existing variants have weaknesses arising from their handling of ties, and in some situations can even be gamed. We propose instead to meta-evaluate metrics with a version of pairwise accuracy that gives metrics credit for correctly predicting ties, in combination with a tie calibration procedure that automatically introduces ties into metric scores, enabling fair comparison between metrics that do and do not predict ties. We argue and provide experimental evidence that these modifications lead to fairer ranking-based assessments of metric performance. View details
Preview abstract Neural machine translation (NMT) has progressed rapidly over the past several years, and modern models are able to achieve relatively high quality using only monolingual text data, an approach dubbed Unsupervised Machine Translation, or UNMT. However, these models still struggle in a variety of ways, including aspects of translation that for a human are the easiest---for instance, correctly translating common nouns. This work explores a cheap and abundant resource to combat this problem: bilingual lexicons (\textsc{BiLex}s). We test the efficacy of bilingual lexicons in a real-world set-up, on 200-language translation models trained on web-mined text. We present several findings: (1) we demonstrate the most effective ways to use this resource for MT by extensively experimenting with lexical data augmentation techniques, such as codeswitching and lexical prompting; (2) we pinpoint what settings and languages are benefited most from lexical data augmentation; and (3) we provide an empirical, per-language analysis of the quality of the public resource PanLex, a multilingual lexicon covering thousands of languages. View details
×