Unified Verbalization for Speech Recognition & Synthesis Across Languages

Richard Sproat
Christian Schallhart
Nikos Bampounis
Jonas Fromseier Mortensen
Millie Holt
Proceedings of Interspeech 2019

Abstract

We describe a new approach to converting written tokens to their spoken form, which can be used across automatic speech recognition (ASR) and text-to-speech synthesis (TTS) systems. Both ASR and TTS systems need to map from the written to the spoken domain, and we present an approach that enables us to share verbalization grammars between the two systems. We also describe improvements to an induction system for number name grammars. Between these shared ASR/TTS verbalization systems and the improved induction system for number name grammars, we see significant gains in development time and scalability across languages