Mixture Model Attention: Flexible Streaming and Non-Streaming Automatic Speech Recognition

Bhuvana Ramabhadran

Kartik Audhkhasi

Pedro Jose Moreno Mengibar

Tongzhou Chen

Proceedings of Interspeech, 2021 (to appear)

Google Scholar

Abstract

Streaming automatic speech recognition (ASR) hypothesizes words as soon as the input audio arrives, whereas non-streaming ASR can potentially wait for the completion of the entire utterance to hypothesize words.
Streaming and non-streaming ASR systems have typically used different acoustic encoders.
Recent work has attempted to unify them by either jointly training a fixed stack of streaming and non-streaming layers or using knowledge distillation during training to ensure consistency between the streaming and non-streaming predictions.
We propose mixture model (MiMo) attention as a simpler and theoretically-motivated alternative that replaces only the attention mechanism, requires no change to the training loss, and allows greater flexibility of switching between streaming and non-streaming mode during inference.
Our experiments on the public Librispeech data set and a few Indic language data sets show that MiMo attention endows a single ASR model with the ability to operate in both streaming and non-streaming modes without any overhead and without significant loss in accuracy compared to separately-trained streaming and non-streaming models.

Research Areas

Speech Processing

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Mixture Model Attention: Flexible Streaming and Non-Streaming Automatic Speech Recognition

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Mixture Model Attention: Flexible Streaming and Non-Streaming Automatic Speech Recognition

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities