DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION

Golan Pundak

Tara Sainath

Rohit Prabhavalkar

Anjuli Kannan

Ding Zhao

IEEE SLT (2018)

Download Google Scholar

Abstract

In automatic speech recognition (ASR) what a user says
depends on the particular context she is in. Typically, this
context is represented as a set of word n-grams. In this work,
we present a novel, all-neural, end-to-end (E2E) ASR system
that utilizes such context. Our approach, which we refer
to as Contextual Listen, Attend and Spell (CLAS) jointlyoptimizes
the ASR components along with embeddings of the
context n-grams. During inference, the CLAS system can be
presented with context phrases which might contain out-ofvocabulary
(OOV) terms not seen during training. We compare
our proposed system to a more traditional contextualization
approach, which performs shallow-fusion between independently
trained LAS and contextual n-gram models during
beam search. Across a number of tasks, we find that the proposed
CLAS system outperforms the baseline method by as
much as 68% relative WER, indicating the advantage of joint
optimization over individually trained components.
Index Terms: speech recognition, sequence-to-sequence
models, listen attend and spell, LAS, attention, embedded
speech recognition.

Research Areas

Speech Processing

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities