Lower Frame Rate Neural Network Acoustic Models

Golan Pundak

Tara Sainath

Interspeech (2016)

Google Scholar

Abstract

Recently neural network acoustic models trained with Connectionist
Temporal Classification (CTC) were proposed as an alternative approach
to conventional cross-entropy trained neural network acoustic models which output frame-level decisions every 10ms~\cite{senior15asru}. As opposed to
conventional models, CTC learns an alignment jointly with the acoustic
model, and outputs a \textit{blank} symbol in addition to the
regular acoustic state units. This allows the CTC model to run with a
lower frame rate, outputting decisions every 30ms rather than 10ms as
in conventional models, thus improving overall system latency. In this
work, we explore how conventional models behave with lower frame
rates. On a large vocabulary Voice Search task, we will show that with
conventional models, we can slow the frame rate to 40ms while improving WER by 3\% relative over a CTC-based model.

Research Areas

Speech Processing

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Lower Frame Rate Neural Network Acoustic Models

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Lower Frame Rate Neural Network Acoustic Models

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities