Highway-LSTM and Recurrent Highway Networks for Speech Recognition

Proc. Interspeech 2017, ISCA

Abstract

Recently, very deep networks, with as many as hundreds of
layers, have shown great success in image classification tasks.
One key component that has enabled such deep models is the
use of “skip connections”, including either residual or highway
connections, to alleviate the vanishing and exploding gradient
problems. While these connections have been explored
for speech, they have mainly been explored for feed-forward
networks. Since recurrent structures, such as LSTMs, have produced
state-of-the-art results on many of our Voice Search tasks,
the goal of this work is to thoroughly investigate different approaches
to adding depth to recurrent structures. Specifically,
we experiment with novel Highway-LSTM models with bottlenecks
skip connections and show that a 10 layer model can outperform
a state-of-the-art 5 layer LSTM model with the same
number of parameters by 2% relative WER. In addition, we experiment
with Recurrent Highway layers and find these to be on
par with Highway-LSTM models, when given sufficient depth.

Research Areas