Fast Decoding in Sequence Models Using Discrete Latent Variables

Lukasz Kaiser
Ashish Vaswani
Niki J. Parmar
Samy Bengio
Jakob Uszkoreit
Noam Shazeer
ICML (2018)

Abstract

Auto-regressive sequence models based on deep neural networks, such as
RNNs, Wavenet and Transformer are the state of the art on many tasks.
However, they lack parallelism and are thus slow for long sequences.
RNNs lack parallelism both during training and decoding, while
architectures like WaveNet and Transformer are much more parallel
during training, but still lack parallelism during decoding.

We present a method to extend sequence models using
discrete latent variables that makes decoding much more parallel.
The main idea behind this approach is to first autoencode the
target sequence into a shorter discrete latent sequence,
which is generated auto-regressively,
and finally decode the full sequence from this shorter
latent sequence in a parallel manner.
We verify that our method works on the task of neural machine
translation, where our models are an order of magnitude faster than comparable
auto-regressive models. We also introduce a new method for constructing discrete
latent variables that allows us to obtain good BLEU scores.