Unlocking Compositional Generalization in Pre-trained ModelsUsing Intermediate Representations

Ice Pasupat
Ming-Wei Chang
(2021)

Abstract

Pre-trained seq2seq models are prevalent in semantic parsing, but have been found to struggle at out-of-distribution compositional generalization. In contrast, specialized model architectures have been proposed to address this issue, often at the cost of generality and in-distribution performance.
In this paper, we propose a simple strategy to unlock compositionality
of pre-trained seq2seq models through intermediate representations,
without changing the model architectures at all. We identify several effective strategies for designing reversible and lossy intermediate representations that reduce the structural mismatch between inputs and outputs. We then apply either deterministic transformations or a second seq2seq to map the intermediate form to the original executable form.
We find that the combination of our proposed transformations and pre-trained models is surprisingly effective, obtaining a new state-of-the-art on CFQ (+11.9 accuracy points) and on the template-splits of three text-to-SQL datasets (+15.0 to +19.4 accuracy points).
This work highlights that intermediate representations provide an important (and potentially overlooked) degree of freedom for improving the compositional generalization abilities of pre-trained seq2seq models.