Unsupervised Paraphrasing without Translation

David Grangier
ACL (2019)

Abstract

Paraphrasing exemplifies the ability to
abstract semantic content from surface forms. Recent work
on automatic paraphrasing is dominated by methods leveraging
Machine Translation (MT) as an intermediate step. This contrasts with
humans, who can paraphrase without being bilingual.

This work proposes to learn paraphrasing models from an unlabeled
monolingual corpus only. To that end, we propose a residual variant of
vector-quantized variational auto-encoder.

We compare with MT-based approaches on paraphrase identification,
generation, and training augmentation.
Monolingual paraphrasing outperforms unsupervised MT
in all settings. Comparisons with supervised MT are more mixed:
monolingual paraphrasing is interesting for identification and
augmentation; supervised MT is superior for generation.