On the Robustness of Self-Attentive Models

Yu-Lun Hsieh
Minhao Cheng
Wei Wei
Wen-Lian Hsu
Cho-Jui Hsieh
Annual Meeting of the Association for Computational Linguistics (ACL) (2019)
Google Scholar

Abstract

This work examines the robustness of self-attentive neural networks against adversarial input perturbations. Specifically, we investigate the attention and feature extraction mechanisms of state-of-the-art recurrent neural networks and self-attentive architectures for sentiment analysis, entailment and machine translation under adversarial attacks. We also propose a novel attack algorithm for generating more natural adversarial examples that could mislead neural models but not humans. Experimental results show that, compared to recurrent neural models, self-attentive models are more robust against adversarial perturbation. In addition, we provide theoretical explanations for their superior robustness to support our claims.