Did the model understand the question?

Abstract

We analyze state-of-the-art deep learning
models for three tasks: question answering
on (1) images, (2) tables, and (3) passages
of text. Using the notion of attribution
(word importance), we find that
these deep networks often ignore important
question terms. Leveraging such behavior,
we perturb questions to craft a variety
of adversarial examples. Our strongest
attacks drop the accuracy of a visual question
answering model from 61.1% to 19%,
and that of a tabular question answering
model from 33.5% to 3.3%. Additionally,
we show how attributions can strengthen
attacks proposed by Jia and Liang (2017)
on paragraph comprehension models. Our
results demonstrate that attributions can
augment standard measures of accuracy
and empower investigation of model performance.
When a model is accurate but
for the wrong reasons, attributions can surface
erroneous logic in the model that indicates
inadequacies in the test data.