What do you learn from context? Probing for sentence structure in contextualized word representations

Patrick Xia
Berlin Chen
Alex Wang
Adam Poliak
R. Thomas McCoy
Najoung Kim
Benjamin Van Durme
Samuel R. Bowman
International Conference on Learning Representations (2019)

Abstract

Contextualized representation models such as CoVe (McCann et al., 2017) and
ELMo (Peters et al., 2018a) have recently achieved state-of-the-art results on a
broad suite of downstream NLP tasks. Building on recent token-level probing
work (Peters et al., 2018a; Blevins et al., 2018; Belinkov et al., 2017b; Shi et al.,
2016), we introduce a broad suite of sub-sentence probing tasks derived from the traditional
structured-prediction pipeline, including parsing, semantic role labeling,
and coreference, and covering a range of syntactic, semantic, local, and long-range
phenomena. We use these tasks to examine the word-level contextual representations
and investigate how they encode information about the structure of
the sentence in which they appear. We probe three recently-released contextual encoder models,
and find that ELMo better encodes linguistic structure at the word level than do other comparable
models. We find that the existing models trained on language modeling and translation
produce strong representations for syntactic phenomena, but only offer small improvements
on semantic tasks over a non-contextual baseline.