Learning semantic relationships for better action retrieval in images

Vignesh Ramanathan
Jia Deng
Wei Han
Zhen Li
Kunlong Gu
Samy Bengio
Chuck Rosenberg
Li Fei-Fei
CVPR (2015)
Google Scholar

Abstract

Human actions capture a wide variety of interactions
between people and objects. As a result, the set of possible
actions is extremely large and it is difficult to obtain
sufficient training examples for all actions. However, we
could compensate for this sparsity in supervision by leveraging
the rich semantic relationship between different actions.
A single action is often composed of other smaller
actions and is exclusive of certain others. We need a method
which can reason about such relationships and extrapolate
unobserved actions from known actions. Hence, we propose
a novel neural network framework which jointly extracts
the relationship between actions and uses them for
training better action retrieval models. Our model incorporates
linguistic, visual and logical consistency based cues
to effectively identify these relationships. We train and test
our model on a largescale image dataset of human actions.
We show a significant improvement in mean AP compared
to different baseline methods including the HEX-graph approach
from Deng et al. [8]