Conversations Gone Awry: Detecting Warning Signs of Conversational Failure

Justine Zhang
Jonathan P. Chang
Cristian Danescu-Niculescu-Mizil
Dario Taraborelli
Proceedings of ACL, ACM Digital Library (2018)
Google Scholar

Abstract

One of the main challenges online social systems face today is
the prevalence of toxic behavior, such as harassment and personal
attacks. This type of antisocial behavior is especially perplexing and
disruptive when it emerges in the context of healthy conversations
where, at least in principle, participants share a common goal and set
of norms. In this work, we introduce the task of predicting whether a
given conversation is on the verge of being derailed by the antisocial
actions of one of its participants. As opposed to detecting toxic
behavior after the fact, this task aims to enable early, actionable
information at a time when the conversation might still be salvaged.

We focus on two methodological challenges. First, through a combination
of machine learning, crowd-sourcing and causal inference techniques
applied to a novel dataset of 8 million conversations,
we design a controlled setting that allows us to compare healthy
conversations that deteriorate with similar conversations that stay on
track, while accounting for confounding factors such as topical focus
and number of participants. Second, we propose a framework for
applying and evaluating linguistic, conversational and social patterns
in the task of predicting the future trajectory of a conversation.

Our primary result is that a simple model using conversational and
linguistic features can achieve performance close to that of humans
in predicting whether a civil conversation will go awry. We also show
that the conversational context is more informative in this task than
the history and experience of the participants. By demonstrating the
feasibility of the prediction task, and by providing a labeled dataset,
as well as a human baseline, we lay the ground for further work
on methods for detecting early warning signs, and for eventually
preventing, antisocial behavior in online discussions.