Capturing Covertly Toxic Speech via Crowdsourcing

Alyssa Whitlock Lees
Daniel Borkan
Ian Kivlichan
Jorge M Nario
HCI, https://sites.google.com/corp/view/hciandnlp/home (2021) (to appear)
Google Scholar

Abstract

We study the task of extracting covert or veiled toxicity labels from user comments. Prior research has highlighted the difficulty in creating language models that recognize nuanced toxicity such as microaggressions. Our investigations further underscore the difficulty in parsing such labels reliably from raters via crowdsourcing. We introduce an initial dataset, COVERTTOXICITY, which aims to identify such comments from a refined rater template, with rater associated categories. Finally, we fine-tune a comment-domain BERT model to classify covertly offensive comments and compare against existing baselines.