A Case for a Range of Acceptable Annotations

Olivia Rhinehart
Workshop on Subjectivity, Ambiguity and Disagreement in Crowdsourcing, AAAI (HCOMP 2018) (2018)

Abstract

Multi-way annotation is often used to ensure data quality in crowdsourced annotation tasks. Each item is annotated redundantly and the contributors’ judgments are converted into a single “ground truth” label or more complex annotation through a resolution technique (e.g., on the basis of majority or plurality). Recent crowdsourcing research has argued against the notion of a single “ground truth” annotation for items in semantically oriented tasks—that is, we should accept the aggregated judgments of a large pool of crowd contributors as “crowd truth.” While we agree that many semantically oriented tasks are inherently subjective, we do not go so far as to trust the judgments of the crowd in all cases. We recognize that there may be items for which there is truly only one acceptable response, and that there may be divergent annotations that are truly of unacceptable quality. We propose that there exists a class of annotations between these two categories that exhibit acceptable variation, which we define as the range of annotations for a given item that meet the standard of quality for a task. We illustrate acceptable variation within existing annotated data sets, including a labeled sound corpus and a medical relation extraction corpus. Finally, we explore the implications of acceptable variation on annotation task design and annotation quality evaluation.