Learning with Labeling Induced Abstentions

Google Scholar

Abstract

Consider a setting where we wish to automate an expensive task with a machine
learning algorithm using a limited labeling resource. In such settings, examples
routed for labeling are often out of scope for the machine learning algorithm. For
example, in a spam detection setting, human reviewers not only provide labeled
data but are such high-quality detectors of spam that examples routed to them no
longer require machine evaluation. A consequence is that distribution of examples
routed to the machine is intimately tied to the process generating labels. We
introduce a formalization of this setting, and give an algorithm that simultaneously
learns a model and decides when to request a label by leveraging ideas from both
the abstention and active learning literatures. We prove an upper bound on the
algorithm’s label complexity and a matching lower bound for any algorithm in this
setting. We conduct a thorough set of experiments including an ablation study to
test different components of our algorithm. We demonstrate the effectiveness of an
efficient version of our algorithm over margin sampling on a variety of datasets.

Research Areas