Multiclass Boosting and the Cost of Weak Learning

Elad Hazan

Indraneel Mukherjee

Nataly Brukhim

Robert Schapire

Shay Moran

Advances in Neural Information Processing Systems 34 (NeurIPS 2021) (2021)

Download Google Scholar

Abstract

Boosting is an algorithmic approach which is based on the idea
of combining weak and moderately inaccurate hypotheses to a strong and accurate one.
In this work we study multiclass boosting with a possibly large number of classes or categories.
Multiclass boosting can be formulated in various ways.
Here, we focus on an especially natural formulation in which the weak hypotheses
are assumed to belong to an ``easy-to-learn'' base class, and
the weak learner is an agnostic PAC learner for that class
with respect to the standard classification loss.
This is in contrast with other, more complicated losses as have often been considered in the past.
The goal of the overall boosting algorithm
is then to learn a combination of weak hypotheses
by repeatedly calling the weak learner.

We study the resources required for boosting, especially how they
depend on $k$, the number of classes.
For the booster, these include the number of examples needed for learning
and the number of oracle calls to the weak learner.
For the weak learner, we measure the required accuracy, that is, how
close the returned hypothesis must be to the best in the base class,
or alternatively, the number of unweighted
samples the boosting algorithm feeds the weak learner in each round.

We find that the boosting algorithm itself only requires $O(\log k)$
samples, as we show by analyzing a variant of AdaBoost for our
setting. In stark contrast, assuming typical limits on the number of weak-learner calls,
we prove that the number of samples required by any
weak learner is at least polynomial in $k$, exponentially more than the
number of samples needed by the booster.
Alternatively, we prove that the weak learner's accuracy parameter
must be smaller
than an inverse polynomial in $k$, showing that the returned weak
hypotheses must be nearly the best in their class when $k$ is large.
We also prove a trade-off between number of oracle calls and the
resources required of the weak learner, meaning that the fewer calls to the
weak learner the more that is demanded on each call.

Defining the technology of today and tomorrow.

Philosophy

People

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Multiclass Boosting and the Cost of Weak Learning

Abstract

Research Areas

Meet the teams driving innovation