Hitting the High Notes: Subset Selection for Maximizing Expected Order Statistics

Alex Psomas
Aviad Rubinstein
Advances in Neural Information Processing Systems (2020)
Google Scholar

Abstract

We consider the fundamental problem of selecting k out of n random variables
in a way that the expected highest or second-highest value is maximized. This
question captures several applications where we have uncertainty about the quality
of candidates (e.g. auction bids, search results) and have the capacity to explore
only a small subset due to an exogenous constraint. For example, consider a second
price auction where system constraints (e.g., costly retrieval or model computation)
allow the participation of only k out of n bidders, and the goal is to optimize the
expected efficiency (highest bid) or expected revenue (second highest bid).

We study the case where we are given an explicit description of each random
variable. We give a PTAS for the problem of maximizing the expected highest value.
For the second-highest value, we prove a hardness result: assuming the Planted
Clique Hypothesis, there is no constant factor approximation algorithm that runs
in polynomial time. Surprisingly, under the assumption that each random variable
has monotone hazard rate (MHR), a simple score-based algorithm, namely picking
the k random variables with the largest 1/\sqrt{k} top quantile value, is a constant
approximation to the expected highest and second highest value, simultaneously.