Yishay Mansour
Prof. Yishay Mansour got his PhD from MIT in 1990, following it he was
a postdoctoral fellow in Harvard and a Research Staff Member in IBM T.
J. Watson Research Center. Since 1992 he is at Tel-Aviv University,
where he is currently a Professor of Computer Science and has serves
as the first head of the Blavatnik School of Computer Science during
2000-2002. He was the founder and first director of the Israeli Center of Research
Excellence in Algorithms.
Prof. Mansour has published over 100 journal papers and over 200
proceeding papers in various areas of computer science with special
emphasis on machine learning, algorithmic game theory, communication
networks, and theoretical computer science and has supervised over a
dozen graduate students in those areas.
Prof. Mansour was named as an ACM fellow 2014, and he is currently an
associate editor in a number of distinguished journals and has been on
numerous conference program committees. He was both the program chair
of COLT (1998) and the STOC (2016) and served twice on the COLT
steering committee and is a member of the ALT steering committee.
Research Areas
Authored Publications
Sort By
Preview abstract
We study an online finite-horizon Markov Decision Processes with adversarially changing loss and aggregate bandit feedback (a.k.a full-bandit). Under this type of feedback,
the agent observes only the total loss incurred over the entire trajectory, rather than the individual losses at each intermediate step within the trajectory. We introduce the first Policy Optimization algorithms for this setting.
In the known-dynamics case, we achieve the first \textit{optimal} regret bound of $\tilde \Theta(H^2\sqrt{SAK})$, where $K$ is the number of episodes, $H$ is the horizon, $S$ is the number of states, and $A$ is the number of actions of the MDP.
In the unknown dynamics case we establish regret bound of $\tilde O(H^3 S \sqrt{AK})$, significantly improving the best known result by a factor of $H^2 S^5 A^2$.
View details
Of Dice and Games: A Theory of Generalized Boosting
Marco Bressan
Nataly Brukhim
Nicolo Cesa-Bianchi
Emmanuel Esposito
Shay Moran
Maximilian Thiessen
COLT (2025)
Preview abstract
Cost-sensitive loss functions are crucial in many real-world prediction problems, where different types of errors are penalized differently; for example, in medical diagnosis, a false negative prediction can lead to severe consequences. However, traditional PAC learning theory has mostly focused on the symmetric 0-1 loss, leaving cost-sensitive losses largely unaddressed.
In this work, we extend the celebrated theory of boosting to incorporate both cost-sensitive and multi-objective losses.
Cost-sensitive losses assign costs to the entries of a confusion matrix, and are used to control the sum of prediction errors accounting for the cost of each error type. Multi-objective losses, on the other hand, simultaneously track multiple cost-sensitive losses,
and are useful when the goal is to satisfy several criteria at once (e.g., minimizing false positives while keeping false negatives below a critical threshold).
We develop a comprehensive theory of cost-sensitive and multi-objective boosting, providing a taxonomy of weak learning guarantees that distinguishes which guarantees are trivial (i.e., they can always be achieved), which are boostable (i.e., they imply strong learning), and which are intermediate, implying non-trivial yet not arbitrarily accurate learning. For binary classification, we establish a dichotomy: a weak learning guarantee is either trivial or boostable. In the multiclass setting, we describe a more intricate landscape of intermediate weak learning guarantees. Our characterization relies on a geometric interpretation of boosting, revealing a useful duality between cost-sensitive and multi-objective losses.
View details
Preview abstract
We present regret minimization algorithms for the contextual multi-armed bandit (CMAB) problem over $K$ actions in the presence of delayed feedback, a scenario where loss observations arrive with delays chosen by an adversary.
As a preliminary result, assuming direct access to a finite policy class $\Pi$ we establish an optimal expected regret bound of $ O (\sqrt{KT \log |\Pi|} + \sqrt{D \log |\Pi|)} $ where $D$ is the sum of delays.
For our main contribution, we study the general function approximation setting over a (possibly infinite) contextual loss function class $ \mathcal{F} $ with access to an online least-square regression oracle $\mathcal{O}$ over $\mathcal{F}$. In this setting, we achieve an expected regret bound of $O(\sqrt{KT\mathcal{R}_T(\mathcal{O})} + \sqrt{ d_{\max} D \beta})$ assuming FIFO order, where $d_{\max}$ is the maximal delay, $\mathcal{R}_T(\mathcal{O})$ is an upper bound on the oracle's regret and $\beta$ is a stability parameter associated with the oracle. We complement this general result by presenting a novel stability analysis of a Hedge-based version of Vovk's aggregating forecaster as an oracle implementation for least-square regression over a finite function class $\mathcal{F}$ and show that its stability parameter $\beta$ is bounded by $\log |\F|$, resulting in an expected regret bound of $O(\sqrt{KT \log |\mathcal{F}|} + \sqrt{d_{\max} D \log |\mathcal{F}|})$
which is a $\sqrt{d_{\max}}$ factor away from the lower bound of $\Omega(\sqrt{KT \log |\mathcal{F}|} + \sqrt{D \log |\mathcal{F}|})$ that we also present.
View details
Preview abstract
In many repeated auction settings, participants care not only about how frequently they win but also about how their winnings are distributed over time.
This problem arises in various practical domains where avoiding congested demand is crucial and spacing is important, such as advertising campaigns.
We initiate the study of repeated auctions with preferences for spacing of wins over time.
We introduce a simple model of this phenomenon, where the value of a win is given by any concave function of the time since the last win.
We study the optimal policies for this setting in repeated second-price auctions and offer learning algorithms for the bidders that achieve $\tilde O(\sqrt{T})$ regret.
We achieve this by showing that an infinite-horizon Markov decision process (MDP) with the budget constraint in expectation is essentially equivalent to our problem, \emph{even when limiting that MDP to a very small number of states}.
The algorithm achieves low regret by learning a bidding policy that chooses bids as a function of the context and the state of the system, which will be the time elapsed since the last win (or conversion).
View details
Preview abstract
We study the multi-armed bandit problem with adversarially chosen delays in the Best-of-Both-Worlds (BoBW) framework, which aims to achieve near-optimal performance in both stochastic and adversarial environments. While prior work has made progress toward this goal, existing algorithms suffer from significant gaps to the known lower bounds, especially in the stochastic settings. Our main contribution is a new algorithm that, up to logarithmic factors, matches the known lower bounds in each setting individually.
In the adversarial case, our algorithm achieves regret of $\widetilde{O}(\sqrt{KT} + \sqrt{D})$, which is optimal up to logarithmic terms, where $T$ is the number of rounds, $K$ is the number of arms, and $D$ is the cumulative delay. In the stochastic case, we provide a regret bound which scale as $\sum_{i:\Delta_i>0}\roundy{\logp T/\Delta_i} + \frac{1}{K}\sum \Delta_i \sigma_{max}$, where $\Delta_i$ is the sub-optimality gap of arm $i$ and $\sigma_{\max}$ is the maximum number of missing observations.
To the best of our knowledge, this is the first \textit{BoBW} algorithm to simultaneously match the lower bounds in both stochastic and adversarial regimes in delayed environment. Moreover, even beyond the BoBW setting, our stochastic regret bound is the first to match the known lower bound under adversarial delays, improving the second term over the best known result by a factor of $K$.
View details
Preview abstract
We study the regret in stochastic Multi-Armed Bandits (MAB) with multiple agents that communicate over an arbitrary connected communication graph.
We analyzed a variant of Cooperative Successive Elimination algorithm, $\coopse$, and show an individual regret bound of ${O}(\mathcal{R} / m + A^2 + A \sqrt{\log T})$ and a nearly matching lower bound.
Here $A$ is the number of actions, $T$ the time horizon, $m$ the number of agents, and $\mathcal{R} = \sum_{\Delta_i > 0}\log(T)/\Delta_i$ is the optimal single agent regret, where $\Delta_i$ is the sub-optimality gap of action $i$.
Our work is the first to show an individual regret bound in cooperative stochastic MAB that is independent of the graph's diameter.
When considering communication networks there are additional considerations beyond regret, such as message size and number of communication rounds.
First, we show that our regret bound holds even if we restrict the messages to be of logarithmic size.
Second, for logarithmic number of
communication rounds, we obtain a regret bound of ${O}(\mathcal{R} / m+A \log T)$.
View details
Preview abstract
In this paper, we investigate a variant of the classical stochastic Multi-armed Bandit (MAB) problem, where the payoff received by an agent (either cost or reward) is both delayed, and directly corresponds to the magnitude of the delay. This setting models faithfully many real world scenarios such as the time it takes for a data packet to traverse a network given a choice of route (where delay serves as the agent’s cost); or a user's time spent on a web page given a choice of content (where delay serves as the agent’s reward). Our main contributions are tight upper and lower bounds for both the cost and reward settings. For the case that delays serve as costs, we prove optimal regret that scales as $\sum_{i:\Delta_i > 0}\frac{\log T}{\Delta_i} + d^*$ where $T$ is the number of rounds, $\Delta_i$ are the sub-optimality gaps and $d^*$ is the minimal expected delay amongst arms. For the case that delays serves as rewards, we show optimal regret of $\sum_{i:\Delta_i > 0}\frac{\log T}{\Delta_i} + \bar{d}$ where $\bar d$ is the second maximal expected delay. These improve over the regret in the general delay-dependent payoff setting, which scales as $\sum_{i:\Delta_i > 0}\frac{\log T}{\Delta_i} + D$, where $D$ is the maximum possible delay. Our regret bounds highlight the difference between the cost and reward scenarios, showing that the improvement in the cost scenario is more significant than for the reward. Finally, we accompany our theoretical results with an empirical evaluation.
View details
Preview abstract
We introduce efficient differentially private (DP) algorithms for several linear algebraic tasks, including solving linear equalities over arbitrary fields, linear inequalities over the reals, and computing affine spans and convex hulls. As an application, we obtain efficient DP algorithms for learning halfspaces and affine subspaces. Our algorithms addressing equalities are strongly polynomial, whereas those addressing inequalities are weakly polynomial. Furthermore, this distinction is inevitable: no DP algorithm for linear programming can be strongly polynomial-time efficient.
View details
Preview abstract
We revisit the fundamental question of formally defining what constitutes a reconstruction attack. While often clear from the context, our exploration reveals that a precise definition is much more nuanced than it appears, to the extent that a single all-encompassing definition may not exist. Thus, we employ a different strategy and aim to "sandwich" the concept of reconstruction attacks by addressing two complementing questions: (i) What conditions guarantee that a given system is protected against such attacks? (ii) Under what circumstances does a given attack clearly indicate that a system is not protected? More specifically,
* We introduce a new definitional paradigm -- Narcissus Resiliency -- to formulate a security definition for protection against reconstruction attacks. This paradigm has a self-referential nature that enables it to circumvent shortcomings of previously studied notions of security. Furthermore, as a side-effect, we demonstrate that Narcissus resiliency captures as special cases multiple well-studied concepts including differential privacy and other security notions of one-way functions and encryption schemes.
* We formulate a link between reconstruction attacks and Kolmogorov complexity. This allows us to put forward a criterion for evaluating when such attacks are convincingly successful.
View details
Preview abstract
We introduce a novel online learning framework that unifies and generalizes pre-established models, such as delayed and corrupted feedback, to encompass adversarial environments where action feedback evolves over time. In this setting, the observed loss is arbitrary and may not correlate with the true loss incurred, with each round updating previous observations adversarially. We propose regret minimization algorithms for both the full-information and bandit settings, with regret bounds quantified by the average feedback accuracy relative to the true loss. Our algorithms match the known regret bounds across many special cases, while also introducing previously unknown bounds.
View details