Yishay Mansour

Yishay Mansour

Prof. Yishay Mansour got his PhD from MIT in 1990, following it he was a postdoctoral fellow in Harvard and a Research Staff Member in IBM T. J. Watson Research Center. Since 1992 he is at Tel-Aviv University, where he is currently a Professor of Computer Science and has serves as the first head of the Blavatnik School of Computer Science during 2000-2002. He was the founder and first director of the Israeli Center of Research Excellence in Algorithms. Prof. Mansour has published over 100 journal papers and over 200 proceeding papers in various areas of computer science with special emphasis on machine learning, algorithmic game theory, communication networks, and theoretical computer science and has supervised over a dozen graduate students in those areas. Prof. Mansour was named as an ACM fellow 2014, and he is currently an associate editor in a number of distinguished journals and has been on numerous conference program committees. He was both the program chair of COLT (1998) and the STOC (2016) and served twice on the COLT steering committee and is a member of the ALT steering committee.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract We revisit the fundamental question of formally defining what constitutes a reconstruction attack. While often clear from the context, our exploration reveals that a precise definition is much more nuanced than it appears, to the extent that a single all-encompassing definition may not exist. Thus, we employ a different strategy and aim to "sandwich" the concept of reconstruction attacks by addressing two complementing questions: (i) What conditions guarantee that a given system is protected against such attacks? (ii) Under what circumstances does a given attack clearly indicate that a system is not protected? More specifically, * We introduce a new definitional paradigm -- Narcissus Resiliency -- to formulate a security definition for protection against reconstruction attacks. This paradigm has a self-referential nature that enables it to circumvent shortcomings of previously studied notions of security. Furthermore, as a side-effect, we demonstrate that Narcissus resiliency captures as special cases multiple well-studied concepts including differential privacy and other security notions of one-way functions and encryption schemes. * We formulate a link between reconstruction attacks and Kolmogorov complexity. This allows us to put forward a criterion for evaluating when such attacks are convincingly successful. View details
    Preview abstract We introduce efficient differentially private (DP) algorithms for several linear algebraic tasks, including solving linear equalities over arbitrary fields, linear inequalities over the reals, and computing affine spans and convex hulls. As an application, we obtain efficient DP algorithms for learning halfspaces and affine subspaces. Our algorithms addressing equalities are strongly polynomial, whereas those addressing inequalities are weakly polynomial. Furthermore, this distinction is inevitable: no DP algorithm for linear programming can be strongly polynomial-time efficient. View details
    A Fine-grained Characterization of PAC Learnability
    Marco Bressan
    Nataly Brukhim
    Nicolo Cesa-Bianchi
    Emmanuel Esposito
    Shay Moran
    Maximilian Thiessen
    COLT (2025)
    Preview abstract In the multiclass PAC setting, even when full learnability is unattainable, meaningful information can often be extracted to guide predictions. However, classical learning theory has mainly focused on the dichotomy ``learnable vs.\ non-learnable'', leaving notions of partial learnability largely unexplored. Indeed, even for a non-learnable class, a learner may still achieve partial success—for example, by making reliable predictions whenever the true label belongs to a fixed subset of the label space, even if it fails otherwise. Similarly, the rigid nature of PAC learnability makes it impossible to distinguish between classes where one can achieve favorable trade-offs between, say, false-positive and false-negative rates, and classes where such trade-offs are fundamentally unattainable. In a nutshell, standard PAC learnability precludes a fine-grained exploration of learnability. To overcome this limitation, we develop a fine-grained theory of PAC learnability. For any hypothesis class \(\mathcal{H}\), given a loss function (which quantifies the penalty for predicting \(\hat{y}\) instead of the true label \(y\)) and a target loss threshold \(z\), our theory determines whether it is possible to achieve a loss of at most \(z\). In contrast, classical PAC learning considers only the special case of the zero-one loss and \(z = 0\), corresponding to a near perfect classification guarantee. We give a complete characterization of all attainable guarantees, captured by a \emph{finite family} of combinatorial dimensions, which we term the \emph{\(J\)-cube dimensions} of \(\mathcal{H}\). These dimensions are defined for every subset \(J\) of at least two labels. This extends the fundamental theorem of realizable PAC learning based on the VC dimension. In fact, our results hold in a more general multi-objective setting where we fully characterize the Pareto frontier of guarantees attainable for the class $\H$. View details
    Preview abstract We study an online finite-horizon Markov Decision Processes with adversarially changing loss and aggregate bandit feedback (a.k.a full-bandit). Under this type of feedback, the agent observes only the total loss incurred over the entire trajectory, rather than the individual losses at each intermediate step within the trajectory. We introduce the first Policy Optimization algorithms for this setting. In the known-dynamics case, we achieve the first \textit{optimal} regret bound of $\tilde \Theta(H^2\sqrt{SAK})$, where $K$ is the number of episodes, $H$ is the horizon, $S$ is the number of states, and $A$ is the number of actions of the MDP. In the unknown dynamics case we establish regret bound of $\tilde O(H^3 S \sqrt{AK})$, significantly improving the best known result by a factor of $H^2 S^5 A^2$. View details
    Principal-Agent Reward Shaping in MDPs
    Omer Ben-Porat
    Michal Moshkovitz
    Boaz Taitler
    AAAI 2024
    Preview abstract Principal-agent problems arise when one party acts on behalf of another, leading to conflicts of interest. The economic literature has extensively studied principal-agent problems, and recent work has extended this to more complex scenarios such as Markov Decision Processes (MDPs). In this paper, we further explore this line of research by investigating how reward shaping under budget constraints can improve the principal's utility. We study a two-player Stackelberg game where the principal and the agent have different reward functions, and the agent chooses an MDP policy for both players. The principal offers an additional reward to the agent, and the agent picks their policy selfishly to maximize their reward, which is the sum of the original and the offered reward. Our results establish the NP-hardness of the problem and offer polynomial approximation algorithms for two classes of instances: Stochastic trees and deterministic decision processes with a finite horizon. View details
    Partially Interpretable Models with Guarantees on Coverage and Accuracy
    Nave Frost
    Zachary Lipton
    Michal Moshkovitz
    Algorithmic Learning Theory (ALT) (2024)
    Preview abstract Simple, sufficient explanations furnished by short decision lists can be useful for guiding stakeholder actions. Unfortunately, this transparency can come at the expense of the higher accuracy enjoyed by black box methods, like deep nets. To date, practitioners typically either (i) insist on the simpler model, forsaking accuracy; or (ii) insist on maximizing accuracy, settling for post-hoc explanations of dubious faithfulness. In this paper, we propose a hybrid \emph{partially interpretable model} that represents a compromise between the two extremes. In our setup, each input is first processed by a decision list that can either execute a decision or abstain, handing off authority to the opaque model. The key to optimizing the decision list is to optimally trade off the accuracy of the composite system against coverage (the fraction of the population that receives explanations). We contribute a new principled algorithm for constructing partially interpretable decision lists, providing theoretical guarantees addressing both interpretability and accuracy. As an instance of our result, we prove that when the optimal decision list has length $k$, coverage $c$, and $b$ mistakes, our algorithm will generate a decision list that has length no greater than $4k$, coverage at least $c/2$, and makes at most $4b$ mistakes. Finally, we empirically validate the effectiveness of the new model. View details
    Preview abstract Given a policy, we define a {\newprob}%\emph{safe zone} as a subset of states, such that most of the policy's trajectories are confined to this subset. The quality of the {\newprob }%safe zone is parameterized by the number of states and the escape probability, i.e., the probability that a random trajectory will leave the subset. Safe zones are especially interesting when they have a small number of states and low escape probability. We study the complexity of finding optimal {\textsc{safeZones}}, and show that in general the problem is computationally hard. For this reason we concentrate on computing approximate {\textsc{safeZones}.} Our main result is a bi-criteria approximation algorithm which gives a factor of almost $2$ approximation for both the escape probability and safe zone size, using a polynomial size sample complexity. We conclude the paper with an empirical evaluation of our algorithm. View details
    Preview abstract We consider a seller faced with buyers which have the ability to delay their decision, which we call patience. Each buyer's type is composed of value and patience, and it is sampled i.i.d. from a distribution. The seller, using posted prices, would like to maximize her revenue from selling to the buyer. In this paper, we formalize this setting and characterize the resulting Stackelberg equilibrium, where the seller first commits to her strategy, and then the buyers best respond. Following this, we show how to compute both the optimal pure and mixed strategies. We then consider a learning setting, where the seller does not have access to the distribution over buyer's types. Our main results are the following. We derive a sample complexity bound for the learning of an approximate optimal pure strategy, by computing the fat-shattering dimension of this setting. Moreover, we provide a general sample complexity bound for the approximate optimal mixed strategy. We also consider an online setting and derive a vanishing regret bound with respect to both the optimal pure strategy and the optimal mixed strategy. View details
    Preview abstract In classic reinforcement Learning (RL) problems, policies are evaluated with respect to some reward function and all optimal policies obtain the same expected return. However, when considering real-world dynamic environments in which different users have different preferences, a policy that is optimal for one user might sub-optimal for another. In this work, we propose a multi-objective reinforcement learning framework that accommodates different user preferences over objectives, where preferences are learned via policy comparisons. Our setup consists of a Markov Decision Process with a multi-objective reward function, in which each user corresponds to (unknown) personal preferences vector and their reward in each state-action is the inner product of their preference vector with the multi-objective reward at that state-action. Our goal is to efficiently compute a near-optimal policy for a given user. We consider two user feedback models. We first address the case where a user is provided with two policies and the user feedback is their preferred policy. We then move to a different user feedback model, where a user is instead provided with two small weighted sets of representative trajectories and selects the preferred one. In both cases, we suggest an algorithm that finds a nearly optimal policy for the user using a small number of comparison queries. View details
    Preview abstract We study a generalization of boosting to the multiclass setting. We introduce a weak learning condition for multiclass classification that captures the original notion of weak learnability as being “slightly better than random guessing”. We give a simple and efficient boosting algorithm, that does not require realizability assumptions and its sample and oracle complexity bounds are independent of the number of classes. Furthermore, we utilize our new boosting technique in two fundamental settings: multiclass PAC learning and List PAC learning, resulting in simplified algorithms compared to previous works. View details