Corinna Cortes
Corinna Cortes is a VP in Google Research, where she is working on a broad range of theoretical and applied large-scale machine learning problems. Prior to Google, Corinna spent more than ten years at AT&T Labs - Research, formerly AT&T Bell Labs, where she held a distinguished research position. Corinna's research work is well-known in particular for her contributions to the theoretical foundations of support vector machines (SVMs), for which she jointly with Vladimir Vapnik received the 2008 Paris Kanellakis Theory and Practice Award, and her work on data-mining in very large data sets for which she was awarded the AT&T Science and Technology Medal in the year 2000. Corinna received her MS degree in Physics from University of Copenhagen and joined AT&T Bell Labs as a researcher in 1989. She received her Ph.D. in computer science from the University of Rochester in 1993.
Corinna is also a competitive runner and a mother of two.
Authored Publications
Sort By
Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data
Anqi Mao
Proceedings of the 42nd International Conference on Machine Learning (ICML 2025)
Preview abstract
Class imbalance remains a major challenge in machine learning, especially in multi-class problems with long-tailed distributions. Existing methods, such as data resampling, cost-sensitive techniques, and logistic loss modifications, though popular and often effective, lack solid theoretical foundations. As an example, we demonstrate that cost-sensitive methods are not Bayes-consistent. This paper introduces a novel theoretical framework for analyzing generalization in imbalanced classification. We propose a new class-imbalanced margin loss function for both binary and multi-class settings, prove its strong $H$-consistency, and derive corresponding learning guarantees based on empirical loss and a new notion of class-sensitive Rademacher complexity. Leveraging these theoretical results, we devise novel and general learning algorithms, IMMAX (*Imbalanced Margin Maximization*), which incorporate confidence margins and are applicable to various hypothesis sets. While our focus is theoretical, we also present extensive empirical results demonstrating the effectiveness of our algorithms compared to existing baselines.
View details
Improved Balanced Classification with Theoretically Grounded Loss Functions
The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025)
Preview abstract
The *balanced loss* is a widely adopted objective for multi-class classification under class imbalance. By assigning equal importance to all classes, regardless of their frequency, it promotes fairness and ensures that minority classes are not overlooked. However, directly minimizing the balanced classification loss is typically intractable, which makes the design of effective surrogate losses a central question. This paper introduces and studies two advanced surrogate loss families: Generalized Logit-Adjusted (GLA) loss functions and Generalized Class-Aware weighted (GCA) losses. GLA losses generalize Logit-Adjusted losses, which shift logits based on class priors, to the broader general cross-entropy loss family. GCA loss functions extend the standard class-weighted losses, which scale losses inversely by class frequency, by incorporating class-dependent confidence margins and extending them to the general cross-entropy family. We present a comprehensive theoretical analysis of consistency for both loss families. We show that GLA losses are Bayes-consistent, but only $H$-consistent for unbounded and complete hypothesis sets. Moreover, their $H$-consistency bounds depend inversely on the minimum class probability, scaling at least as $1/\mathsf p _{\min}$. In contrast, GCA losses are $H$-consistent for any hypothesis set that is bounded or complete, with $H$-consistency bounds that scale more favorably as $1/\sqrt{\mathsf p _{\min}}$, offering significantly stronger theoretical guarantees in imbalanced settings. We report the results of experiments demonstrating that, empirically, both the GCA losses with calibrated class-dependent confidence margins and GLA losses can greatly outperform straightforward class-weighted losses as well as the LA losses. GLA generally performs slightly better in common benchmarks, whereas GCA exhibits a slight edge in highly imbalanced settings. Thus, we advocate for both GLA and GCA losses as principled, theoretically sound, and state-of-the-art surrogates for balanced classification under class imbalance.
View details
Cardinality-Aware Set Prediction and Top-$k$ Classification
Anqi Mao
Christopher Mohri
The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024)
Preview abstract
We present a detailed study of cardinality-aware top-$k$ classification, a novel approach that aims to learn an accurate top-$k$ set predictor while maintaining a low cardinality. We introduce a new target loss function tailored to this setting that accounts for both the classification error and the cardinality of the set predicted. To optimize this loss function, we propose two families of surrogate losses: cost-sensitive comp-sum losses and cost-sensitive constrained losses. Minimizing these loss functions leads to new cardinality-aware algorithms that we describe in detail in the case of both top-$k$ and threshold-based classifiers. We establish $H$-consistency bounds for our cardinality-aware surrogate loss functions, thereby providing a strong theoretical foundation for our algorithms. We report the results of extensive experiments on CIFAR-10, CIFAR-100, ImageNet, and SVHN datasets demonstrating the effectiveness and benefits of our cardinality-aware algorithms.
View details
Understanding the Effects of Batching in Online Active Learning
Afshin Rostamizadeh
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (2020)
Preview abstract
Online active learning (AL) algorithms often assume immediate access to a label once a query has been made. However, due to practical constraints, the labels of these queried examples are generally only available in ``batches''. In this work, we present a novel analysis for a generic class of batch online AL algorithms and reveal that the effects of batching are in fact mild and only result in an additional term in the label complexity that is linear in the batch size. To our knowledge, this provides the first theoretical justification for such algorithms and we show how they can be applied to batch variants of three canonical online AL algorithms: IWAL, ORIWAL, and DHM. We also conduct an empirical study that corroborates the novel theoretical insights.
View details
Preview abstract
A general framework for online learning with partial information is one where feedback graphs specify which losses can be observed by the learner. We study a challenging scenario where feedback graphs vary with time stochastically and,more importantly, where graphs and losses are stochastically dependent. This scenario appears in several applications that we describe. We give a new algorithm for this scenario that exploits the stochastic properties of the graphs and that benefits from favorable regret guarantees in this challenging setting. We present a detailed theoretical analysis of this algorithm, and also report the results of a series of experiments on real-world datasets, which show that our algorithm outperforms standard baselines for online learning with feedback graphs.
View details
Preview abstract
We present a new active learning algorithm that adaptively partitions the input space into a finite number of regions, and subsequently seeks a distinct predictor for each region, both phases actively requesting labels. We prove theoretical guarantees for both the generalization error and the label complexity of our algorithm, and analyze the number of regions defined by the algorithm under some mild assumptions. We also report the results of an extensive suite of experiments on several real-world datasets demonstrating substantial empirical benefits over existing single-region and nonadaptive region-based active learning baselines.
View details
Preview abstract
We consider the scenario of online learning with the sleeping experts where not all experts are available at each round and analyze the general framework of learning with stochastic feedback graphs, where loss observations associated to each expert are characterized by a graph, thereby including both the bandit and full information settings as special cases. A critical assumption in this framework is that the loss observations and the set of sleeping experts at each round is independent. We first extend the classical algorithm of Kleinberg et al 2008 to use the loss information encoded by any sequence of feedback graphs and prove matching upper and lower bounds for the sleeping regret of this algorithm. Our main contribution is then to relax this independence assumption, present a finer notion of sleeping regret, and derive a general algorithm with strong theoretical guarantees. We instantiate our framework to the important scenario of online learning with abstention, where a learner can elect to abstain from making a prediction at the price of a certain cost. We empirically validate the improvement of our algorithm against multiple abstention algorithms on several real-world datasets
View details
Relative deviation learning bounds and generalization with unbounded loss functions
Spencer Greenberg
Annals of Mathematics and Artificial Intelligence (2019)
Preview abstract
We present an extensive analysis of relative deviation bounds, including detailed proofs of two-sided inequalities and their implications. We also give detailed proofs of two-sided generalization bounds that hold in the general case of unbounded loss functions, under the assumption that a moment of the loss is bounded. We then illustrate how to apply these results in a sample application: the analysis of importance weighting.
View details
Adaptation Based on Generalized Discrepancy
Journal of Machine Learning Research, 20 (2019), pp. 1-30
Preview abstract
We present a new algorithm for domain adaptation improving upon a discrepancy minimization algorithm, (DM), previously shown to outperform a number of algorithms for this problem. Unlike many previously proposed solutions for domain adaptation, our algorithm does not consist of a fixed reweighting of the losses over the training sample. Instead, the reweighting depends on the hypothesis sought. The algorithm is derived from a less conservative notion of discrepancy than the DM algorithm called generalized discrepancy. We present a detailed description of our algorithm and show that it can be formulated as a convex optimization problem. We also give a detailed theoretical analysis of its learning guarantees which helps us select its parameters. Finally, we report the results of experiments demonstrating that it improves upon discrepancy minimization.
View details
Preview abstract
We present two novel extensions of an on-line importance weighted active learning algorithm IWAL, using the properties of disagreement values among hypotheses. The first extension, DIWAL, prunes the hypothesis set with a more aggressive strategy based on concentration bounds with disagreement values. We show that DIWAL improves the generalization performance and the label complexity of the original IWAL, and further quantify the improvement in terms of the disagreement graph coefficient. The second extension, ZOOM, adaptively zooms into the function space near the best-in-class hypothesis, which effectively reduces the best-in-class error and thus simultaneously improves the generalization performance and the label complexity. We report experimental results on multiple datasets and demonstrate that the proposed algorithms achieve better test performances than IWAL given the same amount of labeling budget.
View details