
Silvio Lattanzi
Silvio received his bachelor (2005), master (2007) and PhD(2011) degree from the Computer Science department of Sapienza University of Rome, under the supervision of Alessandro Panconesi. Silvio joined Google Research in the New York office in January 2011. Since April 2017 Silvio moved to Google Research Zurich.
Authored Publications
Sort By
The Cost of Consistency: Submodular Maximization with Constant Recourse
Paul Duetting
Federico Fusco
Ashkan Norouzi Fard
Ola Svensson
Proceedings of the 57th Annual ACM Symposium on Theory of Computing (2025), 1406–1417
Preview abstract
In this work, we study online submodular maximization and how the requirement of maintaining a stable solution impacts the approximation. In particular, we seek bounds on the best-possible
approximation ratio that is attainable when the algorithm is allowed to make, at most, a constant number of updates per step. We show a tight information-theoretic bound of $2/3$ for general monotone submodular functions and an improved (also tight) bound of $3/4$ for coverage functions. Since both these bounds are attained by non poly-time algorithms, we also give a poly-time randomized algorithm that achieves a $0.51$-approximation. Combined with an
information-theoretic hardness of $1/2$ for deterministic algorithms from prior work, our work thus shows a separation between deterministic and randomized algorithms, both information theoretically and for poly-time algorithms.
View details
Deletion Robust Non-Monotone Submodular Maximization over Matroids
Paul Duetting
Federico Fusco
Ashkan Norouzi Fard
Journal of Machine Learning Research, 26 (2025), pp. 1-28
Preview abstract
Maximizing a submodular function is a fundamental task in machine learning and in this paper we study the deletion robust version of the problem under the classic matroids constraint. Here the goal is to extract a small size summary of the dataset that contains a high value independent set even after an adversary deleted some elements. We present constant-factor approximation algorithms, whose space complexity depends on the rank $k$ of the matroid and the number $d$ of deleted elements. In the centralized setting we present a $(4.597+O(\eps))$-approximation algorithm with summary size $O( \frac{k+d}{\eps^2}\log \frac{k}{\eps})$ that is improved to a $(3.582+O(\eps))$-approximation with $O(k + \frac{d}{\eps^2}\log \frac{k}{\eps})$ summary size when the objective is monotone. In the streaming setting we provide a $(9.435 + O(\eps))$-approximation algorithm with summary size and memory $O(k + \frac{d}{\eps^2}\log \frac{k}{\eps})$; the approximation factor is then improved to $(5.582+O(\eps))$ in the monotone case.
View details
Consistent Submodular Maximization
Paul Duetting
Federico Fusco
Ashkan Norouzi Fard
Proceedings of the 41st International Conference on Machine Learning, PMLR (2024), pp. 11979-11991
Preview abstract
Maximizing monotone submodular functions under cardinality constraints is a classic algorithmic problem with several applications in data mining and machine learning. In this paper we study this problem in a dynamic setting with consistency constrains. In this setting, elements arrive in a streaming fashion and one is interested in maintaining a constant approximation to the optimal solution and in having a stable solution (i.e., the number of changes between two consecutive solutions is bounded). We provide several algorithms in this setting with different trade-offs between consistency and approximation quality. We also complement our theoretical results with an experimental analysis showing the effectiveness of our algorithms in real world instances.
View details
Consistent Submodular Maximization
Paul Duetting
Federico Fusco
Ashkan Norouzi Fard
Proceedings of the 41st International Conference on Machine Learning, PMLR (2024), pp. 11979-11991
Preview abstract
Maximizing monotone submodular functions under cardinality constraints is a classic algorithmic problem with several applications in data mining and machine learning. In this paper we study this problem in a dynamic setting with consistency constrains. In this setting, elements arrive in a streaming fashion and one is interested in maintaining a constant approximation to the optimal solution and in having a stable solution (i.e., the number of changes between two consecutive solutions is bounded). We provide several algorithms in this setting with different trade-offs between consistency and approximation quality. We also complement our theoretical results with an experimental analysis showing the effectiveness of our algorithms in real world instances.
View details
Consistent Submodular Maximization
Paul Duetting
Federico Fusco
Ashkan Norouzi Fard
Proceedings of the 41st International Conference on Machine Learning (2024), pp. 11979-11991
Preview abstract
Maximizing monotone submodular functions under cardinality constraints is a classic algorithmic problem with several applications in data mining and machine learning. In this paper we study this problem in a dynamic setting with consistency constrains. In this setting, elements arrive in a streaming fashion and one is interested in maintaining a constant approximation to the optimal solution and in having a stable solution (i.e., the number of changes between two consecutive solutions is bounded). We provide several algorithms in this setting with different trade-offs between consistency and approximation quality. We also complement our theoretical results with an experimental analysis showing the effectiveness of our algorithms in real world instances.
View details
Consistent Submodular Maximization
Paul Duetting
Federico Fusco
Ashkan Norouzi Fard
Proceedings of the 41st International Conference on Machine Learning (2024)
Preview abstract
Maximizing monotone submodular functions under cardinality constraints is a classic algorithmic problem with several applications in data mining and machine learning. In this paper we study this problem in a dynamic setting with consistency constrains. In this setting, elements arrive in a streaming fashion and one is interested in maintaining a constant approximation to the optimal solution and in having a stable solution (i.e., the number of changes between two consecutive solutions is bounded). We provide several algorithms in this setting with different trade-offs between consistency and approximation quality. We also complement our theoretical results with an experimental analysis showing the effectiveness of our algorithms in real world instances.
View details
Preview abstract
Learning graph cluster structure using few queries is a classical question in property testing, with the fundamental special case, namely expansion testing, considered in the seminal work of Goldreich and Ron[STOC'96].
The most recent result in this line of work, due to Gluch et al.[SODA'21], designs {\em clustering oracles} for $(k, \e)$-clusterable graphs. These oracles, given a graph whose vertex set can be partitioned into a disjoint union of $k$ clusters (i.e., good expanders) with outer conductances bounded by $\e\ll 1$, provide query access to an $O(\e \log k)$-approximation to this ground truth clustering in time $\approx 2^{\text{poly}(k/\e)} n^{1/2+O(\e)}$ per query.
In this paper we show that it is possible to learn the {\em hierarchical} cluster structure of $(k, \e)$-clusterable graphs in sublinear time. First, we show how to simulate the hierarchical clustering algorithm of Charikar-Chatziafratis[SODA'17] to approximate the Dasgupta cost of a $k$-clusterable graph to within a factor of $O(\sqrt{\log k})$ in $\approx \text{poly}(k)\cdot n^{1/2+O(\e)}$ time assuming oracle access to the clustering. Second, we introduce a natural hierarchical model of clusterable graphs, and give a bona fide clustering oracle for this model, i.e. a small space data structure that can answer hierarchical clustering queries in $\approx\text{poly}(k) \cdot n^{1/2+O(\e)}$ time per query. Notably, in both cases the query time depends on polynomially on the number $k$ of clusters. The second result is the main technical contribution of the paper, and relies on several structural properties of hierarchically clusterable graphs that we hope will be of independent interest in sublinear time spectral graph algorithms.
View details
Preview abstract
Designing algorithms for machine learning problems beyond worst-case analysis and, in particular,analyzing the effect of side-information on the complexity of such problems is a very important line of research with many practical applications. In this paper we study the classic k-means clustering problem in the presence of noisy labels. In this problem we receive as input a set of points and a set of clustering labels generated by either an adversarial or random perturbation of the optimal solution. Our main goal is to formally study the effect of this extra information on the complexity of the k-means problem. In particular, in the context of random perturbations, we give an efficient algorithm that finds a clustering of cost within a factor 1 +o(1)of optimum even when the label of each point is perturbed with a large probability (think 99%). In contrast, we show that side-information with adversarial perturbations is as hard as the original problem even if only a small fraction of the labels are perturbed. We complement this negative result by giving a simple algorithm in the case when the adversary is only allowed to perturb anfraction of each cluster.
View details
Preview abstract
Thanks to its many applications the clustering ensemble problem has been extensively studied in the past. In htis problem we are giving in input $m$ clustering and the objective is to output a clustering that ``well-represent'' all the input clustering. In this paper, we propose to thee best of our knowledge the first generative model for the problem. Our model is parameterized by a ``center'' clustering and a scale; the probability of a particular clustering is an exponential function of its Rand distance to the center, scaled.
For this new model, we show: (i) a sampling algorithm that runs in polynomial time when the center has a constant number of clusters and (ii) a simple polynomial time reconstruction algorithm when the scale is small.
View details
Scalable Differentially Private Clustering via Hierarchically Separated Trees
Chris Schwiegelshohn
David Saulpic
2022 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2022) (to appear)
Preview abstract
We study the private $k$-median and $k$-means clustering problem in $d$ dimensional Euclidean space.
By leveraging tree embeddings, we give an efficient and easy to implement algorithm, that is empirically competitive with state of the art non private methods.
We prove that our method computes a solution with cost at most $O(d^{3/2}\log n)\cdot OPT + O(k d^2 \log^2 n / \epsilon^2)$, where $\epsilon$ is the privacy guarantee. (The dimension term, $d$, can be replaced with $O(\log k)$ using standard dimension reduction techniques.) Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical, runs in near-linear, $\tilde{O}(nkd)$, time and scales to tens of millions of points. We also show that our method is amenable to parallelization in large-scale distributed computing environments. In particular we show that our private algorithms can be implemented in logarithmic number of MPC rounds in the sublinear memory regime.
Finally, we complement our theoretical analysis with an empirical evaluation demonstrating the algorithm's efficiency and accuracy in comparison to other privacy clustering baselines.
View details