Streaming Euclidean MST to a Constant Factor

Amit Levi
Erik Waingarten
Xi Chen
54rd Annual ACM Symposium on Theory of Computing (STOC'23) (2023)
Google Scholar

Abstract

We study streaming algorithms for the fundamental geometric problem of computing the cost of the Euclidean Minimum Spanning Tree (MST) on an $n$-point set $X \subset \R^d$. In the streaming model, the points in $X$ can be added and removed arbitrarily, and the goal is to maintain an approximation in small space. In low dimensions, $(1+\epsilon)$ approximations are possible in sublinear space. However, for high dimensional space the best known approximation for this problem was $\tilde{O}(\log n)$, due to [Chen, Jayaram, Levi, Waingarten, STOC'22], improving on the prior $O(\log^2 n)$ bound due to [Andoni, Indyk, Krauthgamer, SODA '08]. In this paper, we break the logarithmic barrier, and give the first constant factor sublinear space approximation to Euclidean MST. For any $\epsilon\geq 1$, our algorithm achieves an $\tilde{O}(\epsilon^{-2})$ approximation in $n^{O(\epsilon)} d^{O(1)}$ space.


We complement this by demonstrating that any single pass algorithm which obtains a better than $1.10$-approximation must use $\Omega(\sqrt{n})$ space, demonstrating that $(1+\epsilon)$ approximations are not possible in high-dimensions, and that our algorithm is tight up to a constant. Nevertheless, we demonstrate that $(1+\epsilon)$ approximations are possible in sublinear space with $O(1/\epsilon)$ passes over the stream. More generally, for any $\alpha \geq 2$, we give a $\alpha$-pass streaming algorithm which achieves a $O(\frac{1}{ \alpha \epsilon})$ approximation in $n^{O(\epsilon)} d^{O(1)}$ space.

All our streaming algorithms are linear sketches, and therefore extend to the massively-parallel computation model (MPC). Thus, our results imply the first $(1+\epsilon)$-approximation in a constant number of rounds in the MPC model. Previously, such a result was only known for low-dimensional space ([Andoni, Nikolov, Onak, Yaroslavtsev, STOC'15]), or either required $O(\log n)$ rounds or suffered a $O(\log n)$ approximation.