Submodular Streaming in All Its Glory: Tight Approximation, Minimum Memory and Low Adaptive Complexity

Amin Karbasi
Ehsan Kazemi
Marko Mitrovic
ICML (2019)
Google Scholar

Abstract

Any streaming algorithm is commonly judged
by the quality of its solution, the memory footprint, and its required amount of computations.
In this paper, we study the problem of maximizing a monotone submodular function subject to
a cardinality constraint k. We propose SIEVESTREAMING++, a streaming algorithm that with
one pass over the data, while keeping only O(k)
elements, achieves the tight 1/2 approximation
guarantee. The best previously known streaming
algorithms either achieve a suboptimal approximation ratio 1/4 with Θ(k) memory or the tight
approximation ratio 1/2 with O(k log k) memory.
We then show that by buffering a small fraction
of the data stream, and through a careful filtering
procedure, one can heavily reduce the rounds of
adaptive computations, thus can lower the computational complexity of SIEVE-STREAMING++
substantially. We then generalize our results to
the more challenging multi-source streaming setting. We show how one can achieve the tight
1/2 approximation guarantee with a O(k) shared
memory, while minimizing not only the rounds
of computations but also the total number of communicated bits. Finally, we demonstrate the efficiency of our algorithms on real-world applications including data summarization for multisource streams of tweets and of YouTube videos