Streaming Euclidean k-median and k-means to a (1 + ε)-approximation with o_{k,ε}(log n) Memory Words

Vincent Cohen-Addad

David P. Woodruff

Samson Zhou

FOCS'23 (2023) (to appear)

Google Scholar

Abstract

We consider the classic Euclidean k-median and k-means objective on insertion-only streams,
where the goal is to maintain a (1 + ε)-approximation to the k-median or k-means, while using
as little memory as possible. Over the last 20 years, clustering in data streams has received a
tremendous amount of attention and has been the test-bed for a large variety of new techniques,
including coresets, merge-and-reduce, bicriteria approximation, sensitivity sampling, etc. Despite
this intense effort to obtain smaller and smaller sketches for these problems, all known techniques
require storing at least Ω(log (n∆)) words of memory, where n is size of the input and ∆ is
the aspect ratio. In this paper, we show how to finally bypass this bound and obtain the first
algorithm that achieves a (1 + ε)-approximation to the more general (k, z)-clustering problem,
using only ̃O ( dk / ε^2) · (2^{z log z} ) · min ( 1/ε^z , k) · poly(log log(n∆)) words of memory.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Streaming Euclidean k-median and k-means to a (1 + ε)-approximation with o_{k,ε}(log n) Memory Words

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Streaming Euclidean k-median and k-means to a (1 + ε)-approximation with o_{k,ε}(log n) Memory Words

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities