Differentially Private Space-Efficient Algorithms for Counting Distinct Elements in the Turnstile Model

Rachel Cummings
Tamalika Mukherjee
Tingting Ou
Peilin Zhong
ICML 2025 (2025)

Abstract

The {continual observation} streaming setting captures data analysis scenarios where both data collection is ongoing, and real-time analysis of the collected data is needed at all time points. The \emph{turnstile} streaming model is a subclass of the continual observation model and allows for both the insertion and deletion of data items over time. Typically, the length of the stream $T$ and the size of the universe $\cU$ from which data items come from are considered to be extremely large, thus practical solutions should use space at most sublinear in $T$ and $\mathcal{U}$. Because many of the relevant applications for data analysis in the continual observation setting involve the analysis of sensitive user data, this motivates the need to develop space-efficient algorithms that achieve formal privacy protections such as differential privacy.

We give the first sublinear space differentially private algorithms for the fundamental problem of counting distinct elements in the turnstile streaming model that achieves $\Tilde{O}(T^{1/3})$ space and additive error and a $(1+\eta)$-relative approximation for all $\eta \in (0,1)$. Our algorithm significantly improves upon the space requirements of the state of the art for this problem in the turnstile model~\cite{JainKRSS23} which has a linear dependency in both $T$ and $\vert \cU \vert$, while still achieving an additive error that is close to the known $T^{1/4}$ lower bound.
×