Discrete Distribution Estimation under Local Privacy

Abstract

The collection and analysis of user data drives improvements in the app and web ecosystems,
but comes with risks to privacy. This paper examines discrete distribution estimation under local
privacy, a setting wherein service providers can learn the distribution of a categorical statistic
of interest without collecting the underlying data. We present new mechanisms, including hashed
k-ary Randomized Response (k-RR), that empirically meet or exceed the utility of existing mechanisms
at all privacy levels. New theoretical results demonstrate the order-optimality of k-RR and the existing RAPPOR mechanism at different privacy regimes.