Scalable Private Set Union beyond Uniform Weighting
Abstract
In the differentially private set union problem, users contribute sets of items as input, and the output is a subset of the union of all items. Algorithms for this problem seek to output as many items as possible while maintaining differential privacy with respect to the addition or removal of an individual user.
The basic solution to this problem maintains a weight over each item. Each user contributes uniformly to the items in their set, random noise is added to the weights, and items with noisy weight above a certain threshold are output. The only scalable (i.e., distributed) algorithms for this problem from prior work are this basic algorithm and an iterative method which repeatedly calls the basic algorithm, ignoring items found in prior invocations.
In this work, we give an improved weighting algorithm over basic uniform weighting. Our algorithm reroutes weight from items with weight far above the threshold to items with smaller weight, thereby increasing the probability that less frequent items are output. The algorithm is scalable and does not suffer any privacy loss when compared to the basic algorithm. We prove that our algorithm will never underperform the basic algorithm and show experimentally that replacing the basic algorithm with ours yields the best results among scalable algorithms for the private set union problem.
The basic solution to this problem maintains a weight over each item. Each user contributes uniformly to the items in their set, random noise is added to the weights, and items with noisy weight above a certain threshold are output. The only scalable (i.e., distributed) algorithms for this problem from prior work are this basic algorithm and an iterative method which repeatedly calls the basic algorithm, ignoring items found in prior invocations.
In this work, we give an improved weighting algorithm over basic uniform weighting. Our algorithm reroutes weight from items with weight far above the threshold to items with smaller weight, thereby increasing the probability that less frequent items are output. The algorithm is scalable and does not suffer any privacy loss when compared to the basic algorithm. We prove that our algorithm will never underperform the basic algorithm and show experimentally that replacing the basic algorithm with ours yields the best results among scalable algorithms for the private set union problem.