Does Federated Dropout actually work?

Gary Cheng
Federated Vision (2022)
Google Scholar

Abstract

Model sizes are limited in Federated Learning due to communication bandwidth constraints and on-device memory constraints. The success of scaling model sizes in other machine learning domains, especially when it comes to generalizing to new data distributions, motivates the development of methods of training large scale models in Federated Learning. Inspired by dropout, [3] proposed Federated Dropout as a way of scaling up model sizes: clients train randomly selected subsets of the larger server model. In spite of the promising empirical results and the many other works that build on it [1, 8, 13], we argue in this paper that the metrics used to measure performance of Federated Dropout and its variants are misleading. We propose and perform new experiments which suggest that Federated Dropout is actually detrimental to scaling efforts. We show how a simple ensembling technique outperforms Federated Dropout and other baselines. We perform ablations which suggest that the best performing variations of Federated Dropout attempt to approximate ensembling. The simplicity of ensembling allows for easy, practical implementations. Furthermore, our ensembling technique naturally leverages the parallelizable nature of Federated Learning—recall that it is easy to train several models independently because there are a lot of clients and server-compute is not the bottleneck. Ensembling’s strong performance against our baselines suggests that Federated Learning models may be more easily scaled than previously thought e.g., via boosting.

Research Areas