A System for Massively Parallel Hyperparameter Tuning

Liam Li
Kevin Jamieson
Ekaterina Gonina
Jonathan Ben-tzur
Moritz Hardt
Benjamin Recht
Ameet Talwalkar
Third Conference on Systems and Machine Learning (2020) (to appear)
Google Scholar

Abstract

Modern learning models are characterized by large hyperparameter spaces and long training times; this coupled
with the rise of parallel computing and productionization of machine learning motivate developing production-
quality hyperparameter optimization functionality for a distributed computing setting. We address this challenge
with a simple and robust hyperparameter optimization algorithm ASHA, which exploits parallelism and aggressive
early-stopping to tackle large-scale hyperparameter optimization problems. Our extensive empirical results show
that ASHA outperforms state-of-the-art hyperparameter optimization methods; scales linearly with the number of
workers in distributed settings; and is suitable for massive parallelism, converging to a high quality configuration
in half the time taken by Vizier (Google’s internal hyperparameter optimization service) in an experiment with
500 workers. We end with a discussion of the systems considerations we encountered and our associated solutions
when implementing ASHA in SystemX, a production-quality service for hyperparameter tuning.

Research Areas